File operations and command line

We will learn how read and write files from python and how to write a python program and run it from the command line.

Reading and writing files

Python handles files through file objects.

The open(filename[, mode]) function opens a file and returns a file handle object (or raise an error). The mode can be 'r' (read), 'w' (write), 'r+' (both), 'a' (append), 'a+' (append+read), or in case of binary files 'rb', 'wb', 'r+b', 'ab', 'a+b'.: 'rb', 'wb', 'r+b'.

In [1]:
f = open('E0.csv')  # oper for reading, return a file object
print(f)
print(type(f))
<_io.TextIOWrapper name='E0.csv' mode='r' encoding='UTF-8'>
<class '_io.TextIOWrapper'>

The file object is not so useful on its own. This file contains the English Premier League statistics from the season of 2015/16. The read() method reads the whole file to a string. We don't print out the first 100 chars only:

In [2]:
f = open('E0.csv')
content = f.read()
print(content[:100])
Div,Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,HTR,Referee,HS,AS,HST,AST,HF,AF,HC,AC,HY,AY,HR,AR

Note: opening a text file for reading we may use open('E0.csv', 'r') but reading is the default, so you may drop 'r'.

Read in the first and second lines with the readline() method.

In [3]:
f = open('E0.csv')
first_line = f.readline()
print(first_line)
second_line = f.readline()
print(second_line)
Div,Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,HTR,Referee,HS,AS,HST,AST,HF,AF,HC,AC,HY,AY,HR,AR,B365H,B365D,B365A,BWH,BWD,BWA,IWH,IWD,IWA,LBH,LBD,LBA,PSH,PSD,PSA,WHH,WHD,WHA,VCH,VCD,VCA,Bb1X2,BbMxH,BbAvH,BbMxD,BbAvD,BbMxA,BbAvA,BbOU,BbMx>2.5,BbAv>2.5,BbMx<2.5,BbAv<2.5,BbAH,BbAHh,BbMxAHH,BbAvAHH,BbMxAHA,BbAvAHA

E0,08/08/15,Bournemouth,Aston Villa,0,1,A,0,0,D,M Clattenburg,11,7,2,3,13,13,6,3,3,4,0,0,2,3.6,4,2,3.3,3.7,2.1,3.3,3.3,2.05,3.3,4,1.95,3.65,4.27,1.91,3.5,4,2,3.5,4.2,45,2.1,1.96,3.65,3.48,4.33,3.98,43,2.11,2.02,1.88,1.79,26,-0.5,1.98,1.93,1.99,1.92

The file object is iterable row-wise.

Mind the newline character at the end of each line.

In [4]:
f = open('E0.csv')
L = []
for line in f:
    L.append(line)
print(L[:4])
['Div,Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,HTR,Referee,HS,AS,HST,AST,HF,AF,HC,AC,HY,AY,HR,AR,B365H,B365D,B365A,BWH,BWD,BWA,IWH,IWD,IWA,LBH,LBD,LBA,PSH,PSD,PSA,WHH,WHD,WHA,VCH,VCD,VCA,Bb1X2,BbMxH,BbAvH,BbMxD,BbAvD,BbMxA,BbAvA,BbOU,BbMx>2.5,BbAv>2.5,BbMx<2.5,BbAv<2.5,BbAH,BbAHh,BbMxAHH,BbAvAHH,BbMxAHA,BbAvAHA\n', 'E0,08/08/15,Bournemouth,Aston Villa,0,1,A,0,0,D,M Clattenburg,11,7,2,3,13,13,6,3,3,4,0,0,2,3.6,4,2,3.3,3.7,2.1,3.3,3.3,2.05,3.3,4,1.95,3.65,4.27,1.91,3.5,4,2,3.5,4.2,45,2.1,1.96,3.65,3.48,4.33,3.98,43,2.11,2.02,1.88,1.79,26,-0.5,1.98,1.93,1.99,1.92\n', 'E0,08/08/15,Chelsea,Swansea,2,2,D,2,1,H,M Oliver,11,18,3,10,15,16,4,8,1,3,1,0,1.36,5,11,1.4,4.75,9,1.33,4.8,8.3,1.4,4.5,10,1.39,4.92,10.39,1.4,4,10,1.4,5,9.5,45,1.43,1.37,5,4.66,11.26,9.57,43,1.88,1.8,2.07,1.99,27,-1.5,2.24,2.16,1.8,1.73\n', 'E0,08/08/15,Everton,Watford,2,2,D,0,1,A,M Jones,10,11,5,5,7,13,8,2,1,2,0,0,1.7,3.9,5.5,1.7,3.5,5,1.7,3.6,4.7,1.75,3.8,5,1.7,3.95,5.62,1.73,3.5,5,1.73,3.9,5.4,45,1.75,1.69,4,3.76,5.77,5.25,44,1.93,1.84,2.03,1.96,26,-1,2.28,2.18,1.76,1.71\n']

The list L now contains the rows of the file. You can split the lines into cells with .split(",") but later we will show and other method.

Writing a file

Let's say that you care only about the results of the team 'Liverpool'. Write the appropriate data into a file named 'Liverpool.csv'.

For writing we open with open(filename, 'w'), but if you do so, don't forget to close it!

In [5]:
f = open('Liverpool.csv', 'w')
f.write('YNWA')  # You'll Never Walk Alone
f.close()

Or you may use an equivalent practice with the command with, as it calls close() automaticly when it steps out from the scope of it. And this is a readable and safer solution:

In [6]:
with open('Liverpool.csv', 'w') as f:
    f.write('When you walk through a storm\n')

Let us add a new line to the closed file:

In [7]:
with open('Liverpool.csv', 'a') as f:
    f.write('Hold your head up high\n')

Let's read the results row-by-row and choose the rows containing the word 'Liverpool', save those line in 'Liverpool.csv'. The header of the file will be the same!

In [8]:
f = open('E0.csv')
L = [f.readline()]   # this is the header

for line in f:
    if 'Liverpool' in line:
        L.append(line)

with open('Liverpool.csv', 'w') as f:
    for l in L:
        f.write(l)

Reading and writing binary files

Write some numbers into bytes, and read them back!

In [9]:
with open("binfile.bin", "wb") as f:
    numbs = [1, 2, 4, 8, 16, 32, 31]
    arr = bytearray(numbs)
    f.write(arr)
In [10]:
with open("binfile.bin", "rb") as f:
    numbs = list(f.read())
    print(numbs)
[1, 2, 4, 8, 16, 32, 31]

Handling csv files in Python

The previous file was a comma separated values file with the .csv extenstion (see Wikipedia). Each line of these type of files is a data record. Each record consists of one or more fields, separated by commas. Comma is the default separator, but one may use other character to separate. The spreadsheet programs (like Excel, libreoffice) can save the files in this format.

Python can handle this format with the csv module (see Python documentation).

In [11]:
import csv

L = []
with open('E0.csv', 'r') as csvfile:
    reader = csv.reader(csvfile) #, delimiter=',', quotechar='"')
    for row in reader:
        L.append(row)

print(L[0])
print(L[19])
['Div', 'Date', 'HomeTeam', 'AwayTeam', 'FTHG', 'FTAG', 'FTR', 'HTHG', 'HTAG', 'HTR', 'Referee', 'HS', 'AS', 'HST', 'AST', 'HF', 'AF', 'HC', 'AC', 'HY', 'AY', 'HR', 'AR', 'B365H', 'B365D', 'B365A', 'BWH', 'BWD', 'BWA', 'IWH', 'IWD', 'IWA', 'LBH', 'LBD', 'LBA', 'PSH', 'PSD', 'PSA', 'WHH', 'WHD', 'WHA', 'VCH', 'VCD', 'VCA', 'Bb1X2', 'BbMxH', 'BbAvH', 'BbMxD', 'BbAvD', 'BbMxA', 'BbAvA', 'BbOU', 'BbMx>2.5', 'BbAv>2.5', 'BbMx<2.5', 'BbAv<2.5', 'BbAH', 'BbAHh', 'BbMxAHH', 'BbAvAHH', 'BbMxAHA', 'BbAvAHA']
['E0', '16/08/15', 'Man City', 'Chelsea', '3', '0', 'H', '1', '0', 'H', 'M Atkinson', '18', '10', '8', '3', '19', '13', '5', '1', '4', '2', '0', '0', '2.1', '3.5', '3.75', '2.1', '3.4', '3.7', '2.1', '3.3', '3.3', '2.1', '3.4', '3.75', '2.08', '3.56', '3.87', '2.15', '3.2', '3.6', '2.1', '3.5', '3.9', '43', '2.17', '2.09', '3.56', '3.4', '3.9', '3.66', '42', '2.05', '1.98', '1.88', '1.82', '28', '-0.5', '2.12', '2.06', '1.87', '1.81']

This result differs from the previous one: csv.reader() makes a list from the lines. In comment we gave two optional values what we may use: with the option delimiter one may change the default separator comma to an other character, and with quotechar one may define which character to use to delimit strings. This is useful when there are numbers and strings mixed in the file.

Reading csv format into a dictionary

If you look the data closely, you can see that a dictionary would be even better. It's better to refer the cells by name and not by index.

This format uses the first line (header) as dictionary keys. DictReader() function of the csv modul makes this:

In [12]:
import csv

L=[]
with open('E0.csv') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        L.append(row)

print(L[0])
OrderedDict([('Div', 'E0'), ('Date', '08/08/15'), ('HomeTeam', 'Bournemouth'), ('AwayTeam', 'Aston Villa'), ('FTHG', '0'), ('FTAG', '1'), ('FTR', 'A'), ('HTHG', '0'), ('HTAG', '0'), ('HTR', 'D'), ('Referee', 'M Clattenburg'), ('HS', '11'), ('AS', '7'), ('HST', '2'), ('AST', '3'), ('HF', '13'), ('AF', '13'), ('HC', '6'), ('AC', '3'), ('HY', '3'), ('AY', '4'), ('HR', '0'), ('AR', '0'), ('B365H', '2'), ('B365D', '3.6'), ('B365A', '4'), ('BWH', '2'), ('BWD', '3.3'), ('BWA', '3.7'), ('IWH', '2.1'), ('IWD', '3.3'), ('IWA', '3.3'), ('LBH', '2.05'), ('LBD', '3.3'), ('LBA', '4'), ('PSH', '1.95'), ('PSD', '3.65'), ('PSA', '4.27'), ('WHH', '1.91'), ('WHD', '3.5'), ('WHA', '4'), ('VCH', '2'), ('VCD', '3.5'), ('VCA', '4.2'), ('Bb1X2', '45'), ('BbMxH', '2.1'), ('BbAvH', '1.96'), ('BbMxD', '3.65'), ('BbAvD', '3.48'), ('BbMxA', '4.33'), ('BbAvA', '3.98'), ('BbOU', '43'), ('BbMx>2.5', '2.11'), ('BbAv>2.5', '2.02'), ('BbMx<2.5', '1.88'), ('BbAv<2.5', '1.79'), ('BbAH', '26'), ('BbAHh', '-0.5'), ('BbMxAHH', '1.98'), ('BbAvAHH', '1.93'), ('BbMxAHA', '1.99'), ('BbAvAHA', '1.92')])

The result is not what we hoped. This is not an ordinary dictionary, but something new. This is a so called ordered dictionary. An OrderedDict is a dictionary subclass that remembers the order in which its contents are added. Otherwise we may use exactly the same methods. To convert it to ordinary dictionary use the dict function:

In [13]:
print(dict(L[0]))
{'Div': 'E0', 'Date': '08/08/15', 'HomeTeam': 'Bournemouth', 'AwayTeam': 'Aston Villa', 'FTHG': '0', 'FTAG': '1', 'FTR': 'A', 'HTHG': '0', 'HTAG': '0', 'HTR': 'D', 'Referee': 'M Clattenburg', 'HS': '11', 'AS': '7', 'HST': '2', 'AST': '3', 'HF': '13', 'AF': '13', 'HC': '6', 'AC': '3', 'HY': '3', 'AY': '4', 'HR': '0', 'AR': '0', 'B365H': '2', 'B365D': '3.6', 'B365A': '4', 'BWH': '2', 'BWD': '3.3', 'BWA': '3.7', 'IWH': '2.1', 'IWD': '3.3', 'IWA': '3.3', 'LBH': '2.05', 'LBD': '3.3', 'LBA': '4', 'PSH': '1.95', 'PSD': '3.65', 'PSA': '4.27', 'WHH': '1.91', 'WHD': '3.5', 'WHA': '4', 'VCH': '2', 'VCD': '3.5', 'VCA': '4.2', 'Bb1X2': '45', 'BbMxH': '2.1', 'BbAvH': '1.96', 'BbMxD': '3.65', 'BbAvD': '3.48', 'BbMxA': '4.33', 'BbAvA': '3.98', 'BbOU': '43', 'BbMx>2.5': '2.11', 'BbAv>2.5': '2.02', 'BbMx<2.5': '1.88', 'BbAv<2.5': '1.79', 'BbAH': '26', 'BbAHh': '-0.5', 'BbMxAHH': '1.98', 'BbAvAHH': '1.93', 'BbMxAHA': '1.99', 'BbAvAHA': '1.92'}

To make an ordered dictionary we have to import the collections modul (csv do this otherwise). The next example show the difference between dict and OrderedDict dictionaries:

In [14]:
import collections

d1 = {}
d1['x'] = 'X'
d1['y'] = 'Y'

d2 = {}
d2['y'] = 'Y'
d2['x'] = 'X'

print(d1 == d2)

d1 = collections.OrderedDict()
d1['x'] = 'X'
d1['y'] = 'Y'

d2 = collections.OrderedDict()
d2['y'] = 'Y'
d2['x'] = 'X'

print(d1 == d2)
True
False

Let us go back to our file and store the data of Liverpool matches only. We will write the 'Date', 'HomeTeam', 'AwayTeam', 'FTHG'(Full Time Home Goals), 'FTAG' (Full Time Away Goals), 'FTR' (Full Time Result) values into a file! We will use csv.DictWriter to write the data, the writeheader() method to write the header first, then the writerows() method to write the actual data.

The fieldnames parameter tells which fields (columns in the table) to use. The extrasaction='ignore' ignores the other fields.

In [15]:
import csv

L = []
with open('E0.csv') as csvfile:
    reader = csv.DictReader(csvfile)
    for x in reader:
        if x['HomeTeam'] == 'Liverpool' or x['AwayTeam'] == 'Liverpool':
            L.append(x)

with open('Liverpool.csv', 'w') as output:
    fields = ['Date', 'HomeTeam', 'AwayTeam', 'FTHG', 'FTAG', 'FTR']
    writer = csv.DictWriter(output, fieldnames=fields, extrasaction='ignore')
    writer.writeheader()
    writer.writerows(L)

Handling json format in Python

JSON (JavaScript Object Notation) is a lightweight language independent data-interchange format. The key structures of it are the list and dictionary, and the notation of them is the same as in Python: lists are marked with a comma separated list in brackets [ ], the dict contains the usual key:value pairs in curly brackets { }.

This is an example, you may copy it into a file:

{
    "Liverpool" : {
        "Players": [
            "Steven Gerrard",
            "Bill Shankly"
        ],
        "Results" : [
            {
                "HomeTeam":"Liverpool",
                "AwayTeam":"Tottenham",
                "HTG":1,
                "ATG":1
            },
            {
                "HomeTeam":"West Ham",
                "AwayTeam":"Liverpool",
                "HTG":2,
                "ATG":0
            }
        ],
        "Points":1,
        "Goals Scored":1,
        "Goals Condceded":3
    }
}

Python can hanle this format with the json module. After reading the file, the data is saved in Python objects.

In [16]:
import json

with open('Liverpool.json') as data_file:    
    data = json.load(data_file)

print(data)
print(data['Liverpool']['Players'])
{'Liverpool': {'Players': ['Steven Gerrard', 'Bill Shankly'], 'Results': [{'HomeTeam': 'Liverpool', 'AwayTeam': 'Tottenham', 'HTG': 1, 'ATG': 1}, {'HomeTeam': 'West Ham', 'AwayTeam': 'Liverpool', 'HTG': 2, 'ATG': 0}], 'Points': 1, 'Goals Scored': 1, 'Goals Condceded': 3}}
['Steven Gerrard', 'Bill Shankly']

Now let's write a json file! To look better you may use the sort_keys, indent and separators parameters. The json.dumps(obj) returns a string which encodes an object (obj) in a json format. We write that into a file and that's all.

In [17]:
import json

with open('Liverpool.json') as data_file:    
    data = json.load(data_file)

with open('Liverpool_matches.json', 'w') as f:
    f.write(json.dumps(data['Liverpool']['Results'], 
            sort_keys=True, indent=4, separators=(',', ': ')))

There are sevaral ways to handle json format:

  • json.dumps(obj): encodes obj to a JSON formatted string
  • json.dump(JSON_formatted_string, file): writes into a file
  • json.loads(JSON_formatted_string): converts a JSON formatted string into a python object
  • json.load(file): reads the content of file to a python object (it can be a complex python data)

More details in python docs.

Command line arguments

We will run python codes as standalone programs!

The sys module

Write a python code and save it with the .py extension. Your OS can recongise it as a python program or you can run with an interpreter.

You can communicate with your program via input() function or with command line arguments. Our very first program will write out the number of command line arguments and the list of them. The first element in this list is the name of the program file. The others are optional. To do this we have to import sys first. The list sys.argv will store these parameters. Important note: this is a list of strings! Even the numbers are stored as strings, so using them as numbers, we have to convert them.

Save the followings into a file named cli.py and run from command line.

import sys

print('Number of arguments:', len(sys.argv))
print('List of arguments:  ', sys.argv)
In [18]:
!python3 cli.py arg1 arg2
Number of arguments: 3
List of arguments:   ['cli.py', 'arg1', 'arg2']

The ! tells the notebook to run in command line, not here as a python code.

You can use the values in sys.argv and we call them positional parameters since you can refer to them by their place in the list sys.argv.

Exercise: calculate the power of a number. Write a python program which have two command line arguments: base and exponent.

If the numbers are integers then calculate as integers, otherwise calculate with floats.

Save the followings into a file named power.py.

import sys

def is_intstring(s):
    try:
        int(s)
        return True
    except ValueError:
        return False

a = []

for i in range(1,3):
    if is_intstring(sys.argv[i]):
        a.append(int(sys.argv[i]))
    else:
        a.append(float(sys.argv[i]))

print(a[0] ** a[1])

This is how to run it:

In [19]:
!python3 power.py 4.2 3

!python3 power.py 2 100
74.08800000000001
1267650600228229401496703205376
In [ ]: