We will learn how read and write files from python and how to write a python program and run it from the command line.
Python handles files through file objects.
The open(filename[, mode])
function opens a file and returns a file handle object (or raise an error).
The mode
can be 'r'
(read), 'w'
(write), 'r+'
(both), 'a'
(append), 'a+'
(append+read), or in case of binary files
'rb'
, 'wb'
, 'r+b'
, 'ab'
, 'a+b'
.: 'rb'
, 'wb'
, 'r+b'
.
f = open('E0.csv') # oper for reading, return a file object
print(f)
print(type(f))
The file object is not so useful on its own.
This file contains the English Premier League statistics from the season of 2015/16.
The read()
method reads the whole file to a string. We don't print out the first 100 chars only:
f = open('E0.csv')
content = f.read()
print(content[:100])
Note: opening a text file for reading we may use open('E0.csv', 'r')
but reading is the default, so you may drop 'r'
.
Read in the first and second lines with the readline()
method.
f = open('E0.csv')
first_line = f.readline()
print(first_line)
second_line = f.readline()
print(second_line)
The file object is iterable row-wise.
Mind the newline character at the end of each line.
f = open('E0.csv')
L = []
for line in f:
L.append(line)
print(L[:4])
The list L
now contains the rows of the file. You can split the lines into cells with .split(",")
but later we will show and other method.
Let's say that you care only about the results of the team 'Liverpool'.
Write the appropriate data into a file named 'Liverpool.csv'
.
For writing we open with open(filename, 'w')
, but if you do so, don't forget to close it!
f = open('Liverpool.csv', 'w')
f.write('YNWA') # You'll Never Walk Alone
f.close()
Or you may use an equivalent practice with the command with
, as it calls close()
automaticly when it steps out from the scope of it. And this is a readable and safer solution:
with open('Liverpool.csv', 'w') as f:
f.write('When you walk through a storm\n')
Let us add a new line to the closed file:
with open('Liverpool.csv', 'a') as f:
f.write('Hold your head up high\n')
Let's read the results row-by-row and choose the rows containing the word 'Liverpool', save those line in 'Liverpool.csv'. The header of the file will be the same!
f = open('E0.csv')
L = [f.readline()] # this is the header
for line in f:
if 'Liverpool' in line:
L.append(line)
with open('Liverpool.csv', 'w') as f:
for l in L:
f.write(l)
Write some numbers into bytes, and read them back!
with open("binfile.bin", "wb") as f:
numbs = [1, 2, 4, 8, 16, 32, 31]
arr = bytearray(numbs)
f.write(arr)
with open("binfile.bin", "rb") as f:
numbs = list(f.read())
print(numbs)
csv
files in Python¶The previous file was a comma separated values file with the .csv
extenstion (see Wikipedia).
Each line of these type of files is a data record. Each record consists of one or more fields, separated by commas. Comma is the default separator, but one may use other character to separate. The spreadsheet programs (like Excel, libreoffice) can save the files in this format.
Python can handle this format with the csv
module (see Python documentation).
import csv
L = []
with open('E0.csv', 'r') as csvfile:
reader = csv.reader(csvfile) #, delimiter=',', quotechar='"')
for row in reader:
L.append(row)
print(L[0])
print(L[19])
This result differs from the previous one: csv.reader()
makes a list from the lines. In comment we gave two optional values what we may use: with the option delimiter
one may change the default separator comma to an other character, and with quotechar
one may define which character to use to delimit strings. This is useful when there are numbers and strings mixed in the file.
If you look the data closely, you can see that a dictionary would be even better. It's better to refer the cells by name and not by index.
This format uses the first line (header) as dictionary keys. DictReader()
function of the csv
modul makes this:
import csv
L=[]
with open('E0.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
L.append(row)
print(L[0])
The result is not what we hoped. This is not an ordinary dictionary, but something new. This is a so called ordered dictionary. An OrderedDict
is a dictionary subclass that remembers the order in which its contents are added. Otherwise we may use exactly the same methods. To convert it to ordinary dictionary use the dict
function:
print(dict(L[0]))
To make an ordered dictionary we have to import the collections
modul (csv
do this otherwise). The next example show the difference between dict
and OrderedDict
dictionaries:
import collections
d1 = {}
d1['x'] = 'X'
d1['y'] = 'Y'
d2 = {}
d2['y'] = 'Y'
d2['x'] = 'X'
print(d1 == d2)
d1 = collections.OrderedDict()
d1['x'] = 'X'
d1['y'] = 'Y'
d2 = collections.OrderedDict()
d2['y'] = 'Y'
d2['x'] = 'X'
print(d1 == d2)
Let us go back to our file and store the data of Liverpool matches only. We will write the 'Date', 'HomeTeam', 'AwayTeam', 'FTHG'(Full Time Home Goals), 'FTAG' (Full Time Away Goals), 'FTR' (Full Time Result) values into a file!
We will use csv.DictWriter
to write the data, the writeheader()
method to write the header first,
then the writerows()
method to write the actual data.
The fieldnames
parameter tells which fields (columns in the table) to use.
The extrasaction='ignore'
ignores the other fields.
import csv
L = []
with open('E0.csv') as csvfile:
reader = csv.DictReader(csvfile)
for x in reader:
if x['HomeTeam'] == 'Liverpool' or x['AwayTeam'] == 'Liverpool':
L.append(x)
with open('Liverpool.csv', 'w') as output:
fields = ['Date', 'HomeTeam', 'AwayTeam', 'FTHG', 'FTAG', 'FTR']
writer = csv.DictWriter(output, fieldnames=fields, extrasaction='ignore')
writer.writeheader()
writer.writerows(L)
json
format in Python¶JSON (JavaScript Object Notation) is a lightweight language independent data-interchange format. The key structures of it are the list and dictionary, and the notation of them is the same as in Python: lists are marked with a comma separated list in brackets [ ]
, the dict contains the usual key:value
pairs in curly brackets { }
.
This is an example, you may copy it into a file:
{
"Liverpool" : {
"Players": [
"Steven Gerrard",
"Bill Shankly"
],
"Results" : [
{
"HomeTeam":"Liverpool",
"AwayTeam":"Tottenham",
"HTG":1,
"ATG":1
},
{
"HomeTeam":"West Ham",
"AwayTeam":"Liverpool",
"HTG":2,
"ATG":0
}
],
"Points":1,
"Goals Scored":1,
"Goals Condceded":3
}
}
Python can hanle this format with the json
module.
After reading the file, the data
is saved in Python objects.
import json
with open('Liverpool.json') as data_file:
data = json.load(data_file)
print(data)
print(data['Liverpool']['Players'])
Now let's write a json file! To look better you may use the sort_keys
, indent
and separators
parameters.
The json.dumps(obj)
returns a string which encodes an object (obj
) in a json format.
We write that into a file and that's all.
import json
with open('Liverpool.json') as data_file:
data = json.load(data_file)
with open('Liverpool_matches.json', 'w') as f:
f.write(json.dumps(data['Liverpool']['Results'],
sort_keys=True, indent=4, separators=(',', ': ')))
There are sevaral ways to handle json format:
json.dumps(obj)
: encodes obj
to a JSON formatted stringjson.dump(JSON_formatted_string, file)
: writes into a filejson.loads(JSON_formatted_string)
: converts a JSON formatted string into a python objectjson.load(file)
: reads the content of file
to a python object (it can be a complex python data)More details in python docs.
We will run python codes as standalone programs!
sys
module¶Write a python code and save it with the .py
extension.
Your OS can recongise it as a python program or you can run with an interpreter.
You can communicate with your program via input()
function or with command line arguments.
Our very first program will write out the number of command line arguments and the list of them.
The first element in this list is the name of the program file. The others are optional.
To do this we have to import sys
first. The list sys.argv
will store these parameters. Important note: this is a list of strings! Even the numbers are stored as strings, so using them as numbers, we have to convert them.
Save the followings into a file named cli.py and run from command line.
import sys
print('Number of arguments:', len(sys.argv))
print('List of arguments: ', sys.argv)
!python3 cli.py arg1 arg2
The !
tells the notebook to run in command line, not here as a python code.
You can use the values in sys.argv
and we call them positional parameters since you can refer to them by their place in the list sys.argv
.
Exercise: calculate the power of a number. Write a python program which have two command line arguments: base and exponent.
If the numbers are integers then calculate as integers, otherwise calculate with floats.
Save the followings into a file named power.py.
import sys
def is_intstring(s):
try:
int(s)
return True
except ValueError:
return False
a = []
for i in range(1,3):
if is_intstring(sys.argv[i]):
a.append(int(sys.argv[i]))
else:
a.append(float(sys.argv[i]))
print(a[0] ** a[1])
This is how to run it:
!python3 power.py 4.2 3
!python3 power.py 2 100