Data structures

Arrays (from lists)

In Python the easiest way to implement a 2D array is a list of lists. For a 3D array: list of lists of lists and so on...

In [1]:
M = [[1, 2], [3, 4]]

You can access the emelents with multiple bracket [ ] indexing:

In [2]:
print(M[0], M[0][1])
[1, 2] 2
In [3]:
# a 3x2x2 array
M = [[[1, 2], [3, 4]], [[5, 6], [7, 8]], [[0, 9], [0, 0]]]
M[2][0][1]
Out[3]:
9

Exercise: Write a function that prints a 2D array in a tabular like format:

1   2
3   4
In [4]:
def array_print(M):
    for i in range(len(M)):
        for j in range(len(M[i])):
            print(M[i][j], end='\t')   # TAB character
        print()

Be careful, M[i, j] instead of M[i][j] does not work here!

In [5]:
array_print([[1, 2, 3, 3.5], [4, 5, 6, 6.5], [7, 8, 9, 9.5]])
1	2	3	3.5	
4	5	6	6.5	
7	8	9	9.5	

Embedded data structures

Exercise: Let's say that a store would like to set up a discount system for regular customers. The store's database has the name of the customers (a unique string) and their shopping costs so far. The customers are stored in a list, every element of the list is a pair: their name and their list of shopping costs. For example one customer entry would look like this:

["Anna", [54, 23, 12, 56, 12, 71]]

The discounts are calculated in the following way:

Total shoppings > 200: $10\%$

Total shoppings > 500: $15\%$

Total shoppings > 1000: $20\%$

Otherwise no discount ($0\%$).

Write a function that calculates the discount for every customer.

  • The input is the list of customer entries
  • The output is the list of the discounts in a length 2 list: name and the discount
  • For example a list of these: ["Anna", 10]

How to begin? Break down to subtasks!

  1. Decide the discount from the total shopping cost ( 228 -> 10 )
  2. calculate the total shopping cost (sum the list of costs)
  3. Do this for every customer and return the results in a result list

There are two ways to achive this (design pattern):

  • top-down: first write the final function assuming that the smaller subtask (functions) are already done, then do the smaller tasks
  • bottom-up: write the smaller subtasks first, then work your way up to the final task
In [6]:
# top-down

def discount(customers):
    result = []
    for customer in customers:
        result.append(calculate_discount(customer))
    return result

def calculate_discount(customer):
    name = customer[0]
    total_cost = 0
    for shopping in customer[1]:
        total_cost += shopping
    return [name, discount_from_total(total_cost)]

def discount_from_total(total):
    if total > 1000:
        return 20
    if total > 500:
        return 15
    if total > 200:
        return 10
    return 0
In [7]:
discount([["Anna", [54, 23, 12, 56, 12, 71]],
          ["Bill", [11, 3, 12, 1, 12, 55]],
          ["Hagrid", [111, 545, 343, 56, 12, 66]],
          ["Not_a_wizard", [54, 222, 65, 56, 43, 71]]])
Out[7]:
[['Anna', 10], ['Bill', 0], ['Hagrid', 20], ['Not_a_wizard', 15]]

Tuple

We have already seen strings and lists which are similar to the tuple.

An n-tuple can be created with a comma separeted list in a parenthesis or with the tuple() function:

In [8]:
t = (1, 5, 6, 2, 1)
print(t[2])
type(t)
6
Out[8]:
tuple
In [9]:
l = [1, 2, 3]
t = tuple(l)
print(t)
(1, 2, 3)

Tuple is iterable, so you may use for loop as before:

In [10]:
for e in t:
    print(e, end=" ")
1 2 3 

But you cannot assign individual elements (like the string)

In [11]:
t[1] = 4   # not allowed, tuple is immutable
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-11-2183a780838b> in <module>
----> 1 t[1] = 4   # not allowed, tuple is immutable

TypeError: 'tuple' object does not support item assignment

Parenthesis is also used for grouping operations (like in $2*(3+4)$), but that is different from the parenthesis of the tuple. Also in some cases you don't have to write the parenthesis at all.

In [12]:
# a tuple
x = 2, 3, 4
print(x)
(2, 3, 4)
In [13]:
# two assigments in one line, no tuples
x, y = 2, 3
print(x)
print(y)
2
3

1-tuples are different than a single object in a parenthesis, you can construct a 1-tuple with an ending comma inside a parenthesis.

In [14]:
print(type((1)))    # this is a number
print(type((1,)))   # this is a onelement tuple
print(type(()))     # this is an empty tuple
<class 'int'>
<class 'tuple'>
<class 'tuple'>

Mutable and immutable types

The tuple is almost identical to the list except that it is immutable: you cannot assign a single element. The list is mutable.

You cannot change a tuple once created, except creating a new one, like in case of strings:

In [15]:
s = ("h", "e", "l", "l", "o")
print(s[1])
s = s[:1] + ("a",) + s[2:]
s
e
Out[15]:
('h', 'a', 'l', 'l', 'o')
In [16]:
tuple("hallo")
Out[16]:
('h', 'a', 'l', 'l', 'o')
In [17]:
for e in s:
    print(e, end=' ')
h a l l o 
In [18]:
s[1] = "e"   
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-18-120e0203faa7> in <module>
----> 1 s[1] = "e"

TypeError: 'tuple' object does not support item assignment

Dictionary

A dictionary is a series of pairs, we call them key-value pairs. A dictionary can have any type of key and any type of value, but the key have to be immutable.

You can contruct dictionaries with a curly bracket { } or with the dict() function.

In [19]:
# how many animals do you have?
d = {"puppy": 2, "goldfish": 6}
type(d)
Out[19]:
dict

You can access the elements by their keys in a bracket.

In [20]:
d["puppy"]
Out[20]:
2

You can add a pair to the dictionary by assigning the value to the bracketed key.

In [21]:
d["cat"] = 1    # add a new key-value pair
d
Out[21]:
{'puppy': 2, 'goldfish': 6, 'cat': 1}
In [22]:
d = {}          # empty dictionary
di = dict()     #
In [23]:
d
Out[23]:
{}
In [24]:
di
Out[24]:
{}

You can use different type of keys (as long as they are immutable):

In [25]:
d = dict([[5, "five"], ["pi", 3.14159]])
d[(1, 5)] = "tuple"
d["today"] = "Monday"
print(d)
{5: 'five', 'pi': 3.14159, (1, 5): 'tuple', 'today': 'Monday'}

A for loop iterates over the keys:

In [26]:
for key in d:
    print(key, ":", d[key])
5 : five
pi : 3.14159
(1, 5) : tuple
today : Monday
In [27]:
customers = {"Anna": [54, 23, 12, 130],
             "Bill": [11, 3, 12, 1, 12, 55],
             "Hagrid": [111, 545, 343, 56, 12, 66],
             "Not_a_wizard": [54, 222, 165, 56]}
print(customers)
print()
print(customers["Bill"])
{'Anna': [54, 23, 12, 130], 'Bill': [11, 3, 12, 1, 12, 55], 'Hagrid': [111, 545, 343, 56, 12, 66], 'Not_a_wizard': [54, 222, 165, 56]}

[11, 3, 12, 1, 12, 55]

Exercise: Write a program, which save the discount for every costumers! Rewrite the previous one!

In [28]:
def discount(customers):
    for customer in customers:
        customers[customer] = {"sh": customers[customer]}
        customers[customer]["d"] = calculate_disc(customers[customer]["sh"])
        print(customer, customers[customer]["d"])

def calculate_disc(sh):
    total_cost = 0
    for shopping in sh:
        total_cost += shopping
    return discount_from_total(total_cost)

def discount_from_total(total):
    if total > 1000:
        return 20
    if total > 500:
        return 15
    if total > 200:
        return 10
    return 0
In [29]:
discount(customers)
Anna 10
Bill 0
Hagrid 20
Not_a_wizard 10
In [30]:
customers.keys()
Out[30]:
dict_keys(['Anna', 'Bill', 'Hagrid', 'Not_a_wizard'])
In [31]:
customers.values()
Out[31]:
dict_values([{'sh': [54, 23, 12, 130], 'd': 10}, {'sh': [11, 3, 12, 1, 12, 55], 'd': 0}, {'sh': [111, 545, 343, 56, 12, 66], 'd': 20}, {'sh': [54, 222, 165, 56], 'd': 10}])
In [32]:
for i in customers.values():
    print(i)
{'sh': [54, 23, 12, 130], 'd': 10}
{'sh': [11, 3, 12, 1, 12, 55], 'd': 0}
{'sh': [111, 545, 343, 56, 12, 66], 'd': 20}
{'sh': [54, 222, 165, 56], 'd': 10}

Hash function

The dictionary uses a so-called hash function to calculate where to put the elements. In this way it can find every key-value quickly.

You can observe the hash values yourself with the hash() function:

In [33]:
print(hash((1, 5)))
print(hash(5), hash(0), hash(False), hash(True))
print(hash((5,)))
print(hash("puppy"))
print(hash("puppz"))
print(hash("muppy"))
3713081631939823281
5 0 0 1
3430023387570
-2961552395247026828
-8885092286189886893
313590609254500820

Every immutable object is hashable (that is the hash() function can be applied on it)!

In [34]:
hash([1, 2])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-34-4b420d0158ba> in <module>
----> 1 hash([1, 2])

TypeError: unhashable type: 'list'

The dictionary uses the hash function, but a similar function is common in both theoretical and applied computer science. There are advanced algorithm courses about hash functions.

The hash() functions are important in other areas of computer science Advances Datastructures and Techniques for Analysis of Algorithms, Katalin Friedl, Gyula Katona.

What is a hash function:

  • assign a natural number to every piece of data (the value has a fixed width bit representation: 64, 128, 256 ... )
    • it is a function so it assigns the same value to the same thing (every time).
  • Sort-of unique meaning that there are a few different data with the same hash value
  • it should be calculated quickly

There may be extra requirements in other applications (cryptographic hash function):

  • infeasable to invert (find out the data from the hash), not impossible but rather slow
  • (pseudo)random and non-continuous
  • infeasable to mimic a given data (find a piece of data to match a given hash value)

Applications

  • checksum (file hash)
  • indexing, fast access (dict)
  • pseudo random number generators (cryptography)

Iterating over and functions of builtin containers

A data type is iterable if an object of that type is capable of returning its members one at a time, that is it can be iterated over with a for loop. Example:

  • list
  • string (characters)
  • tuple
  • dict
for x in <iterable>:
        <do stuff>

There are some useful operations that can be used on some iterables:

In [35]:
# repetition (except for dict)
print("puppy "*3)
print((1, 2, 3)*3)
print([1, 2, 3]*3)
puppy puppy puppy 
(1, 2, 3, 1, 2, 3, 1, 2, 3)
[1, 2, 3, 1, 2, 3, 1, 2, 3]
In [36]:
# concatenation (except for dict)
(1, 2) + (2, 4, 6)
Out[36]:
(1, 2, 2, 4, 6)

There are some useful functions that can be used on iterables:

sum: sum the contents of an iterable.

sorted: return a list of the sorted contents of an interable

any: returns True if any item is True and stops the iteration

all: returns True if all items are True in the iterable

max: return the largest value

min: return the smallest value

In [37]:
# universal (boolean and)
all((False, True, True)) # True if all is true 
Out[37]:
False
In [38]:
all((0, 1, 1))
Out[38]:
False
In [39]:
# existential (boolean or)
any((0, 1, 1))    # True if any one of it is True
Out[39]:
True
In [40]:
# these are False values (it would be True if any of them were True)
any((0, 0.0, None, False, [], {}, (), ""))
Out[40]:
False
In [41]:
# these are all True values
all((1, 5.2, True, [0], [False], {0:0}, (0,), "a"))
Out[41]:
True
In [42]:
# sum (for numbers, not for strings)
sum((1, 2, 3))
Out[42]:
6
In [43]:
min([4, 1, 2, 3])
Out[43]:
1
In [44]:
sorted((3, 1, 2))
Out[44]:
[1, 2, 3]
In [45]:
sorted("alphabet")
Out[45]:
['a', 'a', 'b', 'e', 'h', 'l', 'p', 't']
In [ ]: