Data structures

Arrays

In Python the easiest way to implement a 2D array is a list of lists. For a 3D array: list of lists of lists and so on...

In [1]:
M = [[1, 2], [3, 4]]

You can access the emelents with multiple bracket [ ] indexing:

In [2]:
M[0][1]
Out[2]:
2
In [3]:
M = [[[1, 2], [3, 4]], [[5, 6], [7, 8]], [[0, 9], [0, 0]]]  # 3x2x2 array
M[2][0][1]
Out[3]:
9

Exercise: Write a function that prints a 2D array in a tabular like format:

1   2
3   4
In [4]:
def array_print(M):
    for i in range(len(M)):
        for j in range(len(M[i])):
            print(M[i][j], end='\t')   # TAB character
        print()
In [5]:
array_print([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
1	2	3	
4	5	6	
7	8	9	
In [6]:
array_print([[1, 2, 3, 3.5], [4, 5, 6, 6.5], [7, 8, 9, 9.5]])
1	2	3	3.5	
4	5	6	6.5	
7	8	9	9.5	

Embedded data structures

Exercise: Let's say that a store would like to set up a discount system for regular customers. The store's database has the name of the customers (a unique string) and their shopping costs so far. The customers are stored in a list, every element of the list is a pairs: their name and their list of shopping costs. For example one customer entry would look like this:

["Anna", [54, 23, 12, 56, 12, 71]]

The discounts are calculated in the following way:

Total shoppings > 200: $10\%$

Total shoppings > 500: $15\%$

Total shoppings > 1000: $20\%$

Otherwise no discount ($0\%$).

Write a function that calculates the discount for every customer.

  • The input is the list of customer entries
  • The output is the list of the discounts in a length 2 list: name and the discount
  • For example a list of these: ["Anna", 10]

How to begin? Break down to subtasks!

  1. Decide the discount from the total shopping cost ( 228 -> 10 )
  2. calculate the total shopping cost (sum the list of costs)
  3. Do this for every customer and return the results in a result list

There are two ways to achive this (design pattern):

  • top-down: first write the final function assuming that the smaller subtask (functions) are already done, then do the smaller tasks
  • bottom-up: write the smaller subtasks first, then work your way up to the final task
In [7]:
# top-down


def discount(customers):
    result = []
    for customer in customers:
        result.append(calculate_discount(customer))
    return result


def calculate_discount(customer):
    name = customer[0]
    total_cost = 0
    for shopping in customer[1]:
        total_cost += shopping
    return [name, discount_from_total(total_cost)]


def discount_from_total(total):
    if total > 1000:
        return 20
    if total > 500:
        return 15
    if total > 200:
        return 10
    return 0
In [8]:
discount([["Anna", [54, 23, 12, 56, 12, 71]],
          ["Bill", [11, 3, 12, 1, 12, 55]],
          ["Hagrid", [111, 545, 343, 56, 12, 66]],
          ["Not_a_wizard", [54, 222, 65, 56, 43, 71]]])
Out[8]:
[['Anna', 10], ['Bill', 0], ['Hagrid', 20], ['Not_a_wizard', 15]]

Tuple

We have already seen strings and lists which are similar to the tuple.

An n-tuple can be created with a comma separeted list in a parenthesis or with the tuple() function:

In [9]:
t = (1, 5, 6, 2, 1)
print(t[2])
type(t)
6
Out[9]:
tuple
In [10]:
l = [1, 2, 3]
t = tuple(l)
print(t)
(1, 2, 3)

Use for loop as before:

In [11]:
for e in t:
    print(e, end=" ")
1 2 3 

But you cannot assign individual elements (like the string)

In [12]:
t[1] = 4
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-87b0f225887f> in <module>
----> 1 t[1] = 4

TypeError: 'tuple' object does not support item assignment

Parenthesis is also used for grouping operations (like in $2\times(3+4)$), but that is different from the parenthesis of the tuple. Also in some cases you don't have to write the parenthesis at all.

In [13]:
x = 2, 3, 4
print(x)
(2, 3, 4)
In [14]:
x, y = 2, 3
print(x)
print(y)
2
3

1-tuples are different than a single object in a parenthesis, you can construct a 1-tuple with an ending comma inside a parenthesis.

In [15]:
print(type((1)))
print(type((1,)))
<class 'int'>
<class 'tuple'>

Mutable and immutable types

The tuple is almost identical to the list except that it is immutable: you cannot assign a single element. The list is mutable.

You cannot change a tuple once created, except creating a new one, like in case of strings:

In [16]:
s = ("h", "e", "l", "l", "o")
print(s[2])
s = ("h", "a", "l", "l", "o")

string = "puppy"
string2 = string[:2] + "ff" + string[4:]
print(string2)
l
puffy
In [17]:
for e in s:
    print(e, end=' ')
h a l l o 
In [18]:
s[1] = "e"
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-18-6ca1c1f7962b> in <module>
----> 1 s[1] = "e"

TypeError: 'tuple' object does not support item assignment

Dictionary

A dictionary is a series of pairs, we call them key-value pairs. A dictionary can have any type of key and any type of value, but the key have to be immutable.

You can contruct dictionaries with a curly bracket { } or with the dict() function.

In [19]:
d = {"puppy": 5}
type(d)
Out[19]:
dict

You can add a pair to the dictionary by assigning the value to the bracketed key.

In [20]:
d = {}             # empty dictionary
d["cat"] = [1, 5]  # add a new key-value pair
d["puppy"] = 3
In [21]:
d
Out[21]:
{'cat': [1, 5], 'puppy': 3}

You can access the elements by their keys in a bracket.

In [22]:
d["cat"]
Out[22]:
[1, 5]

You can use different type of keys (as long as they are immutable):

In [23]:
d = dict()
d[5] = 7
d[(1, 5)] = "puppy"
d[(1, 5)] = "snake"
d["cat"] = 3.14
print(d)
{5: 7, (1, 5): 'snake', 'cat': 3.14}

A for loop iterates over the keys:

In [24]:
for key in d:
    print(key, ":", d[key])
5 : 7
(1, 5) : snake
cat : 3.14
In [25]:
customers = {"Anna": [54, 23, 12, 130],
             "Bill": [11, 3, 12, 1, 12, 55],
             "Hagrid": [111, 545, 343, 56, 12, 66],
             "Not_a_wizard": [54, 222, 165, 56]}
print(customers)
print()
print(customers["Bill"])
{'Anna': [54, 23, 12, 130], 'Bill': [11, 3, 12, 1, 12, 55], 'Hagrid': [111, 545, 343, 56, 12, 66], 'Not_a_wizard': [54, 222, 165, 56]}

[11, 3, 12, 1, 12, 55]

Exercise: Write a program, which save the discount for every costumers! Rewrite the previous one!

In [26]:
def discount(customers):
    for customer in customers:
        customers[customer] = {"shl": customers[customer]}
        customers[customer]["d"] = calculate_disc(customers[customer]["shl"])
        print(customer, customers[customer]["d"])

def calculate_disc(shl):
    total_cost = 0
    for shopping in shl:
        total_cost += shopping
    return discount_from_total(total_cost)

def discount_from_total(total):
    if total > 1000:
        return 20
    if total > 500:
        return 15
    if total > 200:
        return 10
    return 0
In [27]:
discount(customers)
Anna 10
Bill 0
Hagrid 20
Not_a_wizard 10
In [28]:
customers.keys()
Out[28]:
dict_keys(['Anna', 'Bill', 'Hagrid', 'Not_a_wizard'])
In [29]:
customers.values()
Out[29]:
dict_values([{'shl': [54, 23, 12, 130], 'd': 10}, {'shl': [11, 3, 12, 1, 12, 55], 'd': 0}, {'shl': [111, 545, 343, 56, 12, 66], 'd': 20}, {'shl': [54, 222, 165, 56], 'd': 10}])
In [30]:
for i in customers.values():
    print(i)
{'shl': [54, 23, 12, 130], 'd': 10}
{'shl': [11, 3, 12, 1, 12, 55], 'd': 0}
{'shl': [111, 545, 343, 56, 12, 66], 'd': 20}
{'shl': [54, 222, 165, 56], 'd': 10}

Hash function

The dictionary uses a so-called hash function to calculate where to put the elements. In this way it it can find every key-value quickly.

You can observe the hash values yourself with the hash() function:

In [31]:
print(hash((1, 5)))
print(hash(5), hash(0), hash(False), hash(True))
print(hash((5,)))
print(hash("puppy"))
print(hash("puppz"))
print(hash("muppy"))
print(hash("puppy, cat, python, galahad ..."))
3713081631939823281
5 0 0 1
3430023387570
-5006443583613666025
2966548838902777491
7008207655977323352
-4478024969822308348

The dictionary uses the hash function, but a similar function is common in both theoretical and applied computer science. There are advanced algorithm courses about hash functions.

The hash() functions are important in other areas of computer science Advances Datastructures and Techniques for Analysis of Algorithms, Katalin Friedl, Gyula Katona.

What is a hash function:

  • assign a natural number to every piece of data (the value has a fixed width bit representation: 64, 128, 256 ... )
    • it is a function so it assigns the same value to the same thing (every time).
  • Sort-of unique meaning that there are a few different data with the same hash value
  • it should be calculated quickly

There may be extra requirements in other applications (cryptographic hash function):

  • infeasable to invert (find out the data from the hash), not impossible but rather slow
  • (pseudo)random and non-continuous
  • infeasable to mimic a given data (find a piece of data to match a given hash value)

Applications

  • checksum (file hash)
  • indexing, fast access (dict)
  • pseudo random number generators (cryptography)

Iterating and functions of containers

A data type is iterable if it can be iterated over with a for loop. Example:

  • list
  • string (characters)
  • tuple
  • dict
    for x in <iterable>:
        <do stuff>

There are some useful functions that can be used on any iterables:

In [32]:
# repetition
print("puppy "*3)
print((1, 2, 3)*3)
print([1, 2, 3]*3)
puppy puppy puppy 
(1, 2, 3, 1, 2, 3, 1, 2, 3)
[1, 2, 3, 1, 2, 3, 1, 2, 3]
In [33]:
# concatenation (except for dict)
(1, 2) + (2, 4, 6)
Out[33]:
(1, 2, 2, 4, 6)
In [34]:
# universal (boolean and)
print(all((False, True, True, True)))
print(all((0, 1, 1, 1)))
False
False
In [35]:
# existential (boolean or)
any((0, 1, 1, 1))
Out[35]:
True
In [36]:
# "transpose"
zip([1, 2, 3], [11, 12, 13])
Out[36]:
<zip at 0x7f8dbdaacf08>
In [37]:
list(_)
Out[37]:
[(1, 11), (2, 12), (3, 13)]
In [38]:
array_print(_)    # use our previously written function
1	11	
2	12	
3	13	
In [39]:
# sum (for numbers, not for strings)
sum((1, 2, 3))
Out[39]:
6
In [ ]: