{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Data structures\n",
"\n",
"## Arrays (from lists)\n",
"\n",
"In Python the easiest way to implement a 2D array is a list of lists.\n",
"For a 3D array: list of lists of lists and so on..."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"M = [[1, 2], [3, 4]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can access the emelents with multiple bracket [ ]
indexing:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(M[0], M[0][1])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# a 3x2x2 array\n",
"M = [[[1, 2], [3, 4]], [[5, 6], [7, 8]], [[0, 9], [0, 0]]]\n",
"M[2][0][1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Exercise:** Write a function that prints a 2D array in a tabular like format:\n",
"\n",
" 1 2\n",
" 3 4\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def array_print(M):\n",
" for i in range(len(M)):\n",
" for j in range(len(M[i])):\n",
" print(M[i][j], end='\\t') # TAB character\n",
" print()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Be careful, ```M[i, j]``` instead of ```M[i][j]``` does not work here!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"array_print([[1, 2, 3, 3.5], [4, 5, 6, 6.5], [7, 8, 9, 9.5]])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Embedded data structures\n",
"\n",
"**Exercise:** Let's say that a store would like to set up a discount system for regular customers.\n",
"The store's database has the name of the customers (a unique string) and their shopping costs so far.\n",
"The customers are stored in a list, every element of the list is a pair: their name and their list of shopping costs.\n",
"For example one customer entry would look like this:\n",
"\n",
"[\"Anna\", [54, 23, 12, 56, 12, 71]]
\n",
"\n",
"The discounts are calculated in the following way:\n",
"\n",
"Total shoppings > 200: $10\\%$\n",
"\n",
"Total shoppings > 500: $15\\%$\n",
"\n",
"Total shoppings > 1000: $20\\%$\n",
"\n",
"Otherwise no discount ($0\\%$).\n",
"\n",
"Write a function that calculates the discount for every customer.\n",
"* The input is the list of customer entries\n",
"* The output is the list of the discounts in a length 2 list: name and the discount\n",
"* For example a list of these: [\"Anna\", 10]
\n",
"\n",
"How to begin? Break down to subtasks!\n",
"1. Decide the discount from the total shopping cost ( `228 -> 10` )\n",
"1. calculate the total shopping cost (sum the list of costs)\n",
"1. Do this for every customer and return the results in a result list\n",
"\n",
"There are two ways to achive this ([design pattern](https://en.wikipedia.org/wiki/Top-down_and_bottom-up_design#Programming)):\n",
"* **top-down**: first write the final function assuming that the smaller subtask (functions) are already done,\n",
" then do the smaller tasks\n",
"* **bottom-up**: write the smaller subtasks first,\n",
" then work your way up to the final task"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# top-down\n",
"\n",
"def discount(customers):\n",
" result = []\n",
" for customer in customers:\n",
" result.append(calculate_discount(customer))\n",
" return result\n",
"\n",
"def calculate_discount(customer):\n",
" name = customer[0]\n",
" total_cost = 0\n",
" for shopping in customer[1]:\n",
" total_cost += shopping\n",
" return [name, discount_from_total(total_cost)]\n",
"\n",
"def discount_from_total(total):\n",
" if total > 1000:\n",
" return 20\n",
" if total > 500:\n",
" return 15\n",
" if total > 200:\n",
" return 10\n",
" return 0"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"discount([[\"Anna\", [54, 23, 12, 56, 12, 71]],\n",
" [\"Bill\", [11, 3, 12, 1, 12, 55]],\n",
" [\"Hagrid\", [111, 545, 343, 56, 12, 66]],\n",
" [\"Not_a_wizard\", [54, 222, 65, 56, 43, 71]]])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Tuple\n",
"We have already seen strings and lists which are similar to the tuple.\n",
"\n",
"An **_n_-tuple** can be created with a comma separeted list in a parenthesis or with the tuple()
function:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"t = (1, 5, 6, 2, 1)\n",
"print(t[2])\n",
"type(t)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"l = [1, 2, 3]\n",
"t = tuple(l)\n",
"print(t)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Tuple is iterable, so you may use for
loop as before:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for e in t:\n",
" print(e, end=\" \")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But you cannot assign individual elements (like the string)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"t[1] = 4 # not allowed, tuple is immutable"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Parenthesis** is also used for grouping operations (like in $2*(3+4)$), but that is different from the parenthesis of the tuple.\n",
"Also in some cases you don't have to write the parenthesis at all."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# a tuple\n",
"x = 2, 3, 4\n",
"print(x)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# two assigments in one line, no tuples\n",
"x, y = 2, 3\n",
"print(x)\n",
"print(y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1-tuples are different than a single object in a parenthesis, you can construct a 1-tuple with an ending comma inside a parenthesis."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(type((1))) # this is a number\n",
"print(type((1,))) # this is a onelement tuple\n",
"print(type(())) # this is an empty tuple"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Mutable and immutable types\n",
"The *tuple* is almost identical to the list except that it is *immutable*: you cannot assign a single element.\n",
"The list is *mutable*.\n",
"\n",
"You cannot change a tuple once created, except creating a new one, like in case of strings:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"s = (\"h\", \"e\", \"l\", \"l\", \"o\")\n",
"print(s[1])\n",
"s = s[:1] + (\"a\",) + s[2:]\n",
"s"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tuple(\"hallo\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for e in s:\n",
" print(e, end=' ')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"s[1] = \"e\" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Dictionary\n",
"A __dictionary__ is a series of pairs, we call them key-value pairs.\n",
"A dictionary can have any type of key and any type of value, but __the key have to be immutable__.\n",
"\n",
"You can contruct dictionaries with a curly bracket { }
or with the dict()
function."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# how many animals do you have?\n",
"d = {\"puppy\": 2, \"goldfish\": 6}\n",
"type(d)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can access the elements by their keys in a bracket."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"d[\"puppy\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can add a pair to the dictionary by assigning the value to the bracketed key."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"d[\"cat\"] = 1 # add a new key-value pair\n",
"d"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"d = {} # empty dictionary\n",
"di = dict() #"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"d"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"di"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can use different type of keys (as long as they are immutable):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"d = dict([[5, \"five\"], [\"pi\", 3.14159]])\n",
"d[(1, 5)] = \"tuple\"\n",
"d[\"today\"] = \"Monday\"\n",
"print(d)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A for
loop iterates over the keys:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for key in d:\n",
" print(key, \":\", d[key])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"customers = {\"Anna\": [54, 23, 12, 130],\n",
" \"Bill\": [11, 3, 12, 1, 12, 55],\n",
" \"Hagrid\": [111, 545, 343, 56, 12, 66],\n",
" \"Not_a_wizard\": [54, 222, 165, 56]}\n",
"print(customers)\n",
"print()\n",
"print(customers[\"Bill\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Exercise:** Write a program, which save the discount for every costumers! Rewrite the previous one!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def discount(customers):\n",
" for customer in customers:\n",
" customers[customer] = {\"sh\": customers[customer]}\n",
" customers[customer][\"d\"] = calculate_disc(customers[customer][\"sh\"])\n",
" print(customer, customers[customer][\"d\"])\n",
"\n",
"def calculate_disc(sh):\n",
" total_cost = 0\n",
" for shopping in sh:\n",
" total_cost += shopping\n",
" return discount_from_total(total_cost)\n",
"\n",
"def discount_from_total(total):\n",
" if total > 1000:\n",
" return 20\n",
" if total > 500:\n",
" return 15\n",
" if total > 200:\n",
" return 10\n",
" return 0"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"discount(customers)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"customers.keys()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"customers.values()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for i in customers.values():\n",
" print(i)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Hash function\n",
"The dictionary uses a so-called **hash** function to calculate where to put the elements.\n",
"In this way it can find every key-value quickly.\n",
"\n",
"You can observe the hash values yourself with the hash()
function:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(hash((1, 5)))\n",
"print(hash(5), hash(0), hash(False), hash(True))\n",
"print(hash((5,)))\n",
"print(hash(\"puppy\"))\n",
"print(hash(\"puppz\"))\n",
"print(hash(\"muppy\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Every immutable object is hashable (that is the ```hash()``` function can be applied on it)!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"hash([1, 2])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The dictionary uses the hash function, but a similar function is common in both theoretical and applied computer science.\n",
"There are advanced algorithm courses about hash functions.\n",
"\n",
"The hash() functions are important in other areas of computer science [Advances Datastructures and Techniques for Analysis of Algorithms, Katalin Friedl, Gyula Katona](http://www.cs.bme.hu/haladoadat/)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What is a hash function:\n",
"* assign a natural number to every piece of data (the value has a fixed width bit representation: 64, 128, 256 ... )\n",
" * it is a function so it assigns the same value to the same thing (every time).\n",
"* Sort-of unique meaning that there are a few different data with the same hash value\n",
"* it should be calculated quickly\n",
"\n",
"There may be extra requirements in other applications ([cryptographic hash function](https://en.wikipedia.org/wiki/Cryptographic_hash_function)):\n",
"* infeasable to invert (find out the data from the hash), not impossible but rather slow\n",
"* (pseudo)random and non-continuous\n",
"* infeasable to mimic a given data (find a piece of data to match a given hash value)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Applications\n",
"* checksum (file hash)\n",
"* indexing, fast access (`dict`)\n",
"* pseudo random number generators (cryptography)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Iterating over and functions of builtin containers\n",
"A data type is **iterable** if an object of that type is capable of returning its members one at a time, that is it can be iterated over with a for
loop.\n",
"Example:\n",
"* list\n",
"* string (characters)\n",
"* tuple\n",
"* dict\n",
"\n",
"```Python\n",
" for x in :\n",
" \n",
"```\n",
"\n",
"There are some useful operations that can be used on some iterables:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# repetition (except for dict)\n",
"print(\"puppy \"*3)\n",
"print((1, 2, 3)*3)\n",
"print([1, 2, 3]*3)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# concatenation (except for dict)\n",
"(1, 2) + (2, 4, 6)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are some useful functions that can be used on iterables:\n",
"\n",
"```sum```: sum the contents of an iterable.\n",
"\n",
"```sorted```: return a list of the sorted contents of an interable\n",
"\n",
"```any```: returns True if any item is True and stops the iteration\n",
"\n",
"```all```: returns True if all items are True in the iterable\n",
"\n",
"```max```: return the largest value\n",
"\n",
"```min```: return the smallest value"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# universal (boolean and)\n",
"all((False, True, True)) # True if all is true "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"all((0, 1, 1))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# existential (boolean or)\n",
"any((0, 1, 1)) # True if any one of it is True"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# these are False values (it would be True if any of them were True)\n",
"any((0, 0.0, None, False, [], {}, (), \"\"))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# these are all True values\n",
"all((1, 5.2, True, [0], [False], {0:0}, (0,), \"a\"))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# sum (for numbers, not for strings)\n",
"sum((1, 2, 3))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"min([4, 1, 2, 3])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sorted((3, 1, 2))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sorted(\"alphabet\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}