Let us consider a list of $n$ different real numbers and we want to find a given number in this list. If we iterate over the elements to find a particular one then it takes as much comparisons as the length of the list at most. So the necessary number of operations is a linear function of $n$, that is it can be estimated above by a constant times $n$. We express it with saying that the time complexity of this algorithm is linear, or $O(n)$ (this is the so called big O notation)
But you can do better with the binary search if the list is already sorted. The times complexity of it is $O(\log(n))$.
Binary search
import sys
def binary_search(l, x):
a = 0
b = len(l)-1
while a <= b:
i = (a + b)//2
print(i, end=' ', file=sys.stderr)
if l[i] < x:
a = i+1
elif l[i] > x:
b = i-1
else:
print(file=sys.stderr)
return i
print(file=sys.stderr)
return None
stderr
(short for standard error) is the default file descriptor where a process can write error messages. In jupyter we can separate the stderr
messages from the stdout
(short for standard output) messages. The background color of stdout
messages is different. We will use it to trace midle term indexes. Writing to stdout
we import the sys
module.
l = [3, 10, -1, 3.14, 20, 2, 11, 12, 1]
l.sort()
print(l)
b = binary_search(l, 10)
print(b, l[b])
[-1, 1, 2, 3, 3.14, 10, 11, 12, 20] 5 10
4 6 5
b = binary_search(l, 13)
print(b)
None
4 6 7 8
This search takes at most ceiling (upper integer part) of $\log_2(n)$ steps, since the list get halved at every step.
for i in range(-1, 11):
print(binary_search(list(range(10)), i), end=" ")
None 0 1 2 3 4 5 6 7 8 9 None
4 1 0 4 1 0 4 1 4 1 2 4 1 2 3 4 4 7 5 4 7 5 6 4 7 4 7 8 4 7 8 9 4 7 8 9
In this algoritm the numbers have to be pre-sorted. A question: how could we manage this list if we can insert or remove some elements.
Let us write a class for storing the elements in a sorted list. The class will keep the order of the elements by inserting a new element to its correct place. (There is no problem when we remove an element.)
class OrderedList:
def __init__(self, elements=[]):
self.l = sorted(elements)
def search(self, x):
a = 0
b = len(self.l)-1
while a <= b:
i = (a + b)//2
if self.l[i] < x:
a = i+1
elif self.l[i] > x:
b = i-1
else:
return i
return None
def insert(self, x):
a = 0
b = len(self.l)-1
while a <= b:
i = (a + b)//2
if self.l[i] < x:
a = i+1
elif self.l[i] > x:
b = i-1
else:
return
self.l.insert(a, x)
def erase(self, index):
if index >= 0 and index < len(self.l):
del self.l[index]
def __iter__(self):
return iter(self.l)
def __repr__(self):
return str(self.l)
l = OrderedList([1, -1, 3])
print(l)
l.insert(5)
print(l)
l.erase(2)
print(l)
l.insert(3.14)
l.insert(0)
print(l)
print(l.search(1))
[-1, 1, 3] [-1, 1, 3, 5] [-1, 1, 5] [-1, 0, 1, 3.14, 5] 2
A binary tree is a directed graph which has a root (a vertex with 0 indegree) and where every node have at most two outgoing edges (that is the outdegree is 0, 1 or 2 for every vertex). These 1 or 2 neighbours sometimes called child of the node.
We will implement a tree for storing numbers keeping the order in a similar way than in the previous ordered list.
class Tree:
def __init__(self, root):
self.root = root
self.left = None
self.right = None
def insert(self, data):
if self.root > data:
if self.left is None:
self.left = Tree(data)
else:
self.left.insert(data)
elif self.root < data:
if self.right is None:
self.right = Tree(data)
else:
self.right.insert(data)
This implementation keeps the tree sorted in a way that smaller numbers always go to the left and greater numbers to the right.
When we initialize a new tree, we add two children to it with the value None
.
If you want to insert an elements then do the following.
None
then insert a new node there.insert
method to that node recursively.No we add a to_list()
method which retrieves the elements in order.
to_list
on the left subtree (recursive)to_list
on the right subtree (recursive)The result is the list of ordered elements.
class Tree:
def __init__(self, root):
self.root = root
self.left = None
self.right = None
def insert(self, data):
if self.root > data:
if self.left is None:
self.left = Tree(data)
else:
self.left.insert(data)
elif self.root < data:
if self.right is None:
self.right = Tree(data)
else:
self.right.insert(data)
def to_list(self):
s = []
if self.left is not None:
s += self.left.to_list()
s.append(self.root)
if self.right is not None:
s += self.right.to_list()
return s
tree = Tree(5)
tree.insert(4)
tree.insert(1)
tree.insert(7)
print(tree.to_list())
[1, 4, 5, 7]
5
/ \
4 7
/
1
This is good for storing the elements ordered and also you can find and add elements with $O(\log(n))$ time complexity.
Although this technique is slow if the tree is very uneven (unbalanced). The actual time complexity is depending on the depth (height) of the tree. The tree will be very unbalanced if the element come in order.
tree = Tree(5)
tree.insert(4)
tree.insert(3)
tree.insert(2)
tree.insert(1)
print(tree.to_list())
[1, 2, 3, 4, 5]
5
/
4
/
3
/
2
/
1
There are self-balancing binary trees (AVL trees) and you will learn about them on Algorithm theory class.
Now we implement the find(data)
method, which finds the subtree with data
as a root. It is quite straight forward.
class Tree:
def __init__(self, root):
self.root = root
self.left = None
self.right = None
def insert(self, data):
if self.root > data:
if self.left is None:
self.left = Tree(data)
else:
self.left.insert(data)
elif self.root < data:
if self.right is None:
self.right = Tree(data)
else:
self.right.insert(data)
def to_list(self):
s = []
if self.left is not None:
s += self.left.to_list()
s.append(self.root)
if self.right is not None:
s += self.right.to_list()
return s
def find(self, data):
if data == self.root:
return self
elif data < self.root and self.left is not None:
return self.left.find(data)
elif data > self.root and self.right is not None:
return self.right.find(data)
else:
return None
tree = Tree(5)
tree.insert(3)
tree.insert(6)
tree.insert(4)
tree.insert(1)
tree.insert(2)
print(tree.find(3).root)
print(tree.find(3).to_list())
3 [1, 2, 3, 4]
5
/ \
3 6
/ \
1 4
\
2
How could we delete a node? It is simple if the node is a leaf (has no child), or if it has one child only: delete it and substitute the child if it has at all. If it has two children, we search the smallest value greater than this node. Delete that smallest value from the right subtree, and write this value into the place of the node.
class Tree:
def __init__(self, root):
self.root = root
self.left = None
self.right = None
def insert(self, data):
if self.root > data:
if self.left is None:
self.left = Tree(data)
else:
self.left.insert(data)
elif self.root < data:
if self.right is None:
self.right = Tree(data)
else:
self.right.insert(data)
def to_list(self):
s = []
if self.left is not None:
s += self.left.to_list()
s.append(self.root)
if self.right is not None:
s += self.right.to_list()
return s
def minValue(self):
current = self
while current.left is not None:
current = current.left
return current.root
def deleteNode(tree, node):
if node < tree.root:
tree.left = deleteNode(tree.left, node)
elif node > tree.root:
tree.right = deleteNode(tree.right, node)
else:
if tree.left is None:
temp = tree.right
tree = None
return temp
elif tree.right is None:
temp = tree.left
tree = None
return temp
minv = tree.right.minValue()
tree.root = minv
tree.right = deleteNode(tree.right, minv)
return tree
tree = Tree(5)
tree.insert(3)
tree.insert(1)
tree.insert(4)
print(tree.minValue())
tree = deleteNode(tree, 3)
print(tree.to_list())
1 [1, 4, 5]
5 5
/ /
3 => 4
/ \ /
1 4 1
tree = Tree(5); tree.insert(4); tree.insert(1)
tree.insert(8); tree.insert(6); tree.insert(9)
print(tree.root, tree.to_list())
tree = deleteNode(tree, 5)
print(tree.root, tree.to_list())
tree = deleteNode(tree, 6)
tree = deleteNode(tree, 9)
print(tree.root, tree.to_list())
tree = deleteNode(tree, 1)
tree = deleteNode(tree, 8)
print(tree.root, tree.to_list())
5 [1, 4, 5, 6, 8, 9] 6 [1, 4, 6, 8, 9] 8 [1, 4, 8] 4 [4]
5 6 8 8 8 4
/ \ / \ / \ / /
4 8 => 4 8 => 4 9 => 4 => 4 =>
/ / \ / \ / /
1 6 9 1 9 1 1
A very useful application of binary trees are expression trees where the nodes are operations and the leaves are numbers.
If a node is an operation, then store the operation in self.data
and the two branches are the two operands.
The leaves are numbers without outgoing edges.
class Node:
def __init__(self, data):
self.data = data
def calculate(self):
if self.data == "+":
return self.left.calculate() + self.right.calculate()
elif self.data == "*":
return self.left.calculate() * self.right.calculate()
else:
return self.data
def __str__(self):
if self.data == "+" or self.data == "*":
return f"({self.left.__str__()}){self.data}({self.right.__str__()})"
else:
return str(self.data)
def __repr__(self):
return self.__str__()
(7*(2+3))+(-3)
+ x0
/ \ / \
* -3 x1 x4
/ \ / \
7 + x2 x3
/ \ / \
2 3 x5 x6
x0 = Node("+"); x1 = Node("*"); x3 = Node("+")
x2 = Node(7); x4 = Node(-3); x5 = Node(2); x6 = Node(3)
x0.left = x1; x0.right = x4
x1.left = x2; x1.right = x3
x3.left = x5; x3.right = x6
print(x0.calculate())
x0
32
((7)*((2)+(3)))+(-3)
x1
(7)*((2)+(3))
x5
2