Binary trees

Let us consider a list of $n$ different real numbers and we want to find a given number in this list. If we iterate over the elements to find a particular one then it takes as much comparisons as the length of the list at most. So the necessary number of operations is a linear function of $n$, that is it can be estimated above by a constant times $n$. We express it with saying that the time complexity of this algorithm is linear, or $O(n)$ (this is the so called big O notation)

But you can do better with the binary search if the list is already sorted. The times complexity of it is $O(\log(n))$.

Binary search

  • The list is sorted increasingly. Look at the middle term of the list and compare it with the searched number.
    • If the middle is smaller, then the searched number must be in the second half of the list.
      • In this case continue your search in this half of the list.
    • If the middle is greater, then the searched number must be in the first half.
      • In this case continue your search in this half of the list.
    • If they are equal, then you found it.
In [1]:
import sys

def binary_search(l, x):
    a = 0
    b = len(l)-1
    while a <= b:
        i = (a + b)//2
        print(i, end=' ', file=sys.stderr)
        if l[i] < x:
            a = i+1
        elif l[i] > x:
            b = i-1
        else:
            print(file=sys.stderr)
            return i
    print(file=sys.stderr)
    return None

stderr (short for standard error) is the default file descriptor where a process can write error messages. In jupyter we can separate the stderr messages from the stdout (short for standard output) messages. The background color of stdout messages is different. We will use it to trace midle term indexes. Writing to stdout we import the sys module.

In [2]:
l = [3, 10, -1, 3.14, 20, 2, 11, 12, 1]
l.sort()
print(l)
b = binary_search(l, 10)
print(b, l[b])
[-1, 1, 2, 3, 3.14, 10, 11, 12, 20]
5 10
4 6 5 
In [3]:
b = binary_search(l, 13)
print(b)
None
4 6 7 8 

This search takes at most ceiling (upper integer part) of $\log_2(n)$ steps, since the list get halved at every step.

In [4]:
for i in range(-1, 11):
    print(binary_search(list(range(10)), i), end=" ")
None 0 1 2 3 4 5 6 7 8 9 None 
4 1 0 
4 1 0 
4 1 
4 1 2 
4 1 2 3 
4 
4 7 5 
4 7 5 6 
4 7 
4 7 8 
4 7 8 9 
4 7 8 9 

In this algoritm the numbers have to be pre-sorted. A question: how could we manage this list if we can insert or remove some elements.

Let us write a class for storing the elements in a sorted list. The class will keep the order of the elements by inserting a new element to its correct place. (There is no problem when we remove an element.)

In [5]:
class OrderedList:

    def __init__(self, elements=[]):
        self.l = sorted(elements)

    def search(self, x):
        a = 0
        b = len(self.l)-1
        while a <= b:
            i = (a + b)//2
            if self.l[i] < x:
                a = i+1
            elif self.l[i] > x:
                b = i-1
            else:
                return i
        return None

    def insert(self, x):
        a = 0
        b = len(self.l)-1
        while a <= b:
            i = (a + b)//2
            if self.l[i] < x:
                a = i+1
            elif self.l[i] > x:
                b = i-1
            else:
                return
        self.l.insert(a, x)

    def erase(self, index):
        if index >= 0 and index < len(self.l):
            del self.l[index]

    def __iter__(self):
        return iter(self.l)

    def __repr__(self):
        return str(self.l)
In [6]:
l = OrderedList([1, -1, 3])
print(l)
l.insert(5)
print(l)
l.erase(2)
print(l)
l.insert(3.14)
l.insert(0)
print(l)
print(l.search(1))
[-1, 1, 3]
[-1, 1, 3, 5]
[-1, 1, 5]
[-1, 0, 1, 3.14, 5]
2

Binary trees

A binary tree is a directed graph which has a root (a vertex with 0 indegree) and where every node have at most two outgoing edges (that is the outdegree is 0, 1 or 2 for every vertex). These 1 or 2 neighbours sometimes called child of the node.

We will implement a tree for storing numbers keeping the order in a similar way than in the previous ordered list.

In [7]:
class Tree:
    def __init__(self, root):
        self.root = root
        self.left = None
        self.right = None

    def insert(self, data):
        if self.root > data:
            if self.left is None:
                self.left = Tree(data)
            else:
                self.left.insert(data)
        elif self.root < data:
            if self.right is None:
                self.right = Tree(data)
            else:
                self.right.insert(data)

This implementation keeps the tree sorted in a way that smaller numbers always go to the left and greater numbers to the right.

When we initialize a new node, we add two children to it with the value None. If you want to insert an elements then do the following.

  • If the new number is smaller then the current node then you should insert it to the left
    • If the left child is None then insert a new node there.
    • If the left subtree is not empty, then call the insert method to that node recursively.
  • If the new number is greater than the current node, then do the same for the right subtree.
  • If they are equal then you don't have to do anything because the number was already in the tree.

No we add a to_list() method which retrieves the elements in order.

  • First start with an empty list
  • Concatenate the result of to_list on the left subtree (recursive)
  • Append the current element
  • Concatenate the result of to_list on the right subtree (recursive)

The result is the list of ordered elements.

Python Tutor link

In [8]:
class Tree:
    def __init__(self, root):
        self.root = root
        self.left = None
        self.right = None

    def insert(self, data):
        if self.root > data:
            if self.left is None:
                self.left = Tree(data)
            else:
                self.left.insert(data)
        elif self.root < data:
            if self.right is None:
                self.right = Tree(data)
            else:
                self.right.insert(data)

    def to_list(self):
        s = []
        if self.left is not None:
            s += self.left.to_list()
        s.append(self.root)
        if self.right is not None:
            s += self.right.to_list()
        return s
In [9]:
tree = Tree(5)
tree.insert(4)
tree.insert(1)
tree.insert(7)
print(tree.to_list())
[1, 4, 5, 7]
       5
      / \
     4   7
    /
   1

This is good for storing the elements ordered and also you can find and add elements with $O(\log(n))$ time complexity.

Although this technique is slow if the tree is very uneven (unbalanced). The actual time complexity is depending on the depth (height) of the tree. The tree will be very unbalanced if the element come in order.

Python tutor link

In [10]:
tree = Tree(5)
tree.insert(4)
tree.insert(3)
tree.insert(2)
tree.insert(1)
        5
       /
      4
     /
    3
   /
  2
 /
1

There are self-balancing binary trees (AVL trees) and you will learn about them on Algorithm theory class.

Now we implement the find(data) method, which finds the subtree with data as a root. It is quite straight forward.

Python Tutor link

In [11]:
class Tree:
    def __init__(self, root):
        self.root = root
        self.left = None
        self.right = None

    def insert(self, data):
        if self.root > data:
            if self.left is None:
                self.left = Tree(data)
            else:
                self.left.insert(data)
        elif self.root < data:
            if self.right is None:
                self.right = Tree(data)
            else:
                self.right.insert(data)

    def to_list(self):
        s = []
        if self.left is not None:
            s += self.left.to_list()
        s.append(self.root)
        if self.right is not None:
            s += self.right.to_list()
        return s

    def find(self, data):
        if data == self.root:
            return self
        elif data < self.root and self.left is not None:
            return self.left.find(data)
        elif data > self.root and self.right is not None:
            return self.right.find(data)
        else:
            return None
In [12]:
tree = Tree(5)
tree.insert(3)
tree.insert(6)
tree.insert(4)
tree.insert(1)
tree.insert(2)
print(tree.find(3).root)
print(tree.find(3).to_list())
3
[1, 2, 3, 4]
       5
      / \
     3   6
    / \
   1   4
    \
     2

How could we delete a node? It is simple if the node is a leaf (has no child), or if it has one child only: delete it and substitute the child if it has at all. If it has two children, we search the smallest value greater than this node. Delete that smallest value from the right subtree, and write this value into the place of the node.

Python tutor

In [13]:
class Tree:
    def __init__(self, root):
        self.root = root
        self.left = None
        self.right = None

    def insert(self, data):
        if self.root > data:
            if self.left is None:
                self.left = Tree(data)
            else:
                self.left.insert(data)
        elif self.root < data:
            if self.right is None:
                self.right = Tree(data)
            else:
                self.right.insert(data)

    def to_list(self):
        s = []
        if self.left is not None:
            s += self.left.to_list()
        s.append(self.root)
        if self.right is not None:
            s += self.right.to_list()
        return s

    def minValue(self):
        current = self
        while(current.left is not None):
            current = current.left
        return current


def deleteNode(tree, node):
    if node < tree.root:
        tree.left = deleteNode(tree.left, node)
    elif node > tree.root:
        tree.right = deleteNode(tree.right, node)
    else:
        if tree.left is None:
            temp = tree.right
            tree = None
            return temp
        elif tree.right is None:
            temp = tree.left
            tree = None
            return temp
        temp = tree.right.minValue()
        tree.root = temp.root
        tree.right = deleteNode(tree.right, temp.root)
    return tree
In [14]:
tree = Tree(5)
tree.insert(3)
tree.insert(1)
tree.insert(4)
tree = deleteNode(tree, 3)
print(tree.to_list())
[1, 4, 5]
       5               5
      /               /
     3        =>     4
    / \             /
   1   4           1
In [15]:
tree = Tree(5)
tree.insert(4)
tree.insert(1)
tree.insert(8)
tree.insert(6)
tree.insert(9)
print(tree.root, tree.to_list())
tree = deleteNode(tree, 5)
print(tree.root, tree.to_list())
tree = deleteNode(tree, 6)
tree = deleteNode(tree, 9)
print(tree.root, tree.to_list())
tree = deleteNode(tree, 1)
tree = deleteNode(tree, 8)
print(tree.root, tree.to_list())
5 [1, 4, 5, 6, 8, 9]
6 [1, 4, 6, 8, 9]
8 [1, 4, 8]
4 [4]
    5            6           8          8       8      4
   / \          / \         / \        /       / 
  4   8   =>   4   8   =>  4   9  =>  4   =>  4   => 
 /   / \      /     \     /          /
1   6   9    1       9   1          1

An application

A very useful application of binary trees are expression trees where the nodes are operations and the leaves are numbers.

If a node is an operation, then store the operation in self.data and the two branches are the two operands. The leaves are numbers without outgoing edges.

In [16]:
class Node:
    def __init__(self, data):
        self.data = data

def calculate(node):
    if node.data == "+":
        return calculate(node.left) + calculate(node.right)
    elif node.data == "*":
        return calculate(node.left) * calculate(node.right)
    else:
        return node.data

def write(node):
    if node.data == "+" or node.data == "*":
        return "({}){}({})"\
            .format(write(node.left),node.data,write(node.right))
    else:
        return node.data
In [17]:
x0 = Node("+")
x1 = Node("*")
x2 = Node(7)
x3 = Node(5)
x1.left = x2
x1.right = x3
x4 = Node(-3)
x0.left = x1
x0.right = x4
print(calculate(x0))
print(write(x0))
32
((7)*(5))+(-3)
             +          x0
            / \         / \
           *  -3      x1  x4
          / \         / \
         7   5      x2  x3