# Algorithms

## 1Introduction to algorithms¶

### 1.1Computing complexity¶

Under the uniform cost model, all primitive operations are considered to take one unit of time. We don't consider things like the number of bits of an input, even if it's actually the number of bits that determines the amount of time taken; we just consider the number of operations performed.

### 1.2Runtimes of simple algorithms¶

$O(1)$
Accessing an element in an array, parity bit checking, comparison, basic operations
$O(\log n)$ (could be to base 2, but it doesn't matter)
Binary search, the ternary search coin-weighing problem, some other kind of search where you halve (or something) your search space at every time step
$O(n)$
Accessing the $n$th element in a linked list, finding the maximum or minimum value in an unsorted array
$O(n \log n)$
Mergesort, finding the closest two points in a graph(achievable)
(finding the closest two points in a graph $O(n \log^2 n)$)
$O(n^2)$
Integer Multiplication (where $n$ is the number of digits), two dimensional array,selection sort
$O(a^n)$
exponential time

## 2Binary search trees¶

### 2.1Standard BST operations¶

min(), max(), search(), insert(), delete(), next() and prev() all have a running time of $O(h)$, where $h$ is the height of the tree.

### 2.2Random BSTs¶

A binary tree created from random permutations has an expected height of $O(\log n)$.

### 2.3Two-dimensional binary search¶

Given a binary search tree in which each node contains a point and a specification of whether it divides its subtrees horizontally or vertically, find all the points that fall within a specified range.

Implementation: A binary search, implemented recursively, with the condition that the search doesn't stop until all of the potential nodes have been investigated. So at each node, check if that node's coordinates fall within the specified range; if not, go left or right (or both) depending on the coordinates and the division of the subtrees and repeat until all of the nodes have been searched.

Runtime: $O(n)$ where $n$ is the number of nodes in the tree, as no node needs to be visited more than once (but up to the entire tree may need to be visited).

From assignment 1, question 4.

### 2.4Red-black trees¶

Properties

A red-black tree is a type of self-balancing binary search tree with the following properties:

1. Every node is either red or black.
2. The root node is black.
3. Wherever a leaf can be (but is not) placed, add a black nil leaf.
4. If a node is red, both children are black.
5. The number of black nodes between a node and a nil leaf is the same for all nil leaves.

Operations

They are $O(\log n)$ high, so all of the standard operations take $O(\log n)$, as before.

Rotation is $O(1)$ time, as it just involves changing three pointers:

Colouring a red black tree takes $O(n)$ time.

Inserting a new element as the root can be done in $O(\log n)$ time, by inserting the new element as a leaf the standard way and then rotating the tree until the new element becomes the root. Since the tree has a height of $\log n$, there are at most $\log n$ rotations needed, so the total runtime is $O(2 \log n) = O(\log n)$. From assignment 1, question 2.

Lethargic delete

Implement a "lethargic delete" operation which is performed by finding the node and marking it as deleted, and rebuilding the red-black tree from scratch using the non-deleted nodes only if more than 50% of nodes are deleted.

Implementation: The lethargic delete operation can be performed by first finding the element to delete using a binary search $O(\log n)$ and then marking the element as deleted, which can be done in constant time. Once 50% of the nodes have been marked as deleted, it is then necessary to pull out the non-deleted nodes in a sorted order, which can be done by doing an in-order traversal of the tree $O(n)$. Then, the tree can be rebuilt from the array of the nodes above, using the following recursive tree-building algorithm:

build_tree(array):
n = length of array
# If the array has only one element, return that element
if n == 1:
return array[0]

# Otherwise, we build the tree recursively
# Get "middle" element (either the middle, or the one next to it)
root = array[ceil(n / 2)]
# Build left subtree using the elements to the left of the "middle"
root.left = build_tree(array[0:ceil(n / 2)])
# Build right subtree using the elements to the right of the "middle"
root.right = build_tree(array[ceil(n / 2) + 1:])


This algorithm runs in $O(1)$ time because node only needs to be visited once, with a constant number of operations (i.e. splitting the array) done at every node. We can then colour the tree using a $O(n)$ breadth-first search and the following instructions:

1. Save the number of nodes in the tree (obtained after the initial in-order tree traversal) as $n$.
2. Number the root node i = 1.

### 4.6Dijkstra's algorithm¶

Find the shortest path between a single node and every other node in the graph, assuming non-negative edge weights.

Implementation: Create a list of distances from the source to each vertex, each of which is initialised to infinity (except the distance from the source to itself, which is 0). This list is kept in a heap so that the minimum element can be retrieved in $O(\log n)$ time. We can also create a list of the vertices before each vertex in every optimal path, so that we can later generate the paths themselves by backtracking.

For every iteration, do:

1. Take out the vertex with the lowest distance to the source from the list
2. For all of its adjacent vertices, if the distance to this vertex + the cost of the edge leading to the adjacent vertex is lower than the distance to source of the adjacent vertex, update it, and change the previous vertex list to point to this one
3. Update the list accordingly as distances to the source has changed
4. GOTO 1.
5. If there are no more, or the lowest distance is infinity, the algorithm has completed.
6. Backtrack from the destination by looking up the list of previous vertices until we get back to the source

Implemented in assignment 2, question 5.

Runtime: $O(n \log n)$ with a heap, $O(n^2)$ without (keeping the list of distances to the source in a proper heap requires $O(\log n)$ every iteration, and there are at most n iterations)

### 4.7Huffman coding¶

Design an optimal binary, variable-length encoding algorithm (specifically, a prefix code) that uses the frequency of each characters to determine its corresponding code.

Implementation: We start with a list of nodes (each one corresponding to a character) sorted by the frequency of occurrence of the character. We can then build a priority queue of the nodes as follows:

1. Remove the two nodes with the lowest frequency.
2. Create a new node with the combined frequency of these two nodes, and has two children being these two nodes, and put the new node back into the heap
3. GOTO 1.

The structure produced at the end should be a binary tree that determines the Huffman code. To find the codeword for each character, simply traverse the tree, taking the left edge of any node as a 0, and the right as a 1 (or vice versa).

We could also create a ternary Huffman code using a ternary tree. The algorithm is similar, only instead of removing two nodes with the lowest frequency, we remove three, and create a new node with three children. This algorithm then ends when there is only one node left in the queue, as before, although if the number of characters to encode is even, it is necessary to create an empty node initially and add it to the queue (as a full ternary tree can only have an odd number of leaves).

Runtime: $O(n\log n)$ ($O(n\log n)$ to sort the nodes at the beginning, then at most $n$ iterations, each of which takes $O(\log n)$ time to properly insert the new node into the heap)

Example of a ternary Huffman code in assignment 2, question 4.

## 5Divide and conquer¶

Divide-and-conquer algorithms work by taking a problem and breaking it down into smaller, more manageable subproblems. Once all the subproblems have been solved, their solutions can be amalgamated to obtain the solution for the entire problem.

### 5.1Mergesort¶

Implementation: We can create a sorted list in $O(n)$ time by merging two sorted sublists of half lengths. These two sorted sublists can be created in the same way, until the sublists are of length 1 or 0, in which case they are already sorted.

Two sorted sublists can be merged with the use of a temporary array and a linear number of comparisons. That is, for the first element in the fully sorted list, take the smaller of the first elements of the two sorted sublists, and delete that element from the array. And so on.

Runtime: $O(n\log n)$ (from the recurrence $T(n) = 2T(\frac{n}{2}) + n$)

### 5.2Exponentiation¶

Implementation: The naive method, brute-force method is to multiply the number $n$ times, which takes $O(n)$. Using a divide-and-conquer algorithm, however, we can define exponentiation in a way that scales much better: if $n$ is even, $x^n = x^{n/2} \times x^{n/2}$, and if $n$ is odd, $x^n = x^{n/2} \times x^{n/2} \times x$. Then $x^{n/2}$ can be calculated the same way, etc.

Runtime: $T(n) = 2T(\frac{n}{2}) + 2 = 3 \log n = O(\log n)$ confirm?

### 5.3Chip testing¶

Two chips are being put in the tester at once. The chip in slot 1 will test the condition of the chip in slot 2. A good chip will always be correct while a bad chip will give a random result. We know that in our pile, there are more good chips than bad chips. How can we accuately determine the quality of all of the chips?