Wikinotes

Maintainer: admin

With $n$ men and $n$ women, find a pairing with no instabilities

Instability:: Man $m$ prefers woman $w$ over assigned woman $x$; $w$ prefers man $m$ over her assigned partner

Results in a male-optimal and female-pessimal solution

all people are free
while( any man is free & hasn't proposed to all women):
    Choose such a man m
    w = 1st on m's list who hasn't been proposed to yet
    if (w is free):
        assign w to m
    else:
        if (w prefers m over assigned partner, a):
            assign w to m
            make a free
        else
            w rejects m

Suppose $A-Z$ is an unstable pair in the GS Matching $S^*$
$A$ is matched to $B$, $Z$ is matched to $X$
Case $A$ never proposed to $Z$
- $A$ goes down his list in order of favorites.
- If $A$ never proposed to$Z$, he prefers $B$ over $Z$
Case $A$ proposed to $Z$
- $A$ and $Z$ were matched for a time.
- They can only become unmatched if $Z$ prefers some other suitor over $A$

Suppose man $M$ is matched to someone other than his best stable partner in $S$
Men propose in decreasing order of preference
- If $M$ was rejected by a partner, she would have to have stably preferred someone

only one of the following holds for a graph $G$ and BFS $B$

No edge joins two nodes inside $B$, $G$ is bipartite
An edge joins two nodes in the same layer $\to$ $G$ contains an odd length cycle $\to$ $G$ is not bipartite

Divide problem into several parts
Solve parts recursively
Combine solutions from sub-problems into final answer

divide array $O(1)$
recursively sort $2 \times T(\frac{n}{2})$
merge $O(n)$

Two pointers for each (sorted) array
Compare values, and add lowest to output array
left shift pointer
repeat

def mergesort( var a as array )
     if ( n == 1 ) 
         return a
     var l1 as array = a[0] ... a[n/2]
     var l2 as array = a[n/2+1] ... a[n]

     l1 = mergesort( l1 )
     l2 = mergesort( l2 )

     return merge( l1, l2 )

def merge( var a as array, var b as array )
     var c as array

     while ( a and b have elements )
          if ( a[0] > b[0] )
               add b[0] to the end of c
               remove b[0] from b
          else
               add a[0] to the end of c
               remove a[0] from a
     while ( a has elements )
          add a[0] to the end of c
          remove a[0] from a
     while ( b has elements )
          add b[0] to the end of c
          remove b[0] from b
     return c

Draw vertical line to roughly cut points in half
Recursively find closest points in each half
Find any potentially closer pairs after merging halves
Return closest pair of points

This algorithm takes $O(\log^2(n))$, but can be improved to $O(\log(n)$ via pre-sorting the list of points. Each recursion returns a list, with points sorted by y-coordinate, and we then merge the lists.

brute force requires $O(n^2)$ operations
an $n$ digit number can be represented by 2 $\frac{n}{2}$ digit numbers
$x = 2^{\frac{n}{2}}x_1+x_0$, $y = 2^{\frac{n}{2}}y_1+y_0$
$xy = 2^n(x_1y_1)+2^{\frac{n}{2}}(x_1y_0+x_0y_1)+x_0y_0$
multiply 4 $\frac{n}{2}$ digit numbers
add two $\frac{n}{2}$ digit integers and shift
$T(n) = 4T(\frac{n}{2})+\Theta(n) \to T(n) \in \Theta(n^2)$

instead, you can add 2 $\frac{n}{2}$ digit numbers
multiply three $\frac{n}{2}$
same as above, but:

$$xy = 2^n(x_1y_1) + 2^{\frac{n}{2}}((x_1+x_0)(y_1+y_0)-x_1y_1-x_0y_0)+x_0y_0$$

Runtime: $T(n)=3T(\frac{n}{2})+\Theta(n)$; $T(n) \in O(n^{\log_2 3}) \approx n^{1.5}$

$O(n^3)$ time
$n$ multiplications for $n \times n$ elements

Divide matrix into 4 ${\frac{n}{2}} \times {\frac{n}{2}}$ parts
Find product of quarters, with 8 multiplications (of 2x2)
Add appropriate products with 4 matrix additions
Return appropriate values
$T(n) = 8T(\frac{n}{2}) + \Theta(n^2)\to T(n)\in\Theta(n^3)$

No improvement over naive algorithm
BUT you can multiply two 2x2 matrices with 7 multiplications
leads to $T(n) \in \Theta(n^{\log_2 7})$

$T(n) = \alpha T(\frac{n}{\beta}) + f(n), \alpha \ge 1, \beta \gt 1 $

$n$ is size of problem
$\alpha$ is number of sub problems
$\frac{n}{\beta}$ is size of sub problems
$f(n)$ is additional work required

Case 1: $f(n) \in O(n^c),\ st.\ c \lt \log_{\beta}\alpha $; $T(n) \in \Theta(n^{\log_{\beta}\alpha})$
Case 2: $f(n) \in \Theta(n^c \log^k n),\ st.\ c \lt \log_{\beta}\alpha $; $T(n) \in \Theta\left( n^{c} \log^{k+1} n \right) $
Case 3: $f(n) \in \Omega(n^c),\ st.\ c \lt \log_{\beta}\alpha $ and; $a f\left( \frac{n}{b} \right) \le k f(n)$ for some constant $k < 1$ and sufficiently large $n$; $T(n) \in \Theta(f(n))$

captures pairwise relationship between objects
V nodes and E edges
Size Parameters: $n=V$, $m=E$
degree is number of neighboring edges

$n\times n$ matrix where $A_{uv}$ is 1 if $(u, v)$ is an edge
Requires $n^2$ space
Checking if $(u, v)$ is an edge takes $\Theta(1)$ time
Identifying all edges takes $\Theta(n^2)$ time

$n$ length array of nodes, each index is list of connected nodes
requires $m+n$ space
Checking if $(u, v)$ is an edge takes $O(deg(u))$ time
Identifying all edges takes $\Theta(m+n)$ time

Path: a sequence of nodes $\{v_1, ..., v_k\}$ st. any pair $(v_i, v_{i+1})$ is connected
Simple Path: all nodes in path are distinct
Connected Path: There is a path between any pair of nodes
Cycle: A path $\{v_1, ..., v_k\}$ where $v_1=v_k$, k > 2, and the first $k-1$ nodes are distinct
Tree: An undirected path which is connected and doesn't contain a cycle
Connected Component: all nodes reachable from s

Connectivity: is there a path between nodes $s$ and $t$
Shortest Path: the length of the shortest path between nodes $s$ and $t$

Explore outwards from a starting node s, in all directions. Add nodes a layer at a time

Output: $L_0 = {s}$; $L_1 = $ all neighbors of $L_0$; $L_2 =$ all nodes that are not in $L_0, L_1$ which have an edge to a node in $L_1$

There is a path from $s$ to a node $t$ in $L$ if and only if $t$ appears in a BFS with $s$

Graph G
s = some node in G
Discovered[s] = true;
l = 0 //layers
Tree = null
L[0] = s

while(L[l] is not empty):
    initialize L[l+1]
    for each node u in L[l]:
        for each edge e connecting u to v:
            if Discovered[v] == false
                Discovered[v] = true
                add edge e to tree
                add v to L[l+1]
    l++

runs in $O(m+n)$ time

for each node, there are deg(u) incident edges
total time processing edges if $\sum_{u \in G} deg(u) = 2m$, since each edge is counted twice

A BFS checks all nodes within a certain distance of the root. Instead, a DFS goes all the way done a branch of the tree, and then goes down any visible branch, until it has explored every node. The DFS requires less memory, since you don't need to store each node's pointers at each level.
You would chose an algorithm based of the use case, for example a search for something far down the tree would be found quicker with a DFS search, while something closer would be easier to find with a BFS.

Flood Fill: Recolor all neighboring pixels of a color l

Given a pixel of color lime, change the blob of neighboring pixels to blue

Definitions: Node: Pixel
Edge: neighboring lime pixels
Blob: connected component of lime pixels

get_connected_component() {
    R = {s}
    while (there is an edge (u, v) st. u is in R and v is not in R )
        add v to R
}

For flood fill: ensure the edge $(u, v)$ is connecting to like colored edges

Bipartite: an undirected graph st. nodes can be colored red and blue such that no neighboring nodes are like colored; graph can't have an odd length cycle

Let G be a connected graph with L layers from a BFS

No edge of G joins two nodes of the same layer, and G is bipartite
An edge of G joins two nodes of the same layer, and G contains an odd-length cycle (and hence is not bipartite)

An edge $(u, v)$ goes from node $u$ to node $v$

Mutually Reachable: $\exists$ a path between $u$ and $v$ a path between $v$ and $u$
Strong Connectivity: Every pair of nodes is mutually reachable

pick a node $s$
run BFS in $G$
run BFS in $G^{-1}$, where every edge in $G$ has been reversed
return true if all nodes are reachable in both BFS executions

DAG: A directed graph with no cycles
Topological Order: Ordering of nodes as $v_1, v_2,...,v_n$ st for every edge $(v_i, v_j), i < j$

If G has a topological order, G is a DAG.
If G is a DAG, then there is a node with no incoming edges

To computer topological order:

def topological(G):
    Find a node v with no incoming edges
    G.delete(v)
    return v + topological(G)

make an optimal/plausible choice for current position
keep making the same decision for the remaining set

Stays Ahead: Show that greedy algorithm is optimal after each step
Exchange: Show the greedy algorithm's output is equivalent to another solution
Structural: Show every possible solution matches a condition, and the greedy algorithm matches this

sort by earliest finish time

Assume greedy algorithm is suboptimal
let $\{i_1, i_2, ... , i_k\}$ be the greedy set
let $\{j_1, j_2, ..., j_m\}$ be the optimal set with $i_a = j_a$ for the largest possible $a$
$a = m$ since the job $j_{a+1}$ must start after $j_a$ finish meaning it can be added to the set

assign a grouping of lectures to various lecture halls, such that there is no overlap, with the minimum number of rooms
A lecture $l_i$ starts and finishes at $s_i, f_i$

sort lectures by (increasing) starting time
classrooms = {priority_queue}

for lecture j in schedule:
    if (j can be added to some room):
        classrooms.addClass()
    else:
        classrooms.addRoom()
        classrooms.addClass()

It takes $O(n\log n)$ to sort the lectures, and then $O(n)$ time to add them to the schedule

A new classroom is only created if there is a conflict with a current lecture. Since we have sorted by starting time, any conflict starts after $s_j$

You want to perform a set of jobs with a known processing time and due time. They should be performed such that the maximum overdue time of any job is minimized. Use a sorting of 'earliest deadline first'

Maintain a set of explored nodes $S$, which have a shortest path from $s$ to all nodes $u$
Initialize $S = \{s\}$
Repeatedly greedily choose the shortest available path to an unexplored node, until all nodes are explored
$O(E + V\log|V|)$

maintain $d(v) = min(d(u) + l_e)$, st $l_e$ is an edge $(u, v)$
sort new nodes by $d(v)$

Given a connected graph with real value edge waits, the MST is the subset of edges touching all vertices such that the sum of edge weights is minimized.

where $n$ is the number of vertices in the
- Any MST has to have $n-1$ edges
- There are $n^{n-2}$ spanning trees

Sort edges by increasing weight, start with MST $T = \{0\}$
Try inserting edge $e$ with lowest weight
1. if $e$ creates a cycle, remove
2. repeat until $n-1$ edges in graph

$O(E\log E)$
1. Comparison Sort for edges
2. Use Union Find Data Strucutre to create tree, requires $O(v)$ operations

Sort edges by decreasing weight, Start with MST $T = \{E\}$ (all edges)
Try deleting edge $e$ with highest weight
1. if removing $e$ disconnects graph, add it back
2. repeat until $n-1$ edges in graph

$ O(E \log V (\log \log V)^3) $
Comparison Sort for edges
E iterations of loop
* delete in $O(1)$
* connectivity takes $O(\log V (\log \log V)^3) $

Start with $T = \{v\}$ (some random vertex)
Add the edge with lowest weight which doesn't have an end point in $T$

Minimum edge weight data structure	Time complexity (total)
adjacency matrix, searching	$O(
binary heap and adjacency list	$O((
Fibonacci heap and adjacency list	$O(

Time is bottle necked by lookup time

Functions: MakeSet(x) creates a set with only element x
Find(x) returns a number representing an elements location in the data structure
Union(x, y) merges the sets containing x and y into a new set with one representative number

Let's start with $\{ \{1\}, \{2\}, \{3\}, \{4\}, \{5\}\}$.
Calling union(1, 2) results in $\{ \{1, 2\}, \{3\}, \{4\}, \{5\}\}$. In this state, find(2) => 1
Calling union(2, 3), union(4, 5) results in $\{ \{1, 2, 3\}, \{4, 5\}\}$.

Imagine an array of Linked Lists. Initially, each element is in it's own index. But as union() is called, an element can be moved to another linked list, and its array index will point to the new array location.

Find(x):
r = x
while r =! r.parent:
    r = r.parent
while x =! x.parent:
    x = x.parent
    x.parent = r.parent
return x

Break problem into set of overlapping subproblems
Use recursive strategy with memoization

Recipe:

Characterize the structure of a problem
Recursively determine the value of an optimal solution
Compute value of optimal solution
Use the value to find the solution

Techniques

Binary Choice (weighted interval)
Multi-way choice (segmented squares)
Intervals (RNA Secondary Structure)

Maximize weight of set of jobs
- Greedy algorithm has all weights equal to 1
sort jobs by finish time st $f_1 \lt f_2 \lt ... \lt f_n$
p(j) = returns largest index $i<j$ st $i$ and $j$ are compatible
- p(j) can be done in linear time
- lets us know all jobs $[1, i]$ are compatible with j
lets make OPT(j) which returns the maximum weighted schedule

Case 1: job $j$ is selected
jobs $[p(j)+1, j-1]$ are not in the solution
Case 2: jobs $j$ isn't selected
must include optimal solution of jobs $[1, j-1]$, or it's the same as OPT(j-1)

returns maximum weight:

OPT(j)
    if (j == 0)
        return 0
    else
        return max(j + Opt(p(j)), Opt(j-1))

But this is $O(2^n)$ because of redundant recursive calls

instead of repeated calculation, store the result in an array

M = [0 for i in range (0, n)]

M-OPT(j)
    if (j == 0)
        return 0
    else if (M[j] is empty) 
        M[j] = max(j + M-OPT(p(j)), M-OPT(j-1)) 
    return M[j]

M-OPT must take $n$ time since it either takes constant time, or recursively calls itself. If every potential recursive call is made, the algorithm will require $2n$ calls

So $O(n)$ is jobs are presorted by start and finish times (requires $O(n\log n)$

But this only returns the maximum weight of the schedule, not the actual schedule

Can also unwind the recursion, and make an iterative algorithm

An alternative to Djikstra's Algorithm
Allows for negative weighted edges
- eg. profits from decisions
Djikstra's algorithm doesn't work with negative edges
No shortest path if negative cost cycle
- allows for the cost to reduced by going around cycle n times
- involves a cycle with a cost less than zero
Shortest path can't contain cycle
- If positive weight, there is no point
- if negative weight, you would go around it repeatedly

OPT(i, v) returns shortest path $(v, t)$ using at most $i$ edges

Case 1: P uses at most $i-1$ edges
OPT(i, v) = OPT(i-1, v)
Case 2: P uses $i$ edges
say $(v, w)$ is the first edge, OPT uses $(v,w)$ and selects best path from $w \to t$

$$ OPT(i, v) = \begin{cases} 0, i=0\\ \min (OPT(i, v),\ OPT(i-1, w)),i\ne0 \end{cases} $$

Shortest-Path(G, t) {
 foreach node v ∈ V
 M[0,v] ← ∞
 M[0,t] ← 0
 for i = 1 to n-1
     foreach node v ∈ V
 M[i, v] ← M[i-1,v]
     foreach edge (u, w) ∈ E
 M[i, u] ← min { M[i,u], M[i-1,w] + cuw }
}

Worst Case running time : $$O(nm)$$
But usually is substantially faster in practice.

just use one array, maintaining shortest path from $v \to t$
only update edges if distance has changed in a previous iteration

add a node $t$, connected to all nodes with weight one
if a path of length OPT(n) = OPT(n-1), there are no negative cycles

points roughly lie on line segments
order by x st:
- minimizes squared errors in line segments ($e$)
- minimize the number of lines ($L$)
must trade-off between lines and errors ($c$)

$$ OPT(j) = \begin{cases} 0, j=0\\ \min\{e(i, j) + c + OPT(i-1)\}, j \ne 0 \end{cases} $$

min for all $i\in [1, j]$

Given two strings, X = [x1, ..., xn] and Y = [y1, ..., yn], find the allignment such that they match the most. How much they 'match' is determined by minimizing the cost.

Here, an index is mismatched if x[i] != y[i], and is unmatched if x[i] = NULL

$$\mathscr{cost} = \sum_{\mathscr{mismatched}} a_{x_i, y_i}+\sum_{\mathscr{unmatched}}x_i+\sum_{\mathscr{unmatched}}y_i$$

Calculating the optimal matching:

Match (x[i] = y[j])
- cost to align x[i], y[j] + min cost of remainder
No Match (x[i] != y[j])
1. Leave x[i] unmatched
  - cost of gap + min cost of remainder
2. Leave y[i] unmatched
  - cost of gap + min cost of remainder
    <div id="pagebreak"></div>

Find maximum flow of graph
Very similar to smallest cut in a graph

Assume: unidirectional edges
each edge has a capacity c(e)
nodes trying to relate $s$ and $t$

net material flowing through pipes
capacity of a cut is the sum of capacities of edges exiting the cut
$st$ cut, such that each one cut has $s$ and the other has $t$

add parameters to a node such that:

$0 \le f(e) \le c(e)$ for each edge
$\sum_{e_{in}\in v} f(e) = \sum_{e_{out}\in v} f(e) $ for each node

but how do you find a maximum flow?

flow out of a cut equals the flow out of a node
a minimum cut:
- has incoming edges with flow 0
- and outgoing edges with a flow at full capacity

edge $ e = (u, v) \in E$
functions Flow(e), Capacity(e)

undo flow sent
so $e = (u, v)$ and $e^R = (v, u)$
*$c_f(e)$ is residual capacity
$c_f(e) = \begin{cases}c(e) - f(e), if e\in E\\ f(e), if e^R \in E \end{cases}$

So Residual graph is all residual edges with positive capacity

While $\exists$ a path from $s$ to $t$
1. Find a valid path with DFS
2. Find lowest capacity edge in the path and augemnt that path
  - Lower flow in the path by the lowest capacity
  - Add residual edges going in the opposite direction
3. Add the ammount to the total flow
return total flow

DFS takes $O(|E|)$ and we have to execute it at worst case $f$ times, so runtime of $O(f|E|)$

In each step of the iteration, the residual edge is a valid path for teh flow to take.

Choosing a path with the highest bottleneck increases the flow by the maximum possible amount. However, finding the path with highest bottleneck is difficult. Instead, we use a scaling parameter $\Delta$ and use a subgraph $G_f(\Delta)$ which contains arcs with capacity of atleast $\Delta$. Then, keep adding such a path $\Delta$, and then update $G_f$. After there are no more paths, halve $\Delta$.

set flow of all edges to 0
find the largest $\Delta = 2^k$ with a existent path
Make a residual graph $G_f$
while $\Delta \ge 1$:
1. Find $G_f(\Delta)$
2. while there is a path in $G_f$
  1. f = augment(f, k, P)
  2. update $G_f$
3. halve $\Delta$
return f

After processing a value at $\Delta$ , the maximum flow $f$ is off by at most $m\Delta$
There can be at most $2m$ augmentations per phasing
The scaling max-flow algorithm finds a flow in at most $O(m\log C)$ and the overall flow can be found in $O(m^2 \log C)$.

can be done as Network Flow

Matching: $M \subseteq E$ if each node appears at the end of at most one edge in $M$

Bipartite Matching is matching in a Bipartite Graph

add node $s$ and $t$ to a bipartite graph
add edges from $s$ to all nodes on one side of Bipartite Graph
add edges from $t$ to all nodes on the other side of the Graph
Give all edges flow 1, and find max flow

Paths between $s, t$ such that there are no shared edges
Give all edges a weight of 1
Max Edge Disjoint paths equals the maximum flow value

starting node $s$
game nodes $(team\ 1, team\ 2), ..., (team\ x, team\ y )$
team nodes
end node $t$

Can team 3 finish with the most wins?

Assume team 3 wins all remaining games
This implies all other teams have lost all games with team 3
Remove all game nodes involving team 3
Remove team node 3
Set initial flow from $s$ to e the number of games left between teams

The total number of games left to be played, is the total capacity of the system. All these games must be won by
someone other than team 3.

We want the max flow to equal this value, but no team has more wins than team 3 can have in the best case. So all edges leaving the source must be saturated

Say we have $n$ processes, $\{P_1, ..., P_n\}$, all trying to access a database
Only 1 can enter the database at a time
want fairness in how the database is shared
Processes can't communicate

Each process request access with probability $\frac{1}{n}$
Let S(i, t) be the probability of $P_i$ accessing the database at time t
Pr[S(i, t)] = $p \times (1-p)^{n-1} = \frac{1}{n} \times (1-\frac{1}{n})^{n-1}$
$ \lim_{n \to 2} = \frac{1}{2}$ and $\lim_{n \to \infty} = \frac{1}{e}$
so we have the bounds of the probability

The probability that process $i$ fails to access the database in $en$ rounds is at most $\frac{1}{e}$ and after $en(c\ln n)$ rounds the probability is at most $n^{-c}$

The probability that all process succeed within $2e n \ln n$ rounds is at least $1-\frac{1}{n}$.

find the smallest Min Cut in a graph i,e.cut $(A,B)$ (Minimum number of edges connecting $A$ & $B$).
you'd assume it's harder than the minimum $s-t$ but, but it isn't

Pick an edge $e =(u,v)$ uniformly at random
Contract edge $e$
- replace $u, v$ with a super-node $w$
- preserve edges, but update endpoints to be $w$
- remove internal edges, but combine parallel edges
Repeat until there are just two nodes, $w_1, w_2$
The cut is all the nodes contracted to form $w_1$

A randomized algorithm just has a high probability of returning a mincut. In this case it returns a mincut with $\frac{2}{n^2}$. To increase the chance of an answer, repeat the algorithm many times.
Since the probability of finding a solution is $\frac{2}{n^2}$, the probability of not finding it is $1-\frac{2}{n^2}$. This results in a failure to find the global min cut after $n^2\ln n$ iterations of $\frac{1}{n^2}$.

Useful property: if $X$ is a $0/1$ random variable the expectation of $X$ is $E[X] = Pr[X=1]$.

Without memory the expected number of correct guesses is 1.
With memory the the number of correct guesses is $\Theta(\log n)$

The expected number of steps is $\Theta(n \log n)$ for $n$ different types of coupons and uniform distribution.

Randomization protects against worst case by choosing pivot at random.

Expected Number of Comparisons:
$$q_n = 2n\ln n - (4- 2(.577))n + 2\ln n + O(1)$$

use a prime number with an order about equal to the size of the table
identify each element with a number

$\newcommand{\H}{\mathscr{H}}$
We want to map a subset of a large universe $U$. It is large enough that we can't maintain an array.

We want a function $h: u \in U \to \{0, 1, ..., n-1\}$, or a funtion which randomly maps values to an array of size $n$. From the birthday problem, a collision is expected fairly often. We want some degree of randomness, to prevent malicious input from harming computer.

A Universal family is defined such that a $v \ne u$ and function $h \in \H$ st $\Pr_{h}[h(u)=h(v)]\le \frac{1}{n}$. You want to compute $h$ efficiently and select it at random efficiently.
To encode an integer, we can use:
$$ h_a(x) = \sum_{i=1}^{r} a_i x_i \mod p $$
with $p$ being a prime, $r$ the length of the integer, and $0 \le a_i \lt p$. This means we can say $\H = \{h_a: a \in A\}$ is a universal family.

You want to ensure that when hashing $k$ keys, the chances of $h(x_1) = y_1, ..., h(x_k) = y_k = \frac{1}{n^k}$, in addition to the qualifications for a universal hashing function. Basically, you want to ensure that two different keys have a low chance of colliding (condition 1) and repeatedly hashing to the same value (condition 2).

You can try to use all $h \in \H$ and write down the outputs. Now, you must make sure that the chances of $h(x_1) = y_1, ..., h(x_k) = y_k = \frac{1}{n^k}$.

Monte Carlo Algorithm: Guaranteed to run in poly-time, likely to find correct answer.
* Ex) Contraction Algorithm for global min cut

Las Vegas Algorithm:
Guaranteed to find correct answer, likely yo run in poly-time.
* Ex) Randomized Quicksort

Note:
Can always convert a Las Vegas Algo into Monte Carlo, but no known method for reverse.

Numbers bounded by $k$
Counting sort, $O(n)$ if $k$ is $O(n)$
Stable sorts -Preserve order
Radix Sort, $O(n)$ if $k$ is $O(n^c)$.

Counting sort:
" Its running time is linear in the number of items and the difference between the maximum and minimum key values, so it is only suitable for direct use in situations where the variation in keys is not significantly greater than the number of items."

# variables:
#   input -- the array of items to be sorted; key(x) returns the key for item x
#    n -- the length of the input
#    k -- a number such that all keys are in the range 0..k-1
#    count -- an array of numbers, with indexes 0..k-1, initially all zero
#    output -- an array of items, with indexes 0..n-1
#    x -- an individual input item, used within the algorithm
#    total, oldCount, i -- numbers used within the algorithm

# calculate the histogram of key frequencies:
for x in input:
    count[key(x)] += 1

# calculate the starting index for each key:
total = 0
for i in range(k):   # i = 0, 1, ... k-1
    oldCount = count[i]
    count[i] = total
    total += oldCount

# copy to output array, preserving order of inputs with equal keys:
for x in input:
    output[count[key(x)]] = x
    count[key(x)] += 1

return output

Radix sort: Implements a version of counting sort.

A self-balancing binary search tree
- Every node is either red or black
- The root is always black
- The botom row is all black leafs with value Null
- A red node must have two black children
Height of a red-black tree
- Height of a node is the number of edges in a longest path to a leaf.
- Black-height of a node $x: bh(x)$ is the number of black nodes (including Null) on the path from $x$ to leaf, not counting $x$. Black-height is well defined.
- Any node with height $h$ has black-height $\geq \frac{h}{2}$.
- The subtree rooted at any node $x$ contains $\geq 2^{bh(x)} -1$ internal nodes.
- A red-black tree with $n$ nodes has height $\leq 2\lg(n+1)$.
Insertion
- on insertion you want to maintain the balance of the tree
- involves right and left rotation
- it takes $O(1)$ time to rearrange the pointers via rotations, but $O(n)$ time to recolor the tree

2 Questions about Algorithms
1. Analyze Runtime
2. Presumably Dynamic and Max Flow / Min Cut
Divide and Conquer
1. No Matrix Multiplication
2. Effect of Parameters to Runtime
3. Master Method
Graphs
1. Breadth First Search
2. No DAG
Greedy Algorithms
1. Djikstra's Algorithm
2. Minimum Spanning Tree
Dynamic
1. Can't use dynamic for most recursive algorithms
  1. Often easier to iteratively solve problem
2. Expect a question
Flow
1. Ford Fulkerson
2. Capacity Scaling
3. Runtime and Comparison
Randomized Algorithm
1. Randomized Quicksort
2. Hashing and Expected Collisions

Wednesday, December 31, 2014 COMP 251 F2014

1Stable Matching¶

1.1Gale-Shapley Algorithm¶

1.1.1Stability Proof¶

1.1.2Male-Optimality Proof¶

1.2Bipartiteness¶

2Divide and Conquer¶

2.1Mergesort¶

2.1.1Merging¶

2.1.2Pseudocode¶

2.2Closest Pair of Points¶

2.2.1Algorithm¶

2.2.2Analysis¶

2.3Integer Multiplication¶

2.3.1Brute Force¶

2.3.2Karatsuba Multiplication¶

2.4Matrix Multiplication¶

2.4.1Brute Force¶

2.4.2Divide and Conquer¶

2.5Master Theorem¶

3Graphs¶

3.1Representations¶

3.1.1Adjacency Matrix¶

3.1.2Adjacency List¶

3.2Paths and Connectivity¶

3.3Traversal¶

3.3.1Breadth First Search¶

3.3.2Depth First Search¶

3.3.3Connected Component¶

3.3.4Bipartite Graphs¶

3.3.5Connectivity in Directed Graphs¶

3.3.5.1Strong Connectivity¶

3.4Directed Acyclic Graph¶

4Greedy Algorithms¶

4.1Strategies¶

4.2Interval Scheduling¶

4.2.1Proof of Optimality¶

4.3Interval Partitioning¶

4.3.1Algorithm¶

4.3.2Runtime¶

4.3.3Analysis¶

4.4Minimize Lateness¶

4.5Djikstra's Shortest Path¶

4.5.1Implementation¶

4.6Minimum Spanning Tree¶

4.6.1Kruskal's Algorithm¶

4.6.1.1Algorithm¶

4.6.1.2Runtime¶

4.6.2Reverse-Delete Algorithm¶

4.6.2.1Algorithm¶

4.6.2.2Runtime¶

4.6.3Prim's Algorithm¶

4.6.3.1Algorithm¶

4.6.3.2Runtime¶

4.7Union Find / Disjoint Dataset¶

4.7.1Example¶

4.7.2Implementation¶

4.7.3Iterative Find(x)¶

5Dynamic Programming¶

5.1Weighted Interval Scheduling¶

5.1.1Brute Force¶

5.1.2Memoization¶

5.2Shortest Path¶

5.2.1Dynamic - Bellman-Ford¶

5.2.2Implementation¶

5.2.3Practical Improvements¶

5.2.4Detect Negative Cycles¶

5.3Segmented Least Squares¶

5.3.1Dynamic Algorithm¶

5.4String Similarity¶

6Network Flow¶

6.1Min Cut and Max Flow¶

6.1.1Minimum Cut¶

6.1.2Flows¶

6.1.3Relating Flow and Cut¶

6.2Algorithm¶

6.2.1Residual Graph¶

6.2.1.1Residual Edge¶

6.2.2Ford-Fulkerson Algorithm¶

6.2.3Capacity Scaling¶

Wednesday, December 31, 2014
COMP 251 F2014