Wednesday, December 31, 2014 CC-BY-NC
COMP 251 F2014

Maintainer: admin

1Stable Matching

With $n$ men and $n$ women, find a pairing with no instabilities

Instability:
Man $m$ prefers woman $w$ over assigned woman $x$
$w$ prefers man $m$ over her assigned partner

Results in a male-optimal and female-pessimal solution

1.1Gale-Shapley Algorithm

all people are free
while( any man is free & hasn't proposed to all women):
    Choose such a man m
    w = 1st on m's list who hasn't been proposed to yet
    if (w is free):
        assign w to m
    else:
        if (w prefers m over assigned partner, a):
            assign w to m
            make a free
        else
            w rejects m

1.1.1Stability Proof

  • Suppose $A-Z$ is an unstable pair in the GS Matching $S^*$
  • $A$ is matched to $B$, $Z$ is matched to $X$
  • Case $A$ never proposed to $Z$
    • $A$ goes down his list in order of favorites.
    • If $A$ never proposed to$Z$, he prefers $B$ over $Z$
  • Case $A$ proposed to $Z$
    • $A$ and $Z$ were matched for a time.
    • They can only become unmatched if $Z$ prefers some other suitor over $A$

1.1.2Male-Optimality Proof

  • Suppose man $M$ is matched to someone other than his best stable partner in $S$
  • Men propose in decreasing order of preference
    • If $M$ was rejected by a partner, she would have to have stably preferred someone

1.2Bipartiteness

only one of the following holds for a graph $G$ and BFS $B$

  • No edge joins two nodes inside $B$, $G$ is bipartite
  • An edge joins two nodes in the same layer $\to$ $G$ contains an odd length cycle $\to$ $G$ is not bipartite

2Divide and Conquer

  1. Divide problem into several parts
  2. Solve parts recursively
  3. Combine solutions from sub-problems into final answer

2.1Mergesort

  1. divide array $O(1)$
  2. recursively sort $2 \times T(\frac{n}{2})$
  3. merge $O(n)$

2.1.1Merging

  • Two pointers for each (sorted) array
  • Compare values, and add lowest to output array
  • left shift pointer
  • repeat

2.1.2Pseudocode

def mergesort( var a as array )
     if ( n == 1 ) 
         return a
     var l1 as array = a[0] ... a[n/2]
     var l2 as array = a[n/2+1] ... a[n]

     l1 = mergesort( l1 )
     l2 = mergesort( l2 )

     return merge( l1, l2 )

def merge( var a as array, var b as array )
     var c as array

     while ( a and b have elements )
          if ( a[0] > b[0] )
               add b[0] to the end of c
               remove b[0] from b
          else
               add a[0] to the end of c
               remove a[0] from a
     while ( a has elements )
          add a[0] to the end of c
          remove a[0] from a
     while ( b has elements )
          add b[0] to the end of c
          remove b[0] from b
     return c

2.2Closest Pair of Points

2.2.1Algorithm

  1. Draw vertical line to roughly cut points in half
  2. Recursively find closest points in each half
  3. Find any potentially closer pairs after merging halves
  4. Return closest pair of points

2.2.2Analysis

This algorithm takes $O(\log^2(n))$, but can be improved to $O(\log(n)$ via pre-sorting the list of points. Each recursion returns a list, with points sorted by y-coordinate, and we then merge the lists.

2.3Integer Multiplication

2.3.1Brute Force

  • brute force requires $O(n^2)$ operations
  • an $n$ digit number can be represented by 2 $\frac{n}{2}$ digit numbers
  • $x = 2^{\frac{n}{2}}x_1+x_0$, $y = 2^{\frac{n}{2}}y_1+y_0$
  • $xy = 2^n(x_1y_1)+2^{\frac{n}{2}}(x_1y_0+x_0y_1)+x_0y_0$

  • multiply 4 $\frac{n}{2}$ digit numbers

  • add two $\frac{n}{2}$ digit integers and shift
  • $T(n) = 4T(\frac{n}{2})+\Theta(n) \to T(n) \in \Theta(n^2)$

2.3.2Karatsuba Multiplication

  • instead, you can add 2 $\frac{n}{2}$ digit numbers
  • multiply three $\frac{n}{2}$
  • same as above, but:

$$xy = 2^n(x_1y_1) + 2^{\frac{n}{2}}((x_1+x_0)(y_1+y_0)-x_1y_1-x_0y_0)+x_0y_0$$

Runtime
$T(n)=3T(\frac{n}{2})+\Theta(n)$
$T(n) \in O(n^{\log_2 3}) \approx n^{1.5}$

2.4Matrix Multiplication

2.4.1Brute Force

  • $O(n^3)$ time
  • $n$ multiplications for $n \times n$ elements

2.4.2Divide and Conquer

  1. Divide matrix into 4 ${\frac{n}{2}} \times {\frac{n}{2}}$ parts
  2. Find product of quarters, with 8 multiplications (of 2x2)
  3. Add appropriate products with 4 matrix additions
  4. Return appropriate values
    $T(n) = 8T(\frac{n}{2}) + \Theta(n^2)\to T(n)\in\Theta(n^3)$

No improvement over naive algorithm
BUT you can multiply two 2x2 matrices with 7 multiplications
leads to $T(n) \in \Theta(n^{\log_2 7})$

2.5Master Theorem

$T(n) = \alpha T(\frac{n}{\beta}) + f(n), \alpha \ge 1, \beta \gt 1 $

  • $n$ is size of problem
  • $\alpha$ is number of sub problems
  • $\frac{n}{\beta}$ is size of sub problems
  • $f(n)$ is additional work required
Case 1
$f(n) \in O(n^c),\ st.\ c \lt \log_{\beta}\alpha $
$T(n) \in \Theta(n^{\log_{\beta}\alpha})$
Case 2
$f(n) \in \Theta(n^c \log^k n),\ st.\ c \lt \log_{\beta}\alpha $
$T(n) \in \Theta\left( n^{c} \log^{k+1} n \right) $
Case 3
$f(n) \in \Omega(n^c),\ st.\ c \lt \log_{\beta}\alpha $ and
$a f\left( \frac{n}{b} \right) \le k f(n)$ for some constant $k < 1$ and sufficiently large $n$
$T(n) \in \Theta(f(n))$

3Graphs

  • captures pairwise relationship between objects
  • V nodes and E edges
  • Size Parameters: $n=V$, $m=E$
  • degree is number of neighboring edges

3.1Representations

3.1.1Adjacency Matrix

  • $n\times n$ matrix where $A_{uv}$ is 1 if $(u, v)$ is an edge
  • Requires $n^2$ space
  • Checking if $(u, v)$ is an edge takes $\Theta(1)$ time
  • Identifying all edges takes $\Theta(n^2)$ time

3.1.2Adjacency List

  • $n$ length array of nodes, each index is list of connected nodes
  • requires $m+n$ space
  • Checking if $(u, v)$ is an edge takes $O(deg(u))$ time
  • Identifying all edges takes $\Theta(m+n)$ time

3.2Paths and Connectivity

Path
a sequence of nodes $\{v_1, ..., v_k\}$ st. any pair $(v_i, v_{i+1})$ is connected
Simple Path
all nodes in path are distinct
Connected Path
There is a path between any pair of nodes
Cycle
A path $\{v_1, ..., v_k\}$ where $v_1=v_k$, k > 2, and the first $k-1$ nodes are distinct
Tree
An undirected path which is connected and doesn't contain a cycle
Connected Component
all nodes reachable from s

3.3Traversal

  1. Connectivity: is there a path between nodes $s$ and $t$
  2. Shortest Path: the length of the shortest path between nodes $s$ and $t$
3.3.1Breadth First Search

Explore outwards from a starting node s, in all directions. Add nodes a layer at a time

Output
$L_0 = {s}$
$L_1 = $ all neighbors of $L_0$
$L_2 =$ all nodes that are not in $L_0, L_1$ which have an edge to a node in $L_1$

There is a path from $s$ to a node $t$ in $L$ if and only if $t$ appears in a BFS with $s$

Graph G
s = some node in G
Discovered[s] = true;
l = 0 //layers
Tree = null
L[0] = s

while(L[l] is not empty):
    initialize L[l+1]
    for each node u in L[l]:
        for each edge e connecting u to v:
            if Discovered[v] == false
                Discovered[v] = true
                add edge e to tree
                add v to L[l+1]
    l++

runs in $O(m+n)$ time

  • for each node, there are deg(u) incident edges
  • total time processing edges if $\sum_{u \in G} deg(u) = 2m$, since each edge is counted twice

3.3.2Depth First Search

A BFS checks all nodes within a certain distance of the root. Instead, a DFS goes all the way done a branch of the tree, and then goes down any visible branch, until it has explored every node. The DFS requires less memory, since you don't need to store each node's pointers at each level.
You would chose an algorithm based of the use case, for example a search for something far down the tree would be found quicker with a DFS search, while something closer would be easier to find with a BFS.

3.3.3Connected Component

Flood Fill
Recolor all neighboring pixels of a color l

Given a pixel of color lime, change the blob of neighboring pixels to blue

Definitions
Node: Pixel
Edge: neighboring lime pixels
Blob: connected component of lime pixels
get_connected_component() {
    R = {s}
    while (there is an edge (u, v) st. u is in R and v is not in R )
        add v to R
}

For flood fill: ensure the edge $(u, v)$ is connecting to like colored edges

3.3.4Bipartite Graphs

Bipartite
an undirected graph st. nodes can be colored red and blue such that no neighboring nodes are like colored
graph can't have an odd length cycle

Let G be a connected graph with L layers from a BFS

  1. No edge of G joins two nodes of the same layer, and G is bipartite
  2. An edge of G joins two nodes of the same layer, and G contains an odd-length cycle (and hence is not bipartite)

3.3.5Connectivity in Directed Graphs

An edge $(u, v)$ goes from node $u$ to node $v$

Mutually Reachable
$\exists$ a path between $u$ and $v$ a path between $v$ and $u$
Strong Connectivity
Every pair of nodes is mutually reachable
3.3.5.1Strong Connectivity
  • pick a node $s$
  • run BFS in $G$
  • run BFS in $G^{-1}$, where every edge in $G$ has been reversed
  • return true if all nodes are reachable in both BFS executions

3.4Directed Acyclic Graph

DAG
A directed graph with no cycles
Topological Order
Ordering of nodes as $v_1, v_2,...,v_n$ st for every edge $(v_i, v_j), i < j$
  • If G has a topological order, G is a DAG.
  • If G is a DAG, then there is a node with no incoming edges

To computer topological order:

def topological(G):
    Find a node v with no incoming edges
    G.delete(v)
    return v + topological(G)   

<div id="pagebreak"></div>

4Greedy Algorithms

  • make an optimal/plausible choice for current position
  • keep making the same decision for the remaining set

4.1Strategies

Stays Ahead
Show that greedy algorithm is optimal after each step
Exchange
Show the greedy algorithm's output is equivalent to another solution
Structural
Show every possible solution matches a condition, and the greedy algorithm matches this

4.2Interval Scheduling

  • sort by earliest finish time

4.2.1Proof of Optimality

  1. Assume greedy algorithm is suboptimal
  2. let $\{i_1, i_2, ... , i_k\}$ be the greedy set
  3. let $\{j_1, j_2, ..., j_m\}$ be the optimal set with $i_a = j_a$ for the largest possible $a$
  4. $a = m$ since the job $j_{a+1}$ must start after $j_a$ finish meaning it can be added to the set

4.3Interval Partitioning

  • assign a grouping of lectures to various lecture halls, such that there is no overlap, with the minimum number of rooms
  • A lecture $l_i$ starts and finishes at $s_i, f_i$

4.3.1Algorithm

sort lectures by (increasing) starting time
classrooms = {priority_queue}

for lecture j in schedule:
    if (j can be added to some room):
        classrooms.addClass()
    else:
        classrooms.addRoom()
        classrooms.addClass()

4.3.2Runtime

It takes $O(n\log n)$ to sort the lectures, and then $O(n)$ time to add them to the schedule

4.3.3Analysis

A new classroom is only created if there is a conflict with a current lecture. Since we have sorted by starting time, any conflict starts after $s_j$

4.4Minimize Lateness

You want to perform a set of jobs with a known processing time and due time. They should be performed such that the maximum overdue time of any job is minimized. Use a sorting of 'earliest deadline first'

4.5Djikstra's Shortest Path

  • Maintain a set of explored nodes $S$, which have a shortest path from $s$ to all nodes $u$
  • Initialize $S = \{s\}$
  • Repeatedly greedily choose the shortest available path to an unexplored node, until all nodes are explored
  • $O(E + V\log|V|)$

4.5.1Implementation

  • maintain $d(v) = min(d(u) + l_e)$, st $l_e$ is an edge $(u, v)$
  • sort new nodes by $d(v)$

4.6Minimum Spanning Tree

Given a connected graph with real value edge waits, the MST is the subset of edges touching all vertices such that the sum of edge weights is minimized.

  • where $n$ is the number of vertices in the
    • Any MST has to have $n-1$ edges
    • There are $n^{n-2}$ spanning trees

4.6.1Kruskal's Algorithm

4.6.1.1Algorithm
  1. Sort edges by increasing weight, start with MST $T = \{0\}$
  2. Try inserting edge $e$ with lowest weight
    1. if $e$ creates a cycle, remove
    2. repeat until $n-1$ edges in graph
4.6.1.2Runtime

$O(E\log E)$
1. Comparison Sort for edges
2. Use Union Find Data Strucutre to create tree, requires $O(v)$ operations

4.6.2Reverse-Delete Algorithm

4.6.2.1Algorithm
  1. Sort edges by decreasing weight, Start with MST $T = \{E\}$ (all edges)
  2. Try deleting edge $e$ with highest weight
    1. if removing $e$ disconnects graph, add it back
    2. repeat until $n-1$ edges in graph
4.6.2.2Runtime

$ O(E \log V (\log \log V)^3) $
Comparison Sort for edges
E iterations of loop
* delete in $O(1)$
* connectivity takes $O(\log V (\log \log V)^3) $

4.6.3Prim's Algorithm

4.6.3.1Algorithm
  1. Start with $T = \{v\}$ (some random vertex)
  2. Add the edge with lowest weight which doesn't have an end point in $T$
4.6.3.2Runtime
Minimum edge weight data structure Time complexity (total)
adjacency matrix, searching $O(
binary heap and adjacency list $O((
Fibonacci heap and adjacency list $O(

Time is bottle necked by lookup time

4.7Union Find / Disjoint Dataset

Functions
MakeSet(x) creates a set with only element x
Find(x) returns a number representing an elements location in the data structure
Union(x, y) merges the sets containing x and y into a new set with one representative number

4.7.1Example

Let's start with $\{ \{1\}, \{2\}, \{3\}, \{4\}, \{5\}\}$.
Calling union(1, 2) results in $\{ \{1, 2\}, \{3\}, \{4\}, \{5\}\}$. In this state, find(2) => 1
Calling union(2, 3), union(4, 5) results in $\{ \{1, 2, 3\}, \{4, 5\}\}$.

4.7.2Implementation

Imagine an array of Linked Lists. Initially, each element is in it's own index. But as union() is called, an element can be moved to another linked list, and its array index will point to the new array location.

4.7.3Iterative Find(x)

Find(x):
r = x
while r =! r.parent:
    r = r.parent
while x =! x.parent:
    x = x.parent
    x.parent = r.parent
return x

<div id="pagebreak"></div>

5Dynamic Programming

  • Break problem into set of overlapping subproblems
  • Use recursive strategy with memoization

Recipe:

  1. Characterize the structure of a problem
  2. Recursively determine the value of an optimal solution
  3. Compute value of optimal solution
  4. Use the value to find the solution

Techniques

  1. Binary Choice (weighted interval)
  2. Multi-way choice (segmented squares)
  3. Intervals (RNA Secondary Structure)

5.1Weighted Interval Scheduling

  • Maximize weight of set of jobs
    • Greedy algorithm has all weights equal to 1
  • sort jobs by finish time st $f_1 \lt f_2 \lt ... \lt f_n$
  • p(j) = returns largest index $i<j$ st $i$ and $j$ are compatible
    • p(j) can be done in linear time
    • lets us know all jobs $[1, i]$ are compatible with j
  • lets make OPT(j) which returns the maximum weighted schedule
Case 1
job $j$ is selected
jobs $[p(j)+1, j-1]$ are not in the solution
Case 2
jobs $j$ isn't selected
must include optimal solution of jobs $[1, j-1]$, or it's the same as OPT(j-1)

5.1.1Brute Force

returns maximum weight:

OPT(j)
    if (j == 0)
        return 0
    else
        return max(j + Opt(p(j)), Opt(j-1))

But this is $O(2^n)$ because of redundant recursive calls

5.1.2Memoization

instead of repeated calculation, store the result in an array

M = [0 for i in range (0, n)]

M-OPT(j)
    if (j == 0)
        return 0
    else if (M[j] is empty) 
        M[j] = max(j + M-OPT(p(j)), M-OPT(j-1)) 
    return M[j]

M-OPT must take $n$ time since it either takes constant time, or recursively calls itself. If every potential recursive call is made, the algorithm will require $2n$ calls

So $O(n)$ is jobs are presorted by start and finish times (requires $O(n\log n)$

But this only returns the maximum weight of the schedule, not the actual schedule

Can also unwind the recursion, and make an iterative algorithm

5.2Shortest Path

  • An alternative to Djikstra's Algorithm
  • Allows for negative weighted edges
    • eg. profits from decisions
  • Djikstra's algorithm doesn't work with negative edges
  • No shortest path if negative cost cycle
    • allows for the cost to reduced by going around cycle n times
    • involves a cycle with a cost less than zero
  • Shortest path can't contain cycle
    • If positive weight, there is no point
    • if negative weight, you would go around it repeatedly

5.2.1Dynamic - Bellman-Ford

OPT(i, v) returns shortest path $(v, t)$ using at most $i$ edges

Case 1
P uses at most $i-1$ edges
OPT(i, v) = OPT(i-1, v)
Case 2
P uses $i$ edges
say $(v, w)$ is the first edge, OPT uses $(v,w)$ and selects best path from $w \to t$

$$ OPT(i, v) = \begin{cases} 0, i=0\\ \min (OPT(i, v),\ OPT(i-1, w)),i\ne0 \end{cases} $$

5.2.2Implementation

Shortest-Path(G, t) {
 foreach node v ∈ V
 M[0,v] ← ∞
 M[0,t] ← 0
 for i = 1 to n-1
     foreach node v ∈ V
 M[i, v] ← M[i-1,v]
     foreach edge (u, w) ∈ E
 M[i, u] ← min { M[i,u], M[i-1,w] + cuw }
}
  • Worst Case running time : $$O(nm)$$
  • But usually is substantially faster in practice.

5.2.3Practical Improvements

  • just use one array, maintaining shortest path from $v \to t$
  • only update edges if distance has changed in a previous iteration

5.2.4Detect Negative Cycles

  • add a node $t$, connected to all nodes with weight one
  • if a path of length OPT(n) = OPT(n-1), there are no negative cycles

5.3Segmented Least Squares

  • points roughly lie on line segments
  • order by x st:
    • minimizes squared errors in line segments ($e$)
    • minimize the number of lines ($L$)
  • must trade-off between lines and errors ($c$)

5.3.1Dynamic Algorithm

$$ OPT(j) = \begin{cases} 0, j=0\\ \min\{e(i, j) + c + OPT(i-1)\}, j \ne 0 \end{cases} $$

min for all $i\in [1, j]$

5.4String Similarity

Given two strings, X = [x1, ..., xn] and Y = [y1, ..., yn], find the allignment such that they match the most. How much they 'match' is determined by minimizing the cost.

Here, an index is mismatched if x[i] != y[i], and is unmatched if x[i] = NULL

$$\mathscr{cost} = \sum_{\mathscr{mismatched}} a_{x_i, y_i}+\sum_{\mathscr{unmatched}}x_i+\sum_{\mathscr{unmatched}}y_i$$

Calculating the optimal matching:

  1. Match (x[i] = y[j])
    • cost to align x[i], y[j] + min cost of remainder
  2. No Match (x[i] != y[j])
    1. Leave x[i] unmatched
      • cost of gap + min cost of remainder
    2. Leave y[i] unmatched
      • cost of gap + min cost of remainder
        <div id="pagebreak"></div>

6Network Flow

Find maximum flow of graph
Very similar to smallest cut in a graph

6.1Min Cut and Max Flow

6.1.1Minimum Cut

Assume
unidirectional edges
each edge has a capacity c(e)
nodes trying to relate $s$ and $t$
  • net material flowing through pipes
  • capacity of a cut is the sum of capacities of edges exiting the cut
  • $st$ cut, such that each one cut has $s$ and the other has $t$

6.1.2Flows

add parameters to a node such that:

  • $0 \le f(e) \le c(e)$ for each edge
  • $\sum_{e_{in}\in v} f(e) = \sum_{e_{out}\in v} f(e) $ for each node

but how do you find a maximum flow?

6.1.3Relating Flow and Cut

  • flow out of a cut equals the flow out of a node
  • a minimum cut:
    • has incoming edges with flow 0
    • and outgoing edges with a flow at full capacity

6.2Algorithm

6.2.1Residual Graph

edge $ e = (u, v) \in E$
functions Flow(e), Capacity(e)

6.2.1.1Residual Edge
  • undo flow sent
  • so $e = (u, v)$ and $e^R = (v, u)$
    *$c_f(e)$ is residual capacity
  • $c_f(e) = \begin{cases}c(e) - f(e), if e\in E\\ f(e), if e^R \in E \end{cases}$

So Residual graph is all residual edges with positive capacity

6.2.2Ford-Fulkerson Algorithm

  1. While $\exists$ a path from $s$ to $t$
    1. Find a valid path with DFS
    2. Find lowest capacity edge in the path and augemnt that path
      • Lower flow in the path by the lowest capacity
      • Add residual edges going in the opposite direction
    3. Add the ammount to the total flow
  2. return total flow

DFS takes $O(|E|)$ and we have to execute it at worst case $f$ times, so runtime of $O(f|E|)$

In each step of the iteration, the residual edge is a valid path for teh flow to take.

6.2.3Capacity Scaling

Choosing a path with the highest bottleneck increases the flow by the maximum possible amount. However, finding the path with highest bottleneck is difficult. Instead, we use a scaling parameter $\Delta$ and use a subgraph $G_f(\Delta)$ which contains arcs with capacity of atleast $\Delta$. Then, keep adding such a path $\Delta$, and then update $G_f$. After there are no more paths, halve $\Delta$.

6.2.3.1Capacity Scaling Algorithm
  1. set flow of all edges to 0
  2. find the largest $\Delta = 2^k$ with a existent path
  3. Make a residual graph $G_f$
  4. while $\Delta \ge 1$:
    1. Find $G_f(\Delta)$
    2. while there is a path in $G_f$
      1. f = augment(f, k, P)
      2. update $G_f$
    3. halve $\Delta$
  5. return f
6.2.3.2Runtime
  1. After processing a value at $\Delta$ , the maximum flow $f$ is off by at most $m\Delta$
  2. There can be at most $2m$ augmentations per phasing
  3. The scaling max-flow algorithm finds a flow in at most $O(m\log C)$ and the overall flow can be found in $O(m^2 \log C)$.

7Applications of Max Flow / Min Cut

7.1Bipartite Matching

can be done as Network Flow

Matching
$M \subseteq E$ if each node appears at the end of at most one edge in $M$

Bipartite Matching is matching in a Bipartite Graph

  1. add node $s$ and $t$ to a bipartite graph
  2. add edges from $s$ to all nodes on one side of Bipartite Graph
  3. add edges from $t$ to all nodes on the other side of the Graph
  4. Give all edges flow 1, and find max flow

7.2Edge Disjoint Paths

Paths between $s, t$ such that there are no shared edges
Give all edges a weight of 1
Max Edge Disjoint paths equals the maximum flow value

7.3Baseball Elimination

  • starting node $s$
  • game nodes $(team\ 1, team\ 2), ..., (team\ x, team\ y )$
  • team nodes
  • end node $t$

Can team 3 finish with the most wins?

  1. Assume team 3 wins all remaining games
  2. This implies all other teams have lost all games with team 3
  3. Remove all game nodes involving team 3
  4. Remove team node 3
  5. Set initial flow from $s$ to e the number of games left between teams

The total number of games left to be played, is the total capacity of the system. All these games must be won by
someone other than team 3.

We want the max flow to equal this value, but no team has more wins than team 3 can have in the best case. So all edges leaving the source must be saturated

8Randomization

8.1Symmetry Breaking

  • Say we have $n$ processes, $\{P_1, ..., P_n\}$, all trying to access a database
  • Only 1 can enter the database at a time
  • want fairness in how the database is shared
  • Processes can't communicate

8.1.1Implementation

  • Each process request access with probability $\frac{1}{n}$
  • Let S(i, t) be the probability of $P_i$ accessing the database at time t
  • Pr[S(i, t)] = $p \times (1-p)^{n-1} = \frac{1}{n} \times (1-\frac{1}{n})^{n-1}$
  • $ \lim_{n \to 2} = \frac{1}{2}$ and $\lim_{n \to \infty} = \frac{1}{e}$
  • so we have the bounds of the probability

The probability that process $i$ fails to access the database in $en$ rounds is at most $\frac{1}{e}$ and after $en(c\ln n)$ rounds the probability is at most $n^{-c}$

The probability that all process succeed within $2e n \ln n$ rounds is at least $1-\frac{1}{n}$.

8.2Global Min Cut

  • find the smallest Min Cut in a graph i,e.cut $(A,B)$ (Minimum number of edges connecting $A$ & $B$).
  • you'd assume it's harder than the minimum $s-t$ but, but it isn't

8.2.1Contradiction Algorithm

  1. Pick an edge $e =(u,v)$ uniformly at random
  2. Contract edge $e$
    • replace $u, v$ with a super-node $w$
    • preserve edges, but update endpoints to be $w$
    • remove internal edges, but combine parallel edges
  3. Repeat until there are just two nodes, $w_1, w_2$
  4. The cut is all the nodes contracted to form $w_1$

8.2.2Runtime

A randomized algorithm just has a high probability of returning a mincut. In this case it returns a mincut with $\frac{2}{n^2}$. To increase the chance of an answer, repeat the algorithm many times.
Since the probability of finding a solution is $\frac{2}{n^2}$, the probability of not finding it is $1-\frac{2}{n^2}$. This results in a failure to find the global min cut after $n^2\ln n$ iterations of $\frac{1}{n^2}$.

8.3Linearity of Expectation

Useful property: if $X$ is a $0/1$ random variable the expectation of $X$ is $E[X] = Pr[X=1]$.

8.3.1Guessing Cards

  • Without memory the expected number of correct guesses is 1.
  • With memory the the number of correct guesses is $\Theta(\log n)$

8.3.2Coupon Collector

  • The expected number of steps is $\Theta(n \log n)$ for $n$ different types of coupons and uniform distribution.

8.4Randomized Divide and Conquer (Quicksort)

  • Randomization protects against worst case by choosing pivot at random.

Expected Number of Comparisons:
$$q_n = 2n\ln n - (4- 2(.577))n + 2\ln n + O(1)$$

8.5Hashing

  • use a prime number with an order about equal to the size of the table
  • identify each element with a number

8.5.1Universal Hashing

$\newcommand{\H}{\mathscr{H}}$
We want to map a subset of a large universe $U$. It is large enough that we can't maintain an array.

We want a function $h: u \in U \to \{0, 1, ..., n-1\}$, or a funtion which randomly maps values to an array of size $n$. From the birthday problem, a collision is expected fairly often. We want some degree of randomness, to prevent malicious input from harming computer.

A Universal family is defined such that a $v \ne u$ and function $h \in \H$ st $\Pr_{h}[h(u)=h(v)]\le \frac{1}{n}$. You want to compute $h$ efficiently and select it at random efficiently.
To encode an integer, we can use:
$$ h_a(x) = \sum_{i=1}^{r} a_i x_i \mod p $$
with $p$ being a prime, $r$ the length of the integer, and $0 \le a_i \lt p$. This means we can say $\H = \{h_a: a \in A\}$ is a universal family.

8.5.2K-Universal Hashing

You want to ensure that when hashing $k$ keys, the chances of $h(x_1) = y_1, ..., h(x_k) = y_k = \frac{1}{n^k}$, in addition to the qualifications for a universal hashing function. Basically, you want to ensure that two different keys have a low chance of colliding (condition 1) and repeatedly hashing to the same value (condition 2).

You can try to use all $h \in \H$ and write down the outputs. Now, you must make sure that the chances of $h(x_1) = y_1, ..., h(x_k) = y_k = \frac{1}{n^k}$.

8.6Monte Carlo vs Las Vegas Algorithms

Monte Carlo Algorithm: Guaranteed to run in poly-time, likely to find correct answer.
* Ex) Contraction Algorithm for global min cut

Las Vegas Algorithm:
Guaranteed to find correct answer, likely yo run in poly-time.
* Ex) Randomized Quicksort

Note:
Can always convert a Las Vegas Algo into Monte Carlo, but no known method for reverse.

9Linear Sorting

  • Numbers bounded by $k$
  • Counting sort, $O(n)$ if $k$ is $O(n)$
  • Stable sorts -Preserve order
  • Radix Sort, $O(n)$ if $k$ is $O(n^c)$.

Counting sort:
" Its running time is linear in the number of items and the difference between the maximum and minimum key values, so it is only suitable for direct use in situations where the variation in keys is not significantly greater than the number of items."

# variables:
#   input -- the array of items to be sorted; key(x) returns the key for item x
#    n -- the length of the input
#    k -- a number such that all keys are in the range 0..k-1
#    count -- an array of numbers, with indexes 0..k-1, initially all zero
#    output -- an array of items, with indexes 0..n-1
#    x -- an individual input item, used within the algorithm
#    total, oldCount, i -- numbers used within the algorithm

# calculate the histogram of key frequencies:
for x in input:
    count[key(x)] += 1

# calculate the starting index for each key:
total = 0
for i in range(k):   # i = 0, 1, ... k-1
    oldCount = count[i]
    count[i] = total
    total += oldCount

# copy to output array, preserving order of inputs with equal keys:
for x in input:
    output[count[key(x)]] = x
    count[key(x)] += 1

return output

Radix sort: Implements a version of counting sort.

10Red-Black Trees

  1. A self-balancing binary search tree
    • Every node is either red or black
    • The root is always black
    • The botom row is all black leafs with value Null
    • A red node must have two black children
  2. Height of a red-black tree
    • Height of a node is the number of edges in a longest path to a leaf.
    • Black-height of a node $x: bh(x)$ is the number of black nodes (including Null) on the path from $x$ to leaf, not counting $x$. Black-height is well defined.
    • Any node with height $h$ has black-height $\geq \frac{h}{2}$.
    • The subtree rooted at any node $x$ contains $\geq 2^{bh(x)} -1$ internal nodes.
    • A red-black tree with $n$ nodes has height $\leq 2\lg(n+1)$.
  3. Insertion
    • on insertion you want to maintain the balance of the tree
    • involves right and left rotation
    • it takes $O(1)$ time to rearrange the pointers via rotations, but $O(n)$ time to recolor the tree

11Overview

  1. 2 Questions about Algorithms
    1. Analyze Runtime
    2. Presumably Dynamic and Max Flow / Min Cut
  2. Divide and Conquer
    1. No Matrix Multiplication
    2. Effect of Parameters to Runtime
    3. Master Method
  3. Graphs
    1. Breadth First Search
    2. No DAG
  4. Greedy Algorithms
    1. Djikstra's Algorithm
    2. Minimum Spanning Tree
  5. Dynamic
    1. Can't use dynamic for most recursive algorithms
      1. Often easier to iteratively solve problem
    2. Expect a question
  6. Flow
    1. Ford Fulkerson
    2. Capacity Scaling
    3. Runtime and Comparison
  7. Randomized Algorithm
    1. Randomized Quicksort
    2. Hashing and Expected Collisions