1Stable Matching¶
With $n$ men and $n$ women, find a pairing with no instabilities
- Instability:
- Man $m$ prefers woman $w$ over assigned woman $x$
- $w$ prefers man $m$ over her assigned partner
Results in a male-optimal and female-pessimal solution
1.1Gale-Shapley Algorithm¶
all people are free while( any man is free & hasn't proposed to all women): Choose such a man m w = 1st on m's list who hasn't been proposed to yet if (w is free): assign w to m else: if (w prefers m over assigned partner, a): assign w to m make a free else w rejects m
1.1.1Stability Proof¶
- Suppose $A-Z$ is an unstable pair in the GS Matching $S^*$
- $A$ is matched to $B$, $Z$ is matched to $X$
- Case $A$ never proposed to $Z$
- $A$ goes down his list in order of favorites.
- If $A$ never proposed to$Z$, he prefers $B$ over $Z$
- Case $A$ proposed to $Z$
- $A$ and $Z$ were matched for a time.
- They can only become unmatched if $Z$ prefers some other suitor over $A$
1.1.2Male-Optimality Proof¶
- Suppose man $M$ is matched to someone other than his best stable partner in $S$
- Men propose in decreasing order of preference
- If $M$ was rejected by a partner, she would have to have stably preferred someone
1.2Bipartiteness¶
only one of the following holds for a graph $G$ and BFS $B$
- No edge joins two nodes inside $B$, $G$ is bipartite
- An edge joins two nodes in the same layer $\to$ $G$ contains an odd length cycle $\to$ $G$ is not bipartite
2Divide and Conquer¶
- Divide problem into several parts
- Solve parts recursively
- Combine solutions from sub-problems into final answer
2.1Mergesort¶
- divide array $O(1)$
- recursively sort $2 \times T(\frac{n}{2})$
- merge $O(n)$
2.1.1Merging¶
- Two pointers for each (sorted) array
- Compare values, and add lowest to output array
- left shift pointer
- repeat
2.1.2Pseudocode¶
def mergesort( var a as array ) if ( n == 1 ) return a var l1 as array = a[0] ... a[n/2] var l2 as array = a[n/2+1] ... a[n] l1 = mergesort( l1 ) l2 = mergesort( l2 ) return merge( l1, l2 ) def merge( var a as array, var b as array ) var c as array while ( a and b have elements ) if ( a[0] > b[0] ) add b[0] to the end of c remove b[0] from b else add a[0] to the end of c remove a[0] from a while ( a has elements ) add a[0] to the end of c remove a[0] from a while ( b has elements ) add b[0] to the end of c remove b[0] from b return c
2.2Closest Pair of Points¶
2.2.1Algorithm¶
- Draw vertical line to roughly cut points in half
- Recursively find closest points in each half
- Find any potentially closer pairs after merging halves
- Return closest pair of points
2.2.2Analysis¶
This algorithm takes $O(\log^2(n))$, but can be improved to $O(\log(n)$ via pre-sorting the list of points. Each recursion returns a list, with points sorted by y-coordinate, and we then merge the lists.
2.3Integer Multiplication¶
2.3.1Brute Force¶
- brute force requires $O(n^2)$ operations
- an $n$ digit number can be represented by 2 $\frac{n}{2}$ digit numbers
- $x = 2^{\frac{n}{2}}x_1+x_0$, $y = 2^{\frac{n}{2}}y_1+y_0$
-
$xy = 2^n(x_1y_1)+2^{\frac{n}{2}}(x_1y_0+x_0y_1)+x_0y_0$
-
multiply 4 $\frac{n}{2}$ digit numbers
- add two $\frac{n}{2}$ digit integers and shift
- $T(n) = 4T(\frac{n}{2})+\Theta(n) \to T(n) \in \Theta(n^2)$
2.3.2Karatsuba Multiplication¶
- instead, you can add 2 $\frac{n}{2}$ digit numbers
- multiply three $\frac{n}{2}$
- same as above, but:
$$xy = 2^n(x_1y_1) + 2^{\frac{n}{2}}((x_1+x_0)(y_1+y_0)-x_1y_1-x_0y_0)+x_0y_0$$
- Runtime
- $T(n)=3T(\frac{n}{2})+\Theta(n)$
- $T(n) \in O(n^{\log_2 3}) \approx n^{1.5}$
2.4Matrix Multiplication¶
2.4.1Brute Force¶
- $O(n^3)$ time
- $n$ multiplications for $n \times n$ elements
2.4.2Divide and Conquer¶
- Divide matrix into 4 ${\frac{n}{2}} \times {\frac{n}{2}}$ parts
- Find product of quarters, with 8 multiplications (of 2x2)
- Add appropriate products with 4 matrix additions
- Return appropriate values
$T(n) = 8T(\frac{n}{2}) + \Theta(n^2)\to T(n)\in\Theta(n^3)$
No improvement over naive algorithm
BUT you can multiply two 2x2 matrices with 7 multiplications
leads to $T(n) \in \Theta(n^{\log_2 7})$
2.5Master Theorem¶
$T(n) = \alpha T(\frac{n}{\beta}) + f(n), \alpha \ge 1, \beta \gt 1 $
- $n$ is size of problem
- $\alpha$ is number of sub problems
- $\frac{n}{\beta}$ is size of sub problems
- $f(n)$ is additional work required
- Case 1
- $f(n) \in O(n^c),\ st.\ c \lt \log_{\beta}\alpha $
- $T(n) \in \Theta(n^{\log_{\beta}\alpha})$
- Case 2
- $f(n) \in \Theta(n^c \log^k n),\ st.\ c \lt \log_{\beta}\alpha $
- $T(n) \in \Theta\left( n^{c} \log^{k+1} n \right) $
- Case 3
- $f(n) \in \Omega(n^c),\ st.\ c \lt \log_{\beta}\alpha $ and
- $a f\left( \frac{n}{b} \right) \le k f(n)$ for some constant $k < 1$ and sufficiently large $n$
- $T(n) \in \Theta(f(n))$
3Graphs¶
- captures pairwise relationship between objects
- V nodes and E edges
- Size Parameters: $n=V$, $m=E$
degree
is number of neighboring edges
3.1Representations¶
3.1.1Adjacency Matrix¶
- $n\times n$ matrix where $A_{uv}$ is 1 if $(u, v)$ is an edge
- Requires $n^2$ space
- Checking if $(u, v)$ is an edge takes $\Theta(1)$ time
- Identifying all edges takes $\Theta(n^2)$ time
3.1.2Adjacency List¶
- $n$ length array of nodes, each index is list of connected nodes
- requires $m+n$ space
- Checking if $(u, v)$ is an edge takes $O(deg(u))$ time
- Identifying all edges takes $\Theta(m+n)$ time
3.2Paths and Connectivity¶
- Path
- a sequence of nodes $\{v_1, ..., v_k\}$ st. any pair $(v_i, v_{i+1})$ is connected
- Simple Path
- all nodes in path are distinct
- Connected Path
- There is a path between any pair of nodes
- Cycle
- A path $\{v_1, ..., v_k\}$ where $v_1=v_k$, k > 2, and the first $k-1$ nodes are distinct
- Tree
- An undirected path which is connected and doesn't contain a cycle
- Connected Component
- all nodes reachable from s
3.3Traversal¶
- Connectivity: is there a path between nodes $s$ and $t$
- Shortest Path: the length of the shortest path between nodes $s$ and $t$
3.3.1Breadth First Search¶
Explore outwards from a starting node s, in all directions. Add nodes a layer at a time
- Output
- $L_0 = {s}$
- $L_1 = $ all neighbors of $L_0$
- $L_2 =$ all nodes that are not in $L_0, L_1$ which have an edge to a node in $L_1$
There is a path from $s$ to a node $t$ in $L$ if and only if $t$ appears in a BFS with $s$
Graph G s = some node in G Discovered[s] = true; l = 0 //layers Tree = null L[0] = s while(L[l] is not empty): initialize L[l+1] for each node u in L[l]: for each edge e connecting u to v: if Discovered[v] == false Discovered[v] = true add edge e to tree add v to L[l+1] l++
runs in $O(m+n)$ time
- for each node, there are
deg(u)
incident edges - total time processing edges if $\sum_{u \in G} deg(u) = 2m$, since each edge is counted twice
3.3.2Depth First Search¶
A BFS checks all nodes within a certain distance of the root. Instead, a DFS goes all the way done a branch of the tree, and then goes down any visible branch, until it has explored every node. The DFS requires less memory, since you don't need to store each node's pointers at each level.
You would chose an algorithm based of the use case, for example a search for something far down the tree would be found quicker with a DFS search, while something closer would be easier to find with a BFS.
3.3.3Connected Component¶
- Flood Fill
- Recolor all neighboring pixels of a color
l
Given a pixel of color lime, change the blob of neighboring pixels to blue
- Definitions
- Node: Pixel
Edge: neighboring lime pixels
Blob: connected component of lime pixels
get_connected_component() { R = {s} while (there is an edge (u, v) st. u is in R and v is not in R ) add v to R }
For flood fill: ensure the edge $(u, v)$ is connecting to like colored edges
3.3.4Bipartite Graphs¶
- Bipartite
- an undirected graph st. nodes can be colored red and blue such that no neighboring nodes are like colored
- graph can't have an odd length cycle
Let G be a connected graph with L layers from a BFS
- No edge of G joins two nodes of the same layer, and G is bipartite
- An edge of G joins two nodes of the same layer, and G contains an odd-length cycle (and hence is not bipartite)
3.3.5Connectivity in Directed Graphs¶
An edge $(u, v)$ goes from node $u$ to node $v$
- Mutually Reachable
- $\exists$ a path between $u$ and $v$ a path between $v$ and $u$
- Strong Connectivity
- Every pair of nodes is mutually reachable
3.3.5.1Strong Connectivity¶
- pick a node $s$
- run BFS in $G$
- run BFS in $G^{-1}$, where every edge in $G$ has been reversed
- return true if all nodes are reachable in both BFS executions
3.4Directed Acyclic Graph¶
- DAG
- A directed graph with no cycles
- Topological Order
- Ordering of nodes as $v_1, v_2,...,v_n$ st for every edge $(v_i, v_j), i < j$
- If G has a topological order, G is a DAG.
- If G is a DAG, then there is a node with no incoming edges
To computer topological order:
def topological(G): Find a node v with no incoming edges G.delete(v) return v + topological(G)
<div id="pagebreak"></div>
4Greedy Algorithms¶
- make an optimal/plausible choice for current position
- keep making the same decision for the remaining set
4.1Strategies¶
- Stays Ahead
- Show that greedy algorithm is optimal after each step
- Exchange
- Show the greedy algorithm's output is equivalent to another solution
- Structural
- Show every possible solution matches a condition, and the greedy algorithm matches this
4.2Interval Scheduling¶
- sort by earliest finish time
4.2.1Proof of Optimality¶
- Assume greedy algorithm is suboptimal
- let $\{i_1, i_2, ... , i_k\}$ be the greedy set
- let $\{j_1, j_2, ..., j_m\}$ be the optimal set with $i_a = j_a$ for the largest possible $a$
- $a = m$ since the job $j_{a+1}$ must start after $j_a$ finish meaning it can be added to the set
4.3Interval Partitioning¶
- assign a grouping of lectures to various lecture halls, such that there is no overlap, with the minimum number of rooms
- A lecture $l_i$ starts and finishes at $s_i, f_i$
4.3.1Algorithm¶
sort lectures by (increasing) starting time classrooms = {priority_queue} for lecture j in schedule: if (j can be added to some room): classrooms.addClass() else: classrooms.addRoom() classrooms.addClass()
4.3.2Runtime¶
It takes $O(n\log n)$ to sort the lectures, and then $O(n)$ time to add them to the schedule
4.3.3Analysis¶
A new classroom is only created if there is a conflict with a current lecture. Since we have sorted by starting time, any conflict starts after $s_j$
4.4Minimize Lateness¶
You want to perform a set of jobs with a known processing time and due time. They should be performed such that the maximum overdue time of any job is minimized. Use a sorting of 'earliest deadline first'
4.5Djikstra's Shortest Path¶
- Maintain a set of explored nodes $S$, which have a shortest path from $s$ to all nodes $u$
- Initialize $S = \{s\}$
- Repeatedly greedily choose the shortest available path to an unexplored node, until all nodes are explored
- $O(E + V\log|V|)$
4.5.1Implementation¶
- maintain $d(v) = min(d(u) + l_e)$, st $l_e$ is an edge $(u, v)$
- sort new nodes by $d(v)$
4.6Minimum Spanning Tree¶
Given a connected graph with real value edge waits, the MST is the subset of edges touching all vertices such that the sum of edge weights is minimized.
- where $n$ is the number of vertices in the
- Any MST has to have $n-1$ edges
- There are $n^{n-2}$ spanning trees
4.6.1Kruskal's Algorithm¶
4.6.1.1Algorithm¶
- Sort edges by increasing weight, start with MST $T = \{0\}$
- Try inserting edge $e$ with lowest weight
- if $e$ creates a cycle, remove
- repeat until $n-1$ edges in graph
4.6.1.2Runtime¶
$O(E\log E)$
1. Comparison Sort for edges
2. Use Union Find Data Strucutre to create tree, requires $O(v)$ operations
4.6.2Reverse-Delete Algorithm¶
4.6.2.1Algorithm¶
- Sort edges by decreasing weight, Start with MST $T = \{E\}$ (all edges)
- Try deleting edge $e$ with highest weight
- if removing $e$ disconnects graph, add it back
- repeat until $n-1$ edges in graph
4.6.2.2Runtime¶
$ O(E \log V (\log \log V)^3) $
Comparison Sort for edges
E iterations of loop
* delete in $O(1)$
* connectivity takes $O(\log V (\log \log V)^3) $
4.6.3Prim's Algorithm¶
4.6.3.1Algorithm¶
- Start with $T = \{v\}$ (some random vertex)
- Add the edge with lowest weight which doesn't have an end point in $T$
4.6.3.2Runtime¶
Minimum edge weight data structure | Time complexity (total) |
---|---|
adjacency matrix, searching | $O( |
binary heap and adjacency list | $O(( |
Fibonacci heap and adjacency list | $O( |
Time is bottle necked by lookup time
4.7Union Find / Disjoint Dataset¶
- Functions
MakeSet(x)
creates a set with only elementx
Find(x)
returns a number representing an elements location in the data structure
Union(x, y)
merges the sets containingx
andy
into a new set with one representative number
4.7.1Example¶
Let's start with $\{ \{1\}, \{2\}, \{3\}, \{4\}, \{5\}\}$.
Calling union(1, 2)
results in $\{ \{1, 2\}, \{3\}, \{4\}, \{5\}\}$. In this state, find(2) => 1
Calling union(2, 3)
, union(4, 5)
results in $\{ \{1, 2, 3\}, \{4, 5\}\}$.
4.7.2Implementation¶
Imagine an array of Linked Lists. Initially, each element is in it's own index. But as union()
is called, an element can be moved to another linked list, and its array index will point to the new array location.
4.7.3Iterative Find(x)¶
Find(x): r = x while r =! r.parent: r = r.parent while x =! x.parent: x = x.parent x.parent = r.parent return x
<div id="pagebreak"></div>
5Dynamic Programming¶
- Break problem into set of overlapping subproblems
- Use recursive strategy with memoization
Recipe:
- Characterize the structure of a problem
- Recursively determine the value of an optimal solution
- Compute value of optimal solution
- Use the value to find the solution
Techniques
- Binary Choice (weighted interval)
- Multi-way choice (segmented squares)
- Intervals (RNA Secondary Structure)
5.1Weighted Interval Scheduling¶
- Maximize weight of set of jobs
- Greedy algorithm has all weights equal to 1
- sort jobs by finish time st $f_1 \lt f_2 \lt ... \lt f_n$
p(j)
= returns largest index $i<j$ st $i$ and $j$ are compatiblep(j)
can be done in linear time- lets us know all jobs $[1, i]$ are compatible with j
- lets make
OPT(j)
which returns the maximum weighted schedule
- Case 1
- job $j$ is selected
jobs $[p(j)+1, j-1]$ are not in the solution - Case 2
- jobs $j$ isn't selected
must include optimal solution of jobs $[1, j-1]$, or it's the same asOPT(j-1)
5.1.1Brute Force¶
returns maximum weight:
OPT(j) if (j == 0) return 0 else return max(j + Opt(p(j)), Opt(j-1))
But this is $O(2^n)$ because of redundant recursive calls
5.1.2Memoization¶
instead of repeated calculation, store the result in an array
M = [0 for i in range (0, n)] M-OPT(j) if (j == 0) return 0 else if (M[j] is empty) M[j] = max(j + M-OPT(p(j)), M-OPT(j-1)) return M[j]
M-OPT
must take $n$ time since it either takes constant time, or recursively calls itself. If every potential recursive call is made, the algorithm will require $2n$ calls
So $O(n)$ is jobs are presorted by start and finish times (requires $O(n\log n)$
But this only returns the maximum weight of the schedule, not the actual schedule
Can also unwind the recursion, and make an iterative algorithm
5.2Shortest Path¶
- An alternative to Djikstra's Algorithm
- Allows for negative weighted edges
- eg. profits from decisions
- Djikstra's algorithm doesn't work with negative edges
- No shortest path if negative cost cycle
- allows for the cost to reduced by going around cycle n times
- involves a cycle with a cost less than zero
- Shortest path can't contain cycle
- If positive weight, there is no point
- if negative weight, you would go around it repeatedly
5.2.1Dynamic - Bellman-Ford¶
OPT(i, v)
returns shortest path $(v, t)$ using at most $i$ edges
- Case 1
- P uses at most $i-1$ edges
OPT(i, v) = OPT(i-1, v)
- Case 2
- P uses $i$ edges
say $(v, w)$ is the first edge,OPT
uses $(v,w)$ and selects best path from $w \to t$
$$ OPT(i, v) = \begin{cases} 0, i=0\\ \min (OPT(i, v),\ OPT(i-1, w)),i\ne0 \end{cases} $$
5.2.2Implementation¶
Shortest-Path(G, t) { foreach node v ∈ V M[0,v] ← ∞ M[0,t] ← 0 for i = 1 to n-1 foreach node v ∈ V M[i, v] ← M[i-1,v] foreach edge (u, w) ∈ E M[i, u] ← min { M[i,u], M[i-1,w] + cuw } }
- Worst Case running time : $$O(nm)$$
- But usually is substantially faster in practice.
5.2.3Practical Improvements¶
- just use one array, maintaining shortest path from $v \to t$
- only update edges if distance has changed in a previous iteration
5.2.4Detect Negative Cycles¶
- add a node $t$, connected to all nodes with weight one
- if a path of length
OPT(n) = OPT(n-1)
, there are no negative cycles
5.3Segmented Least Squares¶
- points roughly lie on line segments
- order by x st:
- minimizes squared errors in line segments ($e$)
- minimize the number of lines ($L$)
- must trade-off between lines and errors ($c$)
5.3.1Dynamic Algorithm¶
$$ OPT(j) = \begin{cases} 0, j=0\\ \min\{e(i, j) + c + OPT(i-1)\}, j \ne 0 \end{cases} $$
min for all $i\in [1, j]$
5.4String Similarity¶
Given two strings, X = [x1, ..., xn]
and Y = [y1, ..., yn]
, find the allignment such that they match the most. How much they 'match' is determined by minimizing the cost.
Here, an index is mismatched if x[i] != y[i]
, and is unmatched if x[i] = NULL
$$\mathscr{cost} = \sum_{\mathscr{mismatched}} a_{x_i, y_i}+\sum_{\mathscr{unmatched}}x_i+\sum_{\mathscr{unmatched}}y_i$$
Calculating the optimal matching:
- Match (
x[i] = y[j]
)- cost to align
x[i], y[j]
+ min cost of remainder
- cost to align
- No Match (
x[i] != y[j]
)- Leave
x[i]
unmatched- cost of gap + min cost of remainder
- Leave
y[i]
unmatched- cost of gap + min cost of remainder
<div id="pagebreak"></div>
- cost of gap + min cost of remainder
- Leave
6Network Flow¶
Find maximum flow of graph
Very similar to smallest cut in a graph
6.1Min Cut and Max Flow¶
6.1.1Minimum Cut¶
- Assume
- unidirectional edges
each edge has a capacityc(e)
nodes trying to relate $s$ and $t$
- net material flowing through pipes
- capacity of a cut is the sum of capacities of edges exiting the cut
- $st$ cut, such that each one cut has $s$ and the other has $t$
6.1.2Flows¶
add parameters to a node such that:
- $0 \le f(e) \le c(e)$ for each edge
- $\sum_{e_{in}\in v} f(e) = \sum_{e_{out}\in v} f(e) $ for each node
but how do you find a maximum flow?
6.1.3Relating Flow and Cut¶
- flow out of a cut equals the flow out of a node
- a minimum cut:
- has incoming edges with flow 0
- and outgoing edges with a flow at full capacity
6.2Algorithm¶
6.2.1Residual Graph¶
edge $ e = (u, v) \in E$
functions Flow(e), Capacity(e)
6.2.1.1Residual Edge¶
- undo flow sent
- so $e = (u, v)$ and $e^R = (v, u)$
*$c_f(e)$ is residual capacity - $c_f(e) = \begin{cases}c(e) - f(e), if e\in E\\ f(e), if e^R \in E \end{cases}$
So Residual graph is all residual edges with positive capacity
6.2.2Ford-Fulkerson Algorithm¶
- While $\exists$ a path from $s$ to $t$
- Find a valid path with DFS
- Find lowest capacity edge in the path and augemnt that path
- Lower flow in the path by the lowest capacity
- Add residual edges going in the opposite direction
- Add the ammount to the total flow
- return total flow
DFS takes $O(|E|)$ and we have to execute it at worst case $f$ times, so runtime of $O(f|E|)$
In each step of the iteration, the residual edge is a valid path for teh flow to take.
6.2.3Capacity Scaling¶
Choosing a path with the highest bottleneck increases the flow by the maximum possible amount. However, finding the path with highest bottleneck is difficult. Instead, we use a scaling parameter $\Delta$ and use a subgraph $G_f(\Delta)$ which contains arcs with capacity of atleast $\Delta$. Then, keep adding such a path $\Delta$, and then update $G_f$. After there are no more paths, halve $\Delta$.
6.2.3.1Capacity Scaling Algorithm¶
- set flow of all edges to 0
- find the largest $\Delta = 2^k$ with a existent path
- Make a residual graph $G_f$
- while $\Delta \ge 1$:
- Find $G_f(\Delta)$
- while there is a path in $G_f$
- f = augment(f, k, P)
- update $G_f$
- halve $\Delta$
- return f
6.2.3.2Runtime¶
- After processing a value at $\Delta$ , the maximum flow $f$ is off by at most $m\Delta$
- There can be at most $2m$ augmentations per phasing
- The scaling max-flow algorithm finds a flow in at most $O(m\log C)$ and the overall flow can be found in $O(m^2 \log C)$.
7Applications of Max Flow / Min Cut¶
7.1Bipartite Matching¶
can be done as Network Flow
- Matching
- $M \subseteq E$ if each node appears at the end of at most one edge in $M$
Bipartite Matching is matching in a Bipartite Graph
- add node $s$ and $t$ to a bipartite graph
- add edges from $s$ to all nodes on one side of Bipartite Graph
- add edges from $t$ to all nodes on the other side of the Graph
- Give all edges flow 1, and find max flow
7.2Edge Disjoint Paths¶
Paths between $s, t$ such that there are no shared edges
Give all edges a weight of 1
Max Edge Disjoint paths equals the maximum flow value
7.3Baseball Elimination¶
- starting node $s$
- game nodes $(team\ 1, team\ 2), ..., (team\ x, team\ y )$
- team nodes
- end node $t$
Can team 3 finish with the most wins?
- Assume team 3 wins all remaining games
- This implies all other teams have lost all games with team 3
- Remove all game nodes involving team 3
- Remove team node 3
- Set initial flow from $s$ to e the number of games left between teams
The total number of games left to be played, is the total capacity of the system. All these games must be won by
someone other than team 3.
We want the max flow to equal this value, but no team has more wins than team 3 can have in the best case. So all edges leaving the source must be saturated
8Randomization¶
8.1Symmetry Breaking¶
- Say we have $n$ processes, $\{P_1, ..., P_n\}$, all trying to access a database
- Only 1 can enter the database at a time
- want fairness in how the database is shared
- Processes can't communicate
8.1.1Implementation¶
- Each process request access with probability $\frac{1}{n}$
- Let
S(i, t)
be the probability of $P_i$ accessing the database at time t Pr[S(i, t)]
= $p \times (1-p)^{n-1} = \frac{1}{n} \times (1-\frac{1}{n})^{n-1}$- $ \lim_{n \to 2} = \frac{1}{2}$ and $\lim_{n \to \infty} = \frac{1}{e}$
- so we have the bounds of the probability
The probability that process $i$ fails to access the database in $en$ rounds is at most $\frac{1}{e}$ and after $en(c\ln n)$ rounds the probability is at most $n^{-c}$
The probability that all process succeed within $2e n \ln n$ rounds is at least $1-\frac{1}{n}$.
8.2Global Min Cut¶
- find the smallest Min Cut in a graph i,e.cut $(A,B)$ (Minimum number of edges connecting $A$ & $B$).
- you'd assume it's harder than the minimum $s-t$ but, but it isn't
8.2.1Contradiction Algorithm¶
- Pick an edge $e =(u,v)$ uniformly at random
- Contract edge $e$
- replace $u, v$ with a super-node $w$
- preserve edges, but update endpoints to be $w$
- remove internal edges, but combine parallel edges
- Repeat until there are just two nodes, $w_1, w_2$
- The cut is all the nodes contracted to form $w_1$
8.2.2Runtime¶
A randomized algorithm just has a high probability of returning a mincut. In this case it returns a mincut with $\frac{2}{n^2}$. To increase the chance of an answer, repeat the algorithm many times.
Since the probability of finding a solution is $\frac{2}{n^2}$, the probability of not finding it is $1-\frac{2}{n^2}$. This results in a failure to find the global min cut after $n^2\ln n$ iterations of $\frac{1}{n^2}$.
8.3Linearity of Expectation¶
Useful property: if $X$ is a $0/1$ random variable the expectation of $X$ is $E[X] = Pr[X=1]$.
8.3.1Guessing Cards¶
- Without memory the expected number of correct guesses is 1.
- With memory the the number of correct guesses is $\Theta(\log n)$
8.3.2Coupon Collector¶
- The expected number of steps is $\Theta(n \log n)$ for $n$ different types of coupons and uniform distribution.
8.4Randomized Divide and Conquer (Quicksort)¶
- Randomization protects against worst case by choosing pivot at random.
Expected Number of Comparisons:
$$q_n = 2n\ln n - (4- 2(.577))n + 2\ln n + O(1)$$
8.5Hashing¶
- use a prime number with an order about equal to the size of the table
- identify each element with a number
8.5.1Universal Hashing¶
$\newcommand{\H}{\mathscr{H}}$
We want to map a subset of a large universe $U$. It is large enough that we can't maintain an array.
We want a function $h: u \in U \to \{0, 1, ..., n-1\}$, or a funtion which randomly maps values to an array of size $n$. From the birthday problem, a collision is expected fairly often. We want some degree of randomness, to prevent malicious input from harming computer.
A Universal family is defined such that a $v \ne u$ and function $h \in \H$ st $\Pr_{h}[h(u)=h(v)]\le \frac{1}{n}$. You want to compute $h$ efficiently and select it at random efficiently.
To encode an integer, we can use:
$$
h_a(x) = \sum_{i=1}^{r} a_i x_i \mod p
$$
with $p$ being a prime, $r$ the length of the integer, and $0 \le a_i \lt p$. This means we can say $\H = \{h_a: a \in A\}$ is a universal family.
8.5.2K-Universal Hashing¶
You want to ensure that when hashing $k$ keys, the chances of $h(x_1) = y_1, ..., h(x_k) = y_k = \frac{1}{n^k}$, in addition to the qualifications for a universal hashing function. Basically, you want to ensure that two different keys have a low chance of colliding (condition 1) and repeatedly hashing to the same value (condition 2).
You can try to use all $h \in \H$ and write down the outputs. Now, you must make sure that the chances of $h(x_1) = y_1, ..., h(x_k) = y_k = \frac{1}{n^k}$.
8.6Monte Carlo vs Las Vegas Algorithms¶
Monte Carlo Algorithm: Guaranteed to run in poly-time, likely to find correct answer.
* Ex) Contraction Algorithm for global min cut
Las Vegas Algorithm:
Guaranteed to find correct answer, likely yo run in poly-time.
* Ex) Randomized Quicksort
Note:
Can always convert a Las Vegas Algo into Monte Carlo, but no known method for reverse.
9Linear Sorting¶
- Numbers bounded by $k$
- Counting sort, $O(n)$ if $k$ is $O(n)$
- Stable sorts -Preserve order
- Radix Sort, $O(n)$ if $k$ is $O(n^c)$.
Counting sort:
" Its running time is linear in the number of items and the difference between the maximum and minimum key values, so it is only suitable for direct use in situations where the variation in keys is not significantly greater than the number of items."
# variables: # input -- the array of items to be sorted; key(x) returns the key for item x # n -- the length of the input # k -- a number such that all keys are in the range 0..k-1 # count -- an array of numbers, with indexes 0..k-1, initially all zero # output -- an array of items, with indexes 0..n-1 # x -- an individual input item, used within the algorithm # total, oldCount, i -- numbers used within the algorithm # calculate the histogram of key frequencies: for x in input: count[key(x)] += 1 # calculate the starting index for each key: total = 0 for i in range(k): # i = 0, 1, ... k-1 oldCount = count[i] count[i] = total total += oldCount # copy to output array, preserving order of inputs with equal keys: for x in input: output[count[key(x)]] = x count[key(x)] += 1 return output
Radix sort: Implements a version of counting sort.
10Red-Black Trees¶
- A self-balancing binary search tree
- Every node is either
red
orblack
- The root is always
black
- The botom row is all
black
leafs with valueNull
- A
red
node must have twoblack
children
- Every node is either
- Height of a red-black tree
- Height of a node is the number of edges in a longest path to a leaf.
- Black-height of a node $x: bh(x)$ is the number of
black
nodes (includingNull
) on the path from $x$ to leaf, not counting $x$. Black-height is well defined. - Any node with height $h$ has black-height $\geq \frac{h}{2}$.
- The subtree rooted at any node $x$ contains $\geq 2^{bh(x)} -1$ internal nodes.
- A red-black tree with $n$ nodes has height $\leq 2\lg(n+1)$.
- Insertion
- on insertion you want to maintain the balance of the tree
- involves right and left rotation
- it takes $O(1)$ time to rearrange the pointers via rotations, but $O(n)$ time to recolor the tree
11Overview¶
- 2 Questions about Algorithms
- Analyze Runtime
- Presumably Dynamic and Max Flow / Min Cut
- Divide and Conquer
- No Matrix Multiplication
- Effect of Parameters to Runtime
- Master Method
- Graphs
- Breadth First Search
- No DAG
- Greedy Algorithms
- Djikstra's Algorithm
- Minimum Spanning Tree
- Dynamic
- Can't use dynamic for most recursive algorithms
- Often easier to iteratively solve problem
- Expect a question
- Can't use dynamic for most recursive algorithms
- Flow
- Ford Fulkerson
- Capacity Scaling
- Runtime and Comparison
- Randomized Algorithm
- Randomized Quicksort
- Hashing and Expected Collisions