1Stable Matching¶
With $n$ men and $n$ women, find a pairing with no instabilities
 Instability:
 Man $m$ prefers woman $w$ over assigned woman $x$
 $w$ prefers man $m$ over her assigned partner
Results in a maleoptimal and femalepessimal solution
1.1GaleShapley Algorithm¶
all people are free while( any man is free & hasn't proposed to all women): Choose such a man m w = 1st on m's list who hasn't been proposed to yet if (w is free): assign w to m else: if (w prefers m over assigned partner, a): assign w to m make a free else w rejects m
1.1.1Stability Proof¶
 Suppose $AZ$ is an unstable pair in the GS Matching $S^*$
 $A$ is matched to $B$, $Z$ is matched to $X$
 Case $A$ never proposed to $Z$
 $A$ goes down his list in order of favorites.
 If $A$ never proposed to$Z$, he prefers $B$ over $Z$
 Case $A$ proposed to $Z$
 $A$ and $Z$ were matched for a time.
 They can only become unmatched if $Z$ prefers some other suitor over $A$
1.1.2MaleOptimality Proof¶
 Suppose man $M$ is matched to someone other than his best stable partner in $S$
 Men propose in decreasing order of preference
 If $M$ was rejected by a partner, she would have to have stably preferred someone
1.2Bipartiteness¶
only one of the following holds for a graph $G$ and BFS $B$
 No edge joins two nodes inside $B$, $G$ is bipartite
 An edge joins two nodes in the same layer $\to$ $G$ contains an odd length cycle $\to$ $G$ is not bipartite
2Divide and Conquer¶
 Divide problem into several parts
 Solve parts recursively
 Combine solutions from subproblems into final answer
2.1Mergesort¶
 divide array $O(1)$
 recursively sort $2 \times T(\frac{n}{2})$
 merge $O(n)$
2.1.1Merging¶
 Two pointers for each (sorted) array
 Compare values, and add lowest to output array
 left shift pointer
 repeat
2.1.2Pseudocode¶
def mergesort( var a as array ) if ( n == 1 ) return a var l1 as array = a[0] ... a[n/2] var l2 as array = a[n/2+1] ... a[n] l1 = mergesort( l1 ) l2 = mergesort( l2 ) return merge( l1, l2 ) def merge( var a as array, var b as array ) var c as array while ( a and b have elements ) if ( a[0] > b[0] ) add b[0] to the end of c remove b[0] from b else add a[0] to the end of c remove a[0] from a while ( a has elements ) add a[0] to the end of c remove a[0] from a while ( b has elements ) add b[0] to the end of c remove b[0] from b return c
2.2Closest Pair of Points¶
2.2.1Algorithm¶
 Draw vertical line to roughly cut points in half
 Recursively find closest points in each half
 Find any potentially closer pairs after merging halves
 Return closest pair of points
2.2.2Analysis¶
This algorithm takes $O(\log^2(n))$, but can be improved to $O(\log(n)$ via presorting the list of points. Each recursion returns a list, with points sorted by ycoordinate, and we then merge the lists.
2.3Integer Multiplication¶
2.3.1Brute Force¶
 brute force requires $O(n^2)$ operations
 an $n$ digit number can be represented by 2 $\frac{n}{2}$ digit numbers
 $x = 2^{\frac{n}{2}}x_1+x_0$, $y = 2^{\frac{n}{2}}y_1+y_0$

$xy = 2^n(x_1y_1)+2^{\frac{n}{2}}(x_1y_0+x_0y_1)+x_0y_0$

multiply 4 $\frac{n}{2}$ digit numbers
 add two $\frac{n}{2}$ digit integers and shift
 $T(n) = 4T(\frac{n}{2})+\Theta(n) \to T(n) \in \Theta(n^2)$
2.3.2Karatsuba Multiplication¶
 instead, you can add 2 $\frac{n}{2}$ digit numbers
 multiply three $\frac{n}{2}$
 same as above, but:
$$xy = 2^n(x_1y_1) + 2^{\frac{n}{2}}((x_1+x_0)(y_1+y_0)x_1y_1x_0y_0)+x_0y_0$$
 Runtime
 $T(n)=3T(\frac{n}{2})+\Theta(n)$
 $T(n) \in O(n^{\log_2 3}) \approx n^{1.5}$
2.4Matrix Multiplication¶
2.4.1Brute Force¶
 $O(n^3)$ time
 $n$ multiplications for $n \times n$ elements
2.4.2Divide and Conquer¶
 Divide matrix into 4 ${\frac{n}{2}} \times {\frac{n}{2}}$ parts
 Find product of quarters, with 8 multiplications (of 2x2)
 Add appropriate products with 4 matrix additions
 Return appropriate values
$T(n) = 8T(\frac{n}{2}) + \Theta(n^2)\to T(n)\in\Theta(n^3)$
No improvement over naive algorithm
BUT you can multiply two 2x2 matrices with 7 multiplications
leads to $T(n) \in \Theta(n^{\log_2 7})$
2.5Master Theorem¶
$T(n) = \alpha T(\frac{n}{\beta}) + f(n), \alpha \ge 1, \beta \gt 1 $
 $n$ is size of problem
 $\alpha$ is number of sub problems
 $\frac{n}{\beta}$ is size of sub problems
 $f(n)$ is additional work required
 Case 1
 $f(n) \in O(n^c),\ st.\ c \lt \log_{\beta}\alpha $
 $T(n) \in \Theta(n^{\log_{\beta}\alpha})$
 Case 2
 $f(n) \in \Theta(n^c \log^k n),\ st.\ c \lt \log_{\beta}\alpha $
 $T(n) \in \Theta\left( n^{c} \log^{k+1} n \right) $
 Case 3
 $f(n) \in \Omega(n^c),\ st.\ c \lt \log_{\beta}\alpha $ and
 $a f\left( \frac{n}{b} \right) \le k f(n)$ for some constant $k < 1$ and sufficiently large $n$
 $T(n) \in \Theta(f(n))$
3Graphs¶
 captures pairwise relationship between objects
 V nodes and E edges
 Size Parameters: $n=V$, $m=E$
degree
is number of neighboring edges
3.1Representations¶
3.1.1Adjacency Matrix¶
 $n\times n$ matrix where $A_{uv}$ is 1 if $(u, v)$ is an edge
 Requires $n^2$ space
 Checking if $(u, v)$ is an edge takes $\Theta(1)$ time
 Identifying all edges takes $\Theta(n^2)$ time
3.1.2Adjacency List¶
 $n$ length array of nodes, each index is list of connected nodes
 requires $m+n$ space
 Checking if $(u, v)$ is an edge takes $O(deg(u))$ time
 Identifying all edges takes $\Theta(m+n)$ time
3.2Paths and Connectivity¶
 Path
 a sequence of nodes $\{v_1, ..., v_k\}$ st. any pair $(v_i, v_{i+1})$ is connected
 Simple Path
 all nodes in path are distinct
 Connected Path
 There is a path between any pair of nodes
 Cycle
 A path $\{v_1, ..., v_k\}$ where $v_1=v_k$, k > 2, and the first $k1$ nodes are distinct
 Tree
 An undirected path which is connected and doesn't contain a cycle
 Connected Component
 all nodes reachable from s
3.3Traversal¶
 Connectivity: is there a path between nodes $s$ and $t$
 Shortest Path: the length of the shortest path between nodes $s$ and $t$
3.3.1Breadth First Search¶
Explore outwards from a starting node s, in all directions. Add nodes a layer at a time
 Output
 $L_0 = {s}$
 $L_1 = $ all neighbors of $L_0$
 $L_2 =$ all nodes that are not in $L_0, L_1$ which have an edge to a node in $L_1$
There is a path from $s$ to a node $t$ in $L$ if and only if $t$ appears in a BFS with $s$
Graph G s = some node in G Discovered[s] = true; l = 0 //layers Tree = null L[0] = s while(L[l] is not empty): initialize L[l+1] for each node u in L[l]: for each edge e connecting u to v: if Discovered[v] == false Discovered[v] = true add edge e to tree add v to L[l+1] l++
runs in $O(m+n)$ time
 for each node, there are
deg(u)
incident edges  total time processing edges if $\sum_{u \in G} deg(u) = 2m$, since each edge is counted twice
3.3.2Depth First Search¶
A BFS checks all nodes within a certain distance of the root. Instead, a DFS goes all the way done a branch of the tree, and then goes down any visible branch, until it has explored every node. The DFS requires less memory, since you don't need to store each node's pointers at each level.
You would chose an algorithm based of the use case, for example a search for something far down the tree would be found quicker with a DFS search, while something closer would be easier to find with a BFS.
3.3.3Connected Component¶
 Flood Fill
 Recolor all neighboring pixels of a color
l
Given a pixel of color lime, change the blob of neighboring pixels to blue
 Definitions
 Node: Pixel
Edge: neighboring lime pixels
Blob: connected component of lime pixels
get_connected_component() { R = {s} while (there is an edge (u, v) st. u is in R and v is not in R ) add v to R }
For flood fill: ensure the edge $(u, v)$ is connecting to like colored edges
3.3.4Bipartite Graphs¶
 Bipartite
 an undirected graph st. nodes can be colored red and blue such that no neighboring nodes are like colored
 graph can't have an odd length cycle
Let G be a connected graph with L layers from a BFS
 No edge of G joins two nodes of the same layer, and G is bipartite
 An edge of G joins two nodes of the same layer, and G contains an oddlength cycle (and hence is not bipartite)
3.3.5Connectivity in Directed Graphs¶
An edge $(u, v)$ goes from node $u$ to node $v$
 Mutually Reachable
 $\exists$ a path between $u$ and $v$ a path between $v$ and $u$
 Strong Connectivity
 Every pair of nodes is mutually reachable
3.3.5.1Strong Connectivity¶
 pick a node $s$
 run BFS in $G$
 run BFS in $G^{1}$, where every edge in $G$ has been reversed
 return true if all nodes are reachable in both BFS executions
3.4Directed Acyclic Graph¶
 DAG
 A directed graph with no cycles
 Topological Order
 Ordering of nodes as $v_1, v_2,...,v_n$ st for every edge $(v_i, v_j), i < j$
 If G has a topological order, G is a DAG.
 If G is a DAG, then there is a node with no incoming edges
To computer topological order:
def topological(G): Find a node v with no incoming edges G.delete(v) return v + topological(G)
<div id="pagebreak"></div>
4Greedy Algorithms¶
 make an optimal/plausible choice for current position
 keep making the same decision for the remaining set
4.1Strategies¶
 Stays Ahead
 Show that greedy algorithm is optimal after each step
 Exchange
 Show the greedy algorithm's output is equivalent to another solution
 Structural
 Show every possible solution matches a condition, and the greedy algorithm matches this
4.2Interval Scheduling¶
 sort by earliest finish time
4.2.1Proof of Optimality¶
 Assume greedy algorithm is suboptimal
 let $\{i_1, i_2, ... , i_k\}$ be the greedy set
 let $\{j_1, j_2, ..., j_m\}$ be the optimal set with $i_a = j_a$ for the largest possible $a$
 $a = m$ since the job $j_{a+1}$ must start after $j_a$ finish meaning it can be added to the set
4.3Interval Partitioning¶
 assign a grouping of lectures to various lecture halls, such that there is no overlap, with the minimum number of rooms
 A lecture $l_i$ starts and finishes at $s_i, f_i$
4.3.1Algorithm¶
sort lectures by (increasing) starting time classrooms = {priority_queue} for lecture j in schedule: if (j can be added to some room): classrooms.addClass() else: classrooms.addRoom() classrooms.addClass()
4.3.2Runtime¶
It takes $O(n\log n)$ to sort the lectures, and then $O(n)$ time to add them to the schedule
4.3.3Analysis¶
A new classroom is only created if there is a conflict with a current lecture. Since we have sorted by starting time, any conflict starts after $s_j$
4.4Minimize Lateness¶
You want to perform a set of jobs with a known processing time and due time. They should be performed such that the maximum overdue time of any job is minimized. Use a sorting of 'earliest deadline first'
4.5Djikstra's Shortest Path¶
 Maintain a set of explored nodes $S$, which have a shortest path from $s$ to all nodes $u$
 Initialize $S = \{s\}$
 Repeatedly greedily choose the shortest available path to an unexplored node, until all nodes are explored
 $O(E + V\logV)$
4.5.1Implementation¶
 maintain $d(v) = min(d(u) + l_e)$, st $l_e$ is an edge $(u, v)$
 sort new nodes by $d(v)$
4.6Minimum Spanning Tree¶
Given a connected graph with real value edge waits, the MST is the subset of edges touching all vertices such that the sum of edge weights is minimized.
 where $n$ is the number of vertices in the
 Any MST has to have $n1$ edges
 There are $n^{n2}$ spanning trees
4.6.1Kruskal's Algorithm¶
4.6.1.1Algorithm¶
 Sort edges by increasing weight, start with MST $T = \{0\}$
 Try inserting edge $e$ with lowest weight
 if $e$ creates a cycle, remove
 repeat until $n1$ edges in graph
4.6.1.2Runtime¶
$O(E\log E)$
1. Comparison Sort for edges
2. Use Union Find Data Strucutre to create tree, requires $O(v)$ operations
4.6.2ReverseDelete Algorithm¶
4.6.2.1Algorithm¶
 Sort edges by decreasing weight, Start with MST $T = \{E\}$ (all edges)
 Try deleting edge $e$ with highest weight
 if removing $e$ disconnects graph, add it back
 repeat until $n1$ edges in graph
4.6.2.2Runtime¶
$ O(E \log V (\log \log V)^3) $
Comparison Sort for edges
E iterations of loop
* delete in $O(1)$
* connectivity takes $O(\log V (\log \log V)^3) $
4.6.3Prim's Algorithm¶
4.6.3.1Algorithm¶
 Start with $T = \{v\}$ (some random vertex)
 Add the edge with lowest weight which doesn't have an end point in $T$
4.6.3.2Runtime¶
Minimum edge weight data structure  Time complexity (total) 

adjacency matrix, searching  $O( 
binary heap and adjacency list  $O(( 
Fibonacci heap and adjacency list  $O( 
Time is bottle necked by lookup time
4.7Union Find / Disjoint Dataset¶
 Functions
MakeSet(x)
creates a set with only elementx
Find(x)
returns a number representing an elements location in the data structure
Union(x, y)
merges the sets containingx
andy
into a new set with one representative number
4.7.1Example¶
Let's start with $\{ \{1\}, \{2\}, \{3\}, \{4\}, \{5\}\}$.
Calling union(1, 2)
results in $\{ \{1, 2\}, \{3\}, \{4\}, \{5\}\}$. In this state, find(2) => 1
Calling union(2, 3)
, union(4, 5)
results in $\{ \{1, 2, 3\}, \{4, 5\}\}$.
4.7.2Implementation¶
Imagine an array of Linked Lists. Initially, each element is in it's own index. But as union()
is called, an element can be moved to another linked list, and its array index will point to the new array location.
4.7.3Iterative Find(x)¶
Find(x): r = x while r =! r.parent: r = r.parent while x =! x.parent: x = x.parent x.parent = r.parent return x
<div id="pagebreak"></div>
5Dynamic Programming¶
 Break problem into set of overlapping subproblems
 Use recursive strategy with memoization
Recipe:
 Characterize the structure of a problem
 Recursively determine the value of an optimal solution
 Compute value of optimal solution
 Use the value to find the solution
Techniques
 Binary Choice (weighted interval)
 Multiway choice (segmented squares)
 Intervals (RNA Secondary Structure)
5.1Weighted Interval Scheduling¶
 Maximize weight of set of jobs
 Greedy algorithm has all weights equal to 1
 sort jobs by finish time st $f_1 \lt f_2 \lt ... \lt f_n$
p(j)
= returns largest index $i<j$ st $i$ and $j$ are compatiblep(j)
can be done in linear time lets us know all jobs $[1, i]$ are compatible with j
 lets make
OPT(j)
which returns the maximum weighted schedule
 Case 1
 job $j$ is selected
jobs $[p(j)+1, j1]$ are not in the solution  Case 2
 jobs $j$ isn't selected
must include optimal solution of jobs $[1, j1]$, or it's the same asOPT(j1)
5.1.1Brute Force¶
returns maximum weight:
OPT(j) if (j == 0) return 0 else return max(j + Opt(p(j)), Opt(j1))
But this is $O(2^n)$ because of redundant recursive calls
5.1.2Memoization¶
instead of repeated calculation, store the result in an array
M = [0 for i in range (0, n)] MOPT(j) if (j == 0) return 0 else if (M[j] is empty) M[j] = max(j + MOPT(p(j)), MOPT(j1)) return M[j]
MOPT
must take $n$ time since it either takes constant time, or recursively calls itself. If every potential recursive call is made, the algorithm will require $2n$ calls
So $O(n)$ is jobs are presorted by start and finish times (requires $O(n\log n)$
But this only returns the maximum weight of the schedule, not the actual schedule
Can also unwind the recursion, and make an iterative algorithm
5.2Shortest Path¶
 An alternative to Djikstra's Algorithm
 Allows for negative weighted edges
 eg. profits from decisions
 Djikstra's algorithm doesn't work with negative edges
 No shortest path if negative cost cycle
 allows for the cost to reduced by going around cycle n times
 involves a cycle with a cost less than zero
 Shortest path can't contain cycle
 If positive weight, there is no point
 if negative weight, you would go around it repeatedly
5.2.1Dynamic  BellmanFord¶
OPT(i, v)
returns shortest path $(v, t)$ using at most $i$ edges
 Case 1
 P uses at most $i1$ edges
OPT(i, v) = OPT(i1, v)
 Case 2
 P uses $i$ edges
say $(v, w)$ is the first edge,OPT
uses $(v,w)$ and selects best path from $w \to t$
$$ OPT(i, v) = \begin{cases} 0, i=0\\ \min (OPT(i, v),\ OPT(i1, w)),i\ne0 \end{cases} $$
5.2.2Implementation¶
ShortestPath(G, t) { foreach node v ∈ V M[0,v] ← ∞ M[0,t] ← 0 for i = 1 to n1 foreach node v ∈ V M[i, v] ← M[i1,v] foreach edge (u, w) ∈ E M[i, u] ← min { M[i,u], M[i1,w] + cuw } }
 Worst Case running time : $$O(nm)$$
 But usually is substantially faster in practice.
5.2.3Practical Improvements¶
 just use one array, maintaining shortest path from $v \to t$
 only update edges if distance has changed in a previous iteration
5.2.4Detect Negative Cycles¶
 add a node $t$, connected to all nodes with weight one
 if a path of length
OPT(n) = OPT(n1)
, there are no negative cycles
5.3Segmented Least Squares¶
 points roughly lie on line segments
 order by x st:
 minimizes squared errors in line segments ($e$)
 minimize the number of lines ($L$)
 must tradeoff between lines and errors ($c$)
5.3.1Dynamic Algorithm¶
$$ OPT(j) = \begin{cases} 0, j=0\\ \min\{e(i, j) + c + OPT(i1)\}, j \ne 0 \end{cases} $$
min for all $i\in [1, j]$
5.4String Similarity¶
Given two strings, X = [x1, ..., xn]
and Y = [y1, ..., yn]
, find the allignment such that they match the most. How much they 'match' is determined by minimizing the cost.
Here, an index is mismatched if x[i] != y[i]
, and is unmatched if x[i] = NULL
$$\mathscr{cost} = \sum_{\mathscr{mismatched}} a_{x_i, y_i}+\sum_{\mathscr{unmatched}}x_i+\sum_{\mathscr{unmatched}}y_i$$
Calculating the optimal matching:
 Match (
x[i] = y[j]
) cost to align
x[i], y[j]
+ min cost of remainder
 cost to align
 No Match (
x[i] != y[j]
) Leave
x[i]
unmatched cost of gap + min cost of remainder
 Leave
y[i]
unmatched cost of gap + min cost of remainder
<div id="pagebreak"></div>
 cost of gap + min cost of remainder
 Leave
6Network Flow¶
Find maximum flow of graph
Very similar to smallest cut in a graph
6.1Min Cut and Max Flow¶
6.1.1Minimum Cut¶
 Assume
 unidirectional edges
each edge has a capacityc(e)
nodes trying to relate $s$ and $t$
 net material flowing through pipes
 capacity of a cut is the sum of capacities of edges exiting the cut
 $st$ cut, such that each one cut has $s$ and the other has $t$
6.1.2Flows¶
add parameters to a node such that:
 $0 \le f(e) \le c(e)$ for each edge
 $\sum_{e_{in}\in v} f(e) = \sum_{e_{out}\in v} f(e) $ for each node
but how do you find a maximum flow?
6.1.3Relating Flow and Cut¶
 flow out of a cut equals the flow out of a node
 a minimum cut:
 has incoming edges with flow 0
 and outgoing edges with a flow at full capacity
6.2Algorithm¶
6.2.1Residual Graph¶
edge $ e = (u, v) \in E$
functions Flow(e), Capacity(e)
6.2.1.1Residual Edge¶
 undo flow sent
 so $e = (u, v)$ and $e^R = (v, u)$
*$c_f(e)$ is residual capacity  $c_f(e) = \begin{cases}c(e)  f(e), if e\in E\\ f(e), if e^R \in E \end{cases}$
So Residual graph is all residual edges with positive capacity
6.2.2FordFulkerson Algorithm¶
 While $\exists$ a path from $s$ to $t$
 Find a valid path with DFS
 Find lowest capacity edge in the path and augemnt that path
 Lower flow in the path by the lowest capacity
 Add residual edges going in the opposite direction
 Add the ammount to the total flow
 return total flow
DFS takes $O(E)$ and we have to execute it at worst case $f$ times, so runtime of $O(fE)$
In each step of the iteration, the residual edge is a valid path for teh flow to take.
6.2.3Capacity Scaling¶
Choosing a path with the highest bottleneck increases the flow by the maximum possible amount. However, finding the path with highest bottleneck is difficult. Instead, we use a scaling parameter $\Delta$ and use a subgraph $G_f(\Delta)$ which contains arcs with capacity of atleast $\Delta$. Then, keep adding such a path $\Delta$, and then update $G_f$. After there are no more paths, halve $\Delta$.
6.2.3.1Capacity Scaling Algorithm¶
 set flow of all edges to 0
 find the largest $\Delta = 2^k$ with a existent path
 Make a residual graph $G_f$
 while $\Delta \ge 1$:
 Find $G_f(\Delta)$
 while there is a path in $G_f$
 f = augment(f, k, P)
 update $G_f$
 halve $\Delta$
 return f
6.2.3.2Runtime¶
 After processing a value at $\Delta$ , the maximum flow $f$ is off by at most $m\Delta$
 There can be at most $2m$ augmentations per phasing
 The scaling maxflow algorithm finds a flow in at most $O(m\log C)$ and the overall flow can be found in $O(m^2 \log C)$.
7Applications of Max Flow / Min Cut¶
7.1Bipartite Matching¶
can be done as Network Flow
 Matching
 $M \subseteq E$ if each node appears at the end of at most one edge in $M$
Bipartite Matching is matching in a Bipartite Graph
 add node $s$ and $t$ to a bipartite graph
 add edges from $s$ to all nodes on one side of Bipartite Graph
 add edges from $t$ to all nodes on the other side of the Graph
 Give all edges flow 1, and find max flow
7.2Edge Disjoint Paths¶
Paths between $s, t$ such that there are no shared edges
Give all edges a weight of 1
Max Edge Disjoint paths equals the maximum flow value
7.3Baseball Elimination¶
 starting node $s$
 game nodes $(team\ 1, team\ 2), ..., (team\ x, team\ y )$
 team nodes
 end node $t$
Can team 3 finish with the most wins?
 Assume team 3 wins all remaining games
 This implies all other teams have lost all games with team 3
 Remove all game nodes involving team 3
 Remove team node 3
 Set initial flow from $s$ to e the number of games left between teams
The total number of games left to be played, is the total capacity of the system. All these games must be won by
someone other than team 3.
We want the max flow to equal this value, but no team has more wins than team 3 can have in the best case. So all edges leaving the source must be saturated
8Randomization¶
8.1Symmetry Breaking¶
 Say we have $n$ processes, $\{P_1, ..., P_n\}$, all trying to access a database
 Only 1 can enter the database at a time
 want fairness in how the database is shared
 Processes can't communicate
8.1.1Implementation¶
 Each process request access with probability $\frac{1}{n}$
 Let
S(i, t)
be the probability of $P_i$ accessing the database at time t Pr[S(i, t)]
= $p \times (1p)^{n1} = \frac{1}{n} \times (1\frac{1}{n})^{n1}$ $ \lim_{n \to 2} = \frac{1}{2}$ and $\lim_{n \to \infty} = \frac{1}{e}$
 so we have the bounds of the probability
The probability that process $i$ fails to access the database in $en$ rounds is at most $\frac{1}{e}$ and after $en(c\ln n)$ rounds the probability is at most $n^{c}$
The probability that all process succeed within $2e n \ln n$ rounds is at least $1\frac{1}{n}$.
8.2Global Min Cut¶
 find the smallest Min Cut in a graph i,e.cut $(A,B)$ (Minimum number of edges connecting $A$ & $B$).
 you'd assume it's harder than the minimum $st$ but, but it isn't
8.2.1Contradiction Algorithm¶
 Pick an edge $e =(u,v)$ uniformly at random
 Contract edge $e$
 replace $u, v$ with a supernode $w$
 preserve edges, but update endpoints to be $w$
 remove internal edges, but combine parallel edges
 Repeat until there are just two nodes, $w_1, w_2$
 The cut is all the nodes contracted to form $w_1$
8.2.2Runtime¶
A randomized algorithm just has a high probability of returning a mincut. In this case it returns a mincut with $\frac{2}{n^2}$. To increase the chance of an answer, repeat the algorithm many times.
Since the probability of finding a solution is $\frac{2}{n^2}$, the probability of not finding it is $1\frac{2}{n^2}$. This results in a failure to find the global min cut after $n^2\ln n$ iterations of $\frac{1}{n^2}$.
8.3Linearity of Expectation¶
Useful property: if $X$ is a $0/1$ random variable the expectation of $X$ is $E[X] = Pr[X=1]$.
8.3.1Guessing Cards¶
 Without memory the expected number of correct guesses is 1.
 With memory the the number of correct guesses is $\Theta(\log n)$
8.3.2Coupon Collector¶
 The expected number of steps is $\Theta(n \log n)$ for $n$ different types of coupons and uniform distribution.
8.4Randomized Divide and Conquer (Quicksort)¶
 Randomization protects against worst case by choosing pivot at random.
Expected Number of Comparisons:
$$q_n = 2n\ln n  (4 2(.577))n + 2\ln n + O(1)$$
8.5Hashing¶
 use a prime number with an order about equal to the size of the table
 identify each element with a number
8.5.1Universal Hashing¶
$\newcommand{\H}{\mathscr{H}}$
We want to map a subset of a large universe $U$. It is large enough that we can't maintain an array.
We want a function $h: u \in U \to \{0, 1, ..., n1\}$, or a funtion which randomly maps values to an array of size $n$. From the birthday problem, a collision is expected fairly often. We want some degree of randomness, to prevent malicious input from harming computer.
A Universal family is defined such that a $v \ne u$ and function $h \in \H$ st $\Pr_{h}[h(u)=h(v)]\le \frac{1}{n}$. You want to compute $h$ efficiently and select it at random efficiently.
To encode an integer, we can use:
$$
h_a(x) = \sum_{i=1}^{r} a_i x_i \mod p
$$
with $p$ being a prime, $r$ the length of the integer, and $0 \le a_i \lt p$. This means we can say $\H = \{h_a: a \in A\}$ is a universal family.
8.5.2KUniversal Hashing¶
You want to ensure that when hashing $k$ keys, the chances of $h(x_1) = y_1, ..., h(x_k) = y_k = \frac{1}{n^k}$, in addition to the qualifications for a universal hashing function. Basically, you want to ensure that two different keys have a low chance of colliding (condition 1) and repeatedly hashing to the same value (condition 2).
You can try to use all $h \in \H$ and write down the outputs. Now, you must make sure that the chances of $h(x_1) = y_1, ..., h(x_k) = y_k = \frac{1}{n^k}$.
8.6Monte Carlo vs Las Vegas Algorithms¶
Monte Carlo Algorithm: Guaranteed to run in polytime, likely to find correct answer.
* Ex) Contraction Algorithm for global min cut
Las Vegas Algorithm:
Guaranteed to find correct answer, likely yo run in polytime.
* Ex) Randomized Quicksort
Note:
Can always convert a Las Vegas Algo into Monte Carlo, but no known method for reverse.
9Linear Sorting¶
 Numbers bounded by $k$
 Counting sort, $O(n)$ if $k$ is $O(n)$
 Stable sorts Preserve order
 Radix Sort, $O(n)$ if $k$ is $O(n^c)$.
Counting sort:
" Its running time is linear in the number of items and the difference between the maximum and minimum key values, so it is only suitable for direct use in situations where the variation in keys is not significantly greater than the number of items."
# variables: # input  the array of items to be sorted; key(x) returns the key for item x # n  the length of the input # k  a number such that all keys are in the range 0..k1 # count  an array of numbers, with indexes 0..k1, initially all zero # output  an array of items, with indexes 0..n1 # x  an individual input item, used within the algorithm # total, oldCount, i  numbers used within the algorithm # calculate the histogram of key frequencies: for x in input: count[key(x)] += 1 # calculate the starting index for each key: total = 0 for i in range(k): # i = 0, 1, ... k1 oldCount = count[i] count[i] = total total += oldCount # copy to output array, preserving order of inputs with equal keys: for x in input: output[count[key(x)]] = x count[key(x)] += 1 return output
Radix sort: Implements a version of counting sort.
10RedBlack Trees¶
 A selfbalancing binary search tree
 Every node is either
red
orblack
 The root is always
black
 The botom row is all
black
leafs with valueNull
 A
red
node must have twoblack
children
 Every node is either
 Height of a redblack tree
 Height of a node is the number of edges in a longest path to a leaf.
 Blackheight of a node $x: bh(x)$ is the number of
black
nodes (includingNull
) on the path from $x$ to leaf, not counting $x$. Blackheight is well defined.  Any node with height $h$ has blackheight $\geq \frac{h}{2}$.
 The subtree rooted at any node $x$ contains $\geq 2^{bh(x)} 1$ internal nodes.
 A redblack tree with $n$ nodes has height $\leq 2\lg(n+1)$.
 Insertion
 on insertion you want to maintain the balance of the tree
 involves right and left rotation
 it takes $O(1)$ time to rearrange the pointers via rotations, but $O(n)$ time to recolor the tree
11Overview¶
 2 Questions about Algorithms
 Analyze Runtime
 Presumably Dynamic and Max Flow / Min Cut
 Divide and Conquer
 No Matrix Multiplication
 Effect of Parameters to Runtime
 Master Method
 Graphs
 Breadth First Search
 No DAG
 Greedy Algorithms
 Djikstra's Algorithm
 Minimum Spanning Tree
 Dynamic
 Can't use dynamic for most recursive algorithms
 Often easier to iteratively solve problem
 Expect a question
 Can't use dynamic for most recursive algorithms
 Flow
 Ford Fulkerson
 Capacity Scaling
 Runtime and Comparison
 Randomized Algorithm
 Randomized Quicksort
 Hashing and Expected Collisions