Under construction. The first two sections are copied from the midterm review and possibly updated.
- 1 Network flow
- 2 Linear programming
- 3 P vs NP vs CoNP
- 4 NP-completeness
- 5 PSPACE
- 6 Approximation algorithms
- 7 Randomised algorithms
1Network flow¶
1.1Introduction to flow problems¶
You have a graph with edges and vertices. Each edge has a capacity. There is a source vertex from which flow magically pours out and a sink vertex into which flow disappears. You want to maximise the amount of flow through the edges coming out of the source (or into the sink, same thing). Note that the flow going into any edge must be equal to the flow coming out of it.
1.2Ford-Fulkerson¶
The runtime is $O(nf)$ where $n$ is the number of edges and $f$ is the maximum flow.
The algorithm will always terminate if the capacities are rational numbers.
There is another version of Ford-Fulkerson that does run in polynomial time. It involves augmenting using a scaling max flow:
Define $\delta$ = maximum edge capacity rounded down to a power of 2 (e.g. 18 would be rounded to 16)
While $\delta \geq 1$:
While $\exists$ an s-t path P in $G_f(\delta)$:
Augment P
Update $G_f(\delta)$
end while
$\delta := \delta / 2
end while
output f
1.2.1Residual graphs¶
Residual graphs are defined as follows:
Given a flow network ($G$, $s$, $t$, {$C_e$}) and a flow $f$:E$\rightarrow \mathbb{R}^{+}$ on $G$, the corresponding residual graph $G_f$ is defined as:
-
The vertices of $G_f$ are the same as those of $G$.
-
For each edge $e$ = $(u,v)$ with $C_e > f(e)$, we add the edge $(u,v)$ to $G_f$ with capacity $C_e-f(e)$ = unused capacity to $G_f$.
-
For each edge $e$ = $(u,v)$ with $f(e) > 0$, we add edge $(v,u)$ with capacity $f(e)$ to $G_f$.
1.2.2Min-cut max-flow¶
The value of any flow is bounded above by the value of any cut.
The maximum flow is equal to the minimum cut.
1.3Applications¶
1.3.1Bipartite matching¶
A matching is a set of edges in a bipartite graph (with the vertices partitioned into disjoint sets $A$ and $B$) such that no vertex is connected to more than one edge. A perfect matching is a matching that includes all vertices.
In a bipartite matching problem, the input is a bipartite graph; the output, the largest possible matching.
First, we convert the graph into a flow network, such that the maximum flow in the network corresponds to a maximum matching in the graph.
Add a source vertex, $s$, and connect it with a directed edge of capacity 1 to every vertex in set $A$. Add a sink vertex, $t$, and connect every vertex in set $B$ to it with a directed edge of capacity 1. Assign a capacity of 1 to every edge between $A$ and $B$. Now, run FF to find the maximum flow. A flow of 1 through any edge corresponds to that edge being part of a maximum matching.
1.3.2Vertex cover¶
Choose a set of vertices such that every edge is connected to a vertex in the set.
This is actually a special case of BMP (above), just make the capacities of the existing edges infinite instead of 1 so that it's easier to find the min cut. The min vertex cover is given by the min cut. This is also given by the maximum matching.
1.3.3Disjoint path problem¶
Find the maximum number of edge-disjoint paths from $s$ to $t$ (with no vertex or edge visited more than once)
1.3.4Networks with multiple sources or sinks¶
Add a "super source" or "super sink". QED
1.3.5Circulation with demands¶
No one source or sink - instead, every vertex is either a source or a sink. Directed graph, non-negative capacities, every vertex has a demand. If this demand is positive, the vertex "demands" this much flow. If it is negative, the vertex supplies this much flow. A valid flow (i.e. circulation) must satisfy the capacity condition for the edges, as well as a new demand condition (flow in - flow out = demand).
If it is, we can check whether or not circulation is possible by converting it into a flow network problem. Add a super source, $s$, and connect it to all the vertices with a supply (i.e. negative demand), with capacity equal to the magnitude of the demand. Add a super sink, $t$, and connect all the vertices with a positive demand to it, with capacity equal to the magnitude of the demand. Then, find the max flow.
We note immediately that a valid circulation is only feasible when the sum of all the demands is 0. That is, the sum of all the "demands" must be equal in magnitude to the sum of all the "supplies". Moreover, if the maximum flow is equal to sum of all the "demands", then a valid circulation is feasible.
1.3.5.1Circulation with demands and lower bounds¶
To add the lower bounds in, just subtract the capacity of each edge by the lower bound, etc.
1.3.5.2Survey design¶
$n$ customers, $m$ products. Each customer knows about a particular subset of products. We want to ask each customer at most $c$ questions, and we want to ask between $p$ and $q$ questions per product. Each customer can be asked 0 or 1 questions about any given product. We can convert this bipartite into a circulation with demands and lower bounds problem
1.3.5.3Airline scheduling¶
Given a schedule consisting of $k$ airplanes, $m$ flights (given departure/arrival time, origin, destination), and we know how long it takes to get from one city to another. Is a given schedule feasible? (Note that we can initially put airplanes anywhere we want. We can also fly empty planes if we need to.)
We can convert this into a circulation with demands and lower bounds problem. For every flight, we add two nodes with an edge between them, with lower bound and capacity 1 (signifying the departure and the destination airport respectively). We then add a directed edge from a destination airport to a source airport if there is sufficient time to fly a plane from the destination airport to the source airport, with a lower bound of 0 and capacity 1. Every node has a demand and supply of 0. We connect $s$ to all origin airport notes, and $t$ to all destination airport nodes, low/cap of (0, 1).
If there is a possible schedule, then there will be a valid circulation. We can convert a circulation into a schedule by taking every edge that has 1 unit of flow through it as a flight taking place (so we trace the edge-disjoint paths from $s$ to $t$).
1.3.6Project selection¶
There is a set of projects, each of which has a certain revenue (positive: gain money, negative: lose money). Certain projects are prerequisites for others. The goal is to pick a set of projects to maximise revenue, with the condition that there is no edge from this set to outside this set (the prereq condition).
We can convert this into a flow network problem by adding an edge from any project to its prereq (feels a bit unintuitive), with capacity infinity. We then connect $s$ to every project with a positive revenue, with a capacity equal to that revenue; same for $t$ (negative revenue). Then, we just need to find a min cut.
1.3.7Baseball elimination¶
Given certain baseball teams, and a count of how many wins each team has had, and a network illustrating what games are yet to be played: can we tell if a particular team can finish in first place?
Yes: flow network. Create a node for each game, and a node for each team. Connect each game node to both team nodes (in that game), infinite capacity. Source: capacity equal to the number of games left, connected to all the games. Connect each team node to a sink, capacity equal to the number of games that team is allowed to win without beating the team that should come in first (or just the number of games it has left, for the team we're interested in). If the max flow is not equal to the sum of the capacities of the edges leaving the source then the team we're looking at cannot place first.
2Linear programming¶
2.1Concepts of linear programming¶
Assign values to a set of variables to maximise or minimise an objective function such that a set of linear constraints is satisfied.
The cost of an assignment is the value of the objective function given this assignment.
2.2Sample linear programming problems¶
Maximise $x_1 + x_2 + x_3$, subject to the constraints:
$$\begin{cases} & x_1 + 2x_2 \leq 1 \\ & 2x_1 + x_2 \leq 1 \\ & 3x_3 = 1 \\ & x_1, x_2, x_3 \geq 0 \end{cases}$$
In this case, the maximum is 1 (when $x_1 = x_2 = x_3 = 1/3$).
2.3Relation to network flow problems¶
We can convert a maximum flow problem into a linear programming problem as follows:
- Variables: the amount of flow through the edges
- Cost: maximise the flow through the edges coming out of the source
- Constraints: $f(e) \leq c_e$ (capacity constraint), conservation constraint
If there are $n$ vertices and $m$ edges, then there are $n + 2m$ constraints.
For example, if we had a graph that looked like this:
a > \ / \ 2 1 / \ / > s --- 3 ---> t I'll replace this diagram with a better one eventually, don't mock me
We could convert it into a linear programming problem:
- Objective function: $f(sa) + f(st)$
- Constraints: $\begin{cases} & f(sa) \leq 2 \\ & f(st) \leq 3 \\ & f(at) \leq 1 \\ & f(sa) = f(at) \\ & f(sa), f(st), f(at) \geq 0 \end{cases}$
2.3.1Flow networks with lower bounds¶
If the flow network had lower bounds, we could replace the last $\geq 0$ constraint with the true lower bound, etc.
2.3.2Flow networks with costs¶
Passing $f(e)$ through edge $e$ costs $f(e)P_e$.
The goal is to minimise the cost given a flow value $\alpha$.
- Objective function: $\displaystyle \sum_{e \in G} f(e) \cdot P_e$
- Constraint: flow leaving the source is equal to $\alpha$ (plus the other usual constraints)
The example given in class had to do with a city, and placing food in only some cities. An example that feels more logical to me involves access to banks since food is a depletable resource, but whatever.
2.4The simplex algorithm¶
Using a geometrical interpretation of the constraints, check all the vertices of the polygon (or polyhedron, depending on the number of constraints) and take the maximum value of the objective function that results.
If there are 3 variables, then there are 3 dimensions and each inequality corresponds to a plane that splits the space into two. We need at least 4 inequalities in order to form a polyhedron (a bounded solid). Each vertex corresponds to the point at which three inequalities meet.
If there are 2 variables, then there are 2 dimensions and each inequality corresponds to a line that splits the plane into two. We need at least 3 inequalities in order to form a (bounded) polygon. Each vertex corresponds to the point at which two inequalities meet.
In the average case, this algorithm runs in polynomial time, although in the worst case it runs in exponential time.
2.5Canonical form¶
Any linear program can be converted into canonical form, which looks like this (for a maximising problem):
$$ \text{Constraints: } \begin{cases} & a_{1,1}x_{1,1} + \ldots a_{1,n} x_{1,n} \leq b_1 \\ & \ldots \\ & a_{m,1} x_{m,1} + \ldots a_{m, n} x_{m, n} \leq b_m \end{cases} $$
Objective function (the thing to maximise): $c_1 x_1 + \ldots c_n x_n$
Other constraints: $x_1, \ldots, x_n \geq 0$
For a minimising problem, flip the $\leq$ signs in the constraints.
2.5.1Strategies for converting into canonical form¶
- If the $\leq$ sign is in the wrong direction, multiply both sides by -1.
- Replace equalities with two inequalities (in opposite directions)
- If we have $x \leq 0$, replace it by $x'$ which is equal to $-x$
- If we have $x \geq a$ where $a \neq 0$, replace $x$ by $x' = x-a$
- If we don't have $x \geq 0$ at all, then we replace it by $x' - x''$, where $x', x'' \geq 0$ (so their difference spans all of $\mathbb{Z}$, which is what the original $x$ needs)
2.6Duality of linear programs¶
To prove that a feasible solution for a linear program is indeed the optimal solution, we can basically combine the inequalities to get another linear system of inequalities - its dual (so a linear program within a linear program, only it becomes a maximising problem instead of a minimising one, or vice versa). The solution for this is equal to the solution to the original linear program. This is reminiscent of the max-flow min-cut theorem.
Also, if one linear program is unbounded, so is its dual; if one is feasible and bounded, same thing.
2.7Complementary slackness¶
If $y_i \geq 0$ then $\sum_j a_{ij}x_{ij} = b_i$, if $\sum_j a_{ij}x_{ij} \leq b_i$ then $y_i = 0$
Primary bounded and feasible, then dual is bounded and feasible
Primary unbounded and feasible, then dual is unfeasible
Primary unfeasible, then dual is unbounded and feasible, or the dual can be unfeasible as well.
3P vs NP vs CoNP¶
A problem is P if it can be solved in polynomial time.
A problem is in NP if there exists a verifier for a 'Yes' answer that runs in polynomial time. (Give examples of proglems NOT in NP?)
All problems in P are NP.
A problem is in CoNP if the complement of the problem is in NP. It is not known whether or not NP = CoNP
4NP-completeness¶
To prove that something is NP-complete, first show that it is in $NP$ (i.e., give a certifier for it), then reduce some problem that is known to be NP-complete to it (i.e., show that it is NP-hard).
Known NP-complete problems: SAT, 3-SAT, independent set, the clique problem, vertex cover, $k$-colorability for $k \geq 3$, Hamiltonian cycle, Hamilton path, subset sum, and the travelling salesman problem. Same as what we were allowed to use in assignment 4. These will be the only ones we should be using for the polytime reduction, unless otherwise stated on the exam.
5PSPACE¶
We spent about half a class on this in total (on Tuesday, November 13).
Definition: the set of all decision problems which can be solved by a Turing machine using a polynomial amount of space (in terms of memory).
Theorem: P $\subseteq$ PSPACE. A polynomial-time algorithm can't write a superpolynomial amount of data to memory.
Theorem: SAT $\in$ PSPACE. Using the exponential-time solution, we can generate every possible assignment one at a time and try it. This might not be polytime but it is polyspace, since we can just reuse the same space in memory.
Theorem: NP $\in$ PSPACE. Reduce them all to SAT. Or, generate all the possible certificates one by one and try them.
So PSPACE is a superset of NP and CoNP, and of course is a superset of P (since P is the intersection of NP and CoNP).
6Approximation algorithms¶
We can approximate the solution to any $NP$-complete problem using an approximation algorithm.
Start with the ILP formulation, do the LP relaxation, round things up or down, etc