Thursday, September 6, 2012 CC-BY-NC
Introduction to the course

Maintainer: admin

Lecture notes for the first COMP 360 class, taught by Hamed Hatami. These lecture notes are student-generated and any errors or omissions should be assumed to be the fault of the notetaker and not of the lecturer. To correct an error, you have to be registered and logged-in; alternatively, you can contact @dellsystem directly.

1Overview of the course

The required textbook is Algorithm Design by Jon Kleinberg and Éva Tardos (likely the same as the one used in COMP 251).

In COMP 250 and COMP 251, we mainly covered basic algorithms: sorting, searching, graph traversal, and the like. In this course, we will cover more advanced algorithms, particularly optimisation algorithms. An approximate overview of the topics that will be covered is:

  • Network flow
  • NP-completeness
  • Linear programming
  • Approximation algorithms
  • Randomisation algorithms

2Network flow

2.1The max-flow problem

The first topic to be covered is network flow, specifically, the max-flow problem. We define a flow network as a directed graph, $G = (V, E)$ (i.e. a set of vertices and edges), such that:

  1. Every edge $e$ has a capacity $c_e$.
  2. There is a source $s \in V$ from which the flow starts.
  3. There is a sink $t \in V$ where the flow ends.

Some assumptions:

  • The capacities are all natural numbers (positive integers).
  • The source has no incoming edges, and at least one outgoing edge.
  • The sink has no outgoing edges, and at least one incoming edge.
  • Every other node is adjacent to (connected with) at least one incoming and one outgoing edge.

We define an s-t flow as a function, $f: E \to \mathbb{Z}^+ \cup \{0\}$, which assigns a non-negative integer to every edge (which is the flow through that edge). This function must satisfy the following conditions:

  1. Capacity condition. The flow through any edge must not be greater than the capacity of that edge. Mathematically: $\forall e \in E, f(e) \leq c_e$.
  2. Conservation condition. For all nodes other than the source and the sink, the flow through all incoming edges should be equal to the flow through all outgoing edges. Mathematically: $\forall v \in V / \{s, t\}$, $\displaystyle\sum_{\text{incoming edges}} f(e) = \sum_{\text{outgoing edges}} f(e)$.

Next, we define a $v(f)$: the total value of a flow $f$ leaving the source. This is the particular quantity that we wish to maximise. Specifically, the problem is: given a flow network $(G, \{c_e\}, s, t)$, what is the maximum $v(f)$ among all valid flows $f$?

2.2Ford-Fulkerson

One algorithm for finding the maximum flow in a flow network is known as Ford-Fulkerson. The basic process is:

  1. Start with the flow for every edge being set to 0.
  2. At every step, push more flow through until we can add no more.

(A better, more detailed explanation of the process can be found below.)

2.2.1Definitions

At this point, we need to define the concept of a residual graph. Given $(G, \{c_e\}, s, t)$, and an $s-t$ flow $f$, the residual graph $G_f$ has the same vertices as $G$, and its edges are specified as follows:

  1. If $f(e) < c_e$, then $G_f$ has the same edge $e$, but with a capacity of $c_e - f(e)$. In other words, if the flow in the original graph has not reached capacity, the flow in the residual graph is the remaining capacity.
  2. If $f(e) > 0$, then $G_f$ has an edge in the opposite direction (possibly in addition to an edge in the same direction) with capacity $f(e)$.

Here's an example, taken from the lecture1:

Residual graph example

In other words, for any edge, its equivalent edge in the same direction in the residual graph has a capacity of whatever remains of the original edge's capacity (that has not been "filled" with flow). It also has an edge in the opposite direction with capacity being whatever the flow of the original edge was.

Now we define an augmented path $P$, which is a simple path2 in a residual graph $G_f$ that starts at the source and ends at the sink. We define $\text{Bottleneck}(P, f)$3 to be the minimum residual capacity of any edge in $P$. So the bottleneck (or cut) is the maximum amount of flow that can be pushed through $P$.

2.2.2Algorithm steps

Now we can redefine the steps of this algorithm in terms of the residual graph and augmented paths:

  1. Set $f(e)$ to 0 for every edge in $G$.
  2. While there is an $s-t$ path in the residual graph $G_f$:
    • $f' = \text{augment}(f,P)$ (i.e. increase the flow by $\text{Bottleneck}(P, f)$ through $P$)
    • Update $f$ to $f'$
    • Reconstruct the residual graph.

2.2.3Analysis of the algorithm

2.2.3.1Correctness

The augmented flow satisfies the two conditions of a valid flow - the capacity constraint, and conservation - as follows:

  • Capacity: the residual capacities are always less than the original capacities.
  • Conservation: there are three possible cases, and in each case the conservation condition is preserved.
    • -- a --> V -- a -->
    • -- a --> V <- a ---
    • <- a --- V -- a -->
2.2.3.2Termination

A valid flow cannot be larger than the total capacity of the edges leaving the source. Every time we agument the flow, we increase the value by at least 1. So, there are at most $C$ iterations where $C$ is the sum of the capacities of the edges leaving the source.

2.2.3.3Running time

Every iteration is $O(m) + O(m) = O(m)$. The total running time is $O(mC)$ where $m$ is the number of edges in the graph.

Is this, then, a polynomial-time algorithm? No, it is not. If you make bad initial choices when augmenting the flow, it could take much longer to complete the algorithm.4

2.2.4Implementation in Python

class Edge:
    def __init__(self, u, v, c):
        self.source = u
        self.sink = v
        self.capacity = c
        # flow only determined when working out the Ford-Fulkerson algorithm 

class FlowGraph:
    def __init__(self):
        self.neigb = {} # list of nodes and the edges that connect the said node to its neighbours
        self.flow= {} # list of flow in edges

    def add_vertex(self, v):
        self.neigb[v] = [] # initiate each node with a list of edge coming from this node

    def get_edges(self, v):
        return self.neigb[v]

    def add_edge(self, u, v, w=0):
        if u == v:
             raise ValueError("u==v")

        # directly add the edges of the residual graph     
        edge = Edge(u, v, w)
        redge = Edge(v, u, 0)
        edge.redge = redge
        redge.redge = edge
        self.neigb[u].append(edge)
        self.neigb[v].append(redge)
        self.flow[edge] = 0
        self.flow[redge] = 0

    def find_path(self, source, sink, path):
        if source == sink:
            return path

        for edge in self.get_edges(source):
            residual = edge.capacity - self.flow[edge]

            # path is a list of set (edge, residual capacity)
            if residual > 0 and not (edge, residual) in path:
                result = self.find_path(edge.sink, sink, path + [(edge, residual)])

                if result is not None:
                    return result

    def max_flow(self, source, sink):
        path = self.find_path(source, sink, [])

        while path is not None:
            # bottleneck is calculated as the minimal flow in the current augmented path 
            flow = min(res for edge, res in path)

            for edge, res in path:
                 self.flow[edge] += flow
                 self.flow[edge.redge] -= flow

            path = self.find_path(source, sink, [])

        return sum(self.flow[edge] for edge in self.get_edges(source))

# Remark: instead of creating residual graph, each node has residual edges
# residual edge is edge of of another node

# Algorithm test
g = FlowGraph()
map(g.add_vertex, ['s','o','p','q','r','t'])
g.add_edge('s','o',3)
g.add_edge('s','p',3)
g.add_edge('o','p',2)
g.add_edge('o','q',3)
g.add_edge('p','r',2)
g.add_edge('r','t',3)
g.add_edge('q','r',4)
g.add_edge('q','t',2)
print g.max_flow('s','t')
  1. The diagram that was drawn on the board didn't, as far as I could tell, have the opposite-direction arrows in cases where the original flow was equal to the capacity. I'm not sure if that was deliberate, accidental, or a mistake on my part, but the above diagram should be correct. See also these notes from the CS department at Urbana-Champaign. 

  2. A path in which no vertex is visited more than once 

  3. I think the notation is meant to indicate a function that takes in two arguments? 

  4. The professor also mentioned something about $\lceil \log_2 n \rceil$ bits being needed to store an integer $n$, but that's a fairly trivial comment valid for many algorithms and doesn't really shed additional light on Ford-Fulkerson in particular. I could have misunderstood what he was saying, though.