Tuesday, March 11, 2014 CC-BY-NC
Random walks, Chernoff bounds

Maintainer: admin

Random walks on graphs and Markov chains. Chernoff bounds.

1Random walks

A random walk on a graph $G = (V, E)$ is a walk which, at any vertex, chooses its next edge uniformly at random. So if the degree of a vertex $v$ is $d_v$, then the probability of any particular vertex adjacent to $v$ being chosen is $1/d_v$.

This naturally gives rise to a transition matrix, $P$, which has $n = |V|$ rows and $n$ columns, with the entry in row $i$ and column $j$ representing the probability of transitioning from vertex $i$ to vertex $j$. If there is no edge between $i$ and $j$, then the probability is 0. Otherwise, the probability depends on $\deg(i)$.

1.1Markov chain model

If $\underline{x}^N$ is a vector containing the probability of being at any state (i.e., vertex) at time $N$, then we can calculate the probability vector for the next time step as follows:

$$\underline{x}^{N+1}= \underline{x}^N \cdot P \tag{it's just matrix multiplication}$$

This is known as a Markov chain model of a random walk.

1.1.1Example

Suppose we have a graph with 4 vertices, and the following transition matrix $P$:

$$P = \begin{pmatrix} 0 & 1 & 0 & 0 \\ \frac{1}{3} & 0 & \frac{1}{3} & \frac{1}{3} \\ 0 & \frac{1}{2} & 0 & \frac{1}{2} \\ 0 & \frac{1}{2} & \frac{1}{2} & 0 \end{pmatrix}$$

(Incidentally, from this matrix, we can derive exactly how the graph is connected - vertex 1 is connected only to vertex 2; vertex 2 is connected to 1, 3, and 4; vertex 3 is connected to 2 and 4; and vertex 4 is connected to 2 and 3.)

Suppose we start off our random walk at vertex 2. This gives us the following probability vector at time step $N=0$:

$$\underline{x}^0 = \begin{pmatrix} 0 & 1 & 0 & 0 \end{pmatrix}$$

Then, using the formula shown above, we can calculate the next probability vector:

$$\underline{x}^1 = \underline{x}^0 \cdot P = \begin{pmatrix} \frac{1}{3} & 0 & \frac{1}{3} & \frac{1}{3} \end{pmatrix}$$

Then, we could compute $\underline{x}^2$, $\underline{x}^3$, etc.

1.2Stationary distributions

In the long run (as $N \to \infty$), then, under certain conditions, we reach a stationary distribution, such that

$$\underline{x}^* = \underline{x}^* \cdot P$$

where, as usual, the sum of the entries of the vector $x^*$ is 1. Now, if $G$ is connected and not bipartite, then $\underline{x}^*$ is unique and satisfies

$$\underline{x}^*_v = \frac{d_v}{2|E|} = \frac{d_v}{2m} \tag{$\star$}$$

Note that this only applies for connected graphs; in a graph that is not connected, $\underline{x}^*$ won't be unique - it will depend on which component you start at. Similarly, in a bipartite graph, $\underline{x}^*$ will depend on which bipartition you start at, since you alternate bipartitions with each transition.

To see that $(\star)$ is true, let $\displaystyle x_v^N = \frac{d_v}{2m}$ (so the $v$th entry in the probability vector at time step $N$, which corresponds to the probability of being at vertex $v$ at this time step). Then:

$$x_v^{N+1} = \underline{x}_v^N \cdot \text{(the $v^{\text{th}}$ column of $P$)} = \sum_{u \in \Gamma(v)} \frac{d_u}{2m} \cdot \underbrace{P_{uv}}_{=1/d_u} = \frac{1}{2m} \sum_{u \in \Gamma(v)} 1 = \frac{d_v}{2m} = x_v^N \, \checkmark$$

1.2.1Application: PageRank

Here's a standard application of random walks. Google's PageRank algorithm, in its most basic form, makes use of a random walk on a directed graph, where vertices are webpages and there is a directed edge from vertex $i$ to vertex $j$ if the corresponding webpage for $i$ has a link to the webpage for $j$. Start crawling at a random vertex, and find the stationary distribution; then, the score of a page is given by

$$x_i^* = \sum_{j \in \Gamma(i)} \underline{x}_j^* \cdot \frac{1}{\text{outdegree}(j)} \tag{$\forall i \in V$}$$

which makes it less vulnerable to malicious SEO.

2Chernoff bounds

Theorem: Let $X_1, x_2, \ldots, X_k$ be independent Bernoulli trials such that $P(X_i = 1) = p_i$. If $\displaystyle X = \sum_i X_i$, then

$$P(X > (1+\delta)\mu) \leq \left ( \frac{e^\delta}{(1+\delta)^{(1+\delta)}} \right)^\mu$$

where $\displaystyle \mu = E(X) = \sum_i P_i$.

This proof is a bit involved, and my notes from this lecture aren't amazing, so the following proof is mostly from my answer for question 5 on the assignment (switching the $<$ for $>$, etc, where necessary), plus a bit from my notes. Should be roughly equivalent to what was done in class.

We define a function $f(x) = e^{tx}$ where $t$ is some positive constant, as of yet unknown. Then, we apply $f(x)$ to both sides of the inequality in the probability:

$$P(X > (1+\delta)\mu) = P(e^{Xt} > e^{t(1+\delta)\mu}) \tag{1}$$

Recall that Markov's inequality states that

$$P[Z > c\cdot E(Z)] \leq \frac{1}{c}$$

We can apply this inequality to the RHS of (1), resulting in:

$$P(X > (1+\delta)\mu) \leq \frac{E(e^{Xt})}{e^{t(1+\delta)\mu)}} \tag{2}$$

Recall that $X$ is the sum of independent random variables $X_i$. To find the expected value of $e^{tX}$, we use the fact that $\displaystyle X = \sum_{i}X_i$
$$\begin{align} E(e^{tX}) & = E(e^{t\sum X_i}) = E\left (\prod_i e^{tX_i} \right ) \\ & = \prod_i E(e^{tX_i}) \tag{by independence, and the linearity of expectation} \\ & = \prod_i (\underbrace{p_ie^{t\cdot 1}}_{\text{success}} + \underbrace{(1-p_i)e^{t\cdot 0}}_{\text{failure}}) \tag{since they're Bernoulli trials} \\ & = \prod_i (p_ie^{t} + (1-p_i)) \\ & = \prod_i (1 + p_i(e^t - 1)) \end{align}$$

We can simplify this by making use of the following inequality: $\forall x \in \mathbb R$, we have that $1+x \leq e^{x}$. If we let $x = p_i(e^t-1)$, then we have the following inequality:

$$\begin{align} E(e^{tX}) & \leq \prod_i e^{p_i(e^t-1)} \\ & = e^{\sum_i p_i (e^{t}-1)} \tag{by exponential laws} \\ & = e^{(e^{t}-1)\sum_i p_i} \\ & = e^{(e^{t}-1)\mu} \tag{by the definition of $\mu$} \end{align}$$

Using this upper bound for $E(e^{tX})$, we can simplify the inequality in (2):

$$P(X > (1+\delta)\mu) \leq \frac{e^{(e^{t}-1)\mu}}{e^{t(1+\delta)\mu}} = e^{(e^{t}-1)\mu -t(1+\delta)\mu} = e^{\mu(e^{t}-1-t-t\delta)} \tag{3}$$

Now we need to choose $t$ to make the bound as tight as possible, which means minimising the RHS of (3) with respect to $t$. We can do this by taking the derivative of $(e^{t}-1-t-t\delta)$ (with respect to $t$) and setting it equal to 0:

$$\begin{align} e^t - 1 - \delta &= 0 \\ \therefore \; e^{t} & = 1 + \delta \\ \therefore \; \ln(e^{t}) & = \ln(1+\delta) \\ \therefore \; t &= \ln(1+\delta) \end{align}$$

Next, we substitute that into (3) and simplify:

$$\begin{align} P(X > (1+\delta)) & \leq e^{\mu(e^{\ln(1+\delta)} - 1 - \ln(1+\delta) - \ln(1+\delta)\delta)} \\ & = e^{\mu ((1+\delta) -1 - (1+\delta)\ln(1+\delta)} \tag{as $e^{\ln(1+\delta)} = (1+\delta)$} \\ & = \left ( e^{\delta - (1+\delta)\ln(1+\delta)} \right )^\mu \tag{taking out the $\mu$} \\ & = \left ( e^{\delta} \cdot e^{-(1+\delta)\ln(1+\delta)}\right )^\mu \\ & = \left ( e^{\delta} \cdot e^{\ln(1+\delta)^{-(1+\delta)}} \right )^\mu \\ & = \left ( e^{\delta} \cdot (1+\delta)^{-(1+\delta)} \right )^\mu \\ & = \left ( \frac{e^{\delta}}{(1+\delta)^{(1+\delta)}} \right )^\mu \end{align}$$

Wow. $\blacksquare$.