Tuesday, March 22, 2012 CC-BY-NC
Approximation Algorithm

Maintainer: admin

1Set Cover

Given n elements $V= \{v_1, \ldots, v_n\}$ and m sets $S_1, \ldots, S_m \subset V$ with cost$(S_i) = c_i$

Set cover: Find a min cost collection of sets that cover every element.

Unusual one: Computer Virus. Elements are known virus (5000 or so). Set are "bad" substrings that are found in viruses but not in normal code. You want to use as few strings as possible but cover all the viruses.

1.1Greedy Solution

  • $\mathcal{G} = \emptyset$
  • Pick the set $S_i$ that covers uncovered elements at the lowest average cost
  • Set $\mathcal{G} = \mathcal{G} \cup S_i$
  • Repeat until $\mathcal{G} covers all elements

So want to pick $\min_{S_i} c_i$/uncovered at time t $\cap S_i$

Theorem: Greedy is $O(\log n)$ - approximation algorithm

Proof: Let Optimal solution be $\{S*_1 , \ldots, S*_i \}$ and Greedy solution be $\{S_1 , \ldots, S_n\}$

Label the elements $v_1, \ldots, v_n$ according to the order they were covered by the greedy algorithm.

Consider $v_i$ At the time it was covered there were at least n-i+1 be uncovered elements. The optimal solution covers $\{v_i, \ldots, v_n\}$. The optimal solution has cost = OPT = $\sum^n_{i=1} c(S*_i)$

So one of $S*_1 \ldots S*_i$ can be used at an average cost at most $\frac{opt}{n-i+1}$

The cost(Greedy) = $\sum^n_{j=1} c(S_j) = \sum^n_{i=1}(\mathrm{average cost to cover element v_i} \leq \sum^n_{i=1} \frac{opt}{n-i+1} = OPT \sum^n_{i=1} \frac{1}{i} = opt(1 + 1/2 + 1/3 + \ldots + 1/n) = opt * Hn$ (harmonic number)

Now $\ln n \leq Hn \leq 1 + \ln n$ QEDB

This analysis is basically tight.

Theorem: Unless P = NP, no approximation algorithm for set cover can do better than $\alpha = c \log n$ for some constant $c$.

2Knapsack Problem

We have $n$ objects with values $v_1 \ldots v_n$, weights $w_1 \ldots w_n$ with bag of capacity $W$. Objective: maximize the value of what we can carry

Optimization version is NP-hard but it has a $O(nW)$ pseudo-polytime Dynamic Programming algorithm. In fact, Knapsack is one of the easiest NP-Hard problems to approximate

2.1Approximation Schemes

An algorithm $\mathcal{A}$ is polytime approximation scheme PTAS for a problem if for any instance $I$ and for any fixed $\Sigma > 0$

  • $\mathcal{A}$ gives at least $(1 - \Sigma)$ of optimal
  • $\mathcal{A}$ runs in time polynomial in $|I|$

An algorithm $\mathcal{A}$ is ''fully'' polytime approximation scheme FPTAS for a problem if for any instance $I$ and for any fixed $\Sigma > 0$

  • $\mathcal{A}$ gives at least $(1 - \Sigma)$ of optimal
  • $\mathcal{A}$ runs in time polynomial in $|I|$ and poly in $\frac{1}{\Sigma}$

There is a FPTAS for Knapsack. TO do this we use a different Dynamic Program

Let $w(i,V)$ = minimum of weight of a subset of items $\{1, \ldots, i\}$ that give value $\geq V$ If no subset exists than $= \infty$. These are recursively calculated $w(i,V) = \min [w(i-1, V), w_i + w(i-1, V-v_i)] with base case $w(i,V) = 0$

Runtime is $O(n^2 V_{max}$ where $V_{max}$ = max $V_i$. That is $V \leq n * V_{max}$

How does this Dynamic Program help? If the $v_i$ are small it is polytime. If the $v_i$ are big maybe we can scale them down without losing too much accuracy. So we set $\hat{v}_i = \lfloor \frac{v_i}{k} \rfloor$

Approximation algorithm: scale them!. Output the solution S to this "new" dynamic program.

  1. This is a feasible solution as the weights are the same as before.
  2. We set $k = \frac{\Sigma * V_{max}}{n}$. This is polytime. $\mathcal{A} in time $O(n2 \hat{V}{max} = O(n^2 \frac{V_max}{\frac{\Sigma V{max}}{n}}) = O(n3 \frac{1}{\Sigma}$

Why does it get within $\Sigma$ of OPT? ie: $v(S) \geq (1- \Sigma) v(S*)$ where $S* \leq [n] = \{1, \ldots n\}$ was OPT for original DP

S is optimal for $\hat{V}$

  1. $\hat{v}(S) \geq \hat{V}(S*)$ or S optimal for DP with $\hat{V}$
  2. $\hat{V*} = \sum_{i \in S*} \hat{v_i} = \sum_{i \in S*} \lfloor \frac{v_i}{\frac{\Sigma V_{max}}{n}} \rfloor = \sum_{i \in S*} \lfloor \frac{n v_i}{\Sigma V_{max}} \geq \sum_{i \in S*} (\frac{n v_i}{\Sigma v_{max}} -1) = \frac{n}{\Sigma V_{max}} * \sum_{i \in S*} v_i 0 \sum_{i \in S*} 1 = \frac{n}{\Sigma V_{max}} OPT - \sum_{i \in S*} 1 \geq \frac{n}{\Sigma V_{max}} OPT - n$$
  3. So $\hat{V}(S) \geq \frac{n}{\Sigma V_{max}} * OPT - n$
  4. $v(S) = \sum_{i \in S} v_i \geq \sum_{i \in S} \hat{v}_i \frac{\Sigma V_{max}}{n} = \frac{\Sigma V_{max}}{n} \sum_{i \in S} \hat{v}_i = \frac{\Sigma V_{max}}{n} \hat{v}(S) \geq \frac{\Sigma V_{max}}{n}(\frac{n}{\Sigma V_{max} v(S*)} -n) = OPT - \Sigma*V_{max} \geq OPT - \Sigma* OPT = (1- \Sigma) OPT$