1Set Cover¶
Given n elements $V= \{v_1, \ldots, v_n\}$ and m sets $S_1, \ldots, S_m \subset V$ with cost$(S_i) = c_i$
Set cover: Find a min cost collection of sets that cover every element.
Unusual one: Computer Virus. Elements are known virus (5000 or so). Set are "bad" substrings that are found in viruses but not in normal code. You want to use as few strings as possible but cover all the viruses.
1.1Greedy Solution¶
- $\mathcal{G} = \emptyset$
- Pick the set $S_i$ that covers uncovered elements at the lowest average cost
- Set $\mathcal{G} = \mathcal{G} \cup S_i$
- Repeat until $\mathcal{G} covers all elements
So want to pick $\min_{S_i} c_i$/uncovered at time t $\cap S_i$
Theorem: Greedy is $O(\log n)$ - approximation algorithm
Proof: Let Optimal solution be $\{S*_1 , \ldots, S*_i \}$ and Greedy solution be $\{S_1 , \ldots, S_n\}$
Label the elements $v_1, \ldots, v_n$ according to the order they were covered by the greedy algorithm.
Consider $v_i$ At the time it was covered there were at least n-i+1 be uncovered elements. The optimal solution covers $\{v_i, \ldots, v_n\}$. The optimal solution has cost = OPT = $\sum^n_{i=1} c(S*_i)$
So one of $S*_1 \ldots S*_i$ can be used at an average cost at most $\frac{opt}{n-i+1}$
The cost(Greedy) = $\sum^n_{j=1} c(S_j) = \sum^n_{i=1}(\mathrm{average cost to cover element v_i} \leq \sum^n_{i=1} \frac{opt}{n-i+1} = OPT \sum^n_{i=1} \frac{1}{i} = opt(1 + 1/2 + 1/3 + \ldots + 1/n) = opt * Hn$ (harmonic number)
Now $\ln n \leq Hn \leq 1 + \ln n$ QEDB
This analysis is basically tight.
Theorem: Unless P = NP, no approximation algorithm for set cover can do better than $\alpha = c \log n$ for some constant $c$.
2Knapsack Problem¶
We have $n$ objects with values $v_1 \ldots v_n$, weights $w_1 \ldots w_n$ with bag of capacity $W$. Objective: maximize the value of what we can carry
Optimization version is NP-hard but it has a $O(nW)$ pseudo-polytime Dynamic Programming algorithm. In fact, Knapsack is one of the easiest NP-Hard problems to approximate
2.1Approximation Schemes¶
An algorithm $\mathcal{A}$ is polytime approximation scheme PTAS for a problem if for any instance $I$ and for any fixed $\Sigma > 0$
- $\mathcal{A}$ gives at least $(1 - \Sigma)$ of optimal
- $\mathcal{A}$ runs in time polynomial in $|I|$
An algorithm $\mathcal{A}$ is ''fully'' polytime approximation scheme FPTAS for a problem if for any instance $I$ and for any fixed $\Sigma > 0$
- $\mathcal{A}$ gives at least $(1 - \Sigma)$ of optimal
- $\mathcal{A}$ runs in time polynomial in $|I|$ and poly in $\frac{1}{\Sigma}$
There is a FPTAS for Knapsack. TO do this we use a different Dynamic Program
Let $w(i,V)$ = minimum of weight of a subset of items $\{1, \ldots, i\}$ that give value $\geq V$ If no subset exists than $= \infty$. These are recursively calculated $w(i,V) = \min [w(i-1, V), w_i + w(i-1, V-v_i)] with base case $w(i,V) = 0$
Runtime is $O(n^2 V_{max}$ where $V_{max}$ = max $V_i$. That is $V \leq n * V_{max}$
How does this Dynamic Program help? If the $v_i$ are small it is polytime. If the $v_i$ are big maybe we can scale them down without losing too much accuracy. So we set $\hat{v}_i = \lfloor \frac{v_i}{k} \rfloor$
Approximation algorithm: scale them!. Output the solution S to this "new" dynamic program.
- This is a feasible solution as the weights are the same as before.
- We set $k = \frac{\Sigma * V_{max}}{n}$. This is polytime. $\mathcal{A} in time $O(n2 \hat{V}{max} = O(n^2 \frac{V_max}{\frac{\Sigma V{max}}{n}}) = O(n3 \frac{1}{\Sigma}$
Why does it get within $\Sigma$ of OPT? ie: $v(S) \geq (1- \Sigma) v(S*)$ where $S* \leq [n] = \{1, \ldots n\}$ was OPT for original DP
S is optimal for $\hat{V}$
- $\hat{v}(S) \geq \hat{V}(S*)$ or S optimal for DP with $\hat{V}$
- $\hat{V*} = \sum_{i \in S*} \hat{v_i} = \sum_{i \in S*} \lfloor \frac{v_i}{\frac{\Sigma V_{max}}{n}} \rfloor = \sum_{i \in S*} \lfloor \frac{n v_i}{\Sigma V_{max}} \geq \sum_{i \in S*} (\frac{n v_i}{\Sigma v_{max}} -1) = \frac{n}{\Sigma V_{max}} * \sum_{i \in S*} v_i 0 \sum_{i \in S*} 1 = \frac{n}{\Sigma V_{max}} OPT - \sum_{i \in S*} 1 \geq \frac{n}{\Sigma V_{max}} OPT - n$$
- So $\hat{V}(S) \geq \frac{n}{\Sigma V_{max}} * OPT - n$
- $v(S) = \sum_{i \in S} v_i \geq \sum_{i \in S} \hat{v}_i \frac{\Sigma V_{max}}{n} = \frac{\Sigma V_{max}}{n} \sum_{i \in S} \hat{v}_i = \frac{\Sigma V_{max}}{n} \hat{v}(S) \geq \frac{\Sigma V_{max}}{n}(\frac{n}{\Sigma V_{max} v(S*)} -n) = OPT - \Sigma*V_{max} \geq OPT - \Sigma* OPT = (1- \Sigma) OPT$
QEDB