The exam will take place on Friday, April 25, in Studio 2 of the gym. Bring a faculty standard calculator.
There will be 6 questions, 3 on theory, 3 on applications.
1Integer programming¶
1.1General solution¶
1.2Examples¶
From the lecture on April 8:
Maximise
$$W = 0.5x^2 + 3y + 2z$$
such that
$$3x + 4y+3z \leq 10 \quad \text{and} \quad 2x+3y+3z \leq 10$$
where $x, y, z$ are all integers $\geq 0$.
Use D.P. formulation, subject to the first constraint.
Let's look at our options. $x$ can be 0, 1, 2, or 3 (any more and you'd have $3x > 10$, which wouldn't satisfy the first constraint even if the others were 0). $y$ can be 0, 1 or 2. $z$ can be 0, 1, 2, or 3.
First, we calculate $f_1(x') = 0.5x^2$ for each value of $x' \leq 10$ such that $3x \leq x'$. We have:
- $f_1(0) = 0$ ($x = 0$)
- $f_1(1) = 0$ ($x = 0$)
- $f_1(2) = 0$ ($x = 0$)
- $f_1(3) = 0.5$ ($x = 1$)
- $f_1(4) = 0.5$ ($x = 1$)
- $f_1(5) = 0.5$ ($x = 1$)
- $f_1(6) = 2$ ($x = 2$)
- $f_1(7) = 2$ ($x = 2$)
- $f_1(8) = 2$ ($x = 2$)
- $f_1(9) = 4.5$ ($x = 3$)
- $f_1(10) = 4.5$ ($x = 3$)
Next, we calculate values of $\displaystyle f_2(x) = \max_{y=0,1,2} \bigg [ 3y + f_1(x-4y) \bigg ]$, marking with a $\bigstar$ the maximum for each $x$-value1:
- $f_2(0) = 0$
- when $y=0$, $3y=0$ and $f_1(x-4y) = f_1(0) = 0$ $\bigstar$
- when $y=1$, $f_1(x-4y) = f_1(-4)$ which isn't defined
- when $y=2$, also undefined
- $f_2(1) = 0$
- when $y=0$, $3y=0$ and $f_1(x-4y) = f_1(1) = 0$ $\bigstar$
- when $y=1$, $f_1(x-4y) = f_1(-3)$ which isn't defined
- when $y=2$, also undefined
- $f_2(2) = 0$
- when $y=0$, $3y=0$ and $f_1(x-4y) = f_1(2) = 0$ $\bigstar$
- when $y=1$, $f_1(x-4y) = f_1(-2)$ which isn't defined
- when $y=2$, also undefined
- $f_2(3) = 0.5$
- when $y=0$, $3y=0$ and $f_1(x-4y) = f_1(3) = 0.5$ thus $3y + f_1(x-4y) = 0.5$ $\bigstar$
- when $y=1$, $f_1(x-4y) = f_1(-1)$ which isn't defined
- when $y=2$, also undefined
- $f_2(4) = 3$
- when $y=0$, $3y=0$ and $f_1(x-4y) = f_1(4) = 0.5$, thus $3y + f_1(x-4y) = 0.5$
- when $y=1$, $3y=3$ and $f_1(x-4y) = f_1(0) = 0$, thus $3y + f_1(x-4y) = 3$ $\bigstar$
- when $y=2$, $f_1(x-4y) = f_1(-4)$ which isn't defined
- $f_2(5) = 3$
- when $y=0$, $3y=0$ and $f_1(x-4y) = f_1(5) = 0.5$, thus $3y + f_1(x-4y) = 0.5$
- when $y=1$, $3y=3$ and $f_1(x-4y) = f_1(1) = 0$, thus $3y + f_1(x-4y) = 3$ $\bigstar$
- when $y=2$, $f_1(x-4y) = f_1(-3)$ which isn't defined
- $f_2(6) = 3$
- when $y=0$, $3y=0$ and $f_1(x-4y) = f_1(6) = 2$, thus $3y + f_1(x-4y) = 2$
- when $y=1$, $3y=3$ and $f_1(x-4y) = f_1(2) = 0$, thus $3y + f_1(x-4y) = 3$ $\bigstar$
- when $y=2$, $f_1(x-4y) = f_1(-23)$ which isn't defined
- $f_2(7) = 3.5$
- when $y=0$, $3y=0$ and $f_1(x-4y) = f_1(7) = 2$, thus $3y + f_1(x-4y) = 2$
- when $y=1$, $3y=3$ and $f_1(x-4y) = f_1(3) = 0.5$, thus $3y + f_1(x-4y) = 3.5$ $\bigstar$
- when $y=2$, $f_1(x-4y) = f_1(-1)$ which isn't defined
- $f_2(8) = 6$
- when $y=0$, $3y=0$ and $f_1(x-4y) = f_1(8) = 2$, thus $3y + f_1(x-4y) = 2$
- when $y=1$, $3y=3$ and $f_1(x-4y) = f_1(4) = 0.5$, thus $3y + f_1(x-4y) = 3.5$
- when $y=2$, $3y = 6$ and $f_1(x-4y) = f_1(0) = 0$ thus $3y + f_1(x-4y) = 6$ $\bigstar$
- $f_2(9) = 6$
- when $y=0$, $3y=0$ and $f_1(x-4y) = f_1(9) = 4.5$, thus $3y + f_1(x-4y) = 4.5$
- when $y=1$, $3y=3$ and $f_1(x-4y) = f_1(5) = 0.5$, thus $3y + f_1(x-4y) = 3.5$
- when $y=2$, $3y = 6$ and $f_1(x-4y) = f_1(1) = 0$ thus $3y + f_1(x-4y) = 6$ $\bigstar$
- $f_2(10) = 6$
- when $y=0$, $3y=0$ and $f_1(x-4y) = f_1(10) = 4.5$, thus $3y + f_1(x-4y) = 4.5$
- when $y=1$, $3y=3$ and $f_1(x-4y) = f_1(6) = 2$, thus 3y + f_1(x-4y) = 5$
- when $y=2$, $3y = 6$ and $f_1(x-4y) = f_1(2) = 0$ thus $3y + f_1(x-4y) = 6$ $\bigstar$
Finally, we calculate $f_3(10)$. Note that we don't need to calculate $f_3(x)$ for values of $x$ other than 10 - this is the last one, because there are only 3 variables.
$$f_3(10) = \max_{z \in 0, 1, 2, 3} [ 2z + f_2(10 - 3z)] = \begin{cases} z=0: & 0 + f_2(10) = 0+6 = 6 \\ z=1: & 2 + f_2(7) = 2 + 3.5 = 5.5 \\ z=2: & 4 + f_2(4) = 4 + 3 = 7 \, \bigstar \\ z=3: & 6 + f_2(1) = 6 + 0 = 6 \end{cases} $$
Thus the maximum value for $W$ is attained when $z=2$. To find what the values of $x$ and $y$ should be, look at how $f_2(4)$ was obtained. We see that $y=1$ and that $f_1(0)$ was evaluated, which tells us that $x=0$. Thus $W = 0.5x^2 + 3y + 2z = 0 + 3 + 4 = 7$, as we calculated.
So we have $x=0$, $y=1$, $z=2$.
For a sanity check, let's make sure that this satisfies both constraints. We have $3x+4y+3z = 0 + 4 + 6 = 10 \leq 10$ $\checkmark$. Also, we have $2x + 3y + 3z = 0 + 3 + 6 = 9 \leq 10$ $\checkmark$. So both constraints are satisfied. $\blacksquare$
2The traveling salesman problem¶
2.1General solution¶
The formula should be given. Just do sets of size $k$ for $k=0, 1, 2, 3, \ldots$ until you've covered all the possibilities. Once you get to the end, backtrack - the first arg of the last $f$-value is the last node, the first arg of the second to last $f$-value is the second to last node, etc.
2.2Examples¶
3The transportation problem¶
3.1General solution¶
You're given a table. Hopefully the formula to use is given, but it should be
$$f_N(x_i) = \min_{0 \leq x_{1N} \leq x_i} \bigg [ g_{1N}(x_{1N}) + g_{2N} (r_N- x_{1N}) + f_{N-1}(x_i-x_{1N} ) \bigg ]$$
where $g_{ij}(x_{ij})$ is the total cost of sending $x_{ij}$ units from depot $i$ to demand point $j$ (so the setup cost $c_{ij}$ if necessary, plus the coefficient $a_{ij}$ times $x_{ij}$), and $r_N$ is the demand needed at demand point $N$.
You iterate through the demand points. For each demand point, iterate through all the possible ways of allocating the $r_N$ resources provided by the depots.
3.2Example¶
Assignment 5, question 1.
4Markov decision processes¶
4.1Examples¶
4.1.1With discounting¶
Given in class (Thursday, February 13):
State $i$ | Policy $k$ | $p_{i1}^k$ | $p_{i2}^k$ | $q_i^k$ (expected reward) |
---|---|---|---|---|
1 | 1 | 0.6 | 0.4 | 4.6 |
1 | 2 | 0.7 | 0.3 | 3.2 |
2 | 1 | 0.4 | 0.6 | 2.4 |
2 | 2 | 0.5 | 0.5 | 1.5 |
$\alpha = 0.9$
At time step 1, we see that policy 1 is best for both states.
Fuck it, I'll do it later if there's time
4.1.2Without discounting¶
Given in class (Tuesday April 8):
State $i$ | Policy $k$ | $p_{i1}^k$ | $p_{i2}^k$ | $q_i^k$ (expected reward) |
---|---|---|---|---|
1 | 1 | 0.7 | 0.3 | 6 |
1 | 2 | 0.2 | 0.8 | 4 |
2 | 1 | 0.5 | 0.5 | 3 |
2 | 2 | 0.6 | 0.4 | 2 |
Evaluate $f_n(i)$ and the optimal policy $n=1,2,3,\ldots$, where
$$f_n(i) = \max_k \left [ q_i^k + \sum_{j=1}^N p_{ij}^k f_{n-1}(j) \right ] \tag{with $f_0=0$}$$
At time step 1:
- If we're at state 1:
- policy 1 reward: 6 $\bigstar$
- policy 2 reward: 4
- If we're at state 2:
- policy 1 reward: 3 $\bigstar$
- policy 2 reward: 2
Thus $f_1(1) = 6$ (policy 1) and $f_1(2) = 3$ (policy 1).
Now, time step 2:
- If we're at state 1:
- Policy 1: $0.7 \cdot f_1(1) + 0.3 \cdot f_1(2) + 6 = 0.7\cdot 6 + 0.3 \cdot 3 + 6 = 11.1$ $\bigstar$
- Policy 2: $0.2 \cdot f_1(1) + 0.8 \cdot f_1(2) + 4 = 0.2 \cdot 6 + 0.8 \cdot 3 + 4 = 7.6$
- If we're at state 2:
- Policy 1: $0.5 \cdot f_1(1) + 0.5 \cdot f_1(2) + 3 = 0.5 \cdot 6 + 0.5 \cdot 3 + 3 = 7.5$ $\bigstar$
- Policy 2: $0.6 \cdot f_1(1) + 0.4 \cdot f_1(2) + 2 = 0.6\cdot 6 + 0.4 \cdot 3 + 2= 6.8$
So $f_2(1) = 11.1$ (policy 1) and $f_2(2) = 7.5$ (policy 1).
Next, time step 3:
- State 1:
- Policy 1: $0.7 \cdot f_2(1) + 0.3 \cdot f_2(2) + 6 = 0.7\cdot 11.1 + 0.3 \cdot 7.5 + 6 = 16.02$ $\bigstar$
- Policy 2: $0.2 \cdot f_2(1) + 0.8 \cdot f_2(2) + 4 = 0.2 \cdot 11.1 + 0.8 \cdot 7.5 + 4 = 12.22$
- If we're at state 2:
- Policy 1: $0.5 \cdot f_2(1) + 0.5 \cdot f_2(2) + 3 = 0.5 \cdot 11.1 + 0.5 \cdot 7.5 + 3 = 12.3$ $\bigstar$
- Policy 2: $0.6 \cdot f_2(1) + 0.4 \cdot f_2(2) + 2 = 0.6\cdot 11.1 + 0.4 \cdot 7.5 + 2= 11.66$
So $f_3(1) = 16.02$ (policy 1) and $f_3(2) = 12.3$ (policy 1).
Now, let's verify that the optimal policy for $n=3$ is the optimal policy as $n \to \infty$. You're basically solving some recurrence relations. Here's are the actual recurrence relations (derivation omitted):
$$\underbrace{g + c_1 = 6 + 0.7c_1 + 0.3c_2}_{\text{for state 1}} \quad \text{and} \quad \underbrace{g + c_2 = 3 + 0.5c_1 + 0.5c_2}_{\text{for state 2}} \tag{equations (a) and (b)}$$
where $g$ is the gain derived from choosing the optimal policy at $n=3$ (which happens to be policy 1 for both states) at each state as $n \to \infty$, and $c_i$ is $\displaystyle \lim_{n\to\infty} f_n(i)$ (so the expected reward, over an infinite horizon).
Then, to solve for $c_1$ and $c_2$, we subtract the two equations:
$$c_1 - c_2 = 6 + 0.7c_1 + 0.3c_2 - 3 - 0.5c_1 - 0.5c_2 = 3 + 0.2c_1 -0.2c_2$$
Thus we have $0.8c_1 -0.8c_2 = 3$ and so $c_1-c_2 = 30/8 = 3.75$. So the expected gain in choosing policy 1 (over policy 2) for each time step at each state is 3.75. Next, we add (a) and (b):
$$2g + c_1 + c_2 = 9 + 1.2c_1 + 0.8c_2$$
Rearranging, we get $2g = 9 + 0.2c_1-0.2c_2 = 9+0.2(c_1-c_2) = 9+0.2(3.75) = 9.75$. So $g=9.75/2 = 4.875$. I don't know what we do with this though.
Then, we use the PIR to find the optimal policy ???
$$PIR = \max \left [ q_i^k + \sum_{j=1}^n P_{ij}^k c_j \right ]$$
At state 1, you have $6 + 0.7(3.75) = 8.625$ (so sub in $c_1-c_2$ for $c_1$, maybe $c_2=0$?) for policy 1, and $3+0.2(3.75) = 3.75$ for policy 2. So obviously policy 1 is optimal here.
At state 2, you have $3 + 0.5(3.75) = 4.875$ for policy 1, and $2 + 0.6(3.75)= 4.25$ for policy 2. Again, policy 1 is the optimal policy.
This confirms that policy 1 at both states is optimal over an infinite horizon.
5Semi-Markov decision processes¶
6Queueing theory¶
See the notes.
7Calculus of variations¶
See the notes.
7.1Example¶
7.1.1Unconstrained¶
Maximise:
$$\int_0^a x^3(y')^2 \, dx \tag{$y(0) = 0$, $y(a) = b$}$$
Recall Euler's equation:
$$F_y = -\frac{d}{dx} F_{y'}$$
$F(x, y, y') = x^3(y')^2$. $F_y = 0$, $F_{y'}$ = $2x^3y'$. Then
$$\frac{d}{dx} F_{y'} = \frac{d}{dx} 2x^3y' = 6x^2y' + 2x^3y'' \tag{product rule}$$
which is equal to 0 because $F_y$ is. So we have the ODE
$$6x^2y' = -2x^3y''$$
We can rearrange this to get
$$xy'' + 3y' = 0$$
This is a linear ODE! Apparently the solution is $y = c_1 + c_2x^{-2}$ but I'm not sure how to derive that. How do I ODE?
-
Note that the $x$-values here don't actually correspond to the values of $x$ the variable that we're trying to find a value for (in $W$). It should really be called something other than $x$, to avoid confusion, but Sancho uses $x$ and it's probably even more confusing to change it now. ↩