Wikinotes

Maintainer: admin

The exam will take place on Friday, April 25, in Studio 2 of the gym. Bring a faculty standard calculator.

There will be 6 questions, 3 on theory, 3 on applications.

1 Integer programming
- 1.1 General solution
- 1.2 Examples
2 The traveling salesman problem
- 2.1 General solution
- 2.2 Examples
3 The transportation problem
- 3.1 General solution
- 3.2 Example
4 Markov decision processes
- 4.1 Examples
  - 4.1.1 With discounting
  - 4.1.2 Without discounting
5 Semi-Markov decision processes
6 Queueing theory
7 Calculus of variations
- 7.1 Example
  - 7.1.1 Unconstrained

Winter 2010 final, question 2

From the lecture on April 8:

Maximise

$$W = 0.5x^2 + 3y + 2z$$

such that

$$3x + 4y+3z \leq 10 \quad \text{and} \quad 2x+3y+3z \leq 10$$

where $x, y, z$ are all integers $\geq 0$.

Use D.P. formulation, subject to the first constraint.

Let's look at our options. $x$ can be 0, 1, 2, or 3 (any more and you'd have $3x > 10$, which wouldn't satisfy the first constraint even if the others were 0). $y$ can be 0, 1 or 2. $z$ can be 0, 1, 2, or 3.

First, we calculate $f_1(x') = 0.5x^2$ for each value of $x' \leq 10$ such that $3x \leq x'$. We have:

$f_1(0) = 0$ ($x = 0$)
$f_1(1) = 0$ ($x = 0$)
$f_1(2) = 0$ ($x = 0$)
$f_1(3) = 0.5$ ($x = 1$)
$f_1(4) = 0.5$ ($x = 1$)
$f_1(5) = 0.5$ ($x = 1$)
$f_1(6) = 2$ ($x = 2$)
$f_1(7) = 2$ ($x = 2$)
$f_1(8) = 2$ ($x = 2$)
$f_1(9) = 4.5$ ($x = 3$)
$f_1(10) = 4.5$ ($x = 3$)

Next, we calculate values of $\displaystyle f_2(x) = \max_{y=0,1,2} \bigg [ 3y + f_1(x-4y) \bigg ]$, marking with a $\bigstar$ the maximum for each $x$-value¹:

$f_2(0) = 0$
- when $y=0$, $3y=0$ and $f_1(x-4y) = f_1(0) = 0$ $\bigstar$
- when $y=1$, $f_1(x-4y) = f_1(-4)$ which isn't defined
- when $y=2$, also undefined
$f_2(1) = 0$
- when $y=0$, $3y=0$ and $f_1(x-4y) = f_1(1) = 0$ $\bigstar$
- when $y=1$, $f_1(x-4y) = f_1(-3)$ which isn't defined
- when $y=2$, also undefined
$f_2(2) = 0$
- when $y=0$, $3y=0$ and $f_1(x-4y) = f_1(2) = 0$ $\bigstar$
- when $y=1$, $f_1(x-4y) = f_1(-2)$ which isn't defined
- when $y=2$, also undefined
$f_2(3) = 0.5$
- when $y=0$, $3y=0$ and $f_1(x-4y) = f_1(3) = 0.5$ thus $3y + f_1(x-4y) = 0.5$ $\bigstar$
- when $y=1$, $f_1(x-4y) = f_1(-1)$ which isn't defined
- when $y=2$, also undefined
$f_2(4) = 3$
- when $y=0$, $3y=0$ and $f_1(x-4y) = f_1(4) = 0.5$, thus $3y + f_1(x-4y) = 0.5$
- when $y=1$, $3y=3$ and $f_1(x-4y) = f_1(0) = 0$, thus $3y + f_1(x-4y) = 3$ $\bigstar$
- when $y=2$, $f_1(x-4y) = f_1(-4)$ which isn't defined
$f_2(5) = 3$
- when $y=0$, $3y=0$ and $f_1(x-4y) = f_1(5) = 0.5$, thus $3y + f_1(x-4y) = 0.5$
- when $y=1$, $3y=3$ and $f_1(x-4y) = f_1(1) = 0$, thus $3y + f_1(x-4y) = 3$ $\bigstar$
- when $y=2$, $f_1(x-4y) = f_1(-3)$ which isn't defined
$f_2(6) = 3$
- when $y=0$, $3y=0$ and $f_1(x-4y) = f_1(6) = 2$, thus $3y + f_1(x-4y) = 2$
- when $y=1$, $3y=3$ and $f_1(x-4y) = f_1(2) = 0$, thus $3y + f_1(x-4y) = 3$ $\bigstar$
- when $y=2$, $f_1(x-4y) = f_1(-23)$ which isn't defined
$f_2(7) = 3.5$
- when $y=0$, $3y=0$ and $f_1(x-4y) = f_1(7) = 2$, thus $3y + f_1(x-4y) = 2$
- when $y=1$, $3y=3$ and $f_1(x-4y) = f_1(3) = 0.5$, thus $3y + f_1(x-4y) = 3.5$ $\bigstar$
- when $y=2$, $f_1(x-4y) = f_1(-1)$ which isn't defined
$f_2(8) = 6$
- when $y=0$, $3y=0$ and $f_1(x-4y) = f_1(8) = 2$, thus $3y + f_1(x-4y) = 2$
- when $y=1$, $3y=3$ and $f_1(x-4y) = f_1(4) = 0.5$, thus $3y + f_1(x-4y) = 3.5$
- when $y=2$, $3y = 6$ and $f_1(x-4y) = f_1(0) = 0$ thus $3y + f_1(x-4y) = 6$ $\bigstar$
$f_2(9) = 6$
- when $y=0$, $3y=0$ and $f_1(x-4y) = f_1(9) = 4.5$, thus $3y + f_1(x-4y) = 4.5$
- when $y=1$, $3y=3$ and $f_1(x-4y) = f_1(5) = 0.5$, thus $3y + f_1(x-4y) = 3.5$
- when $y=2$, $3y = 6$ and $f_1(x-4y) = f_1(1) = 0$ thus $3y + f_1(x-4y) = 6$ $\bigstar$
$f_2(10) = 6$
- when $y=0$, $3y=0$ and $f_1(x-4y) = f_1(10) = 4.5$, thus $3y + f_1(x-4y) = 4.5$
- when $y=1$, $3y=3$ and $f_1(x-4y) = f_1(6) = 2$, thus 3y + f_1(x-4y) = 5$
- when $y=2$, $3y = 6$ and $f_1(x-4y) = f_1(2) = 0$ thus $3y + f_1(x-4y) = 6$ $\bigstar$

Finally, we calculate $f_3(10)$. Note that we don't need to calculate $f_3(x)$ for values of $x$ other than 10 - this is the last one, because there are only 3 variables.

$$f_3(10) = \max_{z \in 0, 1, 2, 3} [ 2z + f_2(10 - 3z)] = \begin{cases} z=0: & 0 + f_2(10) = 0+6 = 6 \\ z=1: & 2 + f_2(7) = 2 + 3.5 = 5.5 \\ z=2: & 4 + f_2(4) = 4 + 3 = 7 \, \bigstar \\ z=3: & 6 + f_2(1) = 6 + 0 = 6 \end{cases} $$

Thus the maximum value for $W$ is attained when $z=2$. To find what the values of $x$ and $y$ should be, look at how $f_2(4)$ was obtained. We see that $y=1$ and that $f_1(0)$ was evaluated, which tells us that $x=0$. Thus $W = 0.5x^2 + 3y + 2z = 0 + 3 + 4 = 7$, as we calculated.

So we have $x=0$, $y=1$, $z=2$.

For a sanity check, let's make sure that this satisfies both constraints. We have $3x+4y+3z = 0 + 4 + 6 = 10 \leq 10$ $\checkmark$. Also, we have $2x + 3y + 3z = 0 + 3 + 6 = 9 \leq 10$ $\checkmark$. So both constraints are satisfied. $\blacksquare$

The formula should be given. Just do sets of size $k$ for $k=0, 1, 2, 3, \ldots$ until you've covered all the possibilities. Once you get to the end, backtrack - the first arg of the last $f$-value is the last node, the first arg of the second to last $f$-value is the second to last node, etc.

Winter 2010 final, question 3

You're given a table. Hopefully the formula to use is given, but it should be

$$f_N(x_i) = \min_{0 \leq x_{1N} \leq x_i} \bigg [ g_{1N}(x_{1N}) + g_{2N} (r_N- x_{1N}) + f_{N-1}(x_i-x_{1N} ) \bigg ]$$

where $g_{ij}(x_{ij})$ is the total cost of sending $x_{ij}$ units from depot $i$ to demand point $j$ (so the setup cost $c_{ij}$ if necessary, plus the coefficient $a_{ij}$ times $x_{ij}$), and $r_N$ is the demand needed at demand point $N$.

You iterate through the demand points. For each demand point, iterate through all the possible ways of allocating the $r_N$ resources provided by the depots.

Assignment 5, question 1.

Given in class (Thursday, February 13):

State $i$	Policy $k$	$p_{i1}^k$	$p_{i2}^k$	$q_i^k$ (expected reward)
1	1	0.6	0.4	4.6
1	2	0.7	0.3	3.2
2	1	0.4	0.6	2.4
2	2	0.5	0.5	1.5

$\alpha = 0.9$

At time step 1, we see that policy 1 is best for both states.

Fuck it, I'll do it later if there's time

Given in class (Tuesday April 8):

State $i$	Policy $k$	$p_{i1}^k$	$p_{i2}^k$	$q_i^k$ (expected reward)
1	1	0.7	0.3	6
1	2	0.2	0.8	4
2	1	0.5	0.5	3
2	2	0.6	0.4	2

Evaluate $f_n(i)$ and the optimal policy $n=1,2,3,\ldots$, where

$$f_n(i) = \max_k \left [ q_i^k + \sum_{j=1}^N p_{ij}^k f_{n-1}(j) \right ] \tag{with $f_0=0$}$$

At time step 1:

If we're at state 1:
- policy 1 reward: 6 $\bigstar$
- policy 2 reward: 4
If we're at state 2:
- policy 1 reward: 3 $\bigstar$
- policy 2 reward: 2

Thus $f_1(1) = 6$ (policy 1) and $f_1(2) = 3$ (policy 1).

Now, time step 2:

If we're at state 1:
- Policy 1: $0.7 \cdot f_1(1) + 0.3 \cdot f_1(2) + 6 = 0.7\cdot 6 + 0.3 \cdot 3 + 6 = 11.1$ $\bigstar$
- Policy 2: $0.2 \cdot f_1(1) + 0.8 \cdot f_1(2) + 4 = 0.2 \cdot 6 + 0.8 \cdot 3 + 4 = 7.6$
If we're at state 2:
- Policy 1: $0.5 \cdot f_1(1) + 0.5 \cdot f_1(2) + 3 = 0.5 \cdot 6 + 0.5 \cdot 3 + 3 = 7.5$ $\bigstar$
- Policy 2: $0.6 \cdot f_1(1) + 0.4 \cdot f_1(2) + 2 = 0.6\cdot 6 + 0.4 \cdot 3 + 2= 6.8$

So $f_2(1) = 11.1$ (policy 1) and $f_2(2) = 7.5$ (policy 1).

Next, time step 3:

State 1:
- Policy 1: $0.7 \cdot f_2(1) + 0.3 \cdot f_2(2) + 6 = 0.7\cdot 11.1 + 0.3 \cdot 7.5 + 6 = 16.02$ $\bigstar$
- Policy 2: $0.2 \cdot f_2(1) + 0.8 \cdot f_2(2) + 4 = 0.2 \cdot 11.1 + 0.8 \cdot 7.5 + 4 = 12.22$
If we're at state 2:
- Policy 1: $0.5 \cdot f_2(1) + 0.5 \cdot f_2(2) + 3 = 0.5 \cdot 11.1 + 0.5 \cdot 7.5 + 3 = 12.3$ $\bigstar$
- Policy 2: $0.6 \cdot f_2(1) + 0.4 \cdot f_2(2) + 2 = 0.6\cdot 11.1 + 0.4 \cdot 7.5 + 2= 11.66$

So $f_3(1) = 16.02$ (policy 1) and $f_3(2) = 12.3$ (policy 1).

Now, let's verify that the optimal policy for $n=3$ is the optimal policy as $n \to \infty$. You're basically solving some recurrence relations. Here's are the actual recurrence relations (derivation omitted):

$$\underbrace{g + c_1 = 6 + 0.7c_1 + 0.3c_2}_{\text{for state 1}} \quad \text{and} \quad \underbrace{g + c_2 = 3 + 0.5c_1 + 0.5c_2}_{\text{for state 2}} \tag{equations (a) and (b)}$$

where $g$ is the gain derived from choosing the optimal policy at $n=3$ (which happens to be policy 1 for both states) at each state as $n \to \infty$, and $c_i$ is $\displaystyle \lim_{n\to\infty} f_n(i)$ (so the expected reward, over an infinite horizon).

Then, to solve for $c_1$ and $c_2$, we subtract the two equations:

$$c_1 - c_2 = 6 + 0.7c_1 + 0.3c_2 - 3 - 0.5c_1 - 0.5c_2 = 3 + 0.2c_1 -0.2c_2$$

Thus we have $0.8c_1 -0.8c_2 = 3$ and so $c_1-c_2 = 30/8 = 3.75$. So the expected gain in choosing policy 1 (over policy 2) for each time step at each state is 3.75. Next, we add (a) and (b):

$$2g + c_1 + c_2 = 9 + 1.2c_1 + 0.8c_2$$

Rearranging, we get $2g = 9 + 0.2c_1-0.2c_2 = 9+0.2(c_1-c_2) = 9+0.2(3.75) = 9.75$. So $g=9.75/2 = 4.875$. I don't know what we do with this though.

Then, we use the PIR to find the optimal policy ???

$$PIR = \max \left [ q_i^k + \sum_{j=1}^n P_{ij}^k c_j \right ]$$

At state 1, you have $6 + 0.7(3.75) = 8.625$ (so sub in $c_1-c_2$ for $c_1$, maybe $c_2=0$?) for policy 1, and $3+0.2(3.75) = 3.75$ for policy 2. So obviously policy 1 is optimal here.

At state 2, you have $3 + 0.5(3.75) = 4.875$ for policy 1, and $2 + 0.6(3.75)= 4.25$ for policy 2. Again, policy 1 is the optimal policy.

This confirms that policy 1 at both states is optimal over an infinite horizon.

See the notes.

Maximise:

$$\int_0^a x^3(y')^2 \, dx \tag{$y(0) = 0$, $y(a) = b$}$$

Recall Euler's equation:

$$F_y = -\frac{d}{dx} F_{y'}$$

$F(x, y, y') = x^3(y')^2$. $F_y = 0$, $F_{y'}$ = $2x^3y'$. Then

$$\frac{d}{dx} F_{y'} = \frac{d}{dx} 2x^3y' = 6x^2y' + 2x^3y'' \tag{product rule}$$

which is equal to 0 because $F_y$ is. So we have the ODE

$$6x^2y' = -2x^3y''$$

We can rearrange this to get

$$xy'' + 3y' = 0$$

This is a linear ODE! Apparently the solution is $y = c_1 + c_2x^{-2}$ but I'm not sure how to derive that. How do I ODE?

Note that the $x$-values here don't actually correspond to the values of $x$ the variable that we're trying to find a value for (in $W$). It should really be called something other than $x$, to avoid confusion, but Sancho uses $x$ and it's probably even more confusing to change it now. ↩

HTSEFP: How to solve every problem

1Integer programming¶

1.1General solution¶

1.2Examples¶

2The traveling salesman problem¶

2.1General solution¶

2.2Examples¶

3The transportation problem¶

3.1General solution¶

3.2Example¶

4Markov decision processes¶

4.1Examples¶

4.1.1With discounting¶

4.1.2Without discounting¶

5Semi-Markov decision processes¶

6Queueing theory¶

7Calculus of variations¶

7.1Example¶

7.1.1Unconstrained¶