Because the file he posted is literally a .doc (which is pretty appalling for a math course tbh), I'm going to type up all the questions on this page (may be edited for clarity).
The following formulas are provided:
- TSP: $\displaystyle f_i(j, S) = \min_{k \in S} \bigg [ f_{i-1} (k, S-\{k\} + d_{kj} \bigg ]$ with b.c. $f_0(j, -) = d_{1j}$ and $S \subseteq N$
- MDP: $\displaystyle f_n(i) = \max_k \left [ q_i^k + \sum_{j=1}^N p_{ij}^k f_{n-1}(j) \right ]$, and $\displaystyle g+c_i = q_i^k + \sum_{j=1}^N p_{ij}^k c_j$; PIR: $\displaystyle \max \left [ q_i^k + \sum_{i=1}^N p_{ij}^kc_j \right ]$
- Semi-markov decision processes: For a particular policy, $\displaystyle a_i + g\overline{\tau_i} = r_i + \sum_{i=1}^n p_{ij}a_j$. PIR: $\displaystyle \frac{1}{\overline{\tau}^k_i} \left [ r_i + \sum_{i=1}^n p_{ij}a_j-a_i \right ]$
- Queueing theory
- $\displaystyle P_n = C_nP_0$
- $\displaystyle C_n = \frac{\lambda_{n-1} \lambda_{n-2} \cdots \lambda_0}{\mu_n\mu_{n-1} \cdots \mu_0} \, n = 1, 2, \ldots$
- $\displaystyle L = \sum nP_n$
- $\displaystyle L_q = \sum(n-s)P_n$
- $W = L/\overline{\lambda}$
- $L_s = L - L_q$
- $W_q = L_q / \overline{\lambda}$
1Question 1: Shortest path¶
Give the D.P. formulation with appropriate boundary conditions for finding the minimum time to go from any node $i$ ($i = 1, 2, \ldots, N-1$) to the terminal node $N$. Assume $t_{ij}$ is the time needed to go from node $i$ to node $j$. Describe an algorithm that can be used to solve this problem.
1.1Solution¶
Dijkstra's algorithm
1.2Accuracy and discussion¶
2Question 2: Integer programming¶
Give the D.P. formulation for solving the following problem: maximise $z = x_1x_2^2x_3^3x_4^{1/2}$ such that $x_1 + 2x_2 + 3x_3 + x_4 \leq 10$, where the $x_i$'s are all integers with $x_i \geq 1$. Using this formulation, solve the problem.
2.1Solution¶
Let's consider our options, using the fact that $x_i \geq 1$ for all $i$.
- For $x_1$: $x_1 \leq 10 - 2x_2 -3x_3 - x_4 \leq 10 - 2 - 3 - 1 = 4$. Thus $x_1 = 1, 2, 3, 4$.
- For $x_2$: $2x_2 \leq 10 - 1-3-1 = 5$, so $x_2 =1, 2$.
- For $x_3$: $3x_3 \leq 10 - 1 - 2 - 1 = 6$, so $x_3 = 1, 2$.
- For $x_4$: $x_4 \leq 10 - 1 - 2 - 3 = 4$, so $x_4 = 1, 2, 3, 4$.
Next, we calculate $f_1(x) = x_1$ (attempting to maximise it) for each value of $x \leq 10$ such that $x_1 \leq x$. In this question, it happens to be a very straightforward calculation. (It won't always be the case, though.)
- $f_1(0) = 0$ ($x=0$)
- $f_1(1) = 1$ ($x=1$)
- $f_1(2) = 2$ ($x=2$)
- $f_1(3) = 3$ ($x=3$)
- $f_1(4) = 4$ ($x=4$) (don't really need anything after this I think)
- $f_1(5) = 5$ ($x=5$)
- $f_1(6) = 6$ ($x=6$)
- $f_1(7) = 7$ ($x=7$)
- $f_1(8) = 8$ ($x=8$)
- $f_1(9) = 9$ ($x=9$)
- $f_1(10) = 10$ ($x=10$)
Next, we calculate the values of $\displaystyle f_2(x) = \max_{x_2 =1,2} \bigg [x_2^2 \cdot f_1(x-2x_2) \bigg ]$.
- $f_2(0) = $ undefined
- when $x_2 = 1$, $f(x-2x_2) = f(-2)$ which is undefined
- also undefined
- $f_2(1) = $ undefined
- undefined
- what is this like javascript or something
- $f_2(2) = 0$
- when $x_2 = 1$, $f_1(2-2) = f(0) = 0$ so it's just 0
- undefined
- $f_2(3) = 1$
- when $x_2 = 1$, $f_1(3-2) = f_1(1) = 1$ so it's just $x_2^2 \cdot 1= 1$
- when $x_2=2$, undefined
- $f_2(4) = 2$
- when $x_2=1$, $f_1(4-2) = f(2) = 2$ so it's $1^2 \cdot 2 = 2$
- when $x_2 = 2$, $f_1(4-4) = f(0) = 0$ so it's just 0
- $f_2(5) = 4$
- when $x_2=1$, $f_1(5-2) = f(3) = 3$ so it's just $1^2 \cdot 3 = 3$
- when $x_2 = 2$, $f_1(5-4) = f(1) = 1$ so it's $2^2 \cdot 1 = 4$
- $f_2(6) = 8$
- when $x_2=1$, $f_1(6-2) = f(4) = 4$ so it's 4
- when $x_2=2$, $f_1(2) = 2$ so it's $2^2 \cdot 2 = 8$
- $f_2(7) = 12$
- 1: 5
- 2: 12
- probably don't need the rest
Next, we calculate the values of $\displaystyle f_3(x) = \max_{x_3 =1,2} \bigg [x_3^3 \cdot f_2(x-3x_3) \bigg ]$. As before, a lot of these values will be undefined.
- $f_3(0) = $ undefined
- $f_3(1) = $ undefined
- $f_3(2) = $ undefined
- $f_3(3) = $ undefined
- 1: $f_2(0) = $ undefined
- undefined
- $f_3(4) = $ undefined
- 1: $f_2(1) = $ undefined
- undefined
- $f_3(5) = 0$
- 1: $f_2(2) = 0$ so 0
- 2: undefined
- $f_3(6) = 1$
- 1: $f_2(3) = 1$ so $x_3^3 \cdot 1= 1^3 \cdot 1 = 1$
- 2: undefined
- $f_3(7) = 2$
- 1: $f_2(4) = 2$ so $x_3^3 \cdot 2 = 2$
- 2: $f_2(1) =$ undefined
- $f_3(8) = 4$
- 1: $f_2(5) = 4$ so 4
- 2: $f_2(2) = 0$ so 0
- $f_3(9) = 8$
- 1: $f_2(6) = 8$ so 8
- 2: $f_2(3) = 1$ so $2^3 \cdot 1= 8$
- not gonna bother
Finally, we calculate $f_4(10)$:
$$f_4(10) = \max_{x_4 = 1, 2, 3, 4} [ x_4^{1/2} \cdot f_3(10 - x_4)] = \begin{cases} x_4=1: & 1 \cdot f_3(9) = 8 \, \bigstar \\ x_4=2: & \sqrt{2} \cdot f_3(8) = 4\sqrt{2} \\ x_4=3: & \sqrt{3} \cdot f_3(7) = 2\sqrt{3} \\ x_4=4: & 2 \cdot f_3(6) = 2 \end{cases} $$
So $x_4=1$, using $f_3(9)$. There, we can have either $x_3 = 1$ (which uses $f_2(6)$) or $x_3=2$ (which uses $f_2(3)$). For $f_2(6)$, we use $x_2=2$ and $f_1(2)$, which has $x_1=2$; this gives us $x_1=2, x_2=2, x_3=1, x_4=1$. For $f_2(3)$, we use $x_2=1$ and $f_1(1)$, which has $x_1=1$; this gives us $x_1=1$, $x_2=1$, $x_3=2$, $x_4=1$. Both assignments result in $z=8$.
2.2Accuracy and discussion¶
Accurate. Wrote a script to test it.
3Question 3: Traveling salesman¶
In the traveling salesman problem, the distance from city $i$ to city $j$, $d_{ij}$, is given by the following distance table:
$i$ \ $j$ | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|
1 | 0 | 3 | 6 | 5 | 4 |
2 | 1 | 0 | 5 | 4 | 3 |
3 | 5 | 4 | 0 | 2 | 3 |
4 | 3 | 1 | 3 | 0 | 4 |
5 | 5 | 2 | 4 | 1 | 0 |
(a) Find the solution if you start at node 1 and visit all the other nodes but don't need to return to node 1.
(b) Find the solution if you start at node 1 and return to node 1, but also need to visit node 2 before node 3 and node 3 before node 5.
Use the standard D.P. approach to solve this.
3.1Solution¶
(a) Just use the formula given. Do sets of size $k$ for each $f_k$. At the end, you have that $f_3(3, \{2, 4, 5\}) = 10$ is the lowest among all the $f_3$ values. Since 3 is the first argument to $f_3$, 3 is the last node you visit. Now, the value of 10 is obtained using $f_2(4, \{2, 5\})$. So 4 is the second-to-last node. $f_2(4, \{2, 5\})$ is obtained using $f_1(5, \{2\})$, so 5 is the third-to-last node. Then 2 is the first node (after 1). So the path is $1-2-5-4-3$.
(b) There are only 4 possibilities. Use the table to find the total cost for each route. 142351 => 19, 124351 => 18, 123451 => 19, and 123541 => 15 so we use the last route.
3.2Accuracy and discussion¶
Accurate. I wrote a script to test it.
4Question 4: Markov decision processes¶
State $i$ | Policy $k$ | $p_{i1}^k$ | $p_{i2}^k$ | $q_i^k$ (expected reward) |
---|---|---|---|---|
1 | 1 | 0.6 | 0.4 | 6 |
1 | 2 | 0.8 | 0.2 | 4 |
2 | 1 | 0.4 | 0.6 | -3 |
2 | 2 | 0.7 | 0.3 | -5 |
(a) Evaluate $f_n(i)$ and the optimal policy for $n=1,2,3$ (i.e., finite horizon in state $i$).
(b) Verify if the optimal policy for $n=3$ is also the optimal policy for $n$ large (i.e., the infinite horizon, non-discounted case).
4.1Solution¶
(a)
Time step 1:
- State 1:
- Policy 1: 6 $\bigstar$
- Policy 2: 4
- State 2:
- 1: -3 $\bigstar$
- 2: -5
$f_1(1) = 6$ (policy 1), $f_1(2) = -3$ (policy 1).
Time step 2:
- State 1:
- 1: $6 + 0.6\cdot f_1(1) + 0.4\cdot f_1(2) = 6 + 0.6*6 + 0.4 * (-3) = 8.4$ $\bigstar$
- 2: $4 + 0.8(6) - 0.2(3) = 8.2$
- State 2:
- 1: $-3 + 0.4(6) + 0.6(-3) = -2.4$
- 2: $-5 + 0.7(6) + 0.3(-3) = -1.7$ $\bigstar$
$f_2(1) = 8.4$ (policy 1), $f_2(2) = -1.7$ (policy 2).
Time step 3:
- State 1:
- 1: $6 + 0.6\cdot f_2(1) + 0.4\cdot f_2(2) = 6 + 0.6(8.4) + 0.4 * (-1.7) = 10.36$
- 2: $4 + 0.8(8.4) - 0.2(1.7) = 10.38$ $\bigstar$
- State 2:
- 1: $-3 + 0.4(8.4) + 0.6(-1.7) = -0.66$
- 2: $-5 + 0.7(8.4) + 0.3(-1.7) = 0.37$ $\bigstar$
$f_3(1) = 10.38$ (policy 2), $f_3(2) = 0.37$ (policy 2).
(b) At $n=3$, the optimal policy is to use policy 2 at either state. We have
$$g + c_1 = 4 + 0.8c_1 + 0.2c_2 \quad \text{and} \quad g+ c_2 = -5 + 0.7c_1 + 0.3c_2$$
Subtracting the two, we get
$$c_1 - c_2 = 9 + 0.1c_1 -0.1c_2$$
Thus $0.9(c_1-c_2) = 9$ and so $c_1 - c_2=10$. Next, add the equations:
$$2g + (c_1 + c_2) = -1 + 1.5c_1 + 0.5c_2$$
Solving for $g$, we get
$$g = \frac{-1 + 0.5(c_1-c_2)}{2} = -0.5 + 2.5 = 2$$
At state 1, you have $6 + 0.6(10) = 12$ for policy 1, and $4 + 0.8(10) = 12$. So either policy 1 or policy 2 works over an infinite horizon.
At state 2, you have $-3 + 0.4(10) = 1$ and $-5 + 0.7(10) = 2$. So policy 2 is best over an infinite horizon.
4.2Accuracy and discussion¶
Wrote a python script to confirm, as usual. All checks out, even if I'm not really sure what I'm doing when it comes to the infinite horizon part.
5Question 5: Semi-Markovian decision processes¶
The following values are given over an finite horizon.
State $i$ | Policy $k$ | $p_{i1}^k$ | $p_{i2}^k$ | $\overline{\tau}_i^k$ | $r_i^k$ (expected reward) |
---|---|---|---|---|---|
1 | 1 | 0.7 | 0.3 | 3.4 | 50 |
1 | 2 | 0 | 1 | 7 | 70 |
2 | 1 | 0.2 | 0.8 | 9 | 60 |
2 | 2 | 1 | 0 | 5 | 40 |
Find the optimal policy and the average gain per unit time period. $\overline{\tau}_i^k$ is the mean time for which a car is rented in state $i$ using policy $k$.
5.1Solution¶
Suppose we used policy 1 at both states. Then:
State 1: $a_1 + 3.4g = 50 + 0.7a_1 + 0.3a_2$. State 2: $a_2 + 9g = 60 + 0.2a_1 + 0.8a_2$.
Adding them together, we get
$$g = \frac{110 - 0.1(a_1-a_2)}{12.4}$$
Multiply the first by 9, and the second by 3.4, then subtract. We get
$$a_1 - a_2 = \frac{246}{3.38} \approx 72.78$$
Substituting that into $g$, we get $8.28$.
Now suppose we use policy 1 at state 1 and policy 2 at state 2. Then:
State 1: $a_1 + 3.4g = 50 + 0.7a_1 + 0.3a_2$. State 2: $a_2 + 5g = 40 + a_1$.
Adding them together, we get
$$g= \frac{90 + 0.7(a_1-a_2)}{8.4}$$
Multiply the first by 5 and the second by 3.4, then subtract. We get:
$$a_1 - a_2 = \frac{114}{4.7} \approx 23.26$$
Substituting that into $g$, we get $12.74$.
Now suppose we use policy 2 at state 1 and policy 1 at state 2. Then:
State 1: $a_1 + 7g = 70 + a_2$. State 2: $a_2 + 9g = 60 + 0.2a_1 + 0.8a_2$.
Adding them together, we get
$$g = \frac{130-0.8(a_1-a_2)}{16}$$
Multiply the first by 9 and the second by 7, then subtract. We get:
$$a_1-a_2 = \frac{210}{10.4}$$
Substituting that into $g$, we get $7.12$.
Lastly, suppose we use policy 2 at both states. We get:
State 1: $a_1 + 7g = 70 + a_2$. State 2: $a_2 + 5g = 40 + a_1$.
Adding them together, we get
$$g = \frac{110}{12} \approx 9.17$$
I guess we don't need to do the subtraction for this, because everything cancels out?
Conclusion: policy 1 at state 1, policy 2 at state 2. Average gain of 12.74 per unit time period.
5.2Accuracy and discussion¶
no idea but it feels right
6Question 6: Queueing theory¶
Suppose that one repairman has been assigned the responsibility of maintaining four machines. For each machine, the probability distribution of the running time before a breakdown is exponential, with a mean of 9 hours. The repair time also has an exponential distribution, with a mean of 2 hours.
(a) Calculate the steady-state probability distribution and the expected number of machines that are not running.
(b) A second repairman is available whenever more than one of the four machines requires repair. Calculate the steady-state probability distribution and the expected number of machines that are not running.
6.1Solution¶
(a) The state diagram looks like this:
You can figure out the rest from there. There's a lot of arithmetic involved so I'm not going to write it all out. See the notes for the step-by-step guide.
(b) The state diagram:
Same as above.
6.2Accuracy and discussion¶
idk, there are a lot of calculations