Wikinotes

Maintainer: admin

Because the file he posted is literally a .doc (which is pretty appalling for a math course tbh), I'm going to type up all the questions on this page (may be edited for clarity).

The following formulas are provided:

TSP: $\displaystyle f_i(j, S) = \min_{k \in S} \bigg [ f_{i-1} (k, S-\{k\} + d_{kj} \bigg ]$ with b.c. $f_0(j, -) = d_{1j}$ and $S \subseteq N$
MDP: $\displaystyle f_n(i) = \max_k \left [ q_i^k + \sum_{j=1}^N p_{ij}^k f_{n-1}(j) \right ]$, and $\displaystyle g+c_i = q_i^k + \sum_{j=1}^N p_{ij}^k c_j$; PIR: $\displaystyle \max \left [ q_i^k + \sum_{i=1}^N p_{ij}^kc_j \right ]$
Semi-markov decision processes: For a particular policy, $\displaystyle a_i + g\overline{\tau_i} = r_i + \sum_{i=1}^n p_{ij}a_j$. PIR: $\displaystyle \frac{1}{\overline{\tau}^k_i} \left [ r_i + \sum_{i=1}^n p_{ij}a_j-a_i \right ]$
Queueing theory
- $\displaystyle P_n = C_nP_0$
- $\displaystyle C_n = \frac{\lambda_{n-1} \lambda_{n-2} \cdots \lambda_0}{\mu_n\mu_{n-1} \cdots \mu_0} \, n = 1, 2, \ldots$
- $\displaystyle L = \sum nP_n$
- $\displaystyle L_q = \sum(n-s)P_n$
- $W = L/\overline{\lambda}$
- $L_s = L - L_q$
- $W_q = L_q / \overline{\lambda}$

1 Question 1: Shortest path
- 1.1 Solution
- 1.2 Accuracy and discussion
2 Question 2: Integer programming
- 2.1 Solution
- 2.2 Accuracy and discussion
3 Question 3: Traveling salesman
- 3.1 Solution
- 3.2 Accuracy and discussion
4 Question 4: Markov decision processes
- 4.1 Solution
- 4.2 Accuracy and discussion
5 Question 5: Semi-Markovian decision processes
- 5.1 Solution
- 5.2 Accuracy and discussion
6 Question 6: Queueing theory
- 6.1 Solution
- 6.2 Accuracy and discussion

Give the D.P. formulation with appropriate boundary conditions for finding the minimum time to go from any node $i$ ($i = 1, 2, \ldots, N-1$) to the terminal node $N$. Assume $t_{ij}$ is the time needed to go from node $i$ to node $j$. Describe an algorithm that can be used to solve this problem.

Dijkstra's algorithm

Give the D.P. formulation for solving the following problem: maximise $z = x_1x_2^2x_3^3x_4^{1/2}$ such that $x_1 + 2x_2 + 3x_3 + x_4 \leq 10$, where the $x_i$'s are all integers with $x_i \geq 1$. Using this formulation, solve the problem.

Let's consider our options, using the fact that $x_i \geq 1$ for all $i$.

For $x_1$: $x_1 \leq 10 - 2x_2 -3x_3 - x_4 \leq 10 - 2 - 3 - 1 = 4$. Thus $x_1 = 1, 2, 3, 4$.
For $x_2$: $2x_2 \leq 10 - 1-3-1 = 5$, so $x_2 =1, 2$.
For $x_3$: $3x_3 \leq 10 - 1 - 2 - 1 = 6$, so $x_3 = 1, 2$.
For $x_4$: $x_4 \leq 10 - 1 - 2 - 3 = 4$, so $x_4 = 1, 2, 3, 4$.

Next, we calculate $f_1(x) = x_1$ (attempting to maximise it) for each value of $x \leq 10$ such that $x_1 \leq x$. In this question, it happens to be a very straightforward calculation. (It won't always be the case, though.)

$f_1(0) = 0$ ($x=0$)
$f_1(1) = 1$ ($x=1$)
$f_1(2) = 2$ ($x=2$)
$f_1(3) = 3$ ($x=3$)
$f_1(4) = 4$ ($x=4$) (don't really need anything after this I think)
$f_1(5) = 5$ ($x=5$)
$f_1(6) = 6$ ($x=6$)
$f_1(7) = 7$ ($x=7$)
$f_1(8) = 8$ ($x=8$)
$f_1(9) = 9$ ($x=9$)
$f_1(10) = 10$ ($x=10$)

Next, we calculate the values of $\displaystyle f_2(x) = \max_{x_2 =1,2} \bigg [x_2^2 \cdot f_1(x-2x_2) \bigg ]$.

$f_2(0) = $ undefined
- when $x_2 = 1$, $f(x-2x_2) = f(-2)$ which is undefined
- also undefined
$f_2(1) = $ undefined
- undefined
- what is this like javascript or something
$f_2(2) = 0$
- when $x_2 = 1$, $f_1(2-2) = f(0) = 0$ so it's just 0
- undefined
$f_2(3) = 1$
- when $x_2 = 1$, $f_1(3-2) = f_1(1) = 1$ so it's just $x_2^2 \cdot 1= 1$
- when $x_2=2$, undefined
$f_2(4) = 2$
- when $x_2=1$, $f_1(4-2) = f(2) = 2$ so it's $1^2 \cdot 2 = 2$
- when $x_2 = 2$, $f_1(4-4) = f(0) = 0$ so it's just 0
$f_2(5) = 4$
- when $x_2=1$, $f_1(5-2) = f(3) = 3$ so it's just $1^2 \cdot 3 = 3$
- when $x_2 = 2$, $f_1(5-4) = f(1) = 1$ so it's $2^2 \cdot 1 = 4$
$f_2(6) = 8$
- when $x_2=1$, $f_1(6-2) = f(4) = 4$ so it's 4
- when $x_2=2$, $f_1(2) = 2$ so it's $2^2 \cdot 2 = 8$
$f_2(7) = 12$
- 1: 5
- 2: 12
probably don't need the rest

Next, we calculate the values of $\displaystyle f_3(x) = \max_{x_3 =1,2} \bigg [x_3^3 \cdot f_2(x-3x_3) \bigg ]$. As before, a lot of these values will be undefined.

$f_3(0) = $ undefined
$f_3(1) = $ undefined
$f_3(2) = $ undefined
$f_3(3) = $ undefined
- 1: $f_2(0) = $ undefined
- undefined
$f_3(4) = $ undefined
- 1: $f_2(1) = $ undefined
- undefined
$f_3(5) = 0$
- 1: $f_2(2) = 0$ so 0
- 2: undefined
$f_3(6) = 1$
- 1: $f_2(3) = 1$ so $x_3^3 \cdot 1= 1^3 \cdot 1 = 1$
- 2: undefined
$f_3(7) = 2$
- 1: $f_2(4) = 2$ so $x_3^3 \cdot 2 = 2$
- 2: $f_2(1) =$ undefined
$f_3(8) = 4$
- 1: $f_2(5) = 4$ so 4
- 2: $f_2(2) = 0$ so 0
$f_3(9) = 8$
- 1: $f_2(6) = 8$ so 8
- 2: $f_2(3) = 1$ so $2^3 \cdot 1= 8$
not gonna bother

Finally, we calculate $f_4(10)$:

$$f_4(10) = \max_{x_4 = 1, 2, 3, 4} [ x_4^{1/2} \cdot f_3(10 - x_4)] = \begin{cases} x_4=1: & 1 \cdot f_3(9) = 8 \, \bigstar \\ x_4=2: & \sqrt{2} \cdot f_3(8) = 4\sqrt{2} \\ x_4=3: & \sqrt{3} \cdot f_3(7) = 2\sqrt{3} \\ x_4=4: & 2 \cdot f_3(6) = 2 \end{cases} $$

So $x_4=1$, using $f_3(9)$. There, we can have either $x_3 = 1$ (which uses $f_2(6)$) or $x_3=2$ (which uses $f_2(3)$). For $f_2(6)$, we use $x_2=2$ and $f_1(2)$, which has $x_1=2$; this gives us $x_1=2, x_2=2, x_3=1, x_4=1$. For $f_2(3)$, we use $x_2=1$ and $f_1(1)$, which has $x_1=1$; this gives us $x_1=1$, $x_2=1$, $x_3=2$, $x_4=1$. Both assignments result in $z=8$.

Accurate. Wrote a script to test it.

In the traveling salesman problem, the distance from city $i$ to city $j$, $d_{ij}$, is given by the following distance table:

$i$ \ $j$	1	2	3	4	5
1	0	3	6	5	4
2	1	0	5	4	3
3	5	4	0	2	3
4	3	1	3	0	4
5	5	2	4	1	0

(a) Find the solution if you start at node 1 and visit all the other nodes but don't need to return to node 1.
(b) Find the solution if you start at node 1 and return to node 1, but also need to visit node 2 before node 3 and node 3 before node 5.

Use the standard D.P. approach to solve this.

(a) Just use the formula given. Do sets of size $k$ for each $f_k$. At the end, you have that $f_3(3, \{2, 4, 5\}) = 10$ is the lowest among all the $f_3$ values. Since 3 is the first argument to $f_3$, 3 is the last node you visit. Now, the value of 10 is obtained using $f_2(4, \{2, 5\})$. So 4 is the second-to-last node. $f_2(4, \{2, 5\})$ is obtained using $f_1(5, \{2\})$, so 5 is the third-to-last node. Then 2 is the first node (after 1). So the path is $1-2-5-4-3$.

(b) There are only 4 possibilities. Use the table to find the total cost for each route. 142351 => 19, 124351 => 18, 123451 => 19, and 123541 => 15 so we use the last route.

Accurate. I wrote a script to test it.

State $i$	Policy $k$	$p_{i1}^k$	$p_{i2}^k$	$q_i^k$ (expected reward)
1	1	0.6	0.4	6
1	2	0.8	0.2	4
2	1	0.4	0.6	-3
2	2	0.7	0.3	-5

(a) Evaluate $f_n(i)$ and the optimal policy for $n=1,2,3$ (i.e., finite horizon in state $i$).
(b) Verify if the optimal policy for $n=3$ is also the optimal policy for $n$ large (i.e., the infinite horizon, non-discounted case).

(a)

Time step 1:

State 1:
- Policy 1: 6 $\bigstar$
- Policy 2: 4
State 2:
- 1: -3 $\bigstar$
- 2: -5

$f_1(1) = 6$ (policy 1), $f_1(2) = -3$ (policy 1).

Time step 2:

State 1:
- 1: $6 + 0.6\cdot f_1(1) + 0.4\cdot f_1(2) = 6 + 0.6*6 + 0.4 * (-3) = 8.4$ $\bigstar$
- 2: $4 + 0.8(6) - 0.2(3) = 8.2$
State 2:
- 1: $-3 + 0.4(6) + 0.6(-3) = -2.4$
- 2: $-5 + 0.7(6) + 0.3(-3) = -1.7$ $\bigstar$

$f_2(1) = 8.4$ (policy 1), $f_2(2) = -1.7$ (policy 2).

Time step 3:

State 1:
- 1: $6 + 0.6\cdot f_2(1) + 0.4\cdot f_2(2) = 6 + 0.6(8.4) + 0.4 * (-1.7) = 10.36$
- 2: $4 + 0.8(8.4) - 0.2(1.7) = 10.38$ $\bigstar$
State 2:
- 1: $-3 + 0.4(8.4) + 0.6(-1.7) = -0.66$
- 2: $-5 + 0.7(8.4) + 0.3(-1.7) = 0.37$ $\bigstar$

$f_3(1) = 10.38$ (policy 2), $f_3(2) = 0.37$ (policy 2).

(b) At $n=3$, the optimal policy is to use policy 2 at either state. We have

$$g + c_1 = 4 + 0.8c_1 + 0.2c_2 \quad \text{and} \quad g+ c_2 = -5 + 0.7c_1 + 0.3c_2$$

Subtracting the two, we get

$$c_1 - c_2 = 9 + 0.1c_1 -0.1c_2$$

Thus $0.9(c_1-c_2) = 9$ and so $c_1 - c_2=10$. Next, add the equations:

$$2g + (c_1 + c_2) = -1 + 1.5c_1 + 0.5c_2$$

Solving for $g$, we get

$$g = \frac{-1 + 0.5(c_1-c_2)}{2} = -0.5 + 2.5 = 2$$

At state 1, you have $6 + 0.6(10) = 12$ for policy 1, and $4 + 0.8(10) = 12$. So either policy 1 or policy 2 works over an infinite horizon.

At state 2, you have $-3 + 0.4(10) = 1$ and $-5 + 0.7(10) = 2$. So policy 2 is best over an infinite horizon.

Wrote a python script to confirm, as usual. All checks out, even if I'm not really sure what I'm doing when it comes to the infinite horizon part.

The following values are given over an finite horizon.

State $i$	Policy $k$	$p_{i1}^k$	$p_{i2}^k$	$\overline{\tau}_i^k$	$r_i^k$ (expected reward)
1	1	0.7	0.3	3.4	50
1	2	0	1	7	70
2	1	0.2	0.8	9	60
2	2	1	0	5	40

Find the optimal policy and the average gain per unit time period. $\overline{\tau}_i^k$ is the mean time for which a car is rented in state $i$ using policy $k$.

Suppose we used policy 1 at both states. Then:

State 1: $a_1 + 3.4g = 50 + 0.7a_1 + 0.3a_2$. State 2: $a_2 + 9g = 60 + 0.2a_1 + 0.8a_2$.

Adding them together, we get

$$g = \frac{110 - 0.1(a_1-a_2)}{12.4}$$

Multiply the first by 9, and the second by 3.4, then subtract. We get

$$a_1 - a_2 = \frac{246}{3.38} \approx 72.78$$

Substituting that into $g$, we get $8.28$.

Now suppose we use policy 1 at state 1 and policy 2 at state 2. Then:

State 1: $a_1 + 3.4g = 50 + 0.7a_1 + 0.3a_2$. State 2: $a_2 + 5g = 40 + a_1$.

Adding them together, we get

$$g= \frac{90 + 0.7(a_1-a_2)}{8.4}$$

Multiply the first by 5 and the second by 3.4, then subtract. We get:

$$a_1 - a_2 = \frac{114}{4.7} \approx 23.26$$

Substituting that into $g$, we get $12.74$.

Now suppose we use policy 2 at state 1 and policy 1 at state 2. Then:

State 1: $a_1 + 7g = 70 + a_2$. State 2: $a_2 + 9g = 60 + 0.2a_1 + 0.8a_2$.

Adding them together, we get

$$g = \frac{130-0.8(a_1-a_2)}{16}$$

Multiply the first by 9 and the second by 7, then subtract. We get:

$$a_1-a_2 = \frac{210}{10.4}$$

Substituting that into $g$, we get $7.12$.

Lastly, suppose we use policy 2 at both states. We get:

State 1: $a_1 + 7g = 70 + a_2$. State 2: $a_2 + 5g = 40 + a_1$.

Adding them together, we get

$$g = \frac{110}{12} \approx 9.17$$

I guess we don't need to do the subtraction for this, because everything cancels out?

Conclusion: policy 1 at state 1, policy 2 at state 2. Average gain of 12.74 per unit time period.

no idea but it feels right

Suppose that one repairman has been assigned the responsibility of maintaining four machines. For each machine, the probability distribution of the running time before a breakdown is exponential, with a mean of 9 hours. The repair time also has an exponential distribution, with a mean of 2 hours.

(a) Calculate the steady-state probability distribution and the expected number of machines that are not running.
(b) A second repairman is available whenever more than one of the four machines requires repair. Calculate the steady-state probability distribution and the expected number of machines that are not running.

(a) The state diagram looks like this:

State diagram

You can figure out the rest from there. There's a lot of arithmetic involved so I'm not going to write it all out. See the notes for the step-by-step guide.

(b) The state diagram:

State diagram

Same as above.

idk, there are a lot of calculations

Winter 2010 Final

1Question 1: Shortest path¶

1.1Solution¶

1.2Accuracy and discussion¶

2Question 2: Integer programming¶

2.1Solution¶

2.2Accuracy and discussion¶

3Question 3: Traveling salesman¶

3.1Solution¶

3.2Accuracy and discussion¶

4Question 4: Markov decision processes¶

4.1Solution¶

4.2Accuracy and discussion¶

5Question 5: Semi-Markovian decision processes¶

5.1Solution¶

5.2Accuracy and discussion¶

6Question 6: Queueing theory¶

6.1Solution¶

6.2Accuracy and discussion¶