Winter 2010 Final CC-BY-NC

Maintainer: admin

Because the file he posted is literally a .doc (which is pretty appalling for a math course tbh), I'm going to type up all the questions on this page (may be edited for clarity).

The following formulas are provided:

  • TSP: $\displaystyle f_i(j, S) = \min_{k \in S} \bigg [ f_{i-1} (k, S-\{k\} + d_{kj} \bigg ]$ with b.c. $f_0(j, -) = d_{1j}$ and $S \subseteq N$
  • MDP: $\displaystyle f_n(i) = \max_k \left [ q_i^k + \sum_{j=1}^N p_{ij}^k f_{n-1}(j) \right ]$, and $\displaystyle g+c_i = q_i^k + \sum_{j=1}^N p_{ij}^k c_j$; PIR: $\displaystyle \max \left [ q_i^k + \sum_{i=1}^N p_{ij}^kc_j \right ]$
  • Semi-markov decision processes: For a particular policy, $\displaystyle a_i + g\overline{\tau_i} = r_i + \sum_{i=1}^n p_{ij}a_j$. PIR: $\displaystyle \frac{1}{\overline{\tau}^k_i} \left [ r_i + \sum_{i=1}^n p_{ij}a_j-a_i \right ]$
  • Queueing theory
    • $\displaystyle P_n = C_nP_0$
    • $\displaystyle C_n = \frac{\lambda_{n-1} \lambda_{n-2} \cdots \lambda_0}{\mu_n\mu_{n-1} \cdots \mu_0} \, n = 1, 2, \ldots$
    • $\displaystyle L = \sum nP_n$
    • $\displaystyle L_q = \sum(n-s)P_n$
    • $W = L/\overline{\lambda}$
    • $L_s = L - L_q$
    • $W_q = L_q / \overline{\lambda}$

1Question 1: Shortest path

Give the D.P. formulation with appropriate boundary conditions for finding the minimum time to go from any node $i$ ($i = 1, 2, \ldots, N-1$) to the terminal node $N$. Assume $t_{ij}$ is the time needed to go from node $i$ to node $j$. Describe an algorithm that can be used to solve this problem.

1.1Solution

Dijkstra's algorithm

1.2Accuracy and discussion

2Question 2: Integer programming

Give the D.P. formulation for solving the following problem: maximise $z = x_1x_2^2x_3^3x_4^{1/2}$ such that $x_1 + 2x_2 + 3x_3 + x_4 \leq 10$, where the $x_i$'s are all integers with $x_i \geq 1$. Using this formulation, solve the problem.

2.1Solution

Let's consider our options, using the fact that $x_i \geq 1$ for all $i$.

  • For $x_1$: $x_1 \leq 10 - 2x_2 -3x_3 - x_4 \leq 10 - 2 - 3 - 1 = 4$. Thus $x_1 = 1, 2, 3, 4$.
  • For $x_2$: $2x_2 \leq 10 - 1-3-1 = 5$, so $x_2 =1, 2$.
  • For $x_3$: $3x_3 \leq 10 - 1 - 2 - 1 = 6$, so $x_3 = 1, 2$.
  • For $x_4$: $x_4 \leq 10 - 1 - 2 - 3 = 4$, so $x_4 = 1, 2, 3, 4$.

Next, we calculate $f_1(x) = x_1$ (attempting to maximise it) for each value of $x \leq 10$ such that $x_1 \leq x$. In this question, it happens to be a very straightforward calculation. (It won't always be the case, though.)

  • $f_1(0) = 0$ ($x=0$)
  • $f_1(1) = 1$ ($x=1$)
  • $f_1(2) = 2$ ($x=2$)
  • $f_1(3) = 3$ ($x=3$)
  • $f_1(4) = 4$ ($x=4$) (don't really need anything after this I think)
  • $f_1(5) = 5$ ($x=5$)
  • $f_1(6) = 6$ ($x=6$)
  • $f_1(7) = 7$ ($x=7$)
  • $f_1(8) = 8$ ($x=8$)
  • $f_1(9) = 9$ ($x=9$)
  • $f_1(10) = 10$ ($x=10$)

Next, we calculate the values of $\displaystyle f_2(x) = \max_{x_2 =1,2} \bigg [x_2^2 \cdot f_1(x-2x_2) \bigg ]$.

  • $f_2(0) = $ undefined
    • when $x_2 = 1$, $f(x-2x_2) = f(-2)$ which is undefined
    • also undefined
  • $f_2(1) = $ undefined
    • undefined
    • what is this like javascript or something
  • $f_2(2) = 0$
    • when $x_2 = 1$, $f_1(2-2) = f(0) = 0$ so it's just 0
    • undefined
  • $f_2(3) = 1$
    • when $x_2 = 1$, $f_1(3-2) = f_1(1) = 1$ so it's just $x_2^2 \cdot 1= 1$
    • when $x_2=2$, undefined
  • $f_2(4) = 2$
    • when $x_2=1$, $f_1(4-2) = f(2) = 2$ so it's $1^2 \cdot 2 = 2$
    • when $x_2 = 2$, $f_1(4-4) = f(0) = 0$ so it's just 0
  • $f_2(5) = 4$
    • when $x_2=1$, $f_1(5-2) = f(3) = 3$ so it's just $1^2 \cdot 3 = 3$
    • when $x_2 = 2$, $f_1(5-4) = f(1) = 1$ so it's $2^2 \cdot 1 = 4$
  • $f_2(6) = 8$
    • when $x_2=1$, $f_1(6-2) = f(4) = 4$ so it's 4
    • when $x_2=2$, $f_1(2) = 2$ so it's $2^2 \cdot 2 = 8$
  • $f_2(7) = 12$
    • 1: 5
    • 2: 12
  • probably don't need the rest

Next, we calculate the values of $\displaystyle f_3(x) = \max_{x_3 =1,2} \bigg [x_3^3 \cdot f_2(x-3x_3) \bigg ]$. As before, a lot of these values will be undefined.

  • $f_3(0) = $ undefined
  • $f_3(1) = $ undefined
  • $f_3(2) = $ undefined
  • $f_3(3) = $ undefined
    • 1: $f_2(0) = $ undefined
    • undefined
  • $f_3(4) = $ undefined
    • 1: $f_2(1) = $ undefined
    • undefined
  • $f_3(5) = 0$
    • 1: $f_2(2) = 0$ so 0
    • 2: undefined
  • $f_3(6) = 1$
    • 1: $f_2(3) = 1$ so $x_3^3 \cdot 1= 1^3 \cdot 1 = 1$
    • 2: undefined
  • $f_3(7) = 2$
    • 1: $f_2(4) = 2$ so $x_3^3 \cdot 2 = 2$
    • 2: $f_2(1) =$ undefined
  • $f_3(8) = 4$
    • 1: $f_2(5) = 4$ so 4
    • 2: $f_2(2) = 0$ so 0
  • $f_3(9) = 8$
    • 1: $f_2(6) = 8$ so 8
    • 2: $f_2(3) = 1$ so $2^3 \cdot 1= 8$
  • not gonna bother

Finally, we calculate $f_4(10)$:

$$f_4(10) = \max_{x_4 = 1, 2, 3, 4} [ x_4^{1/2} \cdot f_3(10 - x_4)] = \begin{cases} x_4=1: & 1 \cdot f_3(9) = 8 \, \bigstar \\ x_4=2: & \sqrt{2} \cdot f_3(8) = 4\sqrt{2} \\ x_4=3: & \sqrt{3} \cdot f_3(7) = 2\sqrt{3} \\ x_4=4: & 2 \cdot f_3(6) = 2 \end{cases} $$

So $x_4=1$, using $f_3(9)$. There, we can have either $x_3 = 1$ (which uses $f_2(6)$) or $x_3=2$ (which uses $f_2(3)$). For $f_2(6)$, we use $x_2=2$ and $f_1(2)$, which has $x_1=2$; this gives us $x_1=2, x_2=2, x_3=1, x_4=1$. For $f_2(3)$, we use $x_2=1$ and $f_1(1)$, which has $x_1=1$; this gives us $x_1=1$, $x_2=1$, $x_3=2$, $x_4=1$. Both assignments result in $z=8$.

2.2Accuracy and discussion

Accurate. Wrote a script to test it.

3Question 3: Traveling salesman

In the traveling salesman problem, the distance from city $i$ to city $j$, $d_{ij}$, is given by the following distance table:

$i$ \ $j$ 1 2 3 4 5
1 0 3 6 5 4
2 1 0 5 4 3
3 5 4 0 2 3
4 3 1 3 0 4
5 5 2 4 1 0

(a) Find the solution if you start at node 1 and visit all the other nodes but don't need to return to node 1.
(b) Find the solution if you start at node 1 and return to node 1, but also need to visit node 2 before node 3 and node 3 before node 5.

Use the standard D.P. approach to solve this.

3.1Solution

(a) Just use the formula given. Do sets of size $k$ for each $f_k$. At the end, you have that $f_3(3, \{2, 4, 5\}) = 10$ is the lowest among all the $f_3$ values. Since 3 is the first argument to $f_3$, 3 is the last node you visit. Now, the value of 10 is obtained using $f_2(4, \{2, 5\})$. So 4 is the second-to-last node. $f_2(4, \{2, 5\})$ is obtained using $f_1(5, \{2\})$, so 5 is the third-to-last node. Then 2 is the first node (after 1). So the path is $1-2-5-4-3$.

(b) There are only 4 possibilities. Use the table to find the total cost for each route. 142351 => 19, 124351 => 18, 123451 => 19, and 123541 => 15 so we use the last route.

3.2Accuracy and discussion

Accurate. I wrote a script to test it.

4Question 4: Markov decision processes

State $i$ Policy $k$ $p_{i1}^k$ $p_{i2}^k$ $q_i^k$ (expected reward)
1 1 0.6 0.4 6
1 2 0.8 0.2 4
2 1 0.4 0.6 -3
2 2 0.7 0.3 -5

(a) Evaluate $f_n(i)$ and the optimal policy for $n=1,2,3$ (i.e., finite horizon in state $i$).
(b) Verify if the optimal policy for $n=3$ is also the optimal policy for $n$ large (i.e., the infinite horizon, non-discounted case).

4.1Solution

(a)

Time step 1:

  • State 1:
    • Policy 1: 6 $\bigstar$
    • Policy 2: 4
  • State 2:
    • 1: -3 $\bigstar$
    • 2: -5

$f_1(1) = 6$ (policy 1), $f_1(2) = -3$ (policy 1).

Time step 2:

  • State 1:
    • 1: $6 + 0.6\cdot f_1(1) + 0.4\cdot f_1(2) = 6 + 0.6*6 + 0.4 * (-3) = 8.4$ $\bigstar$
    • 2: $4 + 0.8(6) - 0.2(3) = 8.2$
  • State 2:
    • 1: $-3 + 0.4(6) + 0.6(-3) = -2.4$
    • 2: $-5 + 0.7(6) + 0.3(-3) = -1.7$ $\bigstar$

$f_2(1) = 8.4$ (policy 1), $f_2(2) = -1.7$ (policy 2).

Time step 3:

  • State 1:
    • 1: $6 + 0.6\cdot f_2(1) + 0.4\cdot f_2(2) = 6 + 0.6(8.4) + 0.4 * (-1.7) = 10.36$
    • 2: $4 + 0.8(8.4) - 0.2(1.7) = 10.38$ $\bigstar$
  • State 2:
    • 1: $-3 + 0.4(8.4) + 0.6(-1.7) = -0.66$
    • 2: $-5 + 0.7(8.4) + 0.3(-1.7) = 0.37$ $\bigstar$

$f_3(1) = 10.38$ (policy 2), $f_3(2) = 0.37$ (policy 2).

(b) At $n=3$, the optimal policy is to use policy 2 at either state. We have

$$g + c_1 = 4 + 0.8c_1 + 0.2c_2 \quad \text{and} \quad g+ c_2 = -5 + 0.7c_1 + 0.3c_2$$

Subtracting the two, we get

$$c_1 - c_2 = 9 + 0.1c_1 -0.1c_2$$

Thus $0.9(c_1-c_2) = 9$ and so $c_1 - c_2=10$. Next, add the equations:

$$2g + (c_1 + c_2) = -1 + 1.5c_1 + 0.5c_2$$

Solving for $g$, we get

$$g = \frac{-1 + 0.5(c_1-c_2)}{2} = -0.5 + 2.5 = 2$$

At state 1, you have $6 + 0.6(10) = 12$ for policy 1, and $4 + 0.8(10) = 12$. So either policy 1 or policy 2 works over an infinite horizon.

At state 2, you have $-3 + 0.4(10) = 1$ and $-5 + 0.7(10) = 2$. So policy 2 is best over an infinite horizon.

4.2Accuracy and discussion

Wrote a python script to confirm, as usual. All checks out, even if I'm not really sure what I'm doing when it comes to the infinite horizon part.

5Question 5: Semi-Markovian decision processes

The following values are given over an finite horizon.

State $i$ Policy $k$ $p_{i1}^k$ $p_{i2}^k$ $\overline{\tau}_i^k$ $r_i^k$ (expected reward)
1 1 0.7 0.3 3.4 50
1 2 0 1 7 70
2 1 0.2 0.8 9 60
2 2 1 0 5 40

Find the optimal policy and the average gain per unit time period. $\overline{\tau}_i^k$ is the mean time for which a car is rented in state $i$ using policy $k$.

5.1Solution

Suppose we used policy 1 at both states. Then:

State 1: $a_1 + 3.4g = 50 + 0.7a_1 + 0.3a_2$. State 2: $a_2 + 9g = 60 + 0.2a_1 + 0.8a_2$.

Adding them together, we get

$$g = \frac{110 - 0.1(a_1-a_2)}{12.4}$$

Multiply the first by 9, and the second by 3.4, then subtract. We get

$$a_1 - a_2 = \frac{246}{3.38} \approx 72.78$$

Substituting that into $g$, we get $8.28$.

Now suppose we use policy 1 at state 1 and policy 2 at state 2. Then:

State 1: $a_1 + 3.4g = 50 + 0.7a_1 + 0.3a_2$. State 2: $a_2 + 5g = 40 + a_1$.

Adding them together, we get

$$g= \frac{90 + 0.7(a_1-a_2)}{8.4}$$

Multiply the first by 5 and the second by 3.4, then subtract. We get:

$$a_1 - a_2 = \frac{114}{4.7} \approx 23.26$$

Substituting that into $g$, we get $12.74$.

Now suppose we use policy 2 at state 1 and policy 1 at state 2. Then:

State 1: $a_1 + 7g = 70 + a_2$. State 2: $a_2 + 9g = 60 + 0.2a_1 + 0.8a_2$.

Adding them together, we get

$$g = \frac{130-0.8(a_1-a_2)}{16}$$

Multiply the first by 9 and the second by 7, then subtract. We get:

$$a_1-a_2 = \frac{210}{10.4}$$

Substituting that into $g$, we get $7.12$.

Lastly, suppose we use policy 2 at both states. We get:

State 1: $a_1 + 7g = 70 + a_2$. State 2: $a_2 + 5g = 40 + a_1$.

Adding them together, we get

$$g = \frac{110}{12} \approx 9.17$$

I guess we don't need to do the subtraction for this, because everything cancels out?

Conclusion: policy 1 at state 1, policy 2 at state 2. Average gain of 12.74 per unit time period.

5.2Accuracy and discussion

no idea but it feels right

6Question 6: Queueing theory

Suppose that one repairman has been assigned the responsibility of maintaining four machines. For each machine, the probability distribution of the running time before a breakdown is exponential, with a mean of 9 hours. The repair time also has an exponential distribution, with a mean of 2 hours.

(a) Calculate the steady-state probability distribution and the expected number of machines that are not running.
(b) A second repairman is available whenever more than one of the four machines requires repair. Calculate the steady-state probability distribution and the expected number of machines that are not running.

6.1Solution

(a) The state diagram looks like this:

State diagram

You can figure out the rest from there. There's a lot of arithmetic involved so I'm not going to write it all out. See the notes for the step-by-step guide.

(b) The state diagram:

State diagram

Same as above.

6.2Accuracy and discussion

idk, there are a lot of calculations