**Maintainer:**admin

*1*Graham Brent theorem (continued)¶

2 kinds of steps:

- complete steps: >= p tasks available

- incomplete steps: < p tasks available

$T_p <= T_1/p + T_\infty$

$T_1/p$: complete

$T_\infty$: incomplete

“an application with p processors at most would take time that is addition of sequential time / p and critical path time”

*1.1*Proof¶

$T_p^* >= max(T_1/p, T_\infty)$ (1)

$and\ T_1/p <= max(T_1/p, T_\infty)$ (2)

$T_\infty <= max(T_1/p, t_\infty)$ (3)

$T_p <= T_1/p + t_\infty$ (0)

$<= max(T_1/p, t_\infty) + max(T_1/p, T_\infty)$ (3)

$<= 2max(T_1/p, T_\infty)$

$<= 2T_P^*$

No more than 2x the optimal time

*2*Work stealing¶

Have a task queue

If threads have their own queue, they need a way to distribute the work

*2.1*Cilk++ language¶

Queue and tasks for each processors

($T_1$ stole from $T_2$)

If empty, pop from somewhere else

If tasks live in stack frames (like C), then some code needs to grow stack and some doesn’t

e.g.

main() { A() } A() { spawn B() spawn C() -sync- } B() { spawn D() spawn E() -sync- foo() // regular function } C() { spawn F() spawn G() -sync- }

This gives a cactus stack

In a concurrent program, there may have more than one active.