# Wednesday, November 20, 2013 Transactional Memory

## 1Graham Brent theorem (continued)¶

2 kinds of steps:
- complete steps: >= p tasks available
- incomplete steps: < p tasks available

$T_p <= T_1/p + T_\infty$

$T_1/p$: complete
$T_\infty$: incomplete

“an application with p processors at most would take time that is addition of sequential time / p and critical path time”

### 1.1Proof¶

$T_p^* >= max(T_1/p, T_\infty)$ (1)
$and\ T_1/p <= max(T_1/p, T_\infty)$ (2)
$T_\infty <= max(T_1/p, t_\infty)$ (3)

$T_p <= T_1/p + t_\infty$ (0)
$<= max(T_1/p, t_\infty) + max(T_1/p, T_\infty)$ (3)
$<= 2max(T_1/p, T_\infty)$
$<= 2T_P^*$

No more than 2x the optimal time

## 2Work stealing¶

If threads have their own queue, they need a way to distribute the work

### 2.1Cilk++ language¶

Queue and tasks for each processors

($T_1$ stole from $T_2$)
If empty, pop from somewhere else
If tasks live in stack frames (like C), then some code needs to grow stack and some doesn’t
e.g.

main() {
A()
}
A() {
spawn B()
spawn C()
-sync-
}
B() {
spawn D()
spawn E()
-sync-
foo() // regular function
}
C() {
spawn F()
spawn G()
-sync-
}

This gives a cactus stack

In a concurrent program, there may have more than one active.