Wednesday, November 20, 2013 CC-BY-NC
Transactional Memory

Maintainer: admin

1Graham Brent theorem (continued)

2 kinds of steps:
- complete steps: >= p tasks available
- incomplete steps: < p tasks available

$T_p <= T_1/p + T_\infty$

$T_1/p$: complete
$T_\infty$: incomplete

“an application with p processors at most would take time that is addition of sequential time / p and critical path time”

1.1Proof

$T_p^* >= max(T_1/p, T_\infty)$ (1)
$and\ T_1/p <= max(T_1/p, T_\infty)$ (2)
$T_\infty <= max(T_1/p, t_\infty)$ (3)

$T_p <= T_1/p + t_\infty$ (0)
$<= max(T_1/p, t_\infty) + max(T_1/p, T_\infty)$ (3)
$<= 2max(T_1/p, T_\infty)$
$<= 2T_P^*$

No more than 2x the optimal time

2Work stealing

Have a task queue
If threads have their own queue, they need a way to distribute the work

2.1Cilk++ language

Queue and tasks for each processors

($T_1$ stole from $T_2$)
If empty, pop from somewhere else
If tasks live in stack frames (like C), then some code needs to grow stack and some doesn’t
e.g.

main() {
  A()
}
A() {
  spawn B()
  spawn C()
  -sync-
}
B() {
  spawn D()
  spawn E()
  -sync-
  foo() // regular function
}
C() {
  spawn F()
  spawn G()
  -sync-
}


This gives a cactus stack


In a concurrent program, there may have more than one active.