Wednesday, November 20, 2013 CC-BY-NC
Transactional Memory

Maintainer: admin

1Graham Brent theorem (continued)

2 kinds of steps:
- complete steps: >= p tasks available
- incomplete steps: < p tasks available

$T_p <= T_1/p + T_\infty$

$T_1/p$: complete
$T_\infty$: incomplete

“an application with p processors at most would take time that is addition of sequential time / p and critical path time”


$T_p^* >= max(T_1/p, T_\infty)$ (1)
$and\ T_1/p <= max(T_1/p, T_\infty)$ (2)
$T_\infty <= max(T_1/p, t_\infty)$ (3)

$T_p <= T_1/p + t_\infty$ (0)
$<= max(T_1/p, t_\infty) + max(T_1/p, T_\infty)$ (3)
$<= 2max(T_1/p, T_\infty)$
$<= 2T_P^*$

No more than 2x the optimal time

2Work stealing

Have a task queue
If threads have their own queue, they need a way to distribute the work

2.1Cilk++ language

Queue and tasks for each processors

($T_1$ stole from $T_2$)
If empty, pop from somewhere else
If tasks live in stack frames (like C), then some code needs to grow stack and some doesn’t

main() {
A() {
  spawn B()
  spawn C()
B() {
  spawn D()
  spawn E()
  foo() // regular function
C() {
  spawn F()
  spawn G()

This gives a cactus stack

In a concurrent program, there may have more than one active.