1X10 continued¶
async s; // mini thread, no handle, lightweight finish { for (i in 1...100) { async {a[i]++} } } // all fors done finish s // all async done
finish { async { finish { async {x=2} // x E {0, 2} } // x E {2} x++ // x E {3} } // x E {0, 2, 3} } // x E {3}
2Generic condition synchronization¶
when (c) -> S // wait for C to be true -> atomically do S // C should be pure (might be execed many times)
Constraints on C, S:
- sequential (no async)
- nonblocking (no finish)
- local (this place)
3Places/locations¶
There are many places. Places are (logical) locations (actual binding to machines is done by RT).
at(p) S // execute S at place p
This'll serialize the data and make a copy (expensive) on p.
N.B. not copied back: x=at(p) E
4Work distribution¶
$C=PE$
$\sum\limits_{k=0}^{n-1} a_{ix}b_{kj} $
^ can be concurrently done!
class worker extends thread { int row, col worker (row, col) {} run() { dot = 0.0 for (k from 0 to n - 1) { dot += A[row][k] * B[k][col] } C[row][col] = dot } } // { for (i from 0 to n - 1) { for (j from 0 to n - 1) { n[i][j] = new worker(i, j) n[i][j].start() } } }
lots of threads not doing much work. Not efficient, high overhead.
Partition the array + use a thread pool.
Java has thread pool (executor framework)
executor ex = { ex.execute(new Runnable) // may or may not create a new thread } class directEx implements Executor { execute(Runnable r) { r.run() } } // no thread created class asyncEx Executor{ execute(Runnable r) { new Thread(r).start() } }
Executor interface is limited. How to wait for thread? How to return value from execution?
Executor fence
Callable <V> interface
Like Runnable but can return something & throw exceptions
submit(callable) // get return value from callable // returns a Future<V> (promise) y = new Future(task) // (maybe) spawn a thread to execute task // later... x = y.get() // blocks an wait til task is done
class Future { callable c Future(callable c) {} get() { return c.call() } }
ThreadPoolExecutor: a pool of threads (capped). Queue job of all threads currently busy (empty pool).
5Fibonacci¶
fib(n) { if (n < 2) return 1 Future f = new Future(fib(n-1)) return fib(n-2) + f.get() }
(Node: computation. Edge: dependency)
This translates to:
With infinite processors, can execute nodes/computations in the dependency dag.
What’s the minimum time it’ll take?
The largest dependency chain is a lower bound. Critical path = 8 (ADGHKIFC).
Speedup: $T_1 = 17$. $T_\infty = 8$. $17/8 > 2$
Notation:
$T_n$: time taken for n processors
$T_1$: sequential
In practice, you have a fixed # of processors.
What’s the speedup for $T_2$, $T_4$? Need to know the schedule. How/when/where tasks are assigned.
We need a clever scheduler. Greedy scheduler works well.
But anything would give less than 2x speedup (Graham Brent Theorem).