Monday, November 18, 2013 CC-BY-NC
PGAS languages and transactional programming

Maintainer: admin

1X10 continued

async s; // mini thread, no handle, lightweight
finish {
  for (i in 1...100) {
    async {a[i]++}
  }
} // all fors done
finish s // all async done
finish {
  async {
    finish {
      async {x=2}
      // x E {0, 2}
    } // x E {2}
    x++
    // x E {3}
  } // x E {0, 2, 3}
} // x E {3}

2Generic condition synchronization

when (c) -> S
// wait for C to be true -> atomically do S
// C should be pure (might be execed many times)

Constraints on C, S:

  • sequential (no async)
  • nonblocking (no finish)
  • local (this place)

3Places/locations

There are many places. Places are (logical) locations (actual binding to machines is done by RT).

at(p) S // execute S at place p

This'll serialize the data and make a copy (expensive) on p.
N.B. not copied back: x=at(p) E

4Work distribution

$C=PE$
$\sum\limits_{k=0}^{n-1} a_{ix}b_{kj} $
^ can be concurrently done!

class worker extends thread {
  int row, col
  worker (row, col) {}
  run() {
    dot = 0.0
    for (k from 0 to n - 1) {
      dot += A[row][k] * B[k][col]
    }
    C[row][col] = dot
  }
}
//
{
  for (i from 0 to n - 1) {
    for (j from 0 to n - 1) {
      n[i][j] = new worker(i, j)
      n[i][j].start()
    }
  }
}

lots of threads not doing much work. Not efficient, high overhead.

Partition the array + use a thread pool.

Java has thread pool (executor framework)

executor ex = {
  ex.execute(new Runnable) // may or may not create a new thread
}

class directEx implements Executor {
  execute(Runnable r) {
    r.run()
  }
} // no thread created
class asyncEx Executor{
  execute(Runnable r) {
    new Thread(r).start()
  }
}

Executor interface is limited. How to wait for thread? How to return value from execution?

Executor fence
Callable <V> interface
Like Runnable but can return something & throw exceptions

submit(callable) // get return value from callable
// returns a Future<V> (promise)
y = new Future(task) // (maybe) spawn a thread to execute task
// later...
x = y.get() // blocks an wait til task is done
class Future {
  callable c
  Future(callable c) {}
  get() {
    return c.call()
  }
}

ThreadPoolExecutor: a pool of threads (capped). Queue job of all threads currently busy (empty pool).

5Fibonacci

fib(n) {
  if (n < 2) return 1
  Future f = new Future(fib(n-1))
  return fib(n-2) + f.get()
}

(Node: computation. Edge: dependency)

This translates to:


With infinite processors, can execute nodes/computations in the dependency dag.
What’s the minimum time it’ll take?
The largest dependency chain is a lower bound. Critical path = 8 (ADGHKIFC).

Speedup: $T_1 = 17$. $T_\infty = 8$. $17/8 > 2$
Notation:
$T_n$: time taken for n processors
$T_1$: sequential

In practice, you have a fixed # of processors.
What’s the speedup for $T_2$, $T_4$? Need to know the schedule. How/when/where tasks are assigned.
We need a clever scheduler. Greedy scheduler works well.
But anything would give less than 2x speedup (Graham Brent Theorem).