Wednesday, November 6, 2013 Elimination stack, General approach to a lock-free data structure and OpenMP

(Someone clean this note because I have no idea what it is)

Harris -> Lazy solution -> make nodes that are being deleted before deleting them (AtomicMarkableReference?)

tryAdd(Node n, Node prev)
n.next = prev.next
return CAS(prev.next, n.next, false, n, false) // n.next, false -> old reference
// n, false -> new reference

tryRemove(Node n, Node prev)
Node succ = n.next
if(CAS(n.next, succ, false, succ, true))
CAS(prev.next, n, false, succ, false) // don’t care if succeeds
return true
return false


P1: T0 trying to remove x
T1 try to insert where w between x and y
H -> x -> y -> z -> T

T0
CAS(x.next, y, false, y, false)
CAS(H.next, x, false, y, false)

T1
3) CAS(x.next, y, false, w, false)

P2: T0 remove x
T1 remove y
H -> x -> y -> z -> T

T0
CAS(x.next, y, false, y, done)
CAS(H.next, x, false, y, false)

T1
3) CAS(y.next, z, false, z, true)
4) CAS(x.next, y, false, z, false)

if 4 < 1 -> x.next = z and T0 fails
if 1 < 4 -> 4 fails

something has to clean up and delete marked nodes

find(int data)
restart: while(true)
pred = J
cur = pred.next
while(cur != T)
succ = cur.next
while(cur.marked)
if(!CAS(pred.next, cur, false, succ, false))
continue restart
cur = succ
succ = cur.next
if(cur.data == data)
return cur
pred = cur
cur = succ
return null


2OpenMP¶

OpenMP is typically regarded as a "language" for concurrent programming, although not really a separate language. Rather, it is a set of directives that sits on top of another language.

While the language could theoretically be arbitrary, in practice, C/C++ and Fortran are the ones which work with OpenMP. There is also a version of OpenMP working with Java, but considering that Java already has concurrency, it is not very useful.

In C/C++, OpenMP works by inserting pragma instructions. As they appear as comments, compiling code with such instructions on a normal compiler normally results in a sequential program. On an OpenMP compiler, the pragmas are honored and tell the compiler how certain segments of code are to be executed in parallel.

2.1Structured Parallelism¶

OpenMP allows to achieve structured parallelism. That is, starting with a for loop, each iteration is executed in parallel and joined at the end, following a fork/join model where the beginning and end are well defined.

It is not very flexible, but it is an easy way to parallelize programs.

#pragma omp parallel for
{
for () {
...
}
}
// A thread team is created to execute the for loop in parallel.


Other times, we might want all the threads to do the same instruction(s).

#pragma omp parallel
{
printf("Hello!\n");
}
// The team of threads all execute the above statement


The number of threads can be chosen by the system, but it may also be passed as a parameter at the very beginning of the program, or on a statement by statement basis.

2.2Nested Parallelism¶

We can also have nested parallelism. That is a parallel section, inside a parallel section, in which case the number of threads executing may not be obvious.

2.3Parallel sections¶

Separate parallel sections can be defined.

#pragma omp section
{
}

#pragma omp section
{
// Executed by another
}


This is a bit like unstructured parallelism, although a bit awkward. The goal of OpenMP is not to give fine-grained control over concurrency, but rather to achieve it easily.

Each thread team has a master, one privileged thread which can be specialized.

#pragma omp master
{
// Only the master thread executes this.
}

#pragma omp single
{
// Only one thread executes this
}


A task model can also be used.

2.4Data Model¶

Out of OpenMP, we get a pretty neat data model. Shared data (heap/static) is the default, but one of the things we can do is declare data to be private.

private(x, y, z);


With this, each thread has its own copy (even if x, y, z have a non-local scope). Each thread ends up using its own copy of the data. However, how is it initialized? There are several initialization models.

First private
The value is inherited from the parent, or a default value is used.

What happens to the private data once the thread is finished? We could have it disappear, but why lose the work?

Last private
Thread submits its value back to the parent scope.

With multiple threads, the last one is the one to succeeds. However, a reduction operator can also be used: (op; list).

Maybe 10 threads are used to compute some value, but we are interested in the sum. In this case, the operator is + and the values in the list are summed. The initial value is assumed to be 0. With *, the default value is 1.

2.5Synchronization¶

2.5.1Master/Single¶

For a critical section, only the master, or only one thread executes it.

2.5.2Critical¶

All the threads execute the critical section, but only one at a time.

2.5.3Barriers¶

All the threads in the pool must arrive at that point, before any leaves.

2.5.4Atomic¶

Atomic Read Atomic write Atomic update Atomic capture
... = x; x = ...; x++; ... = x++; (kind of like FA)

Nb: Variables are not atomic by default; the above must be used.

2.5.5Flush¶

flush(a, b, c); synchronizes a, b, and c with main memory. This ends up being the main mechanism for actually making things consistent between different threads.