Monday, November 4, 2013 CC-BY-NC
Finalization in the JMM, the ABA Problem and LL/SC

Maintainer: admin

1Elimination Stack

Essentially, the lock-free stack is still a sequential data structure. We can however build a concurrent stack, one where push and pop can cancel each other by having the two threads trade values instead.

In the case of a stack, the exchange is one-sided (the "pusher" gives the value to the "popper"), but the "popper" still needs to indicate to the "pusher" that a pop is occurring. Furthermore, push(); push() and pop(); pop(); need to be avoided. Only a push(); pop(); should be allowed.

The exchange occurs by having Thread0 push a value by sending to Thread1 after which Thread1 responds with a null to indicate it has popped the value. We need a general exchanger which lets threads exchange values, based on a lock-free design.

1.1Exchanger Data Structure

We will store the values to be exchanged and a state. The state shall indicate whether one thread is in the exchanger (trying to do the exchange), the other thread has shown, or two threads are busy doing it. In the latter, if a third thread shows up, then it cannot use the exchanger.

Empty state
Ready to do an exchange
Waiting state
First thread has shown up, waiting for the second
Busy
Second thread has shown up, and the exchange can occur

1.2Process

0. We start with the empty state.
1. The first thread shows up and wants to verify that the exchanger is either empty or waiting.
2. It needs to do a CAS to change the state to waiting: CAS(state, empty, waiting);.
3. Just changing the state is not enough. The value should be changed as well with a double-wide CAS, or in Java, an atomic staged reference. CAS((state, slot), (empty, slot), (waiting, myDate));
4. If that succeeds, the thread is indeed the first to enter. As the first thread, it must wait for another to come in (changing from waiting to busy). It does so with a spin-loop.
5. If the thread waits too long, the thread should give up. With a CAS, the state should be changed back to empty. If that succeeds, no other thread has shown up.
6. If the CAS failed, it means another thread came in. In this case, the exchange should rather continue. To complete the exchange, the other thread should take the data in the slot, and change the state should be changed to empty. Nb: No CAS is needed here because only one thread can change from busy to empty.

4a.1. If that fails, there are two possibilities. Either the exchanger is already waiting, or it is already busy. If the latter, 2 threads are already in the exchanger, and the thread should give up.
4a.2. Otherwise, it means the thread is the second thread to show up. The thread must grab the first thread's data from the slot. To make sure only thread complete the exchange, CAS to change the swap data to our data and the state from waiting to busy.

This was just a textual description of a reusable exchanger. The full code is in the textbook.

1.3Example

Starting up with <null, empty>.

Thread0 Thread1
CAS(<null, empty>, <A, waiting>); -
Spin and wait for the state to be busy. -
- CAS fails
- Get A
- CAS(<A, waiting>, <B, busy>);
Get B -
Set state to empty -

1.4Making an elimination stack with an exchanger

Two thread show up and they try to exchange. They should not try forever (there are many failure scenarios where the exchanger ends up not being used).

If the exchange succeeds, push(x) expects a null while pop() expects a non-null value. In this case, the push and pop never changed the stack, as intended. If it fails, a regular push and pop should be done instead.

However, we can make this even more parallel. At the moment, there is only one exchanger and as such, only two threads at a time can do an exchange. Using an array of exchangers, this restriction can be lifted. When a thread shows up, it shall use one of the exchangers at random to try to do the exchange.

Hopefully, with enough exchangers the chance to collide with a busy exchanger should be little. With few enough exchangers, it would be more likely to find a matching thread.

2General approach to a lock-free data structure

Can we make every data structure lock-free? In theory, yes, at least for a very large class of data structures. Should every data structure require some carefully crafted design tailored to this specific data structure? No, not necessarily.

There is a universal construction to use when making a data structure lock-free:

// Every data structure should implement this interface
public interface SeqObject {
    // The response is what the operation expects back
    // The operation is encoded as an Invocation object
    public Response apply(Invocation i);
}

2.1Stack example

push operation
Invocation: <"push", x>
pop operation
Invocation: <"pop">

2.2Approach

The approach is to actually not do things on the data structure; we will maintain a log of all the operations and sequentialize them. To find out in what state the data structure should be, the log should be traversed from the initial state. This is inefficient practically, but lock-free theoretically.

Since multiple threads are calling apply, an order needs to be agreed upon. Whenever apply is called, a new node to represent the invocation is created in the log. n threads are competing to add their invocation to the end of the log list. Agreement must occur in a lock-free manner.

As such, the threads must do a n-threads consensus operation to agree on which node is added next. The winner adds its node, but all the losers left must compete again.

// ith thread
Invocation i = new Invocation(...)
do {
    j = consensus(i);
} while (j != i);
s = tail; // Start at the tail of the log
r = tail.state // initial state
do {
    r = r.apply(s);
    s = s.next;
} while (s != i);
return r;

Nb: We need a reusable consensus (not trivial at all). The solution is not to reuse the consensus at all. We will store the consensus objects with each node of the log list, doing consensus once for each node in the log.

Log linked list

Each consensus object is used by n-threads at most once.

3Finalization in the JMM

One last thing about the Java Memory Model. One of the interesting things that ended up being an issue with the new memory model is the concept of finalization. Finalization is usually (wrongly) recommended to be used to insure certain things are done after an object has been garbage collected. Before using it, we should be aware of the concurrency issues.

public void finalize() {
    // code here is run after the object is garbage collected,
    // but before the memory is reused
}

Nb: The method is not actually guaranteed to run at all! Nothing in the Java specs says that finalize() is guaranteed to run. It is merely some code that will probably run, but not necessarily.

The finalizer is run in some thread, but not necessarily the current one. As such, finalize() is a concurrent method. In theory, it should be thread-safe because it is happening after the object is garbage collected and no one can access it at all.

class Foo {
    private File f;

    public Foo() {
        f = new File(...); // Initialize the file here
    }

    public void write(int x) {
        f.write(x); // Write a number
    }

    @Override
    public void finalize() {
        f.close(); // Close the file upon garbage collection
    }
}

The idea of this code is to have the file opened by a Foo object to be closed automatically when it is garbage collected.

File f = new Foo();
f.write(42);
f = null; // Not useful, actually

f is garbage collectable as soon as it can be proven to not be referenced at all anymore. Also, the Java compiler might perform some optimizations to speed up execution. One such optimization is inlining the constructor to avoid the overhead of calling a function:

File f = [allocate some space]
f.f = new File(); // inline constructor
r1 = f.f; // Hold on f.f in a register because it will be accessed soon
r1.write(42);

The above code is equivalent to the code above it. However, when is f no longer referenced here? The intention of the programmer is for garbage collection to occur after the write(42);. However, the last point f is referenced is at r1 = f.f;. This means that the finalizer can be run at that point. This would cause the file to be closed, as intended, but the next line wants to write to it!

finalize() should be treated as a concurrent method as problems even start to appear before considering concurrency at all. Practically, it should avoided altogether.