Wednesday, June 25, 2008

Back to the future

Future-based concurrent programming can be unexpectedly inefficient when some futures depend on others. Consider the following code (although C++ syntax is used the discussion is not particularly related to this programming language):

‎int constant(int x)
‎{
‎ return x;
‎}

‎int add(future<int> x,future<int> y)
‎{
‎ return x.get()+y.get(); // blocks on x and y
‎}

‎int main()
‎{
‎ // (1+2)+(3+4)
‎ future<int> g=
‎ schedule(add,
‎ schedule(add,schedule(constant,1),schedule(
constant,2)),
‎ schedule(add,schedule(
constant,3),schedule(constant,4)));
‎}

where schedule takes a function along with a set of arguments and sends the job of evaluating the function to some internal task processor running concurrently with the program's main thread of execution. A future object is provided as a token for further claim of the execution's return value. Calling get() on a future object returns the associated value, blocking if necessary in wait for the asynchronous value calculation to complete. The future values generated by the code above can be more easily seen in the following equivalent rewriting:

‎int main()
‎{
‎ future<int> a=schedule(constant,1),
‎ b=schedule(constant,2),
‎ c=schedule(constant,3),
‎ d=schedule(constant,4),
‎ e=schedule(add,a,b),
‎ f=schedule(add,c,d),
‎ g=schedule(add,e,f);
‎}

Future-based programming can achieve better performance than sequential computation because execution is translated to a thread pool taking advantage of multicore and multiprocessor architectures. The task queue can be shared among all threads of the pool (so that when a thread completes a task goes to the common queue for another) or we can equip each thread with its own task queue, so that when a new future task is generated it gets added to the least busy queue. Although from the point of view of queueing theory this second configuration is less efficient (see for instance the discussion on section 2.4 of Andreas Willig's A Short Introduction to Queueing Theory), it also have benefits generally not considered in basic queueing theory, namely reduced locking contention when accessing the task queues --queue reading by the pool threads needs no locking.

Now, let us suppose we have a pool with two threads, each with its own task queue; on a two-core machine this can in theory double the performance of a sequential computation. But consider now the case where the seven future values of the example program get assigned to the two threads of the pool in the following manner:

thread 1 ← a b c d e f,
thread 2 ← g.

This seems an unlikely distribution (one would expect a more or less balanced assignment of tasks among the threads) but is a consistent situation and could happen if, say, thread 2 was already overloaded with prior tasks. The scenario is extremely bad performancewise: thread 2 blocks processing g until the future values it depends on, e and f, are completed by thread 1, so basically we get no improvement over the sequential case.

We could remedy this contention scenario if g was rescheduled rather than allowed to block on e and f. For instance if future<T>::get exhibited the following behavior:

  • Return the future value is it has already been calculated;
  • otherwise, block in wait for the value calculation to complete if in a user thread
  • or throw a future_pending exception if in a future pool thread.

This would prevent g from blocking; thread 2 could then catch the future_pending exception and reschedule the task, becoming available for further processing. There are two important issues that have to be taken into account for this approach to be feasible:

  • Scheduled functions must be written with the knowledge that future<T>::get can throw and, when this happens, the function will be called again in the future, which could pose a problem if the computation produces side effects. No problems of this kind arise in the special case when the scheduled code is purely functional.
  • Rescheduling a task to the end of the queue might be too drastic and introduce excessive delay in the computation of the task: ideally, the task should be rescheduled right after the last future it depends on, but that is an information the scheduler does not have (it is implicitly present in the task code itself). An in-between compromise is to reschedule the task after the future that has thrown the future_pending exception. More intrusive approaches could require that the list of futures a given future depends on be explicitly given by the programmer to the scheduler.

1 comment :

  1. The work stealing implemented on the Boost.Task library should avoid this extreme behavior,as the thraed2 will steal from the queue 1 as far as he will be blocked waiting for e and f.

    ReplyDelete