[libdispatch-dev] Fully functional libdispatch port for Windows and Linux

Sat Feb 19 21:24:10 PST 2011

On Feb 19, 2011, at 4:13 PM, John Engelhart wrote:

> On Fri, Feb 11, 2011 at 1:24 PM, Kevin Van Vechten <kvv at apple.com> wrote:
> Hi Marius,
> 
> It's great that you have enough interest in the libdispatch project to work on a port to Linux and Win32. If you weren't aware, we already have Linux and partial Win32 support in the primary libdispatch source repository. I would encourage you to contribute modifications to the primary project rather than attempt a complete rewrite.
> 
> I reviewed some of the sources of "libxdispatch" and I can see that this does approximate the libdisptach API at a high level, but it's important to keep in mind some of the aspects of the libdispatch implementation that set it apart — specifically thread safety and wait-free algorithms — which allow it to scale well with large numbers of cores and heavy load. I noted several places in the libxdispatch/core/queue.c implementation where thread safety was questionable and other cases where simple spin locks (waiting) were in use.
> 
> I'd be happy to answer any specific questions you may have about the libdispatch implementation or scalability goals.
> 
> There's one part of this message that caught my eye, and I'm particularly curious about:
> 
> but it's important to keep in mind some of the aspects of the libdispatch implementation that set it apart — specifically thread safety and wait-free algorithms — which allow it to scale well with large numbers of cores and heavy load.
> 
> Specifically, the "wait-free algorithms" part.  To be clear, "wait-free" may mean different things to different people, but in high concurrency circles, "wait-free" tends to have a very specific meaning.  See the wikipedia article http://en.wikipedia.org/wiki/Non-blocking_algorithm for the definition(s) I'm using.
> 
> Is libdispatch truly "wait-free"?  I looked at the code when it first came out, and nothing in particular struck me as rising to the level of "wait-free".  Obstruction-free, sure.  "Practical" lock-free, most likely.

John,

Speaking in absolute general terms, yes, libdispatch is "merely" practically wait free.

> Wait-free?  I don't see how that's possible.  From the wikipedia article wrt/ wait-free:
> 
> An algorithm is wait-free if every operation has a bound on the number of steps the algorithm will take before the operation completes.
> 
> This basically eliminates anything that uses a CAS or LL/SC primitive with "loop and retry on failure" logic- there is no guarantee that the operation will ever complete.  If I recall correctly, libdispatch uses this technique to implement it's lockless data structures.

That depends on the architecture. On LL/SC architectures (everything that isn't x86), yes that is true. For whatever it may be worth, there are tricks that LL/SC architectures could do to add some determinism, but they aren't likely to because it means locking a cache line into memory with a timeout in the hope that a store-conditional will happen shortly after the load-linked/load-locked/load-with-reservation instruction.

> Even ensuring (for some very strong definition of ensuring, like provably) "lock-free" behavior with this kind of logic is extremely difficult, hence the informal term "practical" lock-free.  In practice, truly lock-free and wait-free almost requires access to a Verilog description (or equivalent) of all the components and doing very low level simulation to ensure all possible permutations are shown to "work".  As you can imagine, this just isn't practical for a modern, out of order, super scalar CPU, let alone everything it connects to.

Whoa.

Are you interpreting "wait free" to mean "knowably bounded latency"? If that is the case, then yes, knowing the upper bound of latency is practically impossible for a variety of reasons beyond just hardware (Verilog) descriptions. For example:

1) Are interrupts enabled? (which as a subset includes preemption)
2) Is virtual memory enabled? (TLB misses, page faults, swap, etc)
3) And finally, yes, does the CPU architecture add nondeterminism (SMP, SMT, out-of-order, etc).

To us, "wait free" means that we're not spinning in a loop waiting for any condition to be true before continuing.

(Therefore yes, any LL/SC architecture cannot, by definition be "wait free", but Intel is a different story as discussed below.)

> To the best of my knowledge, the x86 architecture is one of the few out there that has a CAS operation the ensures "at least one succeeds" semantics (which I believe is largely due to the requirement to be "mostly" backwards compatible all the way back to 8086, particularly on bus transactions).  Most other architectures do not. In fact, many implementations of LL/SC can result in a live-lock condition in extremely rare circumstances (it's happened on IBM big iron, for example).

You're slightly misinformed. What you wrote applies to every atomic instruction on Intel except compare-and-swap (CMPXCHG). In other words, ADD, SUB, ADC, SBB, INC, DEC, AND, OR, NEG, NOT, BTS, BTR, BTC, and XCHG are wait free (to the best of my knowledge).

In fact, on all of the critical paths, libdispatch prefers the above (non-CAS) instructions. Even when libdispatch does use CAS, it rarely spin-waits around a CAS operation. Most failed CAS operations fall back to an atomic XCHG on a FIFO list.

On most, if not all of the critical/fast paths of GCD, we use non-CAS logic, which on Intel, has more deterministically bound latency than CAS based algorithms.

Is libdispatch perfectly "wait free"? No. Is it practically wait free? Yes.

Do we care about the difference? In theory, yes. In practice, no.

davez

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.macosforge.org/pipermail/libdispatch-dev/attachments/20110219/f0983c88/attachment.html>