Fully functional libdispatch port for Windows and Linux

Marius Zwicker

11 Feb 2011 11 Feb '11

6:06 p.m.

Hi everyone, I'm glad to announce that I finished a port of libdispatch to Linux and Windows during the last days. Except the source interfaces (which differ too much between Windows and Linux) the API should be fully operational. The majority of your original tests is passing - some of them still need to be ported to build and run on all platforms. As I did not manage to understand your code completely, I did a complete rewrite and merely reused your api and general ideas. For achieving "blocks support" on the majority of the used compilers I added lambda support from the upcoming C++0x standard. Additionally I implemented (for those who need and use it) an interface to integrate libdispatch into applications developed using the Qt framework. As my port is a bit more than a patch, I named it libxdispatch and it is currently available by going to http://opensource.mlba-team.de/xdispatch - I'm really curious about your comments. Regards, Marius Zwicker

Show replies by date

Kevin Van Vechten

11 Feb 11 Feb

6:24 p.m.

New subject: Fully functional libdispatch port for Windows and Linux

Hi Marius, It's great that you have enough interest in the libdispatch project to work on a port to Linux and Win32. If you weren't aware, we already have Linux and partial Win32 support in the primary libdispatch source repository. I would encourage you to contribute modifications to the primary project rather than attempt a complete rewrite. I reviewed some of the sources of "libxdispatch" and I can see that this does approximate the libdisptach API at a high level, but it's important to keep in mind some of the aspects of the libdispatch implementation that set it apart — specifically thread safety and wait-free algorithms — which allow it to scale well with large numbers of cores and heavy load. I noted several places in the libxdispatch/core/queue.c implementation where thread safety was questionable and other cases where simple spin locks (waiting) were in use. I'd be happy to answer any specific questions you may have about the libdispatch implementation or scalability goals. Kevin On Feb 11, 2011, at 10:06 AM, Marius Zwicker wrote:

...

Hi everyone,

I'm glad to announce that I finished a port of libdispatch to Linux and Windows during the last days. Except the source interfaces (which differ too much between Windows and Linux) the API should be fully operational. The majority of your original tests is passing - some of them still need to be ported to build and run on all platforms. As I did not manage to understand your code completely, I did a complete rewrite and merely reused your api and general ideas.

For achieving "blocks support" on the majority of the used compilers I added lambda support from the upcoming C++0x standard.

Additionally I implemented (for those who need and use it) an interface to integrate libdispatch into applications developed using the Qt framework.

As my port is a bit more than a patch, I named it libxdispatch and it is currently available by going to http://opensource.mlba-team.de/xdispatch - I'm really curious about your comments.

Regards,

Marius Zwicker

Marius Zwicker

6:56 p.m.

New subject: Fully functional libdispatch port for Windows and Linux

Wow, thanks to Kevin and Davez for your quick reply. I am fully aware of the fact that the original implementation of libdispatch outperforms mine in many ways. Unfortunately I somehow missed the open accessable subversion repository when getting in contact with the libdispatch sources for the first time (Actually I found a gzipped file only at that time whereas it seems to have vanished right now). Thus I did not realize your efforts concerning linux and windows at that time. That's where my decision to start from scratch came from. I'd be happy to contribute to the primary sources in form of patches. First thing to talk about certainly is the support for C++0x lambdas and my Qt interface. Whereas the Qt interface certainly should be kept independently from the rest I consider the C++0x support very important. My goal is to be able to use Microsofts compilers as well which certainly will never get clang's block support. As far as I can see the Win32 support is using automake, thus it is using cygwin or mingw. As soon as I have some sparetime (unfortunately I did not do the port just for fun) I will try to build the primary sources and develop some patches. Concerning the MS support, I will try to develop some CMake files for that if you don't mind. Concerning Davez' annotations I assume that they concern my C++ API only. My main intend was to integrate the Qt Main Event loop with the dispatch main queue, quite similar to the way you guys did in Cocoa. Some of the getter methods (such as the mentioned isSuspended()) were obviously introduced without much thinking as they do not reflect the balanced need of release and suspend calls you describe within your api documentation. I will omit them as soon as possible. Concerning type() I will have a look at ways to avoid this. Thanks a lot for your comments and have a nice weekend. Marius Am 11.02.2011 19:24, schrieb Kevin Van Vechten:

...

Hi Marius,

It's great that you have enough interest in the libdispatch project to work on a port to Linux and Win32. If you weren't aware, we already have Linux and partial Win32 support in the primary libdispatch source repository. I would encourage you to contribute modifications to the primary project rather than attempt a complete rewrite.

I reviewed some of the sources of "libxdispatch" and I can see that this does approximate the libdisptach API at a high level, but it's important to keep in mind some of the aspects of the libdispatch implementation that set it apart — specifically thread safety and wait-free algorithms — which allow it to scale well with large numbers of cores and heavy load. I noted several places in the libxdispatch/core/queue.c implementation where thread safety was questionable and other cases where simple spin locks (waiting) were in use.

I'd be happy to answer any specific questions you may have about the libdispatch implementation or scalability goals.

Kevin

On Feb 11, 2011, at 10:06 AM, Marius Zwicker wrote:

...
Hi everyone,

I'm glad to announce that I finished a port of libdispatch to Linux and Windows during the last days. Except the source interfaces (which differ too much between Windows and Linux) the API should be fully operational. The majority of your original tests is passing - some of them still need to be ported to build and run on all platforms. As I did not manage to understand your code completely, I did a complete rewrite and merely reused your api and general ideas.

For achieving "blocks support" on the majority of the used compilers I added lambda support from the upcoming C++0x standard.

Additionally I implemented (for those who need and use it) an interface to integrate libdispatch into applications developed using the Qt framework.

As my port is a bit more than a patch, I named it libxdispatch and it is currently available by going to http://opensource.mlba-team.de/xdispatch - I'm really curious about your comments.

Regards,

Marius Zwicker

John Engelhart

20 Feb 20 Feb

12:13 a.m.

New subject: Fully functional libdispatch port for Windows and Linux

On Fri, Feb 11, 2011 at 1:24 PM, Kevin Van Vechten <kvv@apple.com> wrote:

...

Hi Marius,

It's great that you have enough interest in the libdispatch project to work on a port to Linux and Win32. If you weren't aware, we already have Linux and partial Win32 support in the primary libdispatch source repository. I would encourage you to contribute modifications to the primary project rather than attempt a complete rewrite.

I reviewed some of the sources of "libxdispatch" and I can see that this does approximate the libdisptach API at a high level, but it's important to keep in mind some of the aspects of the libdispatch implementation that set it apart — specifically thread safety and wait-free algorithms — which allow it to scale well with large numbers of cores and heavy load. I noted several places in the libxdispatch/core/queue.c implementation where thread safety was questionable and other cases where simple spin locks (waiting) were in use.

I'd be happy to answer any specific questions you may have about the libdispatch implementation or scalability goals.

There's one part of this message that caught my eye, and I'm particularly curious about: but it's important to keep in mind some of the aspects of the libdispatch

...

implementation that set it apart — specifically thread safety and wait-free algorithms — which allow it to scale well with large numbers of cores and heavy load.

Specifically, the "wait-free algorithms" part. To be clear, "wait-free" may mean different things to different people, but in high concurrency circles, "wait-free" tends to have a very specific meaning. See the wikipedia article http://en.wikipedia.org/wiki/Non-blocking_algorithm for the definition(s) I'm using. Is libdispatch truly "wait-free"? I looked at the code when it first came out, and nothing in particular struck me as rising to the level of "wait-free". Obstruction-free, sure. "Practical" lock-free, most likely. Wait-free? I don't see how that's possible. From the wikipedia article wrt/ wait-free: An algorithm is wait-free if every operation has a bound on the number of

...

steps the algorithm will take before the operation completes.

This basically eliminates anything that uses a CAS or LL/SC primitive with "loop and retry on failure" logic- there is no guarantee that the operation will ever complete. If I recall correctly, libdispatch uses this technique to implement it's lockless data structures. Even ensuring (for some very strong definition of ensuring, like provably) "lock-free" behavior with this kind of logic is extremely difficult, hence the informal term "practical" lock-free. In practice, truly lock-free and wait-free almost requires access to a Verilog description (or equivalent) of all the components and doing very low level simulation to ensure all possible permutations are shown to "work". As you can imagine, this just isn't practical for a modern, out of order, super scalar CPU, let alone everything it connects to. To the best of my knowledge, the x86 architecture is one of the few out there that has a CAS operation the ensures "at least one succeeds" semantics (which I believe is largely due to the requirement to be "mostly" backwards compatible all the way back to 8086, particularly on bus transactions). Most other architectures do not. In fact, many implementations of LL/SC can result in a live-lock condition in extremely rare circumstances (it's happened on IBM big iron, for example). Is this just a case where a better choice of words would have been more appropriate? Because a truly wait-free implementation would be quite a coup, and I'd appreciate any pointers to the details of how you did this.

Dave Zarzycki

5:24 a.m.

New subject: Fully functional libdispatch port for Windows and Linux

On Feb 19, 2011, at 4:13 PM, John Engelhart wrote:

...

On Fri, Feb 11, 2011 at 1:24 PM, Kevin Van Vechten <kvv@apple.com> wrote: Hi Marius,

It's great that you have enough interest in the libdispatch project to work on a port to Linux and Win32. If you weren't aware, we already have Linux and partial Win32 support in the primary libdispatch source repository. I would encourage you to contribute modifications to the primary project rather than attempt a complete rewrite.

I reviewed some of the sources of "libxdispatch" and I can see that this does approximate the libdisptach API at a high level, but it's important to keep in mind some of the aspects of the libdispatch implementation that set it apart — specifically thread safety and wait-free algorithms — which allow it to scale well with large numbers of cores and heavy load. I noted several places in the libxdispatch/core/queue.c implementation where thread safety was questionable and other cases where simple spin locks (waiting) were in use.

I'd be happy to answer any specific questions you may have about the libdispatch implementation or scalability goals.

There's one part of this message that caught my eye, and I'm particularly curious about:

but it's important to keep in mind some of the aspects of the libdispatch implementation that set it apart — specifically thread safety and wait-free algorithms — which allow it to scale well with large numbers of cores and heavy load.

Specifically, the "wait-free algorithms" part. To be clear, "wait-free" may mean different things to different people, but in high concurrency circles, "wait-free" tends to have a very specific meaning. See the wikipedia article http://en.wikipedia.org/wiki/Non-blocking_algorithm for the definition(s) I'm using.

Is libdispatch truly "wait-free"? I looked at the code when it first came out, and nothing in particular struck me as rising to the level of "wait-free". Obstruction-free, sure. "Practical" lock-free, most likely.

John, Speaking in absolute general terms, yes, libdispatch is "merely" practically wait free.

...

Wait-free? I don't see how that's possible. From the wikipedia article wrt/ wait-free:

An algorithm is wait-free if every operation has a bound on the number of steps the algorithm will take before the operation completes.

This basically eliminates anything that uses a CAS or LL/SC primitive with "loop and retry on failure" logic- there is no guarantee that the operation will ever complete. If I recall correctly, libdispatch uses this technique to implement it's lockless data structures.

That depends on the architecture. On LL/SC architectures (everything that isn't x86), yes that is true. For whatever it may be worth, there are tricks that LL/SC architectures could do to add some determinism, but they aren't likely to because it means locking a cache line into memory with a timeout in the hope that a store-conditional will happen shortly after the load-linked/load-locked/load-with-reservation instruction.

...

Even ensuring (for some very strong definition of ensuring, like provably) "lock-free" behavior with this kind of logic is extremely difficult, hence the informal term "practical" lock-free. In practice, truly lock-free and wait-free almost requires access to a Verilog description (or equivalent) of all the components and doing very low level simulation to ensure all possible permutations are shown to "work". As you can imagine, this just isn't practical for a modern, out of order, super scalar CPU, let alone everything it connects to.

Whoa. Are you interpreting "wait free" to mean "knowably bounded latency"? If that is the case, then yes, knowing the upper bound of latency is practically impossible for a variety of reasons beyond just hardware (Verilog) descriptions. For example: 1) Are interrupts enabled? (which as a subset includes preemption) 2) Is virtual memory enabled? (TLB misses, page faults, swap, etc) 3) And finally, yes, does the CPU architecture add nondeterminism (SMP, SMT, out-of-order, etc). To us, "wait free" means that we're not spinning in a loop waiting for any condition to be true before continuing. (Therefore yes, any LL/SC architecture cannot, by definition be "wait free", but Intel is a different story as discussed below.)

...

To the best of my knowledge, the x86 architecture is one of the few out there that has a CAS operation the ensures "at least one succeeds" semantics (which I believe is largely due to the requirement to be "mostly" backwards compatible all the way back to 8086, particularly on bus transactions). Most other architectures do not. In fact, many implementations of LL/SC can result in a live-lock condition in extremely rare circumstances (it's happened on IBM big iron, for example).

You're slightly misinformed. What you wrote applies to every atomic instruction on Intel except compare-and-swap (CMPXCHG). In other words, ADD, SUB, ADC, SBB, INC, DEC, AND, OR, NEG, NOT, BTS, BTR, BTC, and XCHG are wait free (to the best of my knowledge). In fact, on all of the critical paths, libdispatch prefers the above (non-CAS) instructions. Even when libdispatch does use CAS, it rarely spin-waits around a CAS operation. Most failed CAS operations fall back to an atomic XCHG on a FIFO list. On most, if not all of the critical/fast paths of GCD, we use non-CAS logic, which on Intel, has more deterministically bound latency than CAS based algorithms. Is libdispatch perfectly "wait free"? No. Is it practically wait free? Yes. Do we care about the difference? In theory, yes. In practice, no. davez

Dave Zarzycki

11 Feb 11 Feb

6:39 p.m.

New subject: Fully functional libdispatch port for Windows and Linux

Marius, Interesting work. Thanks for sharing. FYI – We _intentionally_ omitted some APIs from the design of GCD that your code adds. For example: "isSuspended()". There is no way any developer can use that API safely in a concurrent environment because the result is instantly stale upon return. Also, the "setSuspended()" API is misleading. Internally, it is a suspend count, not a boolean. That is why we vended "dispatch_suspend()" and "dispatch_resume()" so that multiple clients may safely coordinate suspension without being explicitly aware of each other. Also, we don't like vending getter APIs that allow developers to implement dubious and/or problematic RTTI like logic. For example, the "isSerialQueue()" method that your C++ wrapper adds. If one needs to submit a non-reentrant block, then use the barrier API that GCD provides and don't bother trying to query the queue type. Similarly, we intentionally didn't vend a "type()" API because we couldn't foresee any way it could be used safely. One should just submit the block with the right combination of desired behavior (dispatch_async, dispatch_sync, dispatch_barrier_async, or dispatch_barrier_sync) and move on. Thanks again for sharing. :-) davez On Feb 11, 2011, at 10:06 AM, Marius Zwicker wrote:

...

Hi everyone,

I'm glad to announce that I finished a port of libdispatch to Linux and Windows during the last days. Except the source interfaces (which differ too much between Windows and Linux) the API should be fully operational. The majority of your original tests is passing - some of them still need to be ported to build and run on all platforms. As I did not manage to understand your code completely, I did a complete rewrite and merely reused your api and general ideas.

For achieving "blocks support" on the majority of the used compilers I added lambda support from the upcoming C++0x standard.

Additionally I implemented (for those who need and use it) an interface to integrate libdispatch into applications developed using the Qt framework.

As my port is a bit more than a patch, I named it libxdispatch and it is currently available by going to http://opensource.mlba-team.de/xdispatch - I'm really curious about your comments.

Regards,

Marius Zwicker

_______________________________________________ libdispatch-dev mailing list libdispatch-dev@lists.macosforge.org http://lists.macosforge.org/mailman/listinfo.cgi/libdispatch-dev

5375

Age (days ago)

5384

Last active (days ago)

List overview

Download

5 comments

4 participants

participants (4)

Dave Zarzycki
John Engelhart
Kevin Van Vechten
Marius Zwicker