[libdispatch-dev] lib dispatch worker threads may loop depending on compiler optimization

Tue Sep 20 00:48:31 PDT 2011

On 09/19/2011 06:32 PM, Joakim Johansson wrote:
> <http://libdispatch.macosforge.org/trac/ticket/35#comment:10>

Thanks.  It turns out it is _not_ a bug in GCC at all, but in libdispatch.

Compiling the .i file shows the following:

        movq    %r13, %rax
        movq    $1, 0(%r13)
        movq    %r12, 16(%r13)
#APP
# 110 "/tb/builds/thd/sbn/2.4/src/thirdparty/libdispatch/197/src/src/queue_internal.h" 1
        xchg %rax, 64(%rbx)
# 0 "" 2
#NO_APP
        testq   %rax, %rax
        movq    %rbp, 24(%r13)
        movq    $0, 8(%r13)
        je      .L172

The APP/NO_APP markers signify that an asm is being used rather than sync
builtins.  Looking at the source code indeed reveals this...

// GCC generates suboptimal register pressure
// LLVM does better, but doesn't support tail calls
// 6248590 __sync_*() intrinsics force a gratuitous "lea" instruction, with resulting register pressure
#if 0 && defined(__i386__) || defined(__x86_64__)
#define dispatch_atomic_xchg(p, n)      ({ typeof(*(p)) _r; asm("xchg %0, %1" : "=r" (_r) : "m" (*(p)), "0" (n)); _r; })
#else
#define dispatch_atomic_xchg(p, n)      ((typeof(*(p)))__sync_lock_test_and_set((p), (n)))
#endif

... which is missing parentheses like

#if 0 && (defined(__i386__) || defined(__x86_64__))

The asm is wrong, because it doesn't have a clobber for "memory".
That fixes the testcase.

Paolo