[libdispatch-dev] lib dispatch worker threads may loop depending on compiler optimization
Paolo Bonzini
bonzini at gnu.org
Tue Sep 20 00:48:31 PDT 2011
On 09/19/2011 06:32 PM, Joakim Johansson wrote:
> <http://libdispatch.macosforge.org/trac/ticket/35#comment:10>
Thanks. It turns out it is _not_ a bug in GCC at all, but in libdispatch.
Compiling the .i file shows the following:
movq %r13, %rax
movq $1, 0(%r13)
movq %r12, 16(%r13)
#APP
# 110 "/tb/builds/thd/sbn/2.4/src/thirdparty/libdispatch/197/src/src/queue_internal.h" 1
xchg %rax, 64(%rbx)
# 0 "" 2
#NO_APP
testq %rax, %rax
movq %rbp, 24(%r13)
movq $0, 8(%r13)
je .L172
The APP/NO_APP markers signify that an asm is being used rather than sync
builtins. Looking at the source code indeed reveals this...
// GCC generates suboptimal register pressure
// LLVM does better, but doesn't support tail calls
// 6248590 __sync_*() intrinsics force a gratuitous "lea" instruction, with resulting register pressure
#if 0 && defined(__i386__) || defined(__x86_64__)
#define dispatch_atomic_xchg(p, n) ({ typeof(*(p)) _r; asm("xchg %0, %1" : "=r" (_r) : "m" (*(p)), "0" (n)); _r; })
#else
#define dispatch_atomic_xchg(p, n) ((typeof(*(p)))__sync_lock_test_and_set((p), (n)))
#endif
... which is missing parentheses like
#if 0 && (defined(__i386__) || defined(__x86_64__))
The asm is wrong, because it doesn't have a clobber for "memory".
That fixes the testcase.
Paolo
More information about the libdispatch-dev
mailing list