Re: [libdispatch-dev] lib dispatch worker threads may loop depending on compiler optimization
On 09/19/2011 06:32 PM, Joakim Johansson wrote:
<http://libdispatch.macosforge.org/trac/ticket/35#comment:10>
Thanks. It turns out it is _not_ a bug in GCC at all, but in libdispatch. Compiling the .i file shows the following: movq %r13, %rax movq $1, 0(%r13) movq %r12, 16(%r13) #APP # 110 "/tb/builds/thd/sbn/2.4/src/thirdparty/libdispatch/197/src/src/queue_internal.h" 1 xchg %rax, 64(%rbx) # 0 "" 2 #NO_APP testq %rax, %rax movq %rbp, 24(%r13) movq $0, 8(%r13) je .L172 The APP/NO_APP markers signify that an asm is being used rather than sync builtins. Looking at the source code indeed reveals this... // GCC generates suboptimal register pressure // LLVM does better, but doesn't support tail calls // 6248590 __sync_*() intrinsics force a gratuitous "lea" instruction, with resulting register pressure #if 0 && defined(__i386__) || defined(__x86_64__) #define dispatch_atomic_xchg(p, n) ({ typeof(*(p)) _r; asm("xchg %0, %1" : "=r" (_r) : "m" (*(p)), "0" (n)); _r; }) #else #define dispatch_atomic_xchg(p, n) ((typeof(*(p)))__sync_lock_test_and_set((p), (n))) #endif ... which is missing parentheses like #if 0 && (defined(__i386__) || defined(__x86_64__)) The asm is wrong, because it doesn't have a clobber for "memory". That fixes the testcase. Paolo
Great find! On Sep 20, 2011, at 12:48 AM, Paolo Bonzini wrote:
On 09/19/2011 06:32 PM, Joakim Johansson wrote:
<http://libdispatch.macosforge.org/trac/ticket/35#comment:10>
Thanks. It turns out it is _not_ a bug in GCC at all, but in libdispatch.
Compiling the .i file shows the following:
movq %r13, %rax movq $1, 0(%r13) movq %r12, 16(%r13) #APP # 110 "/tb/builds/thd/sbn/2.4/src/thirdparty/libdispatch/197/src/src/queue_internal.h" 1 xchg %rax, 64(%rbx) # 0 "" 2 #NO_APP testq %rax, %rax movq %rbp, 24(%r13) movq $0, 8(%r13) je .L172
The APP/NO_APP markers signify that an asm is being used rather than sync builtins. Looking at the source code indeed reveals this...
// GCC generates suboptimal register pressure // LLVM does better, but doesn't support tail calls // 6248590 __sync_*() intrinsics force a gratuitous "lea" instruction, with resulting register pressure #if 0 && defined(__i386__) || defined(__x86_64__) #define dispatch_atomic_xchg(p, n) ({ typeof(*(p)) _r; asm("xchg %0, %1" : "=r" (_r) : "m" (*(p)), "0" (n)); _r; }) #else #define dispatch_atomic_xchg(p, n) ((typeof(*(p)))__sync_lock_test_and_set((p), (n))) #endif
... which is missing parentheses like
#if 0 && (defined(__i386__) || defined(__x86_64__))
The asm is wrong, because it doesn't have a clobber for "memory". That fixes the testcase.
Paolo _______________________________________________ libdispatch-dev mailing list libdispatch-dev@lists.macosforge.org http://lists.macosforge.org/mailman/listinfo.cgi/libdispatch-dev
participants (2)
-
Kevin Van Vechten
-
Paolo Bonzini