[libdispatch-dev] lib dispatch worker threads may loop depending on compiler optimization
zarzycki at apple.com
Fri Sep 9 08:37:34 PDT 2011
On Sep 9, 2011, at 1:25 AM, Jean-Daniel Dupas wrote:
> Le 9 sept. 2011 à 09:39, Paolo Bonzini a écrit :
>> On 09/09/2011 09:23 AM, Dave Zarzycki wrote:
>>>>> I doubt it, __sync_lock_test_and_set is a full barrier on x86.
>>>>> Compiler-wise it is always a full optimization barrier, the
>>>>> actual semantics depend on the processor.
>>> Strictly speaking, that isn't true. From the documentation:
>>> "This builtin is not a full barrier, but rather an acquire barrier.
>>> This means that references after the builtin cannot move to (or be
>>> speculated to) before the builtin, but previous memory stores may not
>>> be globally visible yet, and previous memory loads may not yet be
>>> In practice, GCC and clang have historically treated
>>> _sync_lock_test_and_set() as a full barrier, and that is why GCD was
>>> able to get away with using it to get at the "xchg" instruction.
>> Yes, the documentation is conservative. However, if you look at the code (and this hasn't changed in recent GCC):
>> * moving references before the builtin is clearly prohibited, and so is speculating them;
>> * depending on the target, memory stores may not be globally visible yet, and previous memory loads may not yet be satisfied;
>> * however, the compiler will *never* sink references below the builtin, which is what I meant by "compiler-wise it is always a full optimization barrier" like asm("":::"memory").
> So, how do you explain what we see in this bug report ?
I'm still betting that GCC changed in this regard. Unlike the rest of the __sync_*() intrinsics, the _sync_lock_test_and_set() intrinsic does *not* promise to be a full barrier, and moving older stores after the younger _sync_lock_test_and_set() would be consistent with the observed compiler output and the promised barrier semantics of _sync_lock_test_and_set().
More information about the libdispatch-dev