[libdispatch-dev] lib dispatch worker threads may loop depending on compiler optimization

Fri Sep 9 01:25:20 PDT 2011

Le 9 sept. 2011 à 09:39, Paolo Bonzini a écrit :

> On 09/09/2011 09:23 AM, Dave Zarzycki wrote:
>>>> I doubt it, __sync_lock_test_and_set is a full barrier on x86.
>>>> Compiler-wise it is always a full optimization barrier, the
>>>> actual semantics depend on the processor.
>> 
>> Strictly speaking, that isn't true. From the documentation:
>> 
>> http://gcc.gnu.org/onlinedocs/gcc/Atomic-Builtins.html
>> 
>> "This builtin is not a full barrier, but rather an acquire barrier.
>> This means that references after the builtin cannot move to (or be
>> speculated to) before the builtin, but previous memory stores may not
>> be globally visible yet, and previous memory loads may not yet be
>> satisfied."
>> 
>> In practice, GCC and clang have historically treated
>> _sync_lock_test_and_set() as a full barrier, and that is why GCD was
>> able to get away with using it to get at the "xchg" instruction.
> 
> Yes, the documentation is conservative.  However, if you look at the code (and this hasn't changed in recent GCC):
> 
> * moving references before the builtin is clearly prohibited, and so is speculating them;
> 
> * depending on the target, memory stores may not be globally visible yet, and previous memory loads may not yet be satisfied;
> 
> * however, the compiler will *never* sink references below the builtin, which is what I meant by "compiler-wise it is always a full optimization barrier" like asm("":::"memory").

So, how do you explain what we see in this bug report ? 

-- Jean-Daniel