[libdispatch-dev] lib dispatch worker threads may loop depending on compiler optimization
Jean-Daniel Dupas
devlists at shadowlab.org
Fri Sep 9 01:25:20 PDT 2011
Le 9 sept. 2011 à 09:39, Paolo Bonzini a écrit :
> On 09/09/2011 09:23 AM, Dave Zarzycki wrote:
>>>> I doubt it, __sync_lock_test_and_set is a full barrier on x86.
>>>> Compiler-wise it is always a full optimization barrier, the
>>>> actual semantics depend on the processor.
>>
>> Strictly speaking, that isn't true. From the documentation:
>>
>> http://gcc.gnu.org/onlinedocs/gcc/Atomic-Builtins.html
>>
>> "This builtin is not a full barrier, but rather an acquire barrier.
>> This means that references after the builtin cannot move to (or be
>> speculated to) before the builtin, but previous memory stores may not
>> be globally visible yet, and previous memory loads may not yet be
>> satisfied."
>>
>> In practice, GCC and clang have historically treated
>> _sync_lock_test_and_set() as a full barrier, and that is why GCD was
>> able to get away with using it to get at the "xchg" instruction.
>
> Yes, the documentation is conservative. However, if you look at the code (and this hasn't changed in recent GCC):
>
> * moving references before the builtin is clearly prohibited, and so is speculating them;
>
> * depending on the target, memory stores may not be globally visible yet, and previous memory loads may not yet be satisfied;
>
> * however, the compiler will *never* sink references below the builtin, which is what I meant by "compiler-wise it is always a full optimization barrier" like asm("":::"memory").
So, how do you explain what we see in this bug report ?
-- Jean-Daniel
More information about the libdispatch-dev
mailing list