[libdispatch-dev] lib dispatch worker threads may loop depending on compiler optimization
Paolo Bonzini
bonzini at gnu.org
Fri Sep 9 00:39:49 PDT 2011
On 09/09/2011 09:23 AM, Dave Zarzycki wrote:
>>> I doubt it, __sync_lock_test_and_set is a full barrier on x86.
>>> Compiler-wise it is always a full optimization barrier, the
>>> actual semantics depend on the processor.
>
> Strictly speaking, that isn't true. From the documentation:
>
> http://gcc.gnu.org/onlinedocs/gcc/Atomic-Builtins.html
>
> "This builtin is not a full barrier, but rather an acquire barrier.
> This means that references after the builtin cannot move to (or be
> speculated to) before the builtin, but previous memory stores may not
> be globally visible yet, and previous memory loads may not yet be
> satisfied."
>
> In practice, GCC and clang have historically treated
> _sync_lock_test_and_set() as a full barrier, and that is why GCD was
> able to get away with using it to get at the "xchg" instruction.
Yes, the documentation is conservative. However, if you look at the
code (and this hasn't changed in recent GCC):
* moving references before the builtin is clearly prohibited, and so is
speculating them;
* depending on the target, memory stores may not be globally visible
yet, and previous memory loads may not yet be satisfied;
* however, the compiler will *never* sink references below the builtin,
which is what I meant by "compiler-wise it is always a full optimization
barrier" like asm("":::"memory").
So I find it extremely unlikely that this is the cause of the problem.
It is more likely that an optimization barrier like the above no-op asm
is missing in the source, and clang is getting away without it.
Remember that while the x86 does not need explicit read or write
barriers in the assembly (only full barriers), you do need to write the
barriers in the code and expand them to no-op asms. Otherwise the
compiler may move references across the barrier.
(Former GCC developer here :)).
Paolo
More information about the libdispatch-dev
mailing list