[libdispatch-dev] lib dispatch worker threads may loop depending on compiler optimization

Fri Sep 9 00:41:11 PDT 2011

On 09/09/2011 09:27 AM, Dmitri Shubin wrote:
>> In practice, GCC and clang have historically treated
>> _sync_lock_test_and_set() as a full barrier, and that is why GCD was
>> able to get away with using it to get at the "xchg" instruction.
>> The behavior of _sync_lock_test_and_set() may have changed in recent GCC compilers, and this would explain the bug observed in this email thread where the store instruction was statically moved after the "xchg" instruction. That is why the recently introduced __sync_swap() intrinsic is preferable when available.
>
> If you need full barrier why not explicitly call |__sync_synchronize()|
> before |__sync_lock_test_and_set() in dispatch_atomic_xchg() then?|

Because it's expensive (60-90 cycles).

Paolo