[libdispatch-dev] lib dispatch worker threads may loop depending on compiler optimization

Fri Sep 9 00:51:28 PDT 2011

On 09.09.2011 11:39, Paolo Bonzini wrote:
> Yes, the documentation is conservative.  However, if you look at the 
> code (and this hasn't changed in recent GCC):
>
> * moving references before the builtin is clearly prohibited, and so 
> is speculating them;
>
> * depending on the target, memory stores may not be globally visible 
> yet, and previous memory loads may not yet be satisfied;

AFAIU this is critical here -- global store (tail->do_next) should be 
visible.

>
> * however, the compiler will *never* sink references below the 
> builtin, which is what I meant by "compiler-wise it is always a full 
> optimization barrier" like asm("":::"memory").
>
> So I find it extremely unlikely that this is the cause of the problem.
>
> It is more likely that an optimization barrier like the above no-op 
> asm is missing in the source, and clang is getting away without it. 
> Remember that while the x86 does not need explicit read or write 
> barriers in the assembly (only full barriers), you do need to write 
> the barriers in the code and expand them to no-op asms.  Otherwise the 
> compiler may move references across the barrier.
>
Yes, that was the fix -- adding __asm__ __volatile__("" ::: "memory") 
before call to __sync_lock_test_and_set().

But what about other targets (not x86) -- looks like they need true 
write memory barrier here?