[libdispatch-dev] Question on race condition when using dispatch_group_notify_f
Daniel A. Steffen
dsteffen at apple.com
Mon Apr 2 14:49:07 PDT 2012
Hi Christoper,
please see the Lion branch on macosforge for the latest sources:
http://svn.macosforge.org/repository/libdispatch/branches/Lion
in particular, semaphore.c in those sources has a change in this area where the list of notify waiters is snapshot before any waiters are woken up, to support exactly this use case (new group notify added from a notify callout)
Do note however that notify blocks/functions for a group are not themselves part of the group, so care needs to taken when reusing groups like this (e.g. there is no guarantee that a group is empty by the time a group notify function is called if new work is being submitted asynchronously to the notification).
Daniel
On Apr 2, 2012, at 14:28, Christopher Jones <cdj at fnal.gov> wrote:
> To whom it may concern,
>
> I have found a race condition using the port of libdispatch from http://mark.heily.com/sites/mark.heily.com/files/libdispatch-f16-SRPMS.tgz. The race condition manifests when doing the equivalent of
>
> void run(void* iContext) {
> dispatch_group_t g = reinterpret_cast<dispatch_group_t>(iContext);
>
> dispatch_group_async_f(g, dispatch_get_global_queue(0, 0), g, do_work);
> dispatch_group_notify_f(g, dispatch_get_global_queue(0, 0), g, run);
> }
>
> The idea is to have a new 'run' task start up after the previous 'do_work' task has finished. However, with a system under heavy load I see that 'run' can be called before its instance of 'do_work' has finished. This happens because of a race condition in '_dispatch_group_wake'
>
> _dispatch_group_wake(dispatch_semaphore_t dsema)
> {
> struct dispatch_sema_notify_s *tmp;
> struct dispatch_sema_notify_s *head = (struct dispatch_sema_notify_s *)dispatch_atomic_xchg(&dsema->dsema_notify_head, NULL);
> ...
> while (head) {
> dispatch_async_f(head->dsn_queue, head->dsn_ctxt, head->dsn_func);
> _dispatch_release(head->dsn_queue);
> do {
> tmp = head->dsn_next;
> } while (!tmp && !dispatch_atomic_cmpxchg(&dsema->dsema_notify_tail, head, NULL));
> free(head);
> head = tmp;
> }
> …
> }
>
> Under heavy load, the thread processing _dispatch_group_wake can be swapped out right after the first call to 'dispatch_async_f'. The task can then call 'run(void* iContext)' which can make it all the way to the end of that function thereby adding a new task to the tail of the dispatch_group. Once the '_dispatch_group_wake' thread reawakens, it now has a new entry in 'head->dsn_next' (since in dispatch_group_notify_f the 'head' and 'tail' refer to the same memory address for this problem). This new entry is processed which causes the 'run' to go off before the associated 'do_work' finishes.
>
> You can find a small test which can, usually, exhibit the error and cause an assert to fail
> http://dl.dropbox.com/u/11356841/raceCondition.cpp
>
> This test has succeeded in exhibiting the error for me when run on a 16 core (4CPUs with each CPU having 4 cores) machine under Scientific Linux 6 [which is derived from Red Hat Enterprise 6].
>
> So my question is, is the idea of having dispatch_group_notify_f effectively call itself not a supported activity or is this a bug that should be fixed?
>
> Sincerely,
> Chris
>
> Dr Christopher Jones
> Fermi National Accelerator Laboratory
> cdj at fnal.gov
>
> _______________________________________________
> libdispatch-dev mailing list
> libdispatch-dev at lists.macosforge.org
> http://lists.macosforge.org/mailman/listinfo.cgi/libdispatch-dev
More information about the libdispatch-dev
mailing list