Clarification on default target_queue for custom serial queues
When reading about libdispatch, I get the sense that I can create serial queues en masse and it’ll just work out, because the queues will be limited to a system-determined number of threads. The reason behind this:
In fact, serial queues are scheduled using the global queues. Each serial queue has a target queue, which is initially set to the default priority concurrent queue. When a block is first added to an empty serial queue, the queue itself is added to the target queue.
-- GCD technical brief, formerly linked from http://developer.apple.com/technologies/mac/snowleopard/gcd.html but is now a dead link. There’s a backup at http://www.ctestlabs.org/hughes_multicore/documents/GrandCentral_TB_brief_20... This information is repeated at the WWDC session talks (#210, about 10 and a half minutes into the video available to developers). Essentially, global queues are limited to a handful of threads, and since serial queues run on the global queues, they're also all funneled down to that handful of threads. But libdispatch doesn’t actually seem to do this. Example code (tested on iOS 4.3, also shows this behavior on Snow Leopard 10.6.7, 10.6.0, and Lion dev previews): NSUInteger queueCount = 50; dispatch_queue_t queueArray[queueCount]; for ( NSUInteger i = 0; i < queueCount; ++i ) queueArray[i] = dispatch_queue_create("com.some.queue", DISPATCH_QUEUE_SERIAL); // Then enqueue expensive blocks on each of these with a dispatch group, wait for the group to finish, and clean up. The process will just go ahead and create 50 threads. On a dual-core machine, this isn’t really what I was hoping for. But two lines of adjustments… NSUInteger queueCount = 50; dispatch_queue_t queueArray[queueCount]; dispatch_queue_t defaultGlobalQueue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0); for ( NSUInteger i = 0; i < queueCount; ++i ) { queueArray[i] = dispatch_queue_create("com.some.queue", DISPATCH_QUEUE_SERIAL); dispatch_set_target_queue(queueArray[i], defaultGlobalQueue); } // Then enqueue expensive blocks on each of these with a dispatch group, wait for the group to finish, and clean up. …will limit it to a couple of threads. According to the documentation quoted above, these two code samples should produce identical results -- serial queues should already target the default global queue. Yet the actual behaviors are very different. Is this intentional? Am I misunderstanding something? Thanks for your time and whatever response you can give, --Daniel Shusta === PS. this is specific to custom serial queues. Custom concurrent queues in iOS 4.3+ funnel down to a couple of threads. PPS. In libdispatch's queue.c, it does seem like the target queue is supposed to be the default global queue. In _dispatch_queue_init() we see dq->do_targetq = _dispatch_get_root_queue(0, true);
Hi Daniel, On Jun 25, 2011, at 4:16 PM, Daniel Shusta wrote:
When reading about libdispatch, I get the sense that I can create serial queues en masse and it’ll just work out, because the queues will be limited to a system-determined number of threads. The reason behind this:
In fact, serial queues are scheduled using the global queues. Each serial queue has a target queue, which is initially set to the default priority concurrent queue. When a block is first added to an empty serial queue, the queue itself is added to the target queue.
-- GCD technical brief, formerly linked from http://developer.apple.com/technologies/mac/snowleopard/gcd.html but is now a dead link. There’s a backup at http://www.ctestlabs.org/hughes_multicore/documents/GrandCentral_TB_brief_20...
This information is repeated at the WWDC session talks (#210, about 10 and a half minutes into the video available to developers).
Essentially, global queues are limited to a handful of threads, and since serial queues run on the global queues, they're also all funneled down to that handful of threads.
But libdispatch doesn’t actually seem to do this. Example code (tested on iOS 4.3, also shows this behavior on Snow Leopard 10.6.7, 10.6.0, and Lion dev previews):
NSUInteger queueCount = 50; dispatch_queue_t queueArray[queueCount];
for ( NSUInteger i = 0; i < queueCount; ++i ) queueArray[i] = dispatch_queue_create("com.some.queue", DISPATCH_QUEUE_SERIAL);
// Then enqueue expensive blocks on each of these with a dispatch group, wait for the group to finish, and clean up.
The process will just go ahead and create 50 threads. On a dual-core machine, this isn’t really what I was hoping for. But two lines of adjustments…
NSUInteger queueCount = 50; dispatch_queue_t queueArray[queueCount]; dispatch_queue_t defaultGlobalQueue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
for ( NSUInteger i = 0; i < queueCount; ++i ) { queueArray[i] = dispatch_queue_create("com.some.queue", DISPATCH_QUEUE_SERIAL);
dispatch_set_target_queue(queueArray[i], defaultGlobalQueue); }
// Then enqueue expensive blocks on each of these with a dispatch group, wait for the group to finish, and clean up.
…will limit it to a couple of threads.
According to the documentation quoted above, these two code samples should produce identical results -- serial queues should already target the default global queue. Yet the actual behaviors are very different.
Is this intentional? Am I misunderstanding something?
Thanks for your time and whatever response you can give, --Daniel Shusta
=== PS. this is specific to custom serial queues. Custom concurrent queues in iOS 4.3+ funnel down to a couple of threads.
PPS. In libdispatch's queue.c, it does seem like the target queue is supposed to be the default global queue. In _dispatch_queue_init() we see
dq->do_targetq = _dispatch_get_root_queue(0, true);
it is this 'true' here that makes all the difference, the default target queue of a serial queue is the _overcommit_ default-priority global concurrent queue. For items/queues submitted to an overcommit global queue, the current Mac OS X kernel workqueue mechanism creates threads more eagerly, e.g. even if an n-wide machine is already fully committed with n cpu-busy threads, submitting another item directly to the overcommit global queue or indirectly to a serial queue with default target queue will cause another thread to be created to handle that item (potentially overcommitting the machine, hence the name). If you wish to avoid this, simply set the target queue of your serial queues to the default priority global queue (i.e. non-overcommit). The overcommit/non-overcommit distinction is intentionally undocumented and only available in the queue_private.h header because we hope to revise the kernel workqueue mechanism in the future to avoid the need for this distinction. Daniel
it is this 'true' here that makes all the difference, the default target queue of a serial queue is the _overcommit_ default-priority global concurrent queue.
For items/queues submitted to an overcommit global queue, the current Mac OS X kernel workqueue mechanism creates threads more eagerly, e.g. even if an n- wide machine is already fully committed with n cpu-busy threads, submitting another item directly to the overcommit global queue or indirectly to a serial queue with default target queue will cause another thread to be created to handle that item (potentially overcommitting the machine, hence the name).
If you wish to avoid this, simply set the target queue of your serial queues to the default priority global queue (i.e. non-overcommit).
The overcommit/non-overcommit distinction is intentionally undocumented and only available in the queue_private.h header because we hope to revise the kernel workqueue mechanism in the future to avoid the need for this distinction.
Do the present non-overcommit queues still do the Right Thing if all threads are currently non-busy-but-blocked? In other words, will they let me deadlock by submitting N units of work that require the N+1th to succeed (with N+1 being kept on the queue because the system won't spin up any more threads)? I know there is specific language about this situation in the code (for codepaths that use raw pthreads rather than workqueues), but I'm not sure how the workqueue situation is handled.
On Jun 25, 2011, at 5:08 PM, DrPizza wrote:
it is this 'true' here that makes all the difference, the default target queue of a serial queue is the _overcommit_ default-priority global concurrent queue.
For items/queues submitted to an overcommit global queue, the current Mac OS X kernel workqueue mechanism creates threads more eagerly, e.g. even if an n- wide machine is already fully committed with n cpu-busy threads, submitting another item directly to the overcommit global queue or indirectly to a serial queue with default target queue will cause another thread to be created to handle that item (potentially overcommitting the machine, hence the name).
If you wish to avoid this, simply set the target queue of your serial queues to the default priority global queue (i.e. non-overcommit).
The overcommit/non-overcommit distinction is intentionally undocumented and only available in the queue_private.h header because we hope to revise the kernel workqueue mechanism in the future to avoid the need for this distinction.
Do the present non-overcommit queues still do the Right Thing if all threads are currently non-busy-but-blocked? In other words, will they let me deadlock by submitting N units of work that require the N+1th to succeed (with N+1 being kept on the queue because the system won't spin up any more threads)? I know there is specific language about this situation in the code (for codepaths that use raw pthreads rather than workqueues), but I'm not sure how the workqueue situation is handled.
yes, blocked threads are handled differently than cpu-busy ones, even on the non-overcommit workqueues new threads will be created if there are blocked threads (and fewer than than n cpu-busy threads for a n-wide machine). The kernel imposes an overall limit to the total number of workqueue threads per process, so if N is too large you can still deadlock yourself by blocking too many threads. Daniel
Also for the record, this is the exact behaviour of the portable libpthread_workqueue implementation uses on other platforms as well. Cheers, Joakim This email was sent from my iPhone, so it may be unusually terse. On 26 jun 2011, at 02:14, "Daniel A. Steffen" <dsteffen@apple.com> wrote:
On Jun 25, 2011, at 5:08 PM, DrPizza wrote:
Do the present non-overcommit queues still do the Right Thing if all threads are currently non-busy-but-blocked? In other words, will they let me deadlock by submitting N units of work that require the N+1th to succeed (with N+1 being kept on the queue because the system won't spin up any more threads)? I know there is specific language about this situation in the code (for codepaths that use raw pthreads rather than workqueues), but I'm not sure how the workqueue situation is handled.
yes, blocked threads are handled differently than cpu-busy ones, even on the non-overcommit workqueues new threads will be created if there are blocked threads (and fewer than than n cpu-busy threads for a n-wide machine).
The kernel imposes an overall limit to the total number of workqueue threads per process, so if N is too large you can still deadlock yourself by blocking too many threads.
Daniel
_______________________________________________ libdispatch-dev mailing list libdispatch-dev@lists.macosforge.org http://lists.macosforge.org/mailman/listinfo.cgi/libdispatch-dev
On Jun 25, 2011, at 5:08 PM, DrPizza wrote:
it is this 'true' here that makes all the difference, the default target queue of a serial queue is the _overcommit_ default-priority global concurrent queue.
For items/queues submitted to an overcommit global queue, the current Mac OS X kernel workqueue mechanism creates threads more eagerly, e.g. even if an n- wide machine is already fully committed with n cpu-busy threads, submitting another item directly to the overcommit global queue or indirectly to a serial queue with default target queue will cause another thread to be created to handle that item (potentially overcommitting the machine, hence the name).
If you wish to avoid this, simply set the target queue of your serial queues to the default priority global queue (i.e. non-overcommit).
The overcommit/non-overcommit distinction is intentionally undocumented and only available in the queue_private.h header because we hope to revise the kernel workqueue mechanism in the future to avoid the need for this distinction.
Do the present non-overcommit queues still do the Right Thing if all threads are currently non-busy-but-blocked? In other words, will they let me deadlock by submitting N units of work that require the N+1th to succeed (with N+1 being kept on the queue because the system won't spin up any more threads)? I know there is specific language about this situation in the code (for codepaths that use raw pthreads rather than workqueues), but I'm not sure how the workqueue situation is handled.
Yes, assuming blocked means blocked-in-a-syscall. It won't if you're just blocking in the thread by spinning in userspace.
_______________________________________________ libdispatch-dev mailing list libdispatch-dev@lists.macosforge.org http://lists.macosforge.org/mailman/listinfo.cgi/libdispatch-dev
On Jun 25, 2011, at 7:32 PM, Daniel A. Steffen wrote:
Hi Daniel,
[…]
it is this 'true' here that makes all the difference, the default target queue of a serial queue is the _overcommit_ default-priority global concurrent queue.
For items/queues submitted to an overcommit global queue, the current Mac OS X kernel workqueue mechanism creates threads more eagerly, e.g. even if an n-wide machine is already fully committed with n cpu-busy threads, submitting another item directly to the overcommit global queue or indirectly to a serial queue with default target queue will cause another thread to be created to handle that item (potentially overcommitting the machine, hence the name).
If you wish to avoid this, simply set the target queue of your serial queues to the default priority global queue (i.e. non-overcommit).
The overcommit/non-overcommit distinction is intentionally undocumented and only available in the queue_private.h header because we hope to revise the kernel workqueue mechanism in the future to avoid the need for this distinction.
Daniel
Hey Daniel, So custom serial queues default to a different global queue, interesting. But what are the reasons for this default, and what are the downsides of non-overcommiting queues versus overcommitting ones? I mean, non-overcommitting queues still spawn new threads if it blocks. Would there be a concern if I just retargeted all the queues in my apps to non-overcommiting global queues? Anecdotally, it seems to help UI responsiveness (main thread doesn’t get crowded out?), so that’s a benefit right there. --Daniel Shusta
On Jun 27, 2011, at 1:59 PM, Daniel Shusta wrote:
So custom serial queues default to a different global queue, interesting. But what are the reasons for this default, and what are the downsides of non-overcommiting queues versus overcommitting ones? I mean, non-overcommitting queues still spawn new threads if it blocks.
the reason was essentially better compatibility for code that migrated from pthreads to serial queues (with a direct mapping of one pthread -> one serial queue). E.g. on a single core machine, code keeping one serial queue permanently busy (with a single block) can prevent another serial queue from ever running, something which is not possible with two pthreads.
Would there be a concern if I just retargeted all the queues in my apps to non-overcommiting global queues? Anecdotally, it seems to help UI responsiveness (main thread doesn’t get crowded out?), so that’s a benefit right there.
For new code architected with GCD in mind, setting the target queue of your serial queues to non-overcommit by default seems like a very sensible idea. Daniel
participants (5)
-
Daniel A. Steffen
-
Daniel Shusta
-
DrPizza
-
Joakim Johansson
-
Matt Wright