[libdispatch-dev] libdispatch object caches
jocke at tbricks.com
Fri May 20 04:26:03 PDT 2011
I have a question about a possible optimization and would appreciate feedback on whether it would be considered a) worthwhile to investigate b) a candidate for integration if providing good results.
The reason for asking is that I’ve seen significantly better (seems to be around 2x) performance on Solaris for e.g. ‘dispatch starfish’ for the lap times, which seems to be malloc-bound (as is also commented in the starfish implementation).
libdispatch today uses optimized TSD dispatch continuation object caches with a heavily optimized implementation for Darwin.
This cache seems unbounded during the lifetime of a given drain of the queue, after which a forced cleanup is performed (by calling _dispatch_force_cache_cleanup()).
The proposed change would be to let _dispatch_continuation_alloc_from_heap() allocate its objects from a magazine based object cache, and let _dispatch_cache_cleanup2() to return objects back to that object cache.
This is the approach taken in libumem (see http://www.usenix.org/event/usenix01/full_papers/bonwick/bonwick.pdf for background) - in fact, the easiest approach would possibly be to even depend on libumem being available (https://labs.omniti.com/trac/portableumem exists for non-Solaris platforms) - this could be compile-time optional of course depending on the availability of libumem in such a case, but that would implicitly replace the default memory allocator (if we don’t ‘cherry pick’ out e.g. the magazine layer).
It seems that this would allow us to reuse dispatch continuation objects across invocations/threads in an efficient way.
So there are a few questions:
- Would a magazine-based object cache be interesting for libdispatch?
- Have any analysis of typical TSD continuation cache sizes been done already? (anyone who have seen any issues here in GCD-heavy applications?)
- What would be a reasonable test for performance validation?
- Would a relying on libumem be acceptable for people in general?
One other possibility would be to only keep a single item in the TSD and fall back on such an object-cache if that single object would not be available (that is, releasing it back to the object pool in _dispatch_continuation_free, the ‘hot continuation’ trick from _dispatch_continuation_pop() would thus still work) - but this would require more serious investigation (the magazine approach above would be a ’safer’ change).
PS Apple folks also have rdar://4944235 for libumem/OS X, as it would be a nice analogue to the GCD system-level thread management, but for the memory allocation / object cacheing subsystems instead
More information about the libdispatch-dev