Can someone apply this if it's a proper fix ------------- diff --git a/autogen.sh b/autogen.sh old mode 100644 new mode 100755 diff --git a/testing/Makefile.am b/testing/Makefile.am index 404ced0..103d8bf 100644 --- a/testing/Makefile.am +++ b/testing/Makefile.am @@ -81,7 +81,7 @@ TOOLS= \ noinst_PROGRAMS+=$(TOOLS) INCLUDES=-I$(top_builddir) -I$(top_srcdir) -LDADD=libtest.la ../src/libdispatch.la +LDADD=libtest.la ../src/libdispatch.la ../src/libshims.la CFLAGS=-Wall $(MARCH_FLAGS) $(CBLOCKS_FLAGS) CXXFLAGS=-Wall $(MARCH_FLAGS) $(CXXBLOCKS_FLAGS) --------- I also get a test failure with make check --------- LOW: 68 ******************************************* DEFAULT: 63 **************************************** HIGH: 61 *************************************** Actual: 192 Expected: 192 [PASS] blocks completed Actual: 68 Expected: <61 [FAIL] high priority precedence (../../testing/dispatch_priority.c:83) ../../testing/dispatch_priority.c:83 PASS: dispatch_priority ================================================== [TEST] Dispatch Priority (Set Target Queue) [PID] 541 ================================================== --------- Is this an out of sync test case or a legitimate failure? Last two questions.. Does anyone have ideas on Solaris or Linux non-portable tsd or semaphore optimizations? Thanks! ./C
On 04/16/2011 11:34 AM, "C. Bergström" wrote:
Can someone apply this if it's a proper fix ------------- diff --git a/autogen.sh b/autogen.sh old mode 100644 new mode 100755 diff --git a/testing/Makefile.am b/testing/Makefile.am index 404ced0..103d8bf 100644 --- a/testing/Makefile.am +++ b/testing/Makefile.am @@ -81,7 +81,7 @@ TOOLS= \ noinst_PROGRAMS+=$(TOOLS)
INCLUDES=-I$(top_builddir) -I$(top_srcdir) -LDADD=libtest.la ../src/libdispatch.la +LDADD=libtest.la ../src/libdispatch.la ../src/libshims.la CFLAGS=-Wall $(MARCH_FLAGS) $(CBLOCKS_FLAGS) CXXFLAGS=-Wall $(MARCH_FLAGS) $(CXXBLOCKS_FLAGS) ---------
Hi, Can you provide more details about the compilation problem you are seeing, and give a little information about your Linux distribution and the versions of Automake/Autoconf/Libtool? According to src/Makefile.am: libdispatch_la_DEPENDENCIES=libshims.la This has been sufficient to get the unit tests to build on Debian and Fedora. If I had to guess, I would say your libtool isn't adding the dependency on libshims.la.
I also get a test failure with make check --------- LOW: 68 ******************************************* DEFAULT: 63 **************************************** HIGH: 61 *************************************** Actual: 192 Expected: 192 [PASS] blocks completed Actual: 68 Expected: <61 [FAIL] high priority precedence (../../testing/dispatch_priority.c:83) ../../testing/dispatch_priority.c:83 PASS: dispatch_priority ================================================== [TEST] Dispatch Priority (Set Target Queue) [PID] 541 ================================================== --------- Is this an out of sync test case or a legitimate failure?
Are you using the HEAD of libkqueue from Subversion? This looks like a regression that was recently introduced. Please try again with the libkqueue v1.0.2 tarball. Regards, - Mark
On Sat, 16 Apr 2011 22:34:21 +0700 "C. Bergström" <cbergstrom@pathscale.com> wrote:
Does anyone have ideas on Solaris or Linux non-portable tsd or semaphore optimizations? Thanks! ./C
Some small comment with regard to TSD optimization for Linux/Solaris (I did have a small look at this for Solaris): I'm not aware of how it looks at Linux, but Solaris do already have a quite optimized version of pthread_get/set_specific() for the first 8 TSD keys allocated [1]. For most "normal" applications, the libdispatch keys would be allocated in that space during startup during library initialization. See links below for details (it is not much code to read really...). Of course, the Apple implementation is even one more notch 'down to the metal' [2] - for Solaris I could see the following possible optimizations: 1. Use the non-portable thr_get/set_specific instead of the pthread interface, it saves one function call when setting TSD, but loses one load/store for the get interface 2. Inline the appropriate logic similar to the Apple TSD optimization using the knowledge of the libc implementation from opensolaris.org "1" Seems like easy and "safe", but will trade off slightly worse performance for TSD lookup (see comments in the link below) vs avoiding an extra function call when setting a TSD - depends on the usage pattern of libdispatch - I think this has a decent chance of being a net loss in reality… "2" Would in practice be difficult to do in terms of robustness (Oracle might change the implementation…), but could fairly easily be done as a proof-of-concept just for testing the performance difference by peeking at the implementation referenced below. It might be a small challenge to get the required interfaces supported directly by Oracle then as the next step :-) Frankly, I don't think either one is that attractive unless someone from Oracle/Sun would like to step in and directly address "2" so it could be done in a supported way. A good performance test case would be needed before going either route though, so far I haven't seen either function in any samples on Solaris and decided to postpone further investigation until it's popped up in any samples. Cheers, Joakim [1] pthread_getspecific: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/port/... pthread_setspecific: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/port/... Getting current thread ("curthread", from thr_uberdata.h): http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/inc/t... TSD_NFAST: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/inc/t... [2] http://libdispatch.macosforge.org/trac/browser/trunk/src/shims/tsd.h
Joakim Johansson wrote:
On Sat, 16 Apr 2011 22:34:21 +0700 "C. Bergström" <cbergstrom@pathscale.com> wrote:
Does anyone have ideas on Solaris or Linux non-portable tsd or semaphore optimizations? Thanks! ./C
Some small comment with regard to TSD optimization for Linux/Solaris (I did have a small look at this for Solaris):
I'm not aware of how it looks at Linux, but Solaris do already have a quite optimized version of pthread_get/set_specific() for the first 8 TSD keys allocated [1]. For most "normal" applications, the libdispatch keys would be allocated in that space during startup during library initialization. See links below for details (it is not much code to read really...).
Of course, the Apple implementation is even one more notch 'down to the metal' [2] - for Solaris I could see the following possible optimizations:
1. Use the non-portable thr_get/set_specific instead of the pthread interface, it saves one function call when setting TSD, but loses one load/store for the get interface 2. Inline the appropriate logic similar to the Apple TSD optimization using the knowledge of the libc implementation from opensolaris.org
"1" Seems like easy and "safe", but will trade off slightly worse performance for TSD lookup (see comments in the link below) vs avoiding an extra function call when setting a TSD - depends on the usage pattern of libdispatch - I think this has a decent chance of being a net loss in reality…
"2" Would in practice be difficult to do in terms of robustness (Oracle might change the implementation…), but could fairly easily be done as a proof-of-concept just for testing the performance difference by peeking at the implementation referenced below. It might be a small challenge to get the required interfaces supported directly by Oracle then as the next step :-)
Frankly, I don't think either one is that attractive unless someone from Oracle/Sun would like to step in and directly address "2" so it could be done in a supported way.
A good performance test case would be needed before going either route though, so far I haven't seen either function in any samples on Solaris and decided to postpone further investigation until it's popped up in any samples. This will end up getting used in HPC so we'll likely try to squeeze every bit of performance out that's possible to make it competitive against OpenMP. (Yes I know it's not Apple/Apple comparison) Personally I don't much about Linux and have a hobby OpenSolaris based thing I work on and may be able to do this.
Any results/experience would be of interest, both good or bad! Joakim This email was sent from my iPhone, so it may be unusually terse. On 18 apr 2011, at 09:14, "C. Bergström" <cbergstrom@pathscale.com> wrote:
This will end up getting used in HPC so we'll likely try to squeeze every bit of performance out that's possible to make it competitive against OpenMP. (Yes I know it's not Apple/Apple comparison) Personally I don't much about Linux and have a hobby OpenSolaris based thing I work on and may be able to do this.
Joakim Johansson wrote:
Any results/experience would be of interest, both good or bad!
Here's the problem we're currently facing.. The following code and some details from another engineer: dispatch_queue_t q = dispatch_get_global_queue(0, 0); int iterations = n / m; struct args1 args[iterations]; for (int i=0 ; i<iterations ; i+=m) { args[i].n = &n; args[i].v1 = &v1; args[i].v2 = &v2; args[i].v3 = &v3; args[i].start = i*m; args[i].start = args[i].start + m; dispatch_async_f(q, &args[i], codeletOk_codelet1); } struct args1 last = { &n, &v1, &v2, &v3, iterations*m, n%m }; dispatch_sync_f(q, &last, codeletOk_codelet1); #1 in this statement for (int i=0 ; i<iterations ; i+=m) . Should it be like for (int i=0 ; i<iterations * m ; i+=m) for an complete loop? #2 args[i].start = i*m; args[i].start = args[i].start + m; should it be like: args[i].start = i; args[i].start = args[i].start + m; #3 struct args1 last = { &n, &v1, &v2, &v3, iterations*m, n%m }; Should it be like: struct args1 last = { &n, &v1, &v2, &v3, iterations*m, n%m + iterations*m} ? #4 When I run the the code I found it wouldn't wait until dispatch_async_f(q, &args[i], codeletOk_codelet1) is over then the whole program was over. It can run correctly if we replace dispatch_async_f(q, &args[i], codeletOk_codelet1); with codeletOk_codelet1(&args[i]) and dispatch_sync_f(q, &last, codeletOk_codelet1); with codeletOk_codelet1(&last). ------ Any tips on how to debug the difference between async and sync? Thanks ./C
C. Bergström, dispatch_sync() against a global concurrent queue is meaningless. They're always concurrent. Please consider using dispatch_apply*() for this kind of problem. You'll find that the fit is quite natural and more efficient. :-) davez On Apr 22, 2011, at 5:46 AM, C. Bergström wrote:
Joakim Johansson wrote:
Any results/experience would be of interest, both good or bad!
Here's the problem we're currently facing..
The following code and some details from another engineer:
dispatch_queue_t q = dispatch_get_global_queue(0, 0); int iterations = n / m; struct args1 args[iterations]; for (int i=0 ; i<iterations ; i+=m) { args[i].n = &n; args[i].v1 = &v1; args[i].v2 = &v2; args[i].v3 = &v3; args[i].start = i*m; args[i].start = args[i].start + m; dispatch_async_f(q, &args[i], codeletOk_codelet1); } struct args1 last = { &n, &v1, &v2, &v3, iterations*m, n%m }; dispatch_sync_f(q, &last, codeletOk_codelet1); #1 in this statement for (int i=0 ; i<iterations ; i+=m) . Should it be like for (int i=0 ; i<iterations * m ; i+=m) for an complete loop? #2 args[i].start = i*m; args[i].start = args[i].start + m; should it be like: args[i].start = i; args[i].start = args[i].start + m; #3 struct args1 last = { &n, &v1, &v2, &v3, iterations*m, n%m }; Should it be like: struct args1 last = { &n, &v1, &v2, &v3, iterations*m, n%m + iterations*m} ? #4 When I run the the code I found it wouldn't wait until dispatch_async_f(q, &args[i], codeletOk_codelet1) is over then the whole program was over.
It can run correctly if we replace dispatch_async_f(q, &args[i], codeletOk_codelet1); with codeletOk_codelet1(&args[i]) and dispatch_sync_f(q, &last, codeletOk_codelet1); with codeletOk_codelet1(&last). ------ Any tips on how to debug the difference between async and sync?
Thanks
./C
_______________________________________________ libdispatch-dev mailing list libdispatch-dev@lists.macosforge.org http://lists.macosforge.org/mailman/listinfo.cgi/libdispatch-dev
participants (4)
-
"C. Bergström"
-
Dave Zarzycki
-
Joakim Johansson
-
Mark Heily