What's with this sigsuspend stuff?

older
Possible race in _dispatch_logv...

DrPizza

4 Jul 2011 4 Jul '11

5:36 p.m.

I'm trying to understand the purpose of _dispatch_sigsuspend() [1], but for the life of me, I cannot see what it is. To the best of my knowledge, sigsuspend() blocks a single thread (not an entire process, per POSIX [2]), suspending it until one of the signals in the sigset_t is sent to that thread. sigsuspend() requires that the sigset_t be properly initialized (using either sigemptyset() or sigfillset()) and then, optionally, modified (using sigaddset() or sigdelset()). The behaviour is undefined if this does not occur. _dispatch_sigsuspend() does not properly initialize the sigset_t, instead passing it straight to sigsuspend(). So whatever the purpose of this function is, it probably isn't doing it properly. But then, what is that purpose? _dispatch_sigsuspend() is sent to the normal priority, non-overcommitting queue. This is a multithreaded queue, and the block is only sent once. The function loops forever; even if the sigsuspend() returns, it gets called again immediately. So the net result is surely that one of the threads in the workqueue will just be stuck forever? Clearly I'm missing something here; can anyone enlighten me? Regards, Peter [1]: http://libdispatch.macosforge.org/trac/browser/trunk/src/queue.c#L953 [2]: http://pubs.opengroup.org/onlinepubs/007908799/xsh/sigsuspend.html

Show replies by date

Dave Zarzycki

4 Jul 4 Jul

8:24 p.m.

On Jul 4, 2011, at 10:36 AM, DrPizza wrote:

...

I'm trying to understand the purpose of _dispatch_sigsuspend() [1], but for the life of me, I cannot see what it is.

To the best of my knowledge, sigsuspend() blocks a single thread (not an entire process, per POSIX [2])…

Peter, That above first part correct…

...

…suspending it until one of the signals in the sigset_t is sent to that thread.

…but the line above is backwards. The API changes the signal mask for the duration of the call to that of the passed in parameter. Therefore if an empty set is passed in, then all signals are unblocked for the duration of the call.

...

sigsuspend() requires that the sigset_t be properly initialized (using either sigemptyset() or sigfillset()) and then, optionally, modified (using sigaddset() or sigdelset()). The behaviour is undefined if this does not occur. _dispatch_sigsuspend() does not properly initialize the sigset_t, instead passing it straight to sigsuspend(). So whatever the purpose of this function is, it probably isn't doing it properly.

On all of the platforms that we know of, the result of sigemptyset() is the same as bzero(&set, sizeof(set)), and static variables default initialize to zero. So we're okay.

...

But then, what is that purpose?

To dequeue pending signals installed via signal()/sigaction() that would otherwise be blocked in a pure GCD app. This is because GCD threads block all maskable signals (see _dispatch_worker_thread() in the same file). This behavior helps all code that uses GCD avoid spurious EINTR errors from Unix system calls (which are often not tested for). davez

DrPizza

10:35 p.m.

...

…but the line above is backwards. The API changes the signal mask for the duration of the call to that of the passed in parameter. Therefore if an empty set is passed in, then all signals are unblocked for the duration of the call. So the uninitialized sigset_t is treated as an empty mask, allowing all signals to be delivered to the thread? OK. It still seems a little odd to me to not include the call to sigemptyset().

...

To dequeue pending signals installed via signal()/sigaction() that would otherwise be blocked in a pure GCD app. This is because GCD threads block all maskable signals (see _dispatch_worker_thread() in the same file). It only blocks maskable signals when using raw pthreads. The pthread_workqueue implementation calls _dispatch_worker_thread2() [1], and that function doesn't touch the signal masks. Do pthread_workqueue threads block signals automatically?

And if the intent of _dispatch_pthread_sigmask() is to block maskable signals, instead of masking one by one, shouldn't it just be using sigfillset()?

...

This behavior helps all code that uses GCD avoid spurious EINTR errors from Unix system calls (which are often not tested for). So, the (special, undocumented?) behaviour here is that if the program is determined to be "callback driven" (explicit call to dispatch_main(), or implicit use of Cocoa) then one victim workqueue thread will clear its mask and handle any signal, so that, if all other threads mask their signals, this victim thread will handle any and every signal. The value of this being that if you follow the (unwritten?) rules and mask signals from every other thread, you shouldn't then receive EINTR because you'll never have to worry about a signal being delivered to a regular thread?

OK, so nothing Windows has to worry about then. [1]: http://libdispatch.macosforge.org/trac/browser/trunk/src/queue.c#L1247

Dave Zarzycki

11:34 p.m.

On Jul 4, 2011, at 3:35 PM, DrPizza wrote:

...

...
To dequeue pending signals installed via signal()/sigaction() that would

...

...
otherwise be blocked in a pure GCD app. This is because GCD threads block all maskable signals (see _dispatch_worker_thread() in the same file). It only blocks maskable signals when using raw pthreads. The pthread_workqueue implementation calls _dispatch_worker_thread2() [1], and that function doesn't touch the signal masks. Do pthread_workqueue threads block signals automatically?

Peter, Yes. That helps us avoid two system calls per callback from the pthread_workqueue.

...

And if the intent of _dispatch_pthread_sigmask() is to block maskable signals, instead of masking one by one, shouldn't it just be using sigfillset()?

I'm not sure I understand the question. In any case, it probably doesn't matter for two reasons. First, _dispatch_pthread_sigmask() is a static C function, simple, and only used once. It is a great candidate for inlining, and in fact is inlined. The second reason is that the sig*set() APIs on Mac OS X and iOS are C macros that often compile completely away when optimizations are enabled.

...

...
This behavior helps all code that uses GCD avoid spurious EINTR errors from Unix system calls (which are often not tested for). So, the (special, undocumented?) behaviour here is that if the program is determined to be "callback driven" (explicit call to dispatch_main(), or implicit use of Cocoa) then one victim workqueue thread will clear its mask and handle any signal, so that, if all other threads mask their signals, this victim thread will handle any and every signal. The value of this being that if you follow the (unwritten?) rules and mask signals from every other thread, you shouldn't then receive EINTR because you'll never have to worry about a signal being delivered to a regular thread?

In general, the intersection of different subsystems always creates undefined/undocumented behavior/assumptions. The relationship between libdispatch, POSIX, the Mac OS X kernel, and apps in general is no different. In practice, what we found was that programs that installed signal handlers via signal()/sigaction() often made the assumption that there will always be a thread available to handle the signal. In other words, deferring traditional Unix signal handlers until a worker thread was idle could cause apps to hang. That is why GCD keeps a dedicated signal handling thread running. However, we cannot blindly create this helper thread when libdispatch is initialized because other programs [dubiously] assume that libraries do not leave lingering helper threads. Therefore, GCD only creates the helper thread if the main thread exits (because the main thread doesn't start out with any masked signals – technically it could, but in practice it doesn't happen.)

...

OK, so nothing Windows has to worry about then.

I don't know how Windows supports POSIX, but that sounds reasonable. davez

DrPizza

5 Jul 5 Jul

12:17 a.m.

...

Yes. That helps us avoid two system calls per callback from the pthread_workqueue.

OK, makes sense. Do you know if there any other "special" properties of the workqueue threads that might be significant?

...

I'm not sure I understand the question. Well, at the moment _dispatch_pthread_sigmask() looks like it's picking a specific limited set of signals to block off. If the intent is to block every blockable signal, doesn't sigfillset() generate a suitable mask, and more explicitly express the notion of "block everything".

...

In practice, what we found was that programs that installed signal handlers via signal()/sigaction() often made the assumption that there will always be a thread available to handle the signal. In other words, deferring traditional Unix signal handlers until a worker thread was idle could cause apps to hang. That is why GCD keeps a dedicated signal handling thread running. However, we cannot blindly create this helper thread when libdispatch is initialized because other programs [dubiously] assume that libraries do not leave lingering helper threads. Therefore, GCD only creates the helper thread if the main thread exits (because the main thread doesn't start out with any masked signals – technically it could, but in practice it doesn't happen.) Ah, I get you. The thread stuck in the loop is there to stand in for the (now exited) main thread, to ensure that signals have somewhere to drain.

Might there be situations where the main thread's mask is changed (for example, if an application wants to handle specific signals on a specific threads), and if so, might not a slightly better behaviour be for the signal-draining thread be to use the mask of the main thread, not the empty, accept-anything mask?

...

I don't know how Windows supports POSIX, but that sounds reasonable. The Windows POSIX subsystem has "proper" signal handling, but it is almost entirely separate from the Win32 subsystem, and so not the target of my port. The VC++ C Runtime does include some basic signal functionality, but it's entirely fake; signals can't be delivered from outside the process, and if they're raised in-process, the signal handler is directly executed on the raising thread. There's no masking or notion of delivering signals to other threads.

The one exception is ctrl-c/ctrl-break, and the system spawns a dedicated thread within the process to handle these anyway, so again there's no masking or any equivalent concepts. Peter

Dave Zarzycki

1:03 a.m.

On Jul 4, 2011, at 5:17 PM, DrPizza wrote:

...

...
Yes. That helps us avoid two system calls per callback from the pthread_workqueue.

OK, makes sense. Do you know if there any other "special" properties of the workqueue threads that might be significant?

I believe that some of the "don't do that" section of the dispatch man pages are actually enforced. For example, calling pthread_exit() on a GCD thread will probably give an error. It is worth verifying.

...

...
I'm not sure I understand the question. Well, at the moment _dispatch_pthread_sigmask() looks like it's picking a specific limited set of signals to block off. If the intent is to block every blockable signal, doesn't sigfillset() generate a suitable mask, and more explicitly express the notion of "block everything".

No. sigfillset() does exactly what it says: it simply fills the set. It doesn't know why the set is being created. On Mac OS X and iOS, some signals are only deliverable on the thread that they were generated from (SIGILL, SIGFPE, SIGBUS, SIGSEGV, etc), therefore, we shouldn't mask those off. Someday, if/when the Mac OS X / iOS kernel supports delivering those signals to any available thread, then we can fully mask every signal on GCD threads.

...

...
In practice, what we found was that programs that installed signal handlers via signal()/sigaction() often made the assumption that there will always be a thread available to handle the signal. In other words, deferring traditional Unix signal handlers until a worker thread was idle could cause apps to hang. That is why GCD keeps a dedicated signal handling thread running. However, we cannot blindly create this helper thread when libdispatch is initialized because other programs [dubiously] assume that libraries do not leave lingering helper threads. Therefore, GCD only creates the helper thread if the main thread exits (because the main thread doesn't start out with any masked signals – technically it could, but in practice it doesn't happen.) Ah, I get you. The thread stuck in the loop is there to stand in for the (now exited) main thread, to ensure that signals have somewhere to drain.

Yup.

...

Might there be situations where the main thread's mask is changed (for example, if an application wants to handle specific signals on a specific threads), and if so, might not a slightly better behaviour be for the signal-draining thread be to use the mask of the main thread, not the empty, accept-anything mask?

In theory, yes – but in practice, no. Such a usage pattern would be incompatible with GCD (or any other OS provided thread pool / event engine), because the design would require extremely discipline. (Knowing about *all* threads within a process and *carefully* controlling their respective signal masks *all* of the time, etc) davez

Paolo Bonzini

6:43 a.m.

On 07/05/2011 03:03 AM, Dave Zarzycki wrote:

...

On Mac OS X and iOS, some signals are only deliverable on the thread that they were generated from (SIGILL, SIGFPE, SIGBUS, SIGSEGV, etc), therefore, we shouldn't mask those off. Someday, if/when the Mac OS X / iOS kernel supports delivering those signals to any available thread, then we can fully mask every signal on GCD threads.

That would not be POSIX-compliant. Paolo

Daniel A. Steffen

7:16 a.m.

On Jul 4, 2011, at 11:43 PM, Paolo Bonzini wrote:

...

On 07/05/2011 03:03 AM, Dave Zarzycki wrote:

...
On Mac OS X and iOS, some signals are only deliverable on the thread that they were generated from (SIGILL, SIGFPE, SIGBUS, SIGSEGV, etc), therefore, we shouldn't mask those off. Someday, if/when the Mac OS X / iOS kernel supports delivering those signals to any available thread, then we can fully mask every signal on GCD threads.

That would not be POSIX-compliant.

standards compliance is irrelevant to GCD threads, anything that happens in a GCD workitem is by definition outside of the standard (the execution environment was instantiated in a manner not covered by the standard) and cannot be expected to behave in a compliant fashion, and GCD may actually be able to derive important performance benefits from that fact in the future (standards-compliance is expensive).

Mark Heily

8 Jul 8 Jul

12:11 a.m.

On 07/05/2011 02:43 AM, Paolo Bonzini wrote:

...

On 07/05/2011 03:03 AM, Dave Zarzycki wrote:

...
On Mac OS X and iOS, some signals are only deliverable on the thread that they were generated from (SIGILL, SIGFPE, SIGBUS, SIGSEGV, etc), therefore, we shouldn't mask those off. Someday, if/when the Mac OS X / iOS kernel supports delivering those signals to any available thread, then we can fully mask every signal on GCD threads.

That would not be POSIX-compliant.

I think the idea would be that a GCD-based program could request the non-standard behavior, while all other programs will get the standard POSIX behavior by default. On a related note, here's an interesting question about installing a signal handler for SIGSEGV in a libdispatch-based program. I haven't tried this, but does anyone know if it's possible to install a signal handler for SIGSEGV that would be propagated to all of the GCD worker threads? Thanks, - Mark -------- Original Message -------- Subject: Catching SIGSEGV in a libdispatch program Date: Thu, 07 Jul 2011 19:30:50 +0200 From: Julien BLACHE <jb@jblache.org> To: Mark Heily <mark@heily.com> Hi Mark, Just a quick question about catching SIGSEGV in a libdispatch program. Here's the context: under normal circumstances, I'm using a dedicated queue for all logging operations so logging becomes asynchronous. For debugging purposes, logging can be made synchronous, but as you guess, it changes dramatically the execution profile and quite a few things can become unreproducible once there's a central point of synchronization/contention. I'm wondering if it would be possible to catch SIGSEGV and run the logging queue to flush the logs before aborting. It seems difficult if not impossible given SIGSEGV is a thread-directed signal, and I'm not sure there's a way to run a single queue to depletion anyway.

Dave Zarzycki

2:11 a.m.

On Jul 7, 2011, at 5:11 PM, Mark Heily wrote:

...

On a related note, here's an interesting question about installing a signal handler for SIGSEGV in a libdispatch-based program. I haven't tried this, but does anyone know if it's possible to install a signal handler for SIGSEGV that would be propagated to all of the GCD worker threads?

Mark, Yes, that is how signals work. Also, for all *practical* purposes, GCD makes POSIX signals *more* reliable than they normally are. What GCD does is route externally generated signals (SIGHUP, SIGTERM, SIGPIPE, SIGUSR1, etc) away from GCD threads (and if need be, a dedicated signal handling thread). Why does GCD doe this? Because signal handlers are difficult to program correctly. They cause unrelated subsystems to experience spuriously system call failures (EINTR) and deadlocks are easily to trip over due to the fact that signal handlers pause a random thread in order run on that thread's stack. In contrast, GCD has signal event sources that avoid the aforementioned problems. Also, one should keep in mind that one can always use pthread_sigmask() to unmask the signals that GCD masks by default for GCD threads. The rules are documented in dispatch_queue_create(3) under the COMPATIBILITY section: Applications MAY call the following interfaces from a block submitted to a dispatch queue if and only if they restore the thread to its original state before returning: o pthread_setcancelstate() o pthread_setcanceltype() o pthread_setschedparam() o pthread_sigmask() o pthread_setugid_np() o pthread_chdir() o pthread_fchdir() davez

5237

Age (days ago)

5241

Last active (days ago)

List overview

Download

9 comments

5 participants

participants (5)

Daniel A. Steffen
Dave Zarzycki
DrPizza
Mark Heily
Paolo Bonzini