signalfd is useless

UNIX (well, POSIX) signals are probably one of the worst parts of the UNIX API, and that’s a relatively high bar. A signal can be sent to your program at any point when it’s running, and you have to deal with it somehow. Traditionally, there are three ways of responding to signals (“dispositions”): you can let the default behavior run, which is often to kill the program; you can ignore it; or you can set up a function to be a signal handler.

If you register a signal handler, it’s called in the middle of whatever code you happen to be running. This sets up some very onerous restrictions on what a signal handler can do: it can’t assume that any locks are unlocked, any complex data structures are in a reliable state, etc. The restrictions are stronger than the restrictions on thread-safe code, since the signal handler interrupts and stops the original code from running. So, for instance, it can’t even wait on a lock, because the code that’s holding the lock is paused until the signal handler completes. This means that a lot of convenient functions, including the stdio functions, malloc, etc., are unusable from a signal handler, because they take locks internally. POSIX defines a set of “Async-Signal-Safe” functions that are guaranteed not to rely on locks or non-atomic data structures (because a signal handler could be called between two instructions that update a non-atomic data type). Linux’s signal(7) manual page documents has a list of these functions. But because of the constrained environment, one common approach is to defer the actual work until you’re no longer running in a signal handler. A standard solution is the “self-pipe trick”: open an unnamed pipe, write a byte to it in your signal handler and return, and read from it in the main loop. This makes signal handling resemble the handling of anything else that’s a file descriptor, like network sockets, which normalizes their handling a bit.

There’s another problem with signal handlers: in addition to interrupting user code, they also need to interrupt kernel code so that it gets delivered. So, if you’re in the middle of reading from a network socket, waiting for a timeout, or even closing a file descriptor, that system call gets aborted so that the signal handler can run. By default, once you exit the signal handler, your system call will appear to return early, with an error exit of errno == EINTR (“Interrupted system call”). The intention is that you can restart the call yourself if you want, but in practice, very few libraries do, which causes confusing errors for application developers. (Extra credit: explain why the linked solution is wrong.) To avoid these sorts of problems, BSD added a mechanism to mark signal handlers as restartable, which was standardized as the SA_RESTART flag to the sigaction system call. But that does not reliably cause system calls to continue instead of returning EINTR: as documented later in signal(7), Linux will return EINTR in some cases even when a system call is interrupted by an SA_RESTART handler. (The rationale for many of these makes sense—e.g., timeouts where restarting the call would start a timer from zero—but understanding the problem doesn’t make it any less of a problem.)

So, in 2007, Linux gained an API called signalfd, which lets you create a file descriptor that notifies on signals. The idea is that you can avoid the complexity of the self-pipe trick, as well as any problems with EINTR, by just asking the kernel to send you signals via a file descriptor in the first place. You don’t need to register a signal handler, so everything works perfectly… except that signalfd doesn’t actually change how signal dispositions work. If you only create a signalfd, signals still get delivered via the the signal-handling mechanism, so in order to actually get the signal to get delivered over the file descriptor, you need to suspend normal processing of the system. There’s another part of the signal API that lets you set a mask to block the signal: the intention is that you can remove the mask later, and then pending signals will get delivered. But this also means that pending signals are readable from the signalfd, and they’re removed from the queue once read.

This would work fine if signal masks applied only to your current process, but they also get inherited to children. So if you’re masking a signal like SIGINT in order to receive it via signalfd, and you run a subprocess, then that child process will start up with SIGINT masked. And while programs could reset their own signal masks when they start up, in practice, no software does. So you have to be very careful to reset any masked signals before starting a child process, and unfortunately you need to do this yourself: most ways of starting child processes, including the standard libc system(3) function, do not reset handlers or masks.

There’s another problem with masking signals, which is that standard UNIX signals are permitted to coalesce when they queue. Namely, if you have a mask that’s blocking a signal, and that signal gets delivered twice before you proces the first one from the queue, it only gets reported to you once. For something like SIGINT, this might be okay: probably you don’t need to handle multiple Ctrl-C keystrokes differently from a single one. For something like SIGCHLD, which notifies you that a child process has terminated, this is quite a bit more unfortunate. It now means that you know that at least one child process has terminated. And while signals can have associated info (via the SA_SIGINFO flag to sigaction, or by default for signalfd), and SIGCHLD’s siginfo tells you which process has terminated, if it gets coalesced, you only get one set of info. That is, you know that the specified child process has terminated, but also one or more other unspecified child processes could have terminated and the information about which one has been lost. Not only does this apply to unmasking signals and letting them get delivered to a normal signal handler, it also applies to reading signals from a signalfd, since signalfd works by dequeuing signals.

It turns out that there is, at least in theory, a better way to handle this. Just go back to the old self-pipe trick, but install the handler with the SA_NODEFER flag, which allows signal handlers to interrupt themselves. ~~This way you reliably get one handler called per signal.~~ [EDIT: This is wrong, see below.] This brings back the EINTR problem, but there’s a simple solution to that: simply dedicate a separate thread to signal handling, and make sure all signals get routed there. Then no system calls on any other thread will get interrupted. The only way to ensure this is to mask signals on every other thread, but that’s no worse than the status quo with signalfd, since we already had to mask signals in order to get them delivered to signalfd alone. As a bonus, this is portable to other operating systems that don’t have signalfd, so this is the solution I’m planning on implementing for the MIO cross-platform event-handling library for Rust.

Can signalfd be fixed? I think it can, if we explicitly tie signalfd into the signal-disposition mechanism. If signalfd could claim responsibility for signal delivery, instead of requiring that signals be masked or ignored in addition to using signalfd, this would solve both problems. For inheritance, just set the close-on-exec flag on the signalfd, and signals will go back to the default behavior in child processes, once the fd no longer exists. For multiple delivery, because signalfd no longer interacts with signal queueing, it can just be specfied to send one message per signal. Alternatively, a mechanism to mask or ignore a signal that applies to the current process only would also be an improvement. These proposals would be a change to POSIX signal semantics, but fundamentally so is signalfd, and the only way to get clean handling of signals is to avoid the POSIX API, not work within it.

EDIT: Greg Hudson pointed out to me that even traditional handlers can coalesce signals. If a signal is sent twice before the signal-handling thread is scheduled, it still gets coalesced, even if it's destined for a signal handler with SA_NODEFER. So while a signal-handling thread can get you separate siginfo most of the time, it still can’t do so reliably. I’ve written an example of this behavior that uses vfork to prevent the process from being scheduled: it starts multiple children, but only prints one notification. So my claim that there’s a better alternative to signalfd is wrong, but it’s still true that signalfd doesn’t solve any of the other troubles with signal handling: it just saves you a separate thread.

Thanks to Nelson Elhage for pointing me at the LWN article, and Alex Chernyakhovsky, David Benjamin, and Yehuda Katz for discussions about all the unfortunate parts of signal handling.

17 May 2015