Signal Handling and What Bash Does to It
If you are looking for a gentle introduction to signals on linux or a
tutorial on using trap
to handle signals in bash script, this
article is not for you. This article
signal handling in linux
serves as a good introduction. Instead, I will recount my encounter
with a rare bug when testing signal handling in C inside bash driver
script.
Signal handling is hard since it is asynchronous by nature, and it is going to be a lot messier when multithreading is involved. Luckily, I am only doing single threaded programs here. Remember, man pages are your best friends when coding in C, and signal(7) is your savior.
Signal Handling Program in C
Here is a simple C program that ignores signal 1 (SIGHUP
) and
handles signal 15 (SIGTERM
) with a custom handler using
signal(2),
then waits for signal with
pause(2).
/* sig.c */
#include <unistd.h>
#include <stdio.h>
#include <signal.h>
void handler(int sig)
{
const char msg[] = "SIGTERM caught\n";
if (SIGTERM == sig) {
/* do not use printf inside signal handler */
write(STDERR_FILENO, msg, sizeof(msg)-1);
}
}
int main()
{
/* maybe use sigaction instead */
signal(SIGHUP, SIG_IGN);
signal(SIGTERM, handler);
pause();
return 0;
}
An important detail that many people overlook when handling signals in
C is that it is not safe to use printf
inside a signal handler! Here
is a short answer from a stackoverflow:
The primary problem is that if the signal interrupts malloc() or some similar function, the internal state may be temporarily inconsistent while it is moving blocks of memory between the free and used list, or other similar operations. If the code in the signal handler calls a function that then invokes malloc(), this may completely wreck the memory management.
I will not give more details here in this brief post. But these articles are great references that you should checkout.
Then I compile the program with gcc, and run it in bash interactively, in which case everything works perfectly.
$ gcc sig.c
$ ./a.out &
[1] 3218
$ kill -TERM 3209
SIGTERM caught
[1]+ Done ./a.out
As you can see, the program works as expected by writing to stderr
and exit normally when I send SIGTERM
to it. Also the program did
ignore SIGHUP
, and was interrupted on SIGINT
, which I did not
handle, shown as follows.
$ ./a.out &
[1] 3209
$ kill -HUP 3209
$ kill -INT 3209
[1]+ Interrupt ./a.out
Bash Driver Script
But when I put my commands in a bash script sig.sh
, weird things
happened.
#!/bin/bash
set -e
./sig &
pid=$!
echo $pid
# if we don't sleep, ./sig process might be sent SIGHUP before execve
sleep 1
echo "send SIGHUP to process $pid"
# send SIGHUP to process
kill -HUP $pid || exit 1
# signal propagation might take time
sleep 1
# check whether process is still alive
kill -0 $pid || exit 1
# process should still be alive since we ignored SIGHUP
echo "send SIGINT to process $pid"
# send SIGINT to process
kill -INT $pid || exit 1
# signal propagation might take time
sleep 1
# check whether process is still alive
kill -0 $pid && { echo "process $pid still alive"; }
Some explanation:
$!
is a special bash variable which stores the PID of last job run in background.kill -0 <PID>
is used to check for the existence of a process ID. Thiskill
invoked from bash should be a shell builtin rather than/usr/bin/kill
. See job control builtinkill
for bash(1) and kill(2) for more detail. Here is the relevant quote fromkill(2)
.
If sig is 0, then no signal is sent, but error checking is still performed; this can be used to check for the existence of a process ID or process group ID.
Now execute the script:
$ ./sig.sh
3418
send SIGHUP to process 3418
send SIGINT to process 3418
process 3418 still alive
$ pstree -s -p 3418 # -s shows parent processes
systemd(1)───sig(3418)
./sig
process is still alive after SIGINT
and it became an orphan
process after the bash script process dies.
But wait, why is the process still alive after SIGINT
? sig.c
program does not handle SIGINT
, and by default SIGINT
terminates
the process. So what happened here?
Strace to the Rescue
Now I turn to another best friend of ours,
strace(1).
strace
traces system calls and signals and print it in a quite human
readable format. It's even more readable than my source code!
$ strace -f -b execve -o sig.strace ./sig.sh
-f
option makes strace
trace child process of ./sig.sh
process
as well. -b execve
makes strace
detach from traced process when
execve
is reached. -o sig.strace
writes the output of strace
to
the file sig.strace
. I have reproduced the relevant content of
sig.strace
below.
3814 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fb0a8961a10) = 3815
3815 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
3815 rt_sigaction(SIGTSTP, {SIG_DFL, [], SA_RESTORER, 0x7fb0a7f93250}, {SIG_DFL, [], 0}, 8) = 0
3815 rt_sigaction(SIGTTIN, {SIG_DFL, [], SA_RESTORER, 0x7fb0a7f93250}, {SIG_DFL, [], 0}, 8) = 0
3815 rt_sigaction(SIGTTOU, {SIG_DFL, [], SA_RESTORER, 0x7fb0a7f93250}, {SIG_DFL, [], 0}, 8) = 0
3815 rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7fb0a7f93250}, {SIG_DFL, [], SA_RESTORER, 0x7fb0a7f93250}, 8) = 0
3815 rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x7fb0a7f93250}, {SIG_IGN, [], SA_RESTORER, 0x7fb0a7f93250}, 8) = 0
3815 rt_sigaction(SIGCHLD, {SIG_DFL, [], SA_RESTORER|SA_RESTART, 0x7fb0a7f93250}, {0x441200, [], SA_RESTORER|SA_RESTART, 0x7fb0a7f93250}, 8) = 0
3815 open("/dev/null", O_RDONLY) = 3
3815 dup2(3, 0) = 0
3815 close(3) = 0
3815 rt_sigaction(SIGINT, {SIG_IGN, [], SA_RESTORER, 0x7fb0a7f93250}, {SIG_DFL, [], SA_RESTORER, 0x7fb0a7f93250}, 8) = 0
3815 rt_sigaction(SIGQUIT, {SIG_IGN, [], SA_RESTORER, 0x7fb0a7f93250}, {SIG_DFL, [], SA_RESTORER, 0x7fb0a7f93250}, 8) = 0
3815 execve("./sig", ["./sig"], [/* 24 vars */] <detached ...>
So between fork
(clone
actually) and execve
, a lot of things
happended here. In particular, SIGINT
signal handler has been
installed on the process as shown below. It has been set to SIG_IGN
,
which ignores SIGINT
entirely.
3815 rt_sigaction(SIGINT, {SIG_IGN, [], SA_RESTORER, 0x7fb0a7f93250}, {SIG_DFL, [], SA_RESTORER, 0x7fb0a7f93250}, 8) = 0
Now I found the culprit, it's bash! Bash inserted multiple signal
handlers after we forked, and that's why SIGINT
cannot terminate
the ./sig
program.
So If I want to restore default SIGINT
handler to my program, I just
need to add signal(SIGINT, SIG_DFL);
to the top of the main function
in my sig.c
program. Problem solved!
Conclusion
Bash installs signal handler for subcommands executed in bash scripts. This may not be the desired behavior for a low level signal handling C program, so do not rely on a signal handler being the default one, but always set it to the default signal handler explicitly.
Here the bash documentation
specifies that SIGINT
is ignored when executed as a asynchronous
command, which is the case in my example above.
Non-builtin commands started by Bash have signal handlers set to the values inherited by the shell from its parent. When job control is not in effect, asynchronous commands ignore SIGINT and SIGQUIT in addition to these inherited handlers. Commands run as a result of command substitution ignore the keyboard-generated job control signals SIGTTIN, SIGTTOU, and SIGTSTP.
Also bash source code provides a good insight on how this is done in bash's C implementation.