Notifications with file descriptors
Linux is known for its "everything is a file" mantra inherited from
Unix. This means that a wide range of system resources are represented
by files. Consequently, in a program, these resources are managed as
file descriptors with a common set of primitives such as read()
,
write()
or close()
.
Along with files, block devices, sockets or pipes, Linux also provides a way to manage some notifications with file descriptors. For instance, a program can listen to events like "a file was created in a directory" or "a timer has expired" with file descriptors.
Let's have a look at a few Linux file descriptor based notification mechanisms: inotify, timerfd or signald and a way to combine them: epoll.
Common usage
The file descriptor is returned by an init function. Most of the time, the last function argument is called flags and can be used to change the behaviour. The flags can be set with a bitwise OR. Setting the flags to 0 selects the default behaviour.
A new file descriptor is returned if the initialization is
successful. Otherwise, the function returns -1 and sets the errno
variable.
#include <sys/timerfd.h> int fd = timerfd_create(CLOCK_MONOTONIC, TFD_NONBLOCK | TFD_CLOEXEC); if (fd == -1) { /* timerfd_create() failed; check errno for details. */ }
The notifications can be read from the file descriptor with the
read()
system call. The data returned by read()
contain
information about the notification. The format of these data depends
on the kind of notification.
Timer notification with timerfd
A "file descriptor timer" is created with timerfd_create()
. After
the timer is created, it can be started with timerfd_settime()
. The
timer properties are set using a struct itimerspec
variable that
contains the interval and initial expiration for the timer. Here is
the layout of struct itimerspec
:
Waiting for the timer to expire is just a matter of calling read()
on the timer file descriptor. The buffer provided to read()
is an
uint64_t
integer that read()
fills with the number of timer
expirations that have occurred since the last read()
.
The following program shows timerfd in action:
/* This program prints "Beep!" every second with timerfd. */ #include <inttypes.h> #include <stdio.h> #include <sys/timerfd.h> #include <unistd.h> int main(void) { int fd; struct itimerspec timer = { .it_interval = {1, 0}, /* 1 second */ .it_value = {1, 0}, }; uint64_t count; /* No error checking! */ fd = timerfd_create(CLOCK_MONOTONIC, 0); timerfd_settime(fd, 0, &timer, NULL); for (;;) { read(fd, &count, sizeof(count)); printf("Beep!\n"); } return 0; }
Signal notification with signalfd
The traditional way to handle a signal on Linux is to create a signal
handler with signal()
or sigaction()
. When a signal is received,
the program is interrupted: one of its thread is stopped and the
signal handler function is invoked.
With signalfd()
, no signal handler is invoked when a signal is
received. Instead, the signal notification is read from the
corresponding file descriptor.
Unlike the timerfd API, signalfd uses a single function to create and modify a signal file descriptor:.
int signalfd(int fd, const sigset_t *mask, int flags);
Passing -1
for the fd
argument makes signalfd()
create and
return a new file descriptor. Passing an existing signal file
descriptor allows to modify it.
The signal mask should be built by invoking macros like
sigemptyset()
or sigaddset()
. Moreover, it is important to invoke
sigprocmask()
with SIG_BLOCK
to prevent the signals being handled
by their default signal handler.
When a signal is received, read()
fills one or more
signalfd_siginfo
structures that give information about the signal.
/* This program waits for signals and print their signal number */ /* Send `kill -9` to stop this program. */ #include <signal.h> #include <sys/signalfd.h> #include <stdio.h> #include <unistd.h> int main(void) { int fd; sigset_t sigmask; struct signalfd_siginfo siginfo; /* No error checking! */ sigfillset(&sigmask); sigprocmask(SIG_BLOCK, &sigmask, NULL); fd = signalfd(-1, &sigmask, 0); for (;;) { read(fd, &siginfo, sizeof(siginfo)); printf("Received signal number %d\n", siginfo.ssi_signo); } close(fd); return 0; }
File system notification with inotify
Compared to signalfd and timerfd, inotify is older. It is also more complex, mostly because dealing with file system events is a large task. In my opinion, getting inotify right is tedious but there is no better alternative to get file system notifications on Linux right now.
The inotify API provides several functions that must be combined to monitor file system events. A "inotify file descriptor" is called an "inotify instance". Associated with an inotify instance is a list of directories or files to monitor called "the watch list". Maintaining the watch list as file and directories change is the most difficult part.
An inotify instance is created with inotify_init()
:
int inotify_init(void);
As you can see, the inotify_init()
function does not have a flags
argument. A variant that supports a flags
parameter was added later
as inotify_init1()
. Like dup3()
, pipe2()
or epoll_create1()
,
inotify_init1()
is part of the gang of functions added later to fix
APIs whose behaviour could not be tuned when they were first defined.
After creating an inotify instance, the files and events of interest
should be registered with one or several calls to
inotify_add_watch()
.
File notifications are consumed with a call to read()
that fills one
or several inotify_event
structures. Special attention should be
paid because the size of an inotify event varies depending on its
name
attribute.
Thus, the loop to iterate over inotify events obtained with read()
must take into account the varying length of the event's name:
#define INOTIFY_BUF_LEN (16 * (sizeof(struct inotify_event) + NAME_MAX + 1)) nr = read(inotify_fd, buf, INOTIFY_BUF_LEN); for (p = buf; p < buf + nr; ) { event = (struct inotify_event*) p; p += sizeof(struct inotify_event) + event->len; /* Set p to the next event. */ handle_event(event); }
One file descriptor to rule them all: epoll
To combine several sources of notifications delivered with file descriptors in a single event loop, you can use the epoll API.
Unsurprisingly, a file descriptor must be created -- this time with
epoll_create1()
1 to create an "epoll instance". Then
the file descriptors to monitor must be registered with
epoll_ctl()
. A struct epoll_event
that describes the event must be
passed to epoll_ctl()
.
After that, a call to epoll_wait()
waits for events on the epoll
file descriptor. On return, epoll_wait()
fills one or several
epoll_event
structures. The data.fd
attribute of an epoll_event
indicates which file descriptor is ready.
The following snippet shows how multiple file descriptors can be monitored in a single event loop:
/* Print "Beep!" every second and the number of signals that are received. */ /* Send `kill -9` to stop this program. */ #include <inttypes.h> #include <signal.h> #include <stdio.h> #include <sys/epoll.h> #include <sys/timerfd.h> #include <sys/signalfd.h> #include <unistd.h> int main(void) { int efd, sfd, tfd; struct itimerspec timer = { .it_interval = {1, 0}, /* 1 second */ .it_value = {1, 0}, }; uint64_t count; sigset_t sigmask; struct signalfd_siginfo siginfo; #define MAX_EVENTS 2 struct epoll_event ev, events[MAX_EVENTS]; ssize_t nr_events; size_t i; /* No error checking! */ tfd = timerfd_create(CLOCK_MONOTONIC, 0); timerfd_settime(tfd, 0, &timer, NULL); sigfillset(&sigmask); sigprocmask(SIG_BLOCK, &sigmask, NULL); sfd = signalfd(-1, &sigmask, 0); efd = epoll_create1(0); ev.events = EPOLLIN; ev.data.fd = tfd; epoll_ctl(efd, EPOLL_CTL_ADD, tfd, &ev); ev.events = EPOLLIN; ev.data.fd = sfd; epoll_ctl(efd, EPOLL_CTL_ADD, sfd, &ev); for (;;) { nr_events = epoll_wait(efd, events, MAX_EVENTS, -1); for (i = 0; i < nr_events; i++) { if (events[i].data.fd == tfd) { read(tfd, &count, sizeof(count)); printf("Beep!\n"); } else if (events[i].data.fd == sfd) { read(sfd, &siginfo, sizeof(siginfo)); printf("Received signal number %d\n", siginfo.ssi_signo); } } } return 0; }

Two file descriptors multiplexed with epoll?!
Anna Fasshauer, Happy Humphrey, Jardin des Tuileries during FIAC 2015
-
It is more elegant to use
epoll_create1()
rather thanepoll_create()
because the latter has a confusing prototype with asize
argument which is unused but must be set to a value greater than 0. ↩