Notifications with file descriptors

Christophe Vu-Brugier

2014-11-12

Linux is known for its "everything is a file" mantra inherited from Unix. A wide range of system resources are represented by files. Thus, in programs, system resources are managed as file descriptors with a common set of primitives such as read(), write() or close().

Along with files, block devices, sockets or pipes, Linux also provides a way to manage some notifications with file descriptors. For instance, a program can listen to events like "a file was created in a directory" or "a timer has expired" with file descriptors.

This article discusses several Linux file descriptor based notification mechanisms: inotify, timerfd or signalfd and a way to combine them: epoll.

Common usage

The file descriptor is returned by an init function. Most of the time, the last function argument is called flags and can be used to change the behaviour. Setting the flags to 0 selects the default behaviour, but other flags can be combined with a bitwise OR.

A new file descriptor is returned if the initialization is successful. Otherwise, the function returns -1 and sets the errno variable.

#include <sys/timerfd.h>

int fd = timerfd_create(CLOCK_MONOTONIC, TFD_NONBLOCK | TFD_CLOEXEC);
if (fd == -1) {
    /* timerfd_create() failed; check errno for details. */
}

The notifications can be read from the file descriptor with the read() system call. The data returned by read() contain information about the notification. The format of these data depends on the kind of notification.

Timer notification with timerfd

A "file descriptor timer" is created with timerfd_create(). After the timer is created, it can be started with timerfd_settime(). The timer properties are set using a struct itimerspec variable that describes the interval and initial expiration for the timer. Here is the layout of struct itimerspec:

struct itimerspec layout

Calling read() on the timer file descriptor waits for the timer to expire. The buffer provided to read() is an uint64_t integer that read() fills with the count of timer expirations that occurred since the last read().

The following program shows timerfd in action:

/* This program prints "Beep!" every second with timerfd. */
#include <inttypes.h>
#include <stdio.h>
#include <sys/timerfd.h>
#include <unistd.h>

int main(void)
{
    int fd;
    struct itimerspec timer = {
        .it_interval = {1, 0},  /* 1 second */
        .it_value    = {1, 0},
    };
    uint64_t count;

    /* No error checking! */
    fd = timerfd_create(CLOCK_MONOTONIC, 0);
    timerfd_settime(fd, 0, &timer, NULL);

    for (;;) {
        read(fd, &count, sizeof(count));
        printf("Beep!\n");
    }

    return 0;
}

Signal notification with signalfd

The traditional way to handle a signal on Linux is to create a signal handler with signal() or sigaction(). When a signal is received, the program is interrupted: one of its thread is stopped and the signal handler function is invoked.

With signalfd(), no signal handler is invoked when a signal is received. Instead, the signal notification is read from the corresponding file descriptor.

Unlike the timerfd API, the signalfd API provides a single function to create and change a signal file descriptor:.

int signalfd(int fd, const sigset_t *mask, int flags);

Set fd to -1 to make signalfd() create and return a new file descriptor.
Set fd to an existing file descriptor to change it.

The signal mask is built with macros like sigemptyset() or sigaddset(). Moreover, it is important to invoke sigprocmask() with SIG_BLOCK to block the default signal handler.

When a signal is received, read() fills one or more signalfd_siginfo structures with information about the signal.

/* This program waits for signals and print their signal number */
/* Send `kill -9` to stop this program. */
#include <signal.h>
#include <sys/signalfd.h>
#include <stdio.h>
#include <unistd.h>

int main(void)
{
    int fd;
    sigset_t sigmask;
    struct signalfd_siginfo siginfo;

    /* No error checking! */
    sigfillset(&sigmask);
    sigprocmask(SIG_BLOCK, &sigmask, NULL);
    fd = signalfd(-1, &sigmask, 0);
    for (;;) {
        read(fd, &siginfo, sizeof(siginfo));
        printf("Received signal number %d\n", siginfo.ssi_signo);
    }
    close(fd);

    return 0;
}

File system notification with inotify

Compared to signalfd and timerfd, inotify is older. It is also more complex, mostly because dealing with file system events is difficult. In my opinion, using inotify correctly is tedious but there is no better alternative to get file system notifications on Linux right now.

The inotify API provides several functions that must be combined to watch file system events. A "inotify file descriptor" is called an "inotify instance". Associated with an inotify instance is a list of directories or files to watch called "the watch list". Maintaining the watch list as files and directories change is the most difficult part.

inotify_init() creates an inotify instance:

int inotify_init(void);

inotify_init() has no flags argument. A variant with a flags parameter was added later to the API: inotify_init1().

After creating an inotify instance, the files and events of interest are registered inotify_add_watch().

File notifications are consumed with a call to read() that fills one or several inotify_event structures. Be careful: the size of inotify events depend on the length of their name attribute.

struct inotify_event layout

Thus, the loop to iterate over inotify events obtained with read() must take into account the varying length of the event's name:

#define INOTIFY_BUF_LEN (16 * (sizeof(struct inotify_event) + NAME_MAX + 1))

nr = read(inotify_fd, buf, INOTIFY_BUF_LEN);
for (p = buf; p < buf + nr; ) {
    event = (struct inotify_event*) p;
    p += sizeof(struct inotify_event) + event->len; /* Set p to the next event. */

    handle_event(event);
}

One file descriptor to rule them all: epoll

The epoll API allows to multiplex several sources of notifications delivered with file descriptors in a single event loop.

Unsurprisingly, a file descriptor must be created -- this time with epoll_create1()¹ to create an "epoll instance". Then, the file descriptors to monitor must be registered with epoll_ctl(). A struct epoll_event that describes the event must be passed to epoll_ctl().

After that, a call to epoll_wait() waits for events on the epoll file descriptor. On return, epoll_wait() fills one or several epoll_event structures. The data.fd attribute in epoll_event indicates which file descriptor is ready.

The following snippet shows how multiple file descriptors can be monitored in a single event loop:

/* Print "Beep!" every second and the number of signals that are received. */
/* Send `kill -9` to stop this program. */
#include <inttypes.h>
#include <signal.h>
#include <stdio.h>
#include <sys/epoll.h>
#include <sys/timerfd.h>
#include <sys/signalfd.h>
#include <unistd.h>

int main(void)
{
    int efd, sfd, tfd;
    struct itimerspec timer = {
        .it_interval = {1, 0},  /* 1 second */
        .it_value    = {1, 0},
    };
    uint64_t count;
    sigset_t sigmask;
    struct signalfd_siginfo siginfo;
#define MAX_EVENTS 2
    struct epoll_event ev, events[MAX_EVENTS];
    ssize_t nr_events;
    size_t i;

    /* No error checking! */
    tfd = timerfd_create(CLOCK_MONOTONIC, 0);
    timerfd_settime(tfd, 0, &timer, NULL);

    sigfillset(&sigmask);
    sigprocmask(SIG_BLOCK, &sigmask, NULL);
    sfd = signalfd(-1, &sigmask, 0);

    efd = epoll_create1(0);

    ev.events = EPOLLIN;
    ev.data.fd = tfd;
    epoll_ctl(efd, EPOLL_CTL_ADD, tfd, &ev);

    ev.events = EPOLLIN;
    ev.data.fd = sfd;
    epoll_ctl(efd, EPOLL_CTL_ADD, sfd, &ev);

    for (;;) {
        nr_events = epoll_wait(efd, events, MAX_EVENTS, -1);
        for (i = 0; i < nr_events; i++) {
            if (events[i].data.fd == tfd) {
                read(tfd, &count, sizeof(count));
                printf("Beep!\n");
            } else if (events[i].data.fd == sfd) {
                read(sfd, &siginfo, sizeof(siginfo));
                printf("Received signal number %d\n", siginfo.ssi_signo);
            }
        }
    }
    return 0;
}

Two file descriptors multiplexed with epoll?!
Anna Fasshauer, Happy Humphrey, Jardin des Tuileries during FIAC 2015

It is more elegant to use epoll_create1() rather than epoll_create() because the latter has a confusing prototype with a size argument which is unused but must be set to a value greater than 0. ↩