Did you know? eBPF virtual filesystem

This brief description of the eBPF virtual filesystem was initially published in January 2021 by the Cilium community as part of the eBPF Updates #3 on ebpf.io.

eBPF objects, such as a program or a map, reside in kernel memory until they are no longer needed. Internally, the kernel uses reference counters to keep track of the number of “handles” pointing to such objects. When the number of references comes down to zero, the program or the map is destroyed. The references to a program would typically be a hook where the user has attached the program (such as a tc filter or a kernel probe), or file descriptors that were returned from the kernel when loading the program with the bpf() system call. Similarly, references to an eBPF map can be held by eBPF programs using the map or by a user program that retrieved a file descriptor.

As a consequence, if a process loaded an eBPF program without attaching it, the program will be destroyed when the process exits and its file descriptors are closed. There are ways to share file descriptors between processes, but to make it easier to reference eBPF objects between user applications, or simply to make them persistent at a time when they have no reference in the kernel (such as a detached program or an unused map), another mechanism has been created: the eBPF virtual filesystem.

The eBPF virtual (or pseudo) filesystem, often called bpffs, is traditionally mounted at /sys/fs/bpf, but any alternative mount point can work. It is possible to pin objects to this virtual filesystem, which is rendered as file paths. Calling the bpf() system call with its BPF_OBJ_PIN subcommand pins an object. Then, using the BPF_OBJ_GET subcommand on a bpffs path retrieves a file descriptor to this pinned object. Removing a pinned path simply involves a call to unlink(), just like for regular paths. Pinned paths (and the eBPF objects they reference) are not persistent after reboot.

Note that the use of periods (.) in pinned paths is restricted. The glyph has long been unused, but a recent feature introduced it to mark paths to specific eBPF iterators that the system can preload, maps.debug and progs.debug (but let’s keep this for another time). You can have any other character allowed in UNIX names. Yes, /sys/fs/bpf/🐝 is a valid path.

Here is a concrete example. We create an eBPF map with bpftool. Because no program uses the map yet, the only reference created is a file descriptor, which is closed when bpftool exits. To avoid losing the map at this stage, bpftool takes a path name and will use it to pin the map.

# bpftool map create /sys/fs/bpf/🍯 \
        type array key 4 value 32 entries 8 name honeypot
# bpftool --bpffs map show pinned /sys/fs/bpf/🍯
42: array  name foo  flags 0x0
        key 4B  value 32B  max_entries 8  memlock 4096B
        pinned /sys/fs/bpf/🍯

We can then reuse this map when loading a program:

# bpftool prog load bee.o /sys/fs/bpf/🐝 map name honeypot pinned /sys/fs/bpf/🍯

Of course, you do not have to use emojis. More information on the virtual eBPF filesystem is available (although somewhat scattered) in the BPF and XDP Reference Guide. A post called Lifetime of BPF objects, from Alexei Starovoitov, is an excellent resource to learn more about how eBPF objects are managed in the kernel. More information on bpftool usage is available from the dedicated man pages.

Note that there are a few other eBPF objects (BTF, links, iterators) and that some of them are not handled exactly in the same manner. There are also other ways to reference programs and maps, such as references in program array maps or maps of maps.