Did you know? eBPF virtual filesystem
This brief description of the eBPF virtual filesystem was initially published in January 2021 by the Cilium community as part of the eBPF Updates #3 on ebpf.io.
eBPF objects, such as a program or a map, reside in kernel memory until they
are no longer needed. Internally, the kernel uses reference counters to keep
track of the number of βhandlesβ pointing to such objects. When the number of
references comes down to zero, the program or the map is destroyed. The
references to a program would typically be a hook where the user has attached
the program (such as a tc filter or a kernel probe), or file descriptors that
were returned from the kernel when loading the program with the bpf()
system
call. Similarly, references to an eBPF map can be held by eBPF programs using
the map or by a user program that retrieved a file descriptor.
As a consequence, if a process loaded an eBPF program without attaching it, the program will be destroyed when the process exits and its file descriptors are closed. There are ways to share file descriptors between processes, but to make it easier to reference eBPF objects between user applications, or simply to make them persistent at a time when they have no reference in the kernel (such as a detached program or an unused map), another mechanism has been created: the eBPF virtual filesystem.
The eBPF virtual (or pseudo) filesystem, often called bpffs, is traditionally
mounted at /sys/fs/bpf
, but any alternative mount point can work. It is
possible to pin objects to this virtual filesystem, which is rendered as file
paths. Calling the bpf()
system call with its BPF_OBJ_PIN
subcommand pins
an object. Then, using the BPF_OBJ_GET
subcommand on a bpffs path retrieves a
file descriptor to this pinned object. Removing a pinned path simply involves a
call to unlink()
, just like for regular paths. Pinned paths (and the eBPF
objects they reference) are not persistent after reboot.
Note that the use of periods (.
) in pinned paths is restricted. The glyph has
long been unused, but a recent feature introduced it to mark paths to specific
eBPF iterators that the system can preload, maps.debug
and progs.debug
(but
letβs keep this for another time). You can have any other character allowed in
UNIX names. Yes, /sys/fs/bpf/π
is a valid path.
Here is a concrete example. We create an eBPF map with bpftool. Because no program uses the map yet, the only reference created is a file descriptor, which is closed when bpftool exits. To avoid losing the map at this stage, bpftool takes a path name and will use it to pin the map.
# bpftool map create /sys/fs/bpf/π― \
type array key 4 value 32 entries 8 name honeypot
# bpftool --bpffs map show pinned /sys/fs/bpf/π―
42: array name foo flags 0x0
key 4B value 32B max_entries 8 memlock 4096B
pinned /sys/fs/bpf/π―
We can then reuse this map when loading a program:
# bpftool prog load bee.o /sys/fs/bpf/π map name honeypot pinned /sys/fs/bpf/π―
Of course, you do not have to use emojis. More information on the virtual eBPF filesystem is available (although somewhat scattered) in the BPF and XDP Reference Guide. A post called Lifetime of BPF objects, from Alexei Starovoitov, is an excellent resource to learn more about how eBPF objects are managed in the kernel. More information on bpftool usage is available from the dedicated man pages.
Note that there are a few other eBPF objects (BTF, links, iterators) and that some of them are not handled exactly in the same manner. There are also other ways to reference programs and maps, such as references in program array maps or maps of maps.