Understanding tc “direct action” mode for BPF
- A “direct-action” Mode for TC
- Back to the Basics
- Hooking eBPF Programs
- Concept of the “direct-action”
- The “clsact” Qdisc
- Example Usage
- The Code
- Conclusion
- References
This post was left aside as a draft for a long time. Most of it was written in
August 2016, at a time when no documentation for the direct-action
mode for
TC was available. It is probably less relevant today, but I publish it in case
it might help readers understand a bit more how this flag works.
A “direct-action” Mode for TC
The Linux Traffic Control subsystem, or “TC” for short, has been in the kernel
for years, and yet it is still under active development. A major addition
occurred with kernel version 4.1, when new hooks were added to run eBPF
programs as TC “classifiers” (also known as “filters”) or “actions”. About six
months later, alongside kernel 4.4, iproute2 got a curious direct-action
mode, that has been little documented…1
Back to the Basics
Before we see what direct-action
can be used for, we need a short reminder
about the classic usage of traffic control on Linux. Effective traffic control
happens in the kernel: different algorithms can be used to throttle or
prioritise some flows on an interface. When a user wants to set up traffic
control, they usually rely on tc
utility, from iproute2 package, which is the
user-part counterpart of the kernel TC subsystem, and communicate with the
latter through Netlink messages (mostly).
TC is a powerful, yet complex framework (and it is somewhat documented). It relies on the notions of “queueing disciplines” (qdiscs), “classes”, “classifiers” (filters) and actions. A very simplified description might be the following:
- The user defines a qdisc, a shaper that applies a specific policy to different classes of traffic. The qdisc is attached to a network interface (ingress or egress).
- The user defines classes of traffic, and attach them to the qdisc.
- Filters are attached to the qdisc. They are used to classify the traffic
intercepted on this interface, and to dispatch the packets into the
different classes. A filter is run on every packet, and it can return
one of the following values:
- 0, which denotes a mismatch (for the default class configured for this filter). Next filters, if any, are run on the packet.
- -1, which denotes the default classid configured for this filter,
- any other value will be considered as the class identifier refering to the class where the packet should be sent, thus allowing for non-linear classification.
- Additionally, an action to be applied to all matching packets can be added to a filter. For example, selected packets could be dropped, or mirrored on another network interface, etc.
- New nested qdiscs can be attached to the classes, and receive classes in their turn. The complete policy diagram is in fact a tree spanning under the root qdisc. But we do not need this information for the rest of the article.
An example of this workflow could be the following example (inspired from the documentation of the HTB shaper)
tc qdisc add dev eth0 root handle 1: htb default 11
tc class add dev eth0 parent 1: classid 1:1 htb rate 100kbps ceil 100kbps
tc class add dev eth0 parent 1:1 classid 1:10 htb rate 30kbps ceil 100kbps
tc class add dev eth0 parent 1:1 classid 1:11 htb rate 10kbps ceil 100kbps
tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 \
match ip src 1.2.3.4 match ip dport 80 0xffff flowid 1:10
tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 \
match ip src 1.2.3.4 action drop
In this setup, packets with source IP address 1.2.3.4
and with L4
destination port 80 (HTTP) will be sent to the first queue, i.e. the class
with a rate of 30 kbps. Packets from same host, to different ports, are
dropped. All other packets go to the second queue, with 10 kbps rate.
Hooking eBPF Programs
Enters eBPF (extended Berkeley Packet Software). Basically, it consists in a restricted assembly-like language that can be used to produce programs run in the kernel in a safe fashion. They can be hooked at several points in the kernel, mostly for packet processing or for monitoring tasks. Two of these hooks are related to TC: eBPF programs, since kernel 4.1, can be attached as classifiers or actions.
As classifiers, eBPF brings more flexibility for parsing programs, and even allows stateful processing or interaction with user-space via specific data structures called maps. But in the end, the classifiers remain the same: they return a value that can be 0 for a mismatch, -1 for a match, or any other class identifier.
Used as actions, eBPF programs behave differently, their possible return values
indicating what action should actually be performed on the packet (description
from tc-bpf(2)
manual page):
TC_ACT_UNSPEC (-1)
: Use the default action configured fromtc
(similarly as returning -1 from a classifier).TC_ACT_OK (0)
: Terminate the packet processing pipeline and allows the packet to proceed.TC_ACT_RECLASSIFY (1)
: Terminate the packet processing pipeline and start classification from the beginning.TC_ACT_SHOT (2)
: Terminate the packet processing pipeline and drops the packet.TC_ACT_PIPE (3)
: Iterate to the next action, if available.- And a few others. They are defined in file include/uapi/linux/pkt_cls.h of the kernel tree. The BPF and XDP Reference Guide from Cilium gives more details on their usage.
- Values not defined in that file are unspecified return codes.
Concept of the “direct-action”
Although eBPF has restrictions—it has a limited number of instructions, and only bounded loops are allowed, for example—it provides a powerful language for packet processing. A consequence is that for a number of use cases, eBPF classifiers alone are enough to filter and process the packets, and do not need additional qdiscs or classes to be attached to them. This is particularly true when packets should be filtered (passed, or dropped) at the TC interface level. Classifiers do need, however, an additional action to actually drop the packets: the value returned by a classifier cannot be used to tell the system to drop a packet.
To avoid to add such simple TC actions and to simplify those use cases where
the classifier does all the work, a new flag was added to TC for eBPF
classifiers: direct-action
, also available as da
for short. This flag, used
at filter attach time, tells the system that the return value from the
filter should be considered as the one of an action instead. This
means that an eBPF program attached as a TC classifier can now return
TC_ACT_SHOT
, TC_ACT_OK
, or another one of the reserved values. And it is
interpreted as such: no need to add another TC action object to drop or
mirror the packet. In terms of performance, this is also more efficient,
because the TC subsystem no longer needs to call into an additional action
module external to the kernel.
For using TC with eBPF, using the direct-action
flag is the simplest, the
fastest, and is now recommended way to go.
What about the TC eBPF actions then? Could not they be used to achieve the same result in the first place, to process the packet and return the correct “pass” or “drop” value? The answer is negative: actions are not attached directly to a qdisc, they are only used after a packet has been through a classifier, which means you need a classifier anyway. But then, does this mean that TC eBPF actions are now useless? Well, eBPF actions could still be used after other filters. We can imagine attaching a u32 classifier on a qdisc to proceed to a filtering based on some bit fields in the packets, then drop the packets if they meet a given additional condition. eBPF action would work in that case. But honestly, the use cases I have seen usually just use eBPF both for filtering and returning the action, without the need for additional filters.
Yet another question about the change of signification for the return values:
does this mean that when used with the direct-action
flag, the eBPF
classifier loses its ability to indicate to what class the packet should be
dispatched? Once more, the answer is “no”: the special field tc_classid
of
the struct __skb_buff
which is passed as the only argument to the filter
program can be used, instead of the program return value, to tell the system
where to send the packet when the program allows it to pass.
The “clsact” Qdisc
A few months after the direct-action
mode was introduced into the kernel and
iproute2, a new qdisc appeared in Linux 4.5: clsact
. It is similar to the
ingress
qdisc, to which we can attach eBPF programs with the
direct-action
mode, and which does not perform any queuing. But clsact
acts
as a superset of ingress
, in the sense that it also allows to attach programs
with direct-action
on the egress path, which was not possible before. It is
the recommended qdisc for attaching eBPF programs in direct-action
mode.
More details on the clsact
qdisc are available in the relevant commit
log
or from the Cilium Guide.
Example Usage
Let’s write a sample program that filter packets, possibly modify their data, and pass or drop them, depending on the policy.
#include <linux/bpf.h>
#include <linux/if_ether.h>
#include <linux/pkt_cls.h>
#include <linux/swab.h>
int classifier(struct __sk_buff *skb)
{
void *data_end = (void *)(unsigned long long)skb->data_end;
void *data = (void *)(unsigned long long)skb->data;
struct ethhdr *eth = data;
if (data + sizeof(struct ethhdr) > data_end)
return TC_ACT_SHOT;
if (eth->h_proto == ___constant_swab16(ETH_P_IP))
/*
* Packet processing is not implemented in this sample. Parse
* IPv4 header, possibly push/pop encapsulation headers, update
* header fields, drop or transmit based on network policy,
* collect statistics and store them in a eBPF map...
*/
return process_packet(skb);
else
return TC_ACT_OK;
}
Let’s compile our program with clang/LLVM:
$ clang -O2 -emit-llvm -c foo.c -o - | \
llc -march=bpf -mcpu=probe -filetype=obj -o foo.o
Now let’s load it. We first need a qdisc to attach the filter to.
# tc qdisc add dev eth0 clsact
# tc filter add dev eth0 ingress bpf direct-action obj foo.o sec .text
# tc filter show dev eth0
$ tc filter show dev eth0 ingress
filter protocol all pref 49152 bpf chain 0
filter protocol all pref 49152 bpf chain 0 handle 0x1 foo.o:[.text] direct-action not_in_hw id 11 tag ebe28a8e9a2e747f
The eBPF program loaded from foo.o
appears on the second line of output. Note
the mention of the direct-action
flag. This program is enough to run both
classification and action selection on the traffic.
Remove with:
# tc qdisc del dev eth0 clsact
The Code
The kernel support for the flag was added in commit 045efa82ff56, with the following log:
cls_bpf: introduce integrated actions
Often cls_bpf classifier is used with single action drop attached.
Optimize this use case and let cls_bpf return both classid and action.
For backwards compatibility reasons enable this feature under
TCA_BPF_FLAG_ACT_DIRECT flag.
Then more interesting programs like the following are easier to write:
int cls_bpf_prog(struct __sk_buff *skb)
{
/* classify arp, ip, ipv6 into different traffic classes
* and drop all other packets
*/
switch (skb->protocol) {
case htons(ETH_P_ARP):
skb->tc_classid = 1;
break;
case htons(ETH_P_IP):
skb->tc_classid = 2;
break;
case htons(ETH_P_IPV6):
skb->tc_classid = 3;
break;
default:
return TC_ACT_SHOT;
}
return TC_ACT_OK;
}
In particular, it adds the following chunk to function cls_bpf_classify()
in
net/sched/cls_bpf.c
(the boolean prog->exts_integrated
is set if the
direct-action
flag was passed):
if (prog->exts_integrated) {
res->class = prog->res.class;
res->classid = qdisc_skb_cb(skb)->tc_classid;
ret = cls_bpf_exec_opcode(filter_res);
if (ret == TC_ACT_UNSPEC)
continue;
break;
}
So with the direct-action
mode, the classid
is retrieved from the
tc_classid
field accessible to the eBPF program through the struct __sk_buff
*skb
context, instead of being set to the return value from the classifier.
This return value is assigned to the ret
value instead, and we break from the
loop, whereas if the direct-action
mode was not used, we would have
executed ret = tcf_exts_exec(skb, &prog->exts, res);
to call into the
relevant action module and find the action to perform on the packet.
The iproute2 counterpart was added in commit faa8a463002f, and adds the use of
the da|direct-action
flag on the tc
command line.
Conclusion
Hopefully this post helped you understand what this da
flag appearing on tc
commands mean, and why it is relevant for eBPF programs. As of this publishing,
the direct-action
mode is not only the recommended way to use eBPF programs
with TC, but also the only one that people use in practice, as far as I know.
eBPF is extremely flexible and powerful, and it is only sensible to use it both
for filtering packets and returning the code of the action to perform. It is
also easier (no action to add) and more performant. Use it, and have fun with
TC and eBPF!
References
- On getting tc classifier fully programmable with cls_bpf (Daniel Borkmann, netdev 1.1, Sevilla, February 2016)
- Linux kernel commit 045efa82ff56: cls_bpf: introduce integrated actions (Daniel Borkmann and Alexei Starovoitov, September 2015)
- Linux kernel commit 1f211a1b929c net, sched: add clsact qdisc (Daniel Borkmann, January 2016)
- iproute2 commit faa8a463002f: f_bpf: allow for optional classid and add flags (Daniel Borkmann, September 2015)
- iproute2 commit 8f9afdd53156: tc, clsact: add clsact frontend (Daniel Borkmann, January 2016)
-
There was no documentation for the
direct-action
mode other than the commit logs when I drafted this article. But by the time I published it, the Cilium Guide referenced it, and thetc-bpf(8)
manual page also provides a brief description, stating that the mode “instructs eBPF classifier to not invoke external TC actions, instead use the TC actions return codes (TC_ACT_OK
,TC_ACT_SHOT
etc.) for classifiers.” ↩