powered by Caper
authored by Hyunsuk Bang
While tcpdump and libpcap already generate BPF code from pcap expressions, Caper offers some unique features.
Pcap expressions are higher-level languages used to express packet filters, but they often carry hidden ambiguities. Caper's remarkable feature lies in fully expanding pcap expressions and removing ambiguities to provide users with crystal-clear explanations about pcap expressions.
$ ./caper.byte -q -p -e "tcp or udp"
ether proto \ip &&
(ip proto \tcp || ip proto \udp) ||
ether proto \ip6 &&
(ip6 proto \tcp || ip6 proto \udp)
Through Caper, the pcap expression "tcp or udp" is shown to actually filter TCP or UDP packets on top of IP or IPv6 headers.
Further details about these expansions can be found from 'What we talk about when we talk about pcap expressions' by Nik Sultana.
Caper's ingenious contribution from Marelle León allows users to express pcap expressions in plain English, making it easier for newcomers to comprehend network filters.
$ ./caper.byte -engl-out -q -e "ip host 192.168.0.2"
IPv4 that has a host of 192.168.0.2
$ ./caper.byte -engl-out -q -e "tcp port 80 or 443"
tcp that has a port which is one of [80, 443]
Caper provides a compiler for BPF.
$ ./caper.byte -BPF_optimized -q -p -e "tcp or udp"
(000) ldh [12]
(001) jeq #0x800 jt 2 jf 5
(002) ldb [23]
(003) jeq #0x6 jt 13 jf 4
(004) jeq #0x11 jt 13 jf 14
(005) jeq #0x86dd jt 6 jf 14
(006) ldb [20]
(007) jeq #0x6 jt 13 jf 8
(008) jeq #0x2c jt 9 jf 11
(009) ldb [54]
(010) jeq #0x6 jt 13 jf 12
(011) jeq #0x11 jt 13 jf 14
(012) jeq #0x11 jt 13 jf 14
(013) ret #262144
(014) ret #0
Impressive! But you might wonder why bother with Caper when we already have libpcap doing the job. Well, it's true that libpcap has been around for a while, but it's slowly turning into legacy code, making contributors hesitant to engage with it. Caper is more flexible to embrace changes and improve continuously. We welcome feedback and actively consider user requests to refine our BPF capabilities. Caper introduces exciting features that enhance libpcap's BPF functionality.
IPv6 support is vital in today's network landscape, but libpcap occasionally stumbles in handling certain expressions. For instance, consider the following tcpdump expression:
$ tcpdump -d "ip6 and tcp[tcpflags]=tcp-ack"
tcpdump: expression rejects all packets
This seemingly straightforward expression surprisingly rejects all packets. Caper gracefully addresses such edge cases. With Caper, you can now generate functional BPF codes for IPv6 expressions, opening up new possibilities for filtering IPv6 traffic:
$ ./caper.byte -BPF_optimized -q -p -e "ip6 and tcp[tcpflags]=tcp-ack"
(000) ldh [12]
(001) jeq #0x86dd jt 2 jf 7
(002) ldb [20]
(003) jeq #0x6 jt 4 jf 7
(004) ldb [67]
(005) jeq #0x10 jt 6 jf 7
(006) ret #262144
(007) ret #0
$ ./caper.byte -BPF_optimized -q -p -e "ip6 and (udp port 546 or udp port 547) and (udp[8] == 7)"
(000) ldh [12]
(001) jeq #0x86dd jt 2 jf 13
(002) ldb [20]
(003) jeq #0x11 jt 4 jf 13
(004) ldh [54]
(005) jeq #0x222 jt 10 jf 6
(006) jeq #0x223 jt 10 jf 7
(007) ldh [56]
(008) jeq #0x222 jt 10 jf 9
(009) jeq #0x223 jt 10 jf 13
(010) ldb [62]
(011) jeq #0x7 jt 12 jf 13
(012) ret #262144
(013) ret #0
'Protochain' is specially designed for checking arbitrary numbers of extension headers in IPv4 or IPv6. Since protochain needs to check an 'arbitrary' number of extension headers, it forces the creation of loops within BPF. Caper resolves this issue by flattening the recursion at compile time. This ensures your BPF codes comply with kernel constraints, preventing packet processing from being forced into user space.
Consider the following tcpdump expression with a loop-inducing protochain:
$ tcpdump -i en0 -d "ip6 protochain 6"
(000) ldh [12]
(001) jeq #0x86dd jt 2 jf 35
(002) ldb [20]
(003) ldx #0x28
(004) jeq #0x6 jt 32 jf 5
(005) jeq #0x3b jt 32 jf 6
(006) jeq #0x0 jt 10 jf 7
...(omitted for brevity)
(014) mul #8
(015) add x
(016) tax
(017) ld M[0]
(018) ja 4
(019) jeq #0x33 jt 20 jf 32
...(omitted for brevity)
(029) tax
(030) ld M[0]
(031) ja 4
(032) add #0
(033) jeq #0x6 jt 34 jf 35
(034) ret #524288
(035) ret #0
Caper resolves this issue with the -max_rec flag, allowing you to specify the maximum number of recursions:
$ ./caper.byte -BPF_optimized -max_rec 2 -q -e "ip6 protochain 6"
(000) ldh [12]
(001) jeq #0x86dd jt 2 jf 88
(002) ldx #0x28
(003) ldb [20]
(004) jeq #0x6 jt 87 jf 5
(005) jeq #0x29 jt 6 jf 13
(006) ldb [x + 20]
(007) st M[15]
(008) ld #0x28
(009) add x
(010) tax
(011) ld M[15]
(012) jeq #0x6 jt 87 jf 13
(013) jeq #0x4 jt 14 jf 23
(014) ldb [x + 23]
(015) st M[15]
(016) ldb [x + 14]
(017) and #0xf
(018) mul #0x4
(019) add x
(020) tax
(021) ld M[15]
(022) jeq #0x6 jt 87 jf 23
(023) jeq #0x33 jt 24 jf 33
(024) ldb [x + 14]
...(truncated for brevity)
Within the realm of Caper, protochain finds another strategic attachment point – following specific ICMP or ICMPv6 error messages. The role of ICMP and ICMPv6 error messages is to encapsulate fragments of the original packets responsible for triggering errors. This assortment of error messages encompasses scenarios such as destination unreachability, deprecated source quenching (exclusive to ICMP), time exceeded, parameter problems, oversized packets (specific to ICMPv6), and redirection. Managing protochain in the context of ICMP is a relatively straightforward task. ICMP conveys the original IP header and an 8-byte segment of the IP datagram. However, complexity arises when dealing with ICMPv6. ICMPv6 operates on a different level of flexibility, allowing error messages of variable lengths to be accommodated, expanding until the Maximum Transfer Unit (MTU) is reached. This unique attribute positions ICMPv6 error messages as an ideal platform for protochain examination.
Observe the following examples:
$ ./caper.byte -not_expand -BPF_optimized -max_rec 3 -q -e "ether proto \ip6 && ip6 protochain 58 && icmp6 protochain 17"
(000) ldh [12]
(001) jeq #0x86dd jt 2 jf 274
(002) ldx #0x28
(003) ldb [20]
(004) jeq #0x3a jt 128 jf 5
(005) jeq #0x29 jt 6 jf 13
(006) ldb [x + 20]
(007) st M[15]
(008) ld #0x28
(009) add x
(010) tax
(011) ld M[15]
(012) jeq #0x3a jt 128 jf 13
...(omitted for brevity)
(259) jeq #0x11 jt 273 jf 260
(260) jeq #0x0 jt 264 jf 261
(261) jeq #0x3c jt 264 jf 262
(262) jeq #0x2b jt 264 jf 263
(263) jeq #0x2c jt 264 jf 274
(264) ldb [x + 14]
(265) st M[15]
(266) ldb [x + 15]
(267) add #0x1
(268) mul #0x8
(269) add x
(270) tax
(271) ld M[15]
(272) jeq #0x11 jt 273 jf 274
(273) ret #262144
(274) ret #0
The BPF codes above serve to filter packets with a specific structure. These codes work to filter packets that follow this structure: ETHER / IPv6 / (up to three extension headers) / ICMPv6 / IPv6 / (up to three extension headers) / UDP.
IMPORTANT
Above code is semantically correct but will cause an error when one tries to attach it to the kernel. Want to know why? BPF's Jump Target Mismatch
In a recent development, the Linux kernel, upon receiving packets embedded with VLAN tags, now removes the VLAN header right from the outset. For an illustration, take a look at the example provided below. This example originates from a system running Ubuntu 20.04 with kernel version 5.15. It's important to note that the negative offsets in the example do not directly point to the incoming packets themselves, but rather indicate auxiliary data stored within the kernel.
$ tcpdump -d "vlan 200"
(000) ldb [-4048]
(001) jeq #0x1 jt 6 jf 2
(002) ldh [12]
(003) jeq #0x8100 jt 6 jf 4
(004) jeq #0x88a8 jt 6 jf 5
(005) jeq #0x9100 jt 6 jf 14
(006) ldb [-4048]
(007) jeq #0x1 jt 8 jf 10
(008) ldb [-4052]
(009) ja 11
(010) ldh [14]
(011) and #0xfff
(012) jeq #0xc8 jt 13 jf 14
(013) ret #262144
(014) ret #0
The BPF codes mentioned above may encounter issues when used on different operating systems like macOS. This is because the kernel on macOS rejects packet processing when negative offsets are encountered. Caper addresses this challenge by reordering the BPF code, making it versatile and suitable for diverse operating systems and kernels. This enhanced adaptability guarantees seamless compatibility in packet processing, regardless of the unique characteristics of the underlying system configurations. Below is the example of BPF from "vlan 200" on Caper.
$ ./caper.byte -BPF_optimized -linux_vlan -q -p -e "vlan 200"
(000) ldh [12]
(001) jeq #0x8100 jt 4 jf 2
(002) jeq #0x88a8 jt 4 jf 3
(003) jeq #0x9100 jt 4 jf 7
(004) ldh [14]
(005) and #0xfff
(006) jeq #0xc8 jt 12 jf 7
(007) ldh [-4048]
(008) jeq #0x1 jt 9 jf 13
(009) ldh [-4052]
(010) and #0xfff
(011) jeq #0xc8 jt 12 jf 13
(012) ret #262144
(013) ret #0
People frequently require the ability to filter not just a single specific VLAN tag, but rather a range of such tags. Here, Caper steps in to provide a practical solution. It permits the use of the pcap expression syntax, such as "vlan 2000-3000," which generates BPF code to effectively filter packets with VLAN tags falling within the range of 2000 to 3000. Below is the BPF example of "vlan 2000-3000" on Caper.
$ ./caper.byte -not_expand -BPF_optimized -q -p -e "vlan 2000-3000"
(000) ldh [12]
(001) jeq #0x8100 jt 4 jf 2
(002) jeq #0x88a8 jt 4 jf 3
(003) jeq #0x9100 jt 4 jf 9
(004) ldh [14]
(005) and #0xfff
(006) jge #0x7d0 jt 7 jf 9
(007) jgt #0xbb8 jt 9 jf 8
(008) ret #262144
(009) ret #0