BPF Simulator

powered by Caper
authored by Hyunsuk Bang

While tcpdump and libpcap already generate BPF code from pcap expressions, Caper offers some unique features.

  1. Expanded Pcap Expressions

    Pcap expressions are higher-level languages used to express packet filters, but they often carry hidden ambiguities. Caper's remarkable feature lies in fully expanding pcap expressions and removing ambiguities to provide users with crystal-clear explanations about pcap expressions.

    $ ./caper.byte -q -p -e "tcp or udp"
    ether proto \ip &&
    (ip proto \tcp || ip proto \udp) ||
    ether proto \ip6 &&
    (ip6 proto \tcp || ip6 proto \udp)
                            

    Through Caper, the pcap expression "tcp or udp" is shown to actually filter TCP or UDP packets on top of IP or IPv6 headers.

    Further details about these expansions can be found from 'What we talk about when we talk about pcap expressions' by Nik Sultana.

  2. Converts Pcap Expressions to English

    Caper's ingenious contribution from Marelle León allows users to express pcap expressions in plain English, making it easier for newcomers to comprehend network filters.

    $ ./caper.byte -engl-out -q -e "ip host 192.168.0.2"
    IPv4 that has a host of 192.168.0.2
    
    $ ./caper.byte -engl-out -q -e "tcp port 80 or 443"
    tcp that has a port which is one of [80, 443]
                            
  3. Caper's BPF Compiler

    Caper provides a compiler for BPF.

    $ ./caper.byte -BPF_optimized -q -p -e "tcp or udp"
    (000) ldh [12]
    (001) jeq #0x800               jt 2      jf 5
    (002) ldb [23]
    (003) jeq #0x6                 jt 13     jf 4
    (004) jeq #0x11                jt 13     jf 14
    (005) jeq #0x86dd              jt 6      jf 14
    (006) ldb [20]
    (007) jeq #0x6                 jt 13     jf 8
    (008) jeq #0x2c                jt 9      jf 11
    (009) ldb [54]
    (010) jeq #0x6                 jt 13     jf 12
    (011) jeq #0x11                jt 13     jf 14
    (012) jeq #0x11                jt 13     jf 14
    (013) ret #262144
    (014) ret #0
                            

    Impressive! But you might wonder why bother with Caper when we already have libpcap doing the job. Well, it's true that libpcap has been around for a while, but it's slowly turning into legacy code, making contributors hesitant to engage with it. Caper is more flexible to embrace changes and improve continuously. We welcome feedback and actively consider user requests to refine our BPF capabilities. Caper introduces exciting features that enhance libpcap's BPF functionality.

  4. IPv6 Advancements

    IPv6 support is vital in today's network landscape, but libpcap occasionally stumbles in handling certain expressions. For instance, consider the following tcpdump expression:

    $ tcpdump -d "ip6 and tcp[tcpflags]=tcp-ack"
    tcpdump: expression rejects all packets
                            

    This seemingly straightforward expression surprisingly rejects all packets. Caper gracefully addresses such edge cases. With Caper, you can now generate functional BPF codes for IPv6 expressions, opening up new possibilities for filtering IPv6 traffic:

    $ ./caper.byte -BPF_optimized -q -p -e "ip6 and tcp[tcpflags]=tcp-ack"
    (000) ldh [12]
    (001) jeq #0x86dd              jt 2      jf 7
    (002) ldb [20]
    (003) jeq #0x6                 jt 4      jf 7
    (004) ldb [67]
    (005) jeq #0x10                jt 6      jf 7
    (006) ret #262144
    (007) ret #0
                            
    $ ./caper.byte -BPF_optimized -q -p -e "ip6 and (udp port 546 or udp port 547) and (udp[8] == 7)"
    (000) ldh [12]
    (001) jeq #0x86dd              jt 2      jf 13
    (002) ldb [20]
    (003) jeq #0x11                jt 4      jf 13
    (004) ldh [54]
    (005) jeq #0x222               jt 10     jf 6
    (006) jeq #0x223               jt 10     jf 7
    (007) ldh [56]
    (008) jeq #0x222               jt 10     jf 9
    (009) jeq #0x223               jt 10     jf 13
    (010) ldb [62]
    (011) jeq #0x7                 jt 12     jf 13
    (012) ret #262144
    (013) ret #0
                            
  5. Protochain Handling

    'Protochain' is specially designed for checking arbitrary numbers of extension headers in IPv4 or IPv6. Since protochain needs to check an 'arbitrary' number of extension headers, it forces the creation of loops within BPF. Caper resolves this issue by flattening the recursion at compile time. This ensures your BPF codes comply with kernel constraints, preventing packet processing from being forced into user space.

    Consider the following tcpdump expression with a loop-inducing protochain:

    $ tcpdump -i en0 -d "ip6 protochain 6"
    (000) ldh [12]
    (001) jeq #0x86dd          jt 2      jf 35
    (002) ldb [20]
    (003) ldx #0x28
    (004) jeq #0x6             jt 32     jf 5
    (005) jeq #0x3b            jt 32     jf 6
    (006) jeq #0x0             jt 10     jf 7
    ...(omitted for brevity)
    (014) mul #8
    (015) add x
    (016) tax
    (017) ld M[0]
    (018) ja 4
    (019) jeq #0x33            jt 20     jf 32
    ...(omitted for brevity)
    (029) tax
    (030) ld M[0]
    (031) ja 4
    (032) add #0
    (033) jeq #0x6             jt 34     jf 35
    (034) ret #524288
    (035) ret #0
                            

    Caper resolves this issue with the -max_rec flag, allowing you to specify the maximum number of recursions:

    $ ./caper.byte -BPF_optimized -max_rec 2 -q -e "ip6 protochain 6"
    (000) ldh [12]
    (001) jeq #0x86dd              jt 2      jf 88
    (002) ldx #0x28
    (003) ldb [20]
    (004) jeq #0x6                 jt 87     jf 5
    (005) jeq #0x29                jt 6      jf 13
    (006) ldb [x + 20]
    (007) st M[15]
    (008) ld #0x28
    (009) add x
    (010) tax
    (011) ld M[15]
    (012) jeq #0x6                 jt 87     jf 13
    (013) jeq #0x4                 jt 14     jf 23
    (014) ldb [x + 23]
    (015) st M[15]
    (016) ldb [x + 14]
    (017) and #0xf
    (018) mul #0x4
    (019) add x
    (020) tax
    (021) ld M[15]
    (022) jeq #0x6                 jt 87     jf 23
    (023) jeq #0x33                jt 24     jf 33
    (024) ldb [x + 14]
    ...(truncated for brevity)
                            

    Within the realm of Caper, protochain finds another strategic attachment point – following specific ICMP or ICMPv6 error messages. The role of ICMP and ICMPv6 error messages is to encapsulate fragments of the original packets responsible for triggering errors. This assortment of error messages encompasses scenarios such as destination unreachability, deprecated source quenching (exclusive to ICMP), time exceeded, parameter problems, oversized packets (specific to ICMPv6), and redirection. Managing protochain in the context of ICMP is a relatively straightforward task. ICMP conveys the original IP header and an 8-byte segment of the IP datagram. However, complexity arises when dealing with ICMPv6. ICMPv6 operates on a different level of flexibility, allowing error messages of variable lengths to be accommodated, expanding until the Maximum Transfer Unit (MTU) is reached. This unique attribute positions ICMPv6 error messages as an ideal platform for protochain examination.

    Observe the following examples:

    $ ./caper.byte -not_expand -BPF_optimized -max_rec 3 -q -e "ether proto \ip6 && ip6 protochain 58 && icmp6 protochain 17"
    (000) ldh [12]
    (001) jeq #0x86dd              jt 2      jf 274
    (002) ldx #0x28
    (003) ldb [20]
    (004) jeq #0x3a                jt 128    jf 5
    (005) jeq #0x29                jt 6      jf 13
    (006) ldb [x + 20]
    (007) st M[15]
    (008) ld #0x28
    (009) add x
    (010) tax
    (011) ld M[15]
    (012) jeq #0x3a                jt 128    jf 13
    ...(omitted for brevity)
    (259) jeq #0x11                jt 273    jf 260
    (260) jeq #0x0                 jt 264    jf 261
    (261) jeq #0x3c                jt 264    jf 262
    (262) jeq #0x2b                jt 264    jf 263
    (263) jeq #0x2c                jt 264    jf 274
    (264) ldb [x + 14]
    (265) st M[15]
    (266) ldb [x + 15]
    (267) add #0x1
    (268) mul #0x8
    (269) add x
    (270) tax
    (271) ld M[15]
    (272) jeq #0x11                jt 273    jf 274
    (273) ret #262144
    (274) ret #0
                            

    The BPF codes above serve to filter packets with a specific structure. These codes work to filter packets that follow this structure: ETHER / IPv6 / (up to three extension headers) / ICMPv6 / IPv6 / (up to three extension headers) / UDP.

    IMPORTANT
    Above code is semantically correct but will cause an error when one tries to attach it to the kernel. Want to know why? BPF's Jump Target Mismatch

  6. Linux Vlan

    In a recent development, the Linux kernel, upon receiving packets embedded with VLAN tags, now removes the VLAN header right from the outset. For an illustration, take a look at the example provided below. This example originates from a system running Ubuntu 20.04 with kernel version 5.15. It's important to note that the negative offsets in the example do not directly point to the incoming packets themselves, but rather indicate auxiliary data stored within the kernel.

    $ tcpdump -d "vlan 200"
    (000) ldb [-4048]
    (001) jeq #0x1             jt 6      jf 2
    (002) ldh [12]
    (003) jeq #0x8100          jt 6      jf 4
    (004) jeq #0x88a8          jt 6      jf 5
    (005) jeq #0x9100          jt 6      jf 14
    (006) ldb [-4048]
    (007) jeq #0x1             jt 8      jf 10
    (008) ldb [-4052]
    (009) ja 11
    (010) ldh [14]
    (011) and #0xfff
    (012) jeq #0xc8            jt 13     jf 14
    (013) ret #262144
    (014) ret #0
                            

    The BPF codes mentioned above may encounter issues when used on different operating systems like macOS. This is because the kernel on macOS rejects packet processing when negative offsets are encountered. Caper addresses this challenge by reordering the BPF code, making it versatile and suitable for diverse operating systems and kernels. This enhanced adaptability guarantees seamless compatibility in packet processing, regardless of the unique characteristics of the underlying system configurations. Below is the example of BPF from "vlan 200" on Caper.

    $ ./caper.byte -BPF_optimized -linux_vlan -q -p -e "vlan 200"
    (000) ldh [12]
    (001) jeq #0x8100              jt 4      jf 2
    (002) jeq #0x88a8              jt 4      jf 3
    (003) jeq #0x9100              jt 4      jf 7
    (004) ldh [14]
    (005) and #0xfff
    (006) jeq #0xc8                jt 12     jf 7
    (007) ldh [-4048]
    (008) jeq #0x1                 jt 9      jf 13
    (009) ldh [-4052]
    (010) and #0xfff
    (011) jeq #0xc8                jt 12     jf 13
    (012) ret #262144
    (013) ret #0
                            
  7. Vlanrange

    People frequently require the ability to filter not just a single specific VLAN tag, but rather a range of such tags. Here, Caper steps in to provide a practical solution. It permits the use of the pcap expression syntax, such as "vlan 2000-3000," which generates BPF code to effectively filter packets with VLAN tags falling within the range of 2000 to 3000. Below is the BPF example of "vlan 2000-3000" on Caper.

    $ ./caper.byte -not_expand -BPF_optimized -q -p -e "vlan 2000-3000"
    (000) ldh [12]
    (001) jeq #0x8100              jt 4      jf 2
    (002) jeq #0x88a8              jt 4      jf 3
    (003) jeq #0x9100              jt 4      jf 9
    (004) ldh [14]
    (005) and #0xfff
    (006) jge #0x7d0               jt 7      jf 9
    (007) jgt #0xbb8               jt 9      jf 8
    (008) ret #262144
    (009) ret #0