powered by Caper
authored by Hyunsuk Bang
Consider a piece of semantically correct BPF code that causes an error when attaching it to the kernel.
$ ./caper.byte -not_expand -BPF_optimized -max_rec 3 -q -e "ether proto \ip6 && ip6 protochain 58 && icmp6 protochain 17"
(000) ldh [12]
(001) jeq #0x86dd jt 2 jf 274
...(omitted for brevity)
(273) ret #262144
(274) ret #0
Upon transferring this code from a symbolic representation back to its byte form, a discrepancy emerges.
(001) jeq #0x86dd jt 2 jf 274 => { 0x15, 0, 16, 0x000086dd },
{ 0x15, 0, 16, 0x000086dd } => (001) jeq #0x86dd jt 2 jf 18
The symbolic representation appears accurate, yet the jump instruction's target line number diverges - from 'jf 274' to 'jf 18'. Why is this happening?
I encounter the fascinating truth about the 'jt' (Jump true) and 'jf' (Jump false) fields within BPF's sock_filter struct. These fields are not absolute line numbers as one might assume. Instead, they denote relative offsets from the current filter code. Below is the struct definition of sock_filter (BPF) from linux/filter.h.
struct sock_filter /* Filter block */
{
__u16 code; /* Actual filter code */
__u8 jt; /* Jump true */
__u8 jf; /* Jump false */
__u32 k; /* Generic multiuse field */
}
Drawing attention to a peculiar quirk, the 'jt' and 'jf' fields stand as 8-bit unsigned integers. This peculiarity, while seemingly insignificant, plays a pivotal role in the behavior we observe. Attempting to leap beyond line 255 triggers a bit overflow. In one instance, striving to reach line 274 from line 1 culminates in an overflow to line 18.
target_line(274) - current_line(1) - 1 - max_u8(255) - bit_overflow(1) = 16
current_line(1) + 16 + 1 = 18
Although the semantics of BPF are violated, it is still a valid instruction to jump to line 18. We must clarify what causes the error at line 18.
(015): st M[15]
(016): ldb [x + 14]
(017): and #0xf
(018): mul #0x4
(019): add x
(020): tax
(021): ld M[15]
(022): jeq #0x3a jt l128 jf l023
From line 18 of the above BPF, two potential culprits come under scrutiny:
Inquisitively, we examine whether utilizing the X register without prior initialization is the root of the issue. To test this, we can investigate a dummy BPF code that I wrote.
l000: ld #0x0
l001: add x
l002: jeq #0x0, l003 , l004
l003: ret #262144
l004: ret #0
When I compile the above BPF and attach it to the kernel, the kernel doesn't complain and accepts all incoming packets. From this, it becomes apparent that the kernel implicitly assigns 0 to the X register without explicit initialization.
Investigating the usage of uninitialized memory addresses (M[15]) unveils an intriguing facet of BPF behavior. Below are the dummy BPF codes that I wrote to find out the answer.
l000: ld #0x0
l001: ld M[15]
l002: jeq #0x0, l003 , l004
l003: ret #262144
l004: ret #0
The above code fails with the daunting Warning: Kernel filter failed: Invalid argument error. To make sure that loading a value from an uninitialized memory address is illegal, let's investigate another BPF code.
l000: ld #0x0
l001: st M[15]
l002: ld M[15]
l003: jeq #0x0, l004 , l005
l004: ret #262144
l005: ret #0
The above BPF code doesn't cause any problems and accepts all incoming packets. It becomes clear that loading from an uninitialized memory address is illegal in BPF space.
Caper approaches the challenge by leveraging the 'jmp' (Jump absolute) instruction. Unlike jt and jf, 'jmp' does not rely on relative offsets; instead, it employs an unsigned 32-bit 'k' field to precisely specify the target line number. This maneuver effectively eradicates the concerns surrounding jump overflow, rendering the previous conundrum obsolete. Observe the below demonstration.
$ ./caper.byte -not_expand -BPF_optimized -max_rec 3 -q -p -e "ether proto \ip6 && ip6 protochain 58 && icmp6 protochain 17"
(000) ldh [12]
(001) jeq #0x86dd jt 3 jf 2
(002) jmp (275)
...(omitted for brevity)
(274) ret #262144
(275) ret #0
This output encapsulates both semantic correctness and suitability for attachment to the kernel.
Takeaways from this experiment: the kernel's rejection of the BPF code likely emanates from invoking the 'ld' instruction to fetch data from an uninitialized memory location (M[15]). It is imperative to bear in mind that BPF behavior can exhibit nuances contingent on kernel versions and configurations. Through harnessing 'jmp' instructions, the bit overflow and its following predicament that causes daunting error have been sidestepped.
In your journey through the world of BPF, always remember that the devil lies in the details, and diligent investigation can unravel the enigmatic code behaviors.
Happy Filtering!