Consider a piece of semantically correct BPF code that causes error when attachig it to the kernel.
$ ./caper.byte -not_expand -BPF_optimized -max_rec 3 -q -e "ether proto \ip6 && ip6 protochain 58 && icmp6 protochain 17"
(000) ldh [12]
(001) jeq #0x86dd jt 2 jf 274
...(omitted for brevity)
(273) ret #262144
(274) ret #0
Upon transferring this code from a symbolic representation back to its byte form, a discrepancy emerges.
(001) jeq #0x86dd jt 2 jf 274 => { 0x15, 0, 16, 0x000086dd },
{ 0x15, 0, 16, 0x000086dd } => (001) jeq #0x86dd jt 2 jf 18
The symbolic representation appears accurate, yet the jump instruction's target line number diverges - from 'jf 274' to 'jf 18'. Why is this happening?
I encounter the fascinating truth about the 'jt' (Jump true) and 'jf' (Jump false) fields within BPF's sock_filter struct. These fields are not absolute line numbers as one might assume. Instead, they denote relative offsets from the current filter code. Below is the struct definition of sock_filter (BPF) from linux/filter.h.
struct sock_filter /* Filter block */
{
__u16 code; /* Actual filter code */
__u8 jt; /* Jump true */
__u8 jf; /* Jump false */
__u32 k; /* Generic multiuse field */
}
Drawing attention to a peculiar quirk, the 'jt' and 'jf' fields stand as 8-bit unsigned integers.
This peculiarity, while seemingly insignificant, plays a pivotal role in the behavior we observe.
Attempting to leap beyond line 255 triggers a bit overflow.
In one instance, striving to reach line 274 from line 1 culminates in an overflow to line 18.
(
target_line(274) - current_line(1) - 1 - max_u8(255) - bit_overflow(1) = 16
current_line(1) + 16 + 1 = 18
)
Although the sematics of BPF is violated, it is still a valid instruction to jump to line 18. We must clarify what causes error at line 18.
(015): st M[15]
(016): ldb [x + 14]
(017): and #0xf
(018): mul #0x4
(019): add x
(020): tax
(021): ld M[15]
(022): jeq #0x3a jt l128 jf l023
From line 18 from above BPF, two potential culprits come under scrutiny:
Inquisitively, we examine whether utilizing the X register without prior initialization is the root of the issue. To test this, we can investigate a dummy BPF code that I wrote.
l000: ld #0x0
l001: add x
l002: jeq #0x0, l003 , l004
l003: ret #262144
l004: ret #0
When I compile above bpf and attach it to the kernel, the kernel doesn't complain and accept all incoming packets. From this, it becomes apparent that the kernel implicitly assign 0 to X register to without explicit initialzation.
Investigating the usage of uninitialized memory addresses (M[15]) unveils an intriguing facet of BPF behavior. Below is the dummy BPF codes that I wrote to find out the answer.
l000: ld #0x0
l001: ld M[15]
l002: jeq #0x0, l003 , l004
l003: ret #262144
l004: ret #0
Above code fails with daunting Warning: Kernel filter failed: Invalid argument error. To make sure that loading value from unitialized memory address is illegal, let's investigate another BPF codes.
l000: ld #0x0
l001: st M[15]
l002: ld M[15]
l003: jeq #0x0, l004 , l005
l004: ret #262144
l005: ret #0
Above BPF codes doesn't cause any probelm and accept all imcoming packets. It becomes clear the loading from an unitialized memory address is illegal in BPF space.
Caper approaches the challenge by leveraging the 'jmp' (Jump absolute) instruction.
Unlike jt and jf, 'jmp' does not rely on relative offsets;
instead, it employs an unsigned 32-bit 'k' field to precisely specify the target line number.
This maneuver effectively eradicates the concerns surrounding jump overflow, rendering the previous conundrum obsolete.
Observe below demonstration.
./caper.byte -not_expand -BPF_optimized -max_rec 3 -q -p -e "ether proto \ip6 && ip6 protochain 58 && icmp6 protochain 17"
(000) ldh [12]
(001) jeq #0x86dd jt 3 jf 2
(002) jmp (275)
...(ommitted for brevity)
(274) ret #262144
(275) ret #0
This output encapsulates both semantic correctness and suitability for attachment to the kernel.
Takeaways from this experiment:
the kernel's rejection of the BPF code likely emanates from invoking the 'ld' instruction to fetch data from an uninitialized memory location (M[15]).
It is imperative to bear in mind that BPF behavior can exhibit nuances contingent on kernel versions and configurations.
Through harnessing 'jmp' instructions, the bit overflow and its following predicament that causes daunting error has been sidestepped.
In your journey through the world of BPF, always remember that the devil lies in the details, and diligent investigation can unravel the enigmatic code behaviors.
Happy Filtering!