Conversation
One of the main improvements recent eBPF verifiers have is the ability to jump backwards in code. This means that the OFF field of the opcode can be negative, and thus should be interpreted as signed. eBPF used to have load instructions, specific to loading data from socket buffer. Today, they are deprecated, but it is still good practice to support them, as it's very likely older programs might use them. Some of the newer instructions added by the specification require additional fields to dictate what is the exact operation to be performed (eg. CALL requires 'src' to decide whether 'imm' should interpreted as BTF function ID, or as offset from IP. MOV requires 'off' field in order to decide whether a regular MOV or rather a MOVSX should be performed) * Move BPF opcode setting to the disassembler * Change BPF JMP instruction naming to conventional JA * Fix the sign interpretation of 'off', 'imm', etc * Implement LDIND* and LDABS* legacy packet instructions * Implement JUMP32 instruction class * Move BPF opcode setting to disassembler * Correct and remove branch + load tests, and add more exhaustive ones * Correct and remove malformed LD tests, and add legacy packet tests
* Implement `movs`, `sdiv`, `div` instructions * Implement atomic ALU and complex operations (atomic `add`, `or`, `and`, `xor`, and `cmp`, `cmpxchg` respectively) * Remove outdated `xadd` tests, and add new ones for the new instructions
There was a problem hiding this comment.
Excellent job! Thanks a lot for all the effort!
I added quite some comments. Most of them are pretty simple.
Additionally to them, please:
- Add your copyright in SPDX style to the file headers. E.g.:
// SPDX-FileCopyrightText: 2023 Rot127 <unisono@quyllur.org> // SPDX-License-Identifier: BSD-3 - please run
clang-format -i arch/bpf/*on the files.
Please add some tests to cover:
is_signedandis_pkt. You have to add the fields intest_detail_bpf.h/canddetails.py::test_expected_bpf().
(You can check what how theimmis hanndled over all files).
Use be tested withcompare_tbool(). In the yaml files you set the values with: 1 == true, 0 == unset, -1 == false.- The new groups you added.
I know this is tedious stuff, but doing it now, will save us from many bugs in the future.
I also added quite some comments for using the helper functions we already have.
This is important IMO, because those helper functions are better tested, have more asserts and working with them is easier for other devs.
Because one doesn't have to check how "this specific BPF function" works. Also single point of failure etc.
Regarding the name changes:
The specifications didn't contain information regarding how exactly the packet load instructions look and work, so I followed what llvm-objdump and the Linux dissembler showed (GNU objdump doesn't parse it correctly)
This is ok.
For the others. The problem is usually solved via enabling/disabling features (for x86 for example it is Intel or ATT syntax).
The problem with changing names is, people can't google the mnemonics anymore. But when they attempt to, one can't be sure if they found the correct definition of the instruction or deals with some weird undocumented edge case.
In general it is ok to change syntax for better readability. But please make it opt-in via a flag.
By default the syntax should always be as close as possible to the ISA (or RFC in this case).
You can add a CS_OPT_SYNTAX_BPF_CS in capstone.h::cs_opt_value and handle it in BPF_option().
Check the syntax flag in the printer with MI->csh->syntax & CS_OPT_SYNTAX_BPF_CS and set the alternative mnemonic if true.
To extend cstool, just add the option to cstool.c::all_opts.
|
Ah, one last request. Please add a short description to the changes you made in |
|
It appears the formatting is wrong even though I ran |
7e5bbb7 to
75fa4a4
Compare
Rot127
left a comment
There was a problem hiding this comment.
It appears the formatting is wrong even though I ran clang-format?
Most files are not formatted yet. But I try to keep the formatting diversion on an acceptable level. This is why I asked you to only format the bpf files :)
Please revert the formatting in the others.
bda1cf3 to
163c545
Compare
Rot127
left a comment
There was a problem hiding this comment.
Nice, almost done. Please just address the last few comments.
|
Any idea why the macOS tests fail now? I find why |
There was a problem hiding this comment.
I think this is the issue with the test. Just a guess though. My mistake, the error is with is_pkt. Not is_subtracted.
Guess you have to check with the debugger why it is set to true. But remove the is_signed: 0 line first or set it to -1. Maybe this is the issue for whatever reason.
I review a last time tomorrow.
kabeor
left a comment
There was a problem hiding this comment.
Great, thanks for your contribution.
Your checklist for this pull request
Detailed description
Full details in commit messages.
Resources used:
In case of inconsistencies I followed the resources in the order listed
Some important notes:
movs32/movs(for any s8,s16,s32),movsx\movsx64was used, wherexis the size of the moveacmp, we useacmpxchg. It's misleading naming itacmp, since this instruction also swaps the data if the comparison is correct.na/64instead of32/nais simply because that's what is used so far. Doesn't really matter IMO, (and anyway it makes more sense, since32bit is the default instruction class, and64are the eBPF added ones.)llvm-objdumpand the Linux dissembler showed (GNU objdump doesn't parse it correctly)Test plan
All tests green.
...
Closing issues
...