Questions regarding your paper's evalutation

Hello, congratulations on your paper -- nice work, interesting idea! I was interested in reproducing your results and was looking into your artifact for that. During this, I noticed a few things that I would like clarified.

A note before my questions: Since not all the code of your artifacts is part of your main [GitHub repository](https://github.com/FOX-Fuzz/FOX/tree/ce16df051096e37191becaf4cdc92234265b1f02) and can only be found in the Docker image pulled as part of the reproduction setup, I created a GitHub repository that contains a copy of the data located in your Docker image `adamstorek/fox:latest` (sha256:11ac4f0ceb501d734af81aa4dc4d9ca7d4c87d55239ccf0218271d48bea8b78d) in `/workspace/fuzzopt-eval`. I'll use that to refer to specific parts of your code.


## Coverage on standalone targets
You evaluated 15 standalone targets in [your paper](https://arxiv.org/pdf/2406.04517) to compare against the state of the art. Table 4 displays the results of these experiments. I included the table below for convenience. 
<img width="603" alt="Screenshot 2024-10-17 at 14 26 24" src="https://github.com/user-attachments/assets/4618da70-b55d-4933-a504-53613cd6a76f">


### Reproduction steps I've performed
To check the setup used for evaluation, I followed [the instructions](https://github.com/FOX-Fuzz/FOX/tree/ce16df051096e37191becaf4cdc92234265b1f02/artifact#2-generate-new-results) you provided as part of your artifact. First, I pulled the provided Docker image via `docker pull adamstorek/fox:latest` (sha256:11ac4f0ceb501d734af81aa4dc4d9ca7d4c87d55239ccf0218271d48bea8b78d). Second, I spawned a Docker container using your image as described:
```sh
docker run --privileged --network='host' -d -it adamstorek/fox:latest
docker exec -it optfuzz_eval /bin/bash
```

Next, I switched to the `/workspace/fuzzopt-eval/fuzzdeployment/targets` folder and modified the `set_all_targets.sh` script, such that it builds libarchive (bsdtar) and ffmpeg, by setting `TARGETS="ffmpeg libarchive"`. I chose these targets since you reported a coverage increase of up to 97.25% and 49.04% in your paper, respectively.
After executing `set_all_targets.sh` and once the build process terminated, I checked the resulting binaries.

For this matter, I executed the following command, which prints the size of the [SanitizerCoverage](https://clang.llvm.org/docs/SanitizerCoverage.html) section of each individual binary that has been built for the libarchive and ffmpeg target:
```
# ffmpeg binaries
as5827@19e7b435119d:/workspace/fuzzopt-eval/fuzzdeployment/targets$ find ffmpeg/binaries/  -type f -executable -print -exec bash -c 'readelf -S {} | grep guard' \;
ffmpeg/binaries/optfuzz_build/ffmpeg
 [28] __sancov_guards   PROGBITS         0000000008ddcd6c  08ddbd6c
ffmpeg/binaries/cmplog_build/ffmpeg
 [28] __sancov_guards   PROGBITS         0000000005d94494  05d93494
ffmpeg/binaries/aflpp_build/ffmpeg
 [28] __sancov_guards   PROGBITS         0000000005b29d54  05b28d54

# libarchive binaries
as5827@19e7b435119d:/workspace/fuzzopt-eval/fuzzdeployment/targets$ find libarchive/binaries/  -type f -executable -print -exec bash -c 'readelf -S {} | grep guard' \;
libarchive/binaries/optfuzz_build/bsdtar
 [28] __sancov_guards   PROGBITS         000000000072540c  0072440c
libarchive/binaries/cmplog_build/bsdtar
 [28] __sancov_guards   PROGBITS         00000000004636b4  004626b4
libarchive/binaries/aflpp_build/bsdtar
 [28] __sancov_guards   PROGBITS         00000000004243f4  004233f4
```

For each edge in the target, this map contains one entry that is 4 bytes in size. As we can see, the map sizes differ significantly. I believe that this is due to additional instrumentation added by your modified AFL++ pass (for example, [here](https://github.com/FOX-Fuzz/FOX/blob/ce16df051096e37191becaf4cdc92234265b1f02/instrumentation/SanitizerCoveragePCGUARD.so.cc#L2094)).

Computing the relative guard section size between your fuzzer (optfuzz), AFL++ (aflpp), and AFL++ cmplog (cmplog_build) yields the following results.

For ffmpeg, we are getting the following numbers:
optfuzz / aflpp = 0x0000000008ddcd6c / 0x0000000005b29d54 = 1.52
optfuzz / cmplog_build = 0x0000000008ddcd6c / 0x0000000005d94494 = 1.52

And for libarchive, we are getting the following ratios:
optfuzz / aflpp = 0x000000000072540c / 0x00000000004243f4 = 1.73
optfuzz / cmplog_build = 0x000000000072540c / 0x00000000004636b4 = 1.63


As we can see, again, the guard sections are quite different in size. **Essentially, this means that the binaries compiled using your fuzzer have a considerable number of additional edges compared to the ones of AFL++.** This, in turn, has ramifications for computing coverage over time (at least the way you do to generate the plots for your paper): Of course, if your fuzzer's binary has significantly more edges (added by your instrumentation), it is easy to cover more edges than the baseline fuzzer, for which the binary has fewer edges.

Looking at [the script](https://github.com/fuzz-evaluator/FOX-fuzzopt-eval-upstream/blob/23a3277c6604616157bb085dd26ba2f365780ff1/fuzzdeployment/process_results/parse_results.py) you are using for coverage computation (after the fuzzing runs), you are parsing AFL++'s `fuzzing_stats` or `plot_data` to calculate the coverage: 

https://github.com/fuzz-evaluator/FOX-fuzzopt-eval-upstream/blob/23a3277c6604616157bb085dd26ba2f365780ff1/fuzzdeployment/process_results/parse_results.py#L305-L335

I don't think this comparison is fair: Since your binaries contain more than 50% additional edges for both targets (compared to the other fuzzer's binaries), this inflates your results artificially.

Now, all of the above is based on my superficial understanding so far (I'll definitely take a closer look in the coming days), so please let me know if there's anything I missed or understood wrong. I'd appreciate if you could outline why/how the issue I'm seeing is taken care of. 

On another note, I'm curious if there is any place I can find your fuzzbench configuration so I can rerun your exact evaluation. Thanks in advance!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions regarding your paper's evalutation #2

Coverage on standalone targets

Reproduction steps I've performed

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Questions regarding your paper's evalutation #2

Description

Coverage on standalone targets

Reproduction steps I've performed

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions