fuzzer: don't remove or modify byte of empty input#23180
Closed
McSinyx wants to merge 1 commit intoziglang:masterfrom
McSinyx:fuzz-bound
Closed
fuzzer: don't remove or modify byte of empty input#23180McSinyx wants to merge 1 commit intoziglang:masterfrom McSinyx:fuzz-bound
McSinyx wants to merge 1 commit intoziglang:masterfrom
McSinyx:fuzz-bound
Conversation
Contributor
Author
|
🥺 May I have some eyes over the patch, pwetty pwease? |
gooncreeper
added a commit
to gooncreeper/zig
that referenced
this pull request
Mar 31, 2025
This PR significantly improves the capabilities of the fuzzer. For
comparison, here is a ten minute head to head between the old and new
fuzzer implementations (with newly included fuzz tests):
-- Old --
```
Total Runs: 49020931
Unique Runs: 1044131 (2.1%)
Speed (Runs/Second): 81696
Coverage: 2069 / 15866 (13.0%)
```
(note: Unique Runs is highly inflated due of the inefficiency of the
old implementation)
-- New --
```
Total Runs: 537039526
Unique Runs: 1511 (0.0%)
Speed (Runs/Second): 894950
Coverage: 3000 / 15719 (19.1%)
Examples: `while(C)i(){}else|`
`{y:n()align(b)addrspace`
`switch(P){else=>`
`[:l]align(_:r:l)R`
`(if(b){defer{nosuspend`
`union(enum(I))`
```
NOTE: You have to rebuild the compiler due to new fuzzing
instrumentation being enabled for memory loads.
The changes made to the fuzzer to accomplish this feat mostly include
tracking memory reads from .rodata to determine new runs, new
mutations (especially the ones that insert const values from .rodata
reads and __sanitizer_conv_const_cmp), and minimizing found inputs.
Additionally, the runs per second has greatly been increased due to
generating smaller inputs and avoiding clearing the 8-bit pc counters.
An additional feature added is that the length of the input file is now
stored and the old input file is rerun upon start, though this does not
close ziglang#20803 since it does not output the input (though it can be
verily easily retrieved from the cache directory.)
Other changes made to the fuzzer include more logical initialization,
using one shared file `in` for inputs, creating corpus files with
proper sizes, and using hexadecimal-numbered corpus files for
simplicity. Additionally, volatile was removed from MemoryMappedList
since all that is needed is a guarantee that compiler has done the
writes, which is already accomplished with atomic ordering.
Furthermore, I added several new fuzz tests to gauge the fuzzer's
efficiency. I also tried to add a test for zstandard decompression,
which it crashed within 60,000 runs (less than a second.)
Bug fixes include:
* Fixed a race conditions when multiple fuzzer processes needed to use
the same coverage file.
* Web interface stats now update even when unique runs is not changing.
* Fixed tokenizer.testPropertiesUpheld to allow stray carriage returns
since they are valid whitespace.
* Closes ziglang#23180
POSSIBLE IMPROVEMENTS:
* Remove the 8-bit pc counting code prefer a call to a sanitizer
function that updates a flag if a new pc hit happened (similar to how
the __sanitizer_cov_load functions already operate).
* Less basic input minimization function. It could also try splitting
inputs into two between each byte to see if they both hit the same pcs.
This is useful as smaller inputs are usually much more efficient.
* Deterministic mutations when a new input is found.
* Culling out corpus inputs that are redundant due to smaller inputs
already hitting their pcs and memory addresses.
* Applying multiple mutations during dry spells.
* Prioritizing some corpus inputs.
* Creating a list of the most successful input splices (which would
likely contain grammar keywords) and creating a custom mutation for
adding them.
* Removing some less-efficient mutations.
* Store effective mutations to the disk for the benefit of future runs.
* Counting __sanitizer_cov `@returnAddress`es in determining unique
runs.
* Optimize __sanitizer_cov_trace_const_cmp methods (the use of an
ArrayHashMap is not too fast).
* Processor affinity
* Exclude fuzzer's .rodata
Nevertheless, I feel like the fuzzer is in a viable place to start
being useful (as demonstrated in ziglang#23413)
gooncreeper
added a commit
to gooncreeper/zig
that referenced
this pull request
Mar 31, 2025
This PR significantly improves the capabilities of the fuzzer. For
comparison, here is a ten minute head to head between the old and new
fuzzer implementations (with newly included fuzz tests):
-- Old --
```
Total Runs: 49020931
Unique Runs: 1044131 (2.1%)
Speed (Runs/Second): 81696
Coverage: 2069 / 15866 (13.0%)
```
(note: Unique Runs is highly inflated due of the inefficiency of the
old implementation)
-- New --
```
Total Runs: 537039526
Unique Runs: 1511 (0.0%)
Speed (Runs/Second): 894950
Coverage: 3000 / 15719 (19.1%)
Examples: `while(C)i(){}else|`
`{y:n()align(b)addrspace`
`switch(P){else=>`
`[:l]align(_:r:l)R`
`(if(b){defer{nosuspend`
`union(enum(I))`
```
NOTE: You have to rebuild the compiler due to new fuzzing
instrumentation being enabled for memory loads.
The changes made to the fuzzer to accomplish this feat mostly include
tracking memory reads from .rodata to determine new runs, new
mutations (especially the ones that insert const values from .rodata
reads and __sanitizer_conv_const_cmp), and minimizing found inputs.
Additionally, the runs per second has greatly been increased due to
generating smaller inputs and avoiding clearing the 8-bit pc counters.
An additional feature added is that the length of the input file is now
stored and the old input file is rerun upon start, though this does not
close ziglang#20803 since it does not output the input (though it can be
very easily retrieved from the cache directory.)
Other changes made to the fuzzer include more logical initialization,
using one shared file `in` for inputs, creating corpus files with
proper sizes, and using hexadecimal-numbered corpus files for
simplicity. Additionally, volatile was removed from MemoryMappedList
since all that is needed is a guarantee that compiler has done the
writes, which is already accomplished with atomic ordering.
Furthermore, I added several new fuzz tests to gauge the fuzzer's
efficiency. I also tried to add a test for zstandard decompression,
which it crashed within 60,000 runs (less than a second.)
Bug fixes include:
* Fixed a race conditions when multiple fuzzer processes needed to use
the same coverage file.
* Web interface stats now update even when unique runs is not changing.
* Fixed tokenizer.testPropertiesUpheld to allow stray carriage returns
since they are valid whitespace.
* Closes ziglang#23180
POSSIBLE IMPROVEMENTS:
* Remove the 8-bit pc counting code prefer a call to a sanitizer
function that updates a flag if a new pc hit happened (similar to how
the __sanitizer_cov_load functions already operate).
* Less basic input minimization function. It could also try splitting
inputs into two between each byte to see if they both hit the same pcs.
This is useful as smaller inputs are usually much more efficient.
* Deterministic mutations when a new input is found.
* Culling out corpus inputs that are redundant due to smaller inputs
already hitting their pcs and memory addresses.
* Applying multiple mutations during dry spells.
* Prioritizing some corpus inputs.
* Creating a list of the most successful input splices (which would
likely contain grammar keywords) and creating a custom mutation for
adding them.
* Removing some less-efficient mutations.
* Store effective mutations to the disk for the benefit of future runs.
* Counting __sanitizer_cov `@returnAddress`es in determining unique
runs.
* Optimize __sanitizer_cov_trace_const_cmp methods (the use of an
ArrayHashMap is not too fast).
* Processor affinity
* Exclude fuzzer's .rodata
Nevertheless, I feel like the fuzzer is in a viable place to start
being useful (as demonstrated in ziglang#23413)
gooncreeper
added a commit
to gooncreeper/zig
that referenced
this pull request
Mar 31, 2025
This PR significantly improves the capabilities of the fuzzer. For
comparison, here is a ten minute head to head between the old and new
fuzzer implementations (with newly included fuzz tests):
-- Old --
```
Total Runs: 49020931
Unique Runs: 1044131 (2.1%)
Speed (Runs/Second): 81696
Coverage: 2069 / 15866 (13.0%)
```
(note: Unique Runs is highly inflated due of the inefficiency of the
old implementation)
-- New --
```
Total Runs: 537039526
Unique Runs: 1511 (0.0%)
Speed (Runs/Second): 894950
Coverage: 3000 / 15719 (19.1%)
Examples: `while(C)i(){}else|`
`{y:n()align(b)addrspace`
`switch(P){else=>`
`[:l]align(_:r:l)R`
`(if(b){defer{nosuspend`
`union(enum(I))`
```
NOTE: You have to rebuild the compiler due to new fuzzing
instrumentation being enabled for memory loads.
The changes made to the fuzzer to accomplish this feat mostly include
tracking memory reads from .rodata to determine new runs, new
mutations (especially the ones that insert const values from .rodata
reads and __sanitizer_conv_const_cmp), and minimizing found inputs.
Additionally, the runs per second has greatly been increased due to
generating smaller inputs and avoiding clearing the 8-bit pc counters.
An additional feature added is that the length of the input file is now
stored and the old input file is rerun upon start, though this does not
close ziglang#20803 since it does not output the input (though it can be
very easily retrieved from the cache directory.)
Other changes made to the fuzzer include more logical initialization,
using one shared file `in` for inputs, creating corpus files with
proper sizes, and using hexadecimal-numbered corpus files for
simplicity. Additionally, volatile was removed from MemoryMappedList
since all that is needed is a guarantee that compiler has done the
writes, which is already accomplished with atomic ordering.
Furthermore, I added several new fuzz tests to gauge the fuzzer's
efficiency. I also tried to add a test for zstandard decompression,
which it crashed within 60,000 runs (less than a second.)
Bug fixes include:
* Fixed a race conditions when multiple fuzzer processes needed to use
the same coverage file.
* Web interface stats now update even when unique runs is not changing.
* Fixed tokenizer.testPropertiesUpheld to allow stray carriage returns
since they are valid whitespace.
* Closes ziglang#23180
POSSIBLE IMPROVEMENTS:
* Remove the 8-bit pc counting code prefer a call to a sanitizer
function that updates a flag if a new pc hit happened (similar to how
the __sanitizer_cov_load functions already operate).
* Less basic input minimization function. It could also try splitting
inputs into two between each byte to see if they both hit the same pcs.
This is useful as smaller inputs are usually much more efficient.
* Deterministic mutations when a new input is found.
* Culling out corpus inputs that are redundant due to smaller inputs
already hitting their pcs and memory addresses.
* Applying multiple mutations during dry spells.
* Prioritizing some corpus inputs.
* Creating a list of the most successful input splices (which would
likely contain grammar keywords) and creating a custom mutation for
adding them.
* Removing some less-efficient mutations.
* Store effective mutations to the disk for the benefit of future runs.
* Counting __sanitizer_cov `@returnAddress`es in determining unique
runs.
* Optimize __sanitizer_cov_trace_const_cmp methods (the use of an
ArrayHashMap is not too fast).
* Processor affinity
* Exclude fuzzer's .rodata
Nevertheless, I feel like the fuzzer is in a viable place to start
being useful (as demonstrated with the find in ziglang#23413)
gooncreeper
added a commit
to gooncreeper/zig
that referenced
this pull request
Mar 31, 2025
This PR significantly improves the capabilities of the fuzzer. For
comparison, here is a ten minute head to head between the old and new
fuzzer implementations (with newly included fuzz tests):
-- Old --
```
Total Runs: 49020931
Unique Runs: 1044131 (2.1%)
Speed (Runs/Second): 81696
Coverage: 2069 / 15866 (13.0%)
```
(note: Unique Runs is highly inflated due of the inefficiency of the
old implementation)
-- New --
```
Total Runs: 537039526
Unique Runs: 1511 (0.0%)
Speed (Runs/Second): 894950
Coverage: 3000 / 15719 (19.1%)
Examples: `while(C)i(){}else|`
`{y:n()align(b)addrspace`
`switch(P){else=>`
`[:l]align(_:r:l)R`
`(if(b){defer{nosuspend`
`union(enum(I))`
```
NOTE: You have to rebuild the compiler due to new fuzzing
instrumentation being enabled for memory loads.
The changes made to the fuzzer to accomplish this feat mostly include
tracking memory reads from .rodata to determine new runs, new
mutations (especially the ones that insert const values from .rodata
reads and __sanitizer_conv_const_cmp), and minimizing found inputs.
Additionally, the runs per second has greatly been increased due to
generating smaller inputs and avoiding clearing the 8-bit pc counters.
An additional feature added is that the length of the input file is now
stored and the old input file is rerun upon start, though this does not
close ziglang#20803 since it does not output the input (though it can be
very easily retrieved from the cache directory.)
Other changes made to the fuzzer include more logical initialization,
using one shared file `in` for inputs, creating corpus files with
proper sizes, and using hexadecimal-numbered corpus files for
simplicity. Additionally, volatile was removed from MemoryMappedList
since all that is needed is a guarantee that compiler has done the
writes, which is already accomplished with atomic ordering.
Furthermore, I added several new fuzz tests to gauge the fuzzer's
efficiency. I also tried to add a test for zstandard decompression,
which it crashed within 60,000 runs (less than a second.)
Bug fixes include:
* Fixed a race conditions when multiple fuzzer processes needed to use
the same coverage file.
* Web interface stats now update even when unique runs is not changing.
* Fixed tokenizer.testPropertiesUpheld to allow stray carriage returns
since they are valid whitespace.
* Closes ziglang#23180
Possible Improvements:
* Remove the 8-bit pc counting code prefer a call to a sanitizer
function that updates a flag if a new pc hit happened (similar to how
the __sanitizer_cov_load functions already operate).
* Less basic input minimization function. It could also try splitting
inputs into two between each byte to see if they both hit the same pcs.
This is useful as smaller inputs are usually much more efficient.
* Deterministic mutations when a new input is found.
* Culling out corpus inputs that are redundant due to smaller inputs
already hitting their pcs and memory addresses.
* Applying multiple mutations during dry spells.
* Prioritizing some corpus inputs.
* Creating a list of the most successful input splices (which would
likely contain grammar keywords) and creating a custom mutation for
adding them.
* Removing some less-efficient mutations.
* Store effective mutations to the disk for the benefit of future runs.
* Counting __sanitizer_cov `@returnAddress`es in determining unique
runs.
* Optimize __sanitizer_cov_trace_const_cmp methods (the use of an
ArrayHashMap is not too fast).
* Processor affinity
* Exclude fuzzer's .rodata
Nevertheless, I feel like the fuzzer is in a viable place to start
being useful (as demonstrated with the find in ziglang#23413)
gooncreeper
added a commit
to gooncreeper/zig
that referenced
this pull request
May 1, 2025
This PR significantly improves the capabilities of the fuzzer. For
comparison, here is a ten minute head to head between the old and new
fuzzer implementations (with newly included fuzz tests):
-- Old --
```
Total Runs: 49020931
Unique Runs: 1044131 (2.1%)
Speed (Runs/Second): 81696
Coverage: 2069 / 15866 (13.0%)
```
(note: Unique Runs is highly inflated due of the inefficiency of the
old implementation)
-- New --
```
Total Runs: 537039526
Unique Runs: 1511 (0.0%)
Speed (Runs/Second): 894950
Coverage: 3000 / 15719 (19.1%)
Examples: `while(C)i(){}else|`
`{y:n()align(b)addrspace`
`switch(P){else=>`
`[:l]align(_:r:l)R`
`(if(b){defer{nosuspend`
`union(enum(I))`
```
NOTE: You have to rebuild the compiler due to new fuzzing
instrumentation being enabled for memory loads.
The changes made to the fuzzer to accomplish this feat mostly include
tracking memory reads from .rodata to determine new runs, new
mutations (especially the ones that insert const values from .rodata
reads and __sanitizer_conv_const_cmp), and minimizing found inputs.
Additionally, the runs per second has greatly been increased due to
generating smaller inputs and avoiding clearing the 8-bit pc counters.
An additional feature added is that the length of the input file is now
stored and the old input file is rerun upon start, though this does not
close ziglang#20803 since it does not output the input (though it can be
very easily retrieved from the cache directory.)
Other changes made to the fuzzer include more logical initialization,
using one shared file `in` for inputs, creating corpus files with
proper sizes, and using hexadecimal-numbered corpus files for
simplicity.
Furthermore, I added several new fuzz tests to gauge the fuzzer's
efficiency. I also tried to add a test for zstandard decompression,
which it crashed within 60,000 runs (less than a second.)
Bug fixes include:
* Fixed a race conditions when multiple fuzzer processes needed to use
the same coverage file.
* Web interface stats now update even when unique runs is not changing.
* Fixed tokenizer.testPropertiesUpheld to allow stray carriage returns
since they are valid whitespace.
* Closes ziglang#23180
gooncreeper
added a commit
to gooncreeper/zig
that referenced
this pull request
May 1, 2025
This PR significantly improves the capabilities of the fuzzer. For
comparison, here is a ten minute head to head between the old and new
fuzzer implementations (with newly included fuzz tests):
-- Old --
```
Total Runs: 49020931
Unique Runs: 1044131 (2.1%)
Speed (Runs/Second): 81696
Coverage: 2069 / 15866 (13.0%)
```
(note: Unique Runs is highly inflated due of the inefficiency of the
old implementation)
-- New --
```
Total Runs: 537039526
Unique Runs: 1511 (0.0%)
Speed (Runs/Second): 894950
Coverage: 3000 / 15719 (19.1%)
Examples: `while(C)i(){}else|`
`{y:n()align(b)addrspace`
`switch(P){else=>`
`[:l]align(_:r:l)R`
`(if(b){defer{nosuspend`
`union(enum(I))`
```
NOTE: You have to rebuild the compiler due to new fuzzing
instrumentation being enabled for memory loads.
The changes made to the fuzzer to accomplish this feat mostly include
tracking memory reads from .rodata to determine new runs, new
mutations (especially the ones that insert const values from .rodata
reads and __sanitizer_conv_const_cmp), and minimizing found inputs.
Additionally, the runs per second has greatly been increased due to
generating smaller inputs and avoiding clearing the 8-bit pc counters.
An additional feature added is that the length of the input file is now
stored and the old input file is rerun upon start, though this does not
close ziglang#20803 since it does not output the input (though it can be
very easily retrieved from the cache directory.)
Other changes made to the fuzzer include more logical initialization,
using one shared file `in` for inputs, creating corpus files with
proper sizes, and using hexadecimal-numbered corpus files for
simplicity.
Furthermore, I added several new fuzz tests to gauge the fuzzer's
efficiency. I also tried to add a test for zstandard decompression,
which it crashed within 60,000 runs (less than a second.)
Bug fixes include:
* Fixed a race conditions when multiple fuzzer processes needed to use
the same coverage file.
* Web interface stats now update even when unique runs is not changing.
* Fixed tokenizer.testPropertiesUpheld to allow stray carriage returns
since they are valid whitespace.
* Closes ziglang#23180
gooncreeper
added a commit
to gooncreeper/zig
that referenced
this pull request
Jul 10, 2025
This PR significantly improves the capabilities of the fuzzer. The changes made to the fuzzer to accomplish this feat mostly include tracking memory reads from .rodata to determine new runs, new mutations (especially the ones that insert const values from .rodata reads and __sanitizer_conv_const_cmp), and minimizing found inputs. Additionally, the runs per second has greatly been increased due to generating smaller inputs and avoiding clearing the 8-bit pc counters. An additional feature added is that the length of the input file is now stored and the old input file is rerun upon start, though this does not close ziglang#20803 since it does not output the input (though it can be very easily retrieved from the cache directory.) Other changes made to the fuzzer include more logical initialization, using one shared file `in` for inputs, creating corpus files with proper sizes, and using hexadecimal-numbered corpus files for simplicity. Furthermore, I added several new fuzz tests to gauge the fuzzer's efficiency. I also tried to add a test for zstandard decompression, which it crashed within 60,000 runs (less than a second.) Bug fixes include: * Fixed a race conditions when multiple fuzzer processes needed to use the same coverage file. * Web interface stats now update even when unique runs is not changing. * Fixed tokenizer.testPropertiesUpheld to allow stray carriage returns since they are valid whitespace. * Closes ziglang#23180
gooncreeper
added a commit
to gooncreeper/zig
that referenced
this pull request
Jul 10, 2025
This PR significantly improves the capabilities of the fuzzer. The changes made to the fuzzer to accomplish this feat mostly include tracking memory reads from .rodata to determine new runs, new mutations (especially the ones that insert const values from .rodata reads and __sanitizer_conv_const_cmp), and minimizing found inputs. Additionally, the runs per second has greatly been increased due to generating smaller inputs and avoiding clearing the 8-bit pc counters. An additional feature added is that the length of the input file is now stored and the old input file is rerun upon start, though this does not close ziglang#20803 since it does not output the input (though it can be very easily retrieved from the cache directory.) Other changes made to the fuzzer include more logical initialization, using one shared file `in` for inputs, creating corpus files with proper sizes, and using hexadecimal-numbered corpus files for simplicity. Furthermore, I added several new fuzz tests to gauge the fuzzer's efficiency. I also tried to add a test for zstandard decompression, which it crashed within 60,000 runs (less than a second.) Bug fixes include: * Fixed a race conditions when multiple fuzzer processes needed to use the same coverage file. * Web interface stats now update even when unique runs is not changing. * Fixed tokenizer.testPropertiesUpheld to allow stray carriage returns since they are valid whitespace. * Closes ziglang#23180
gooncreeper
added a commit
to gooncreeper/zig
that referenced
this pull request
Jul 12, 2025
This PR significantly improves the capabilities of the fuzzer. The changes made to the fuzzer to accomplish this feat mostly include tracking memory reads from .rodata to determine fresh inputs, new mutations (especially the ones that insert const values from .rodata reads and __sanitizer_conv_const_cmp), and minimizing found inputs. Additionally, the runs per second has greatly been increased due to generating smaller inputs and avoiding clearing the 8-bit pc counters. An additional feature added is that the length of the input file is now stored and the old input file is rerun upon start. Other changes made to the fuzzer include more logical initialization, using one shared file `in` for inputs, creating corpus files with proper sizes, and using hexadecimal-numbered corpus files for simplicity. Furthermore, I added several new fuzz tests to gauge the fuzzer's efficiency. I also tried to add a test for zstandard decompression, which it crashed within 60,000 runs (less than a second.) Bug fixes include: * Fixed a race conditions when multiple fuzzer processes needed to use the same coverage file. * Web interface stats now update even when unique runs is not changing. * Fixed tokenizer.testPropertiesUpheld to allow stray carriage returns since they are valid whitespace. * Closes ziglang#23180
gooncreeper
added a commit
to gooncreeper/zig
that referenced
this pull request
Jul 19, 2025
This PR significantly improves the capabilities of the fuzzer. The changes made to the fuzzer to accomplish this feat mostly include tracking memory reads from .rodata to determine fresh inputs, new mutations (especially the ones that insert const values from .rodata reads and __sanitizer_conv_const_cmp), and minimizing found inputs. Additionally, the runs per second has greatly been increased due to generating smaller inputs and avoiding clearing the 8-bit pc counters. An additional feature added is that the length of the input file is now stored and the old input file is rerun upon start. Other changes made to the fuzzer include more logical initialization, using one shared file `in` for inputs, creating corpus files with proper sizes, and using hexadecimal-numbered corpus files for simplicity. Furthermore, I added several new fuzz tests to gauge the fuzzer's efficiency. I also tried to add a test for zstandard decompression, which it crashed within 60,000 runs (less than a second.) Bug fixes include: * Fixed a race conditions when multiple fuzzer processes needed to use the same coverage file. * Web interface stats now update even when unique runs is not changing. * Fixed tokenizer.testPropertiesUpheld to allow stray carriage returns since they are valid whitespace. * Closes ziglang#23180
gooncreeper
added a commit
to gooncreeper/zig
that referenced
this pull request
Jul 20, 2025
This PR significantly improves the capabilities of the fuzzer. The changes made to the fuzzer to accomplish this feat mostly include tracking memory reads from .rodata to determine fresh inputs, new mutations (especially the ones that insert const values from .rodata reads and __sanitizer_conv_const_cmp), and minimizing found inputs. Additionally, the runs per second has greatly been increased due to generating smaller inputs and avoiding clearing the 8-bit pc counters. An additional feature added is that the length of the input file is now stored and the old input file is rerun upon start. Other changes made to the fuzzer include more logical initialization, using one shared file `in` for inputs, creating corpus files with proper sizes, and using hexadecimal-numbered corpus files for simplicity. Furthermore, I added several new fuzz tests to gauge the fuzzer's efficiency. I also tried to add a test for zstandard decompression, which it crashed within 60,000 runs (less than a second.) Bug fixes include: * Fixed a race conditions when multiple fuzzer processes needed to use the same coverage file. * Web interface stats now update even when unique runs is not changing. * Fixed tokenizer.testPropertiesUpheld to allow stray carriage returns since they are valid whitespace. * Closes ziglang#23180
gooncreeper
added a commit
to gooncreeper/zig
that referenced
this pull request
Jul 20, 2025
This PR significantly improves the capabilities of the fuzzer. The changes made to the fuzzer to accomplish this feat mostly include tracking memory reads from .rodata to determine fresh inputs, new mutations (especially the ones that insert const values from .rodata reads and __sanitizer_conv_const_cmp), and minimizing found inputs. Additionally, the runs per second has greatly been increased due to generating smaller inputs and avoiding clearing the 8-bit pc counters. An additional feature added is that the length of the input file is now stored and the old input file is rerun upon start. Other changes made to the fuzzer include more logical initialization, using one shared file `in` for inputs, creating corpus files with proper sizes, and using hexadecimal-numbered corpus files for simplicity. Furthermore, I added several new fuzz tests to gauge the fuzzer's efficiency. I also tried to add a test for zstandard decompression, which it crashed within 60,000 runs (less than a second.) Bug fixes include: * Fixed a race conditions when multiple fuzzer processes needed to use the same coverage file. * Web interface stats now update even when unique runs is not changing. * Fixed tokenizer.testPropertiesUpheld to allow stray carriage returns since they are valid whitespace. * Closes ziglang#23180
gooncreeper
added a commit
to gooncreeper/zig
that referenced
this pull request
Jul 25, 2025
This PR significantly improves the capabilities of the fuzzer. The changes made to the fuzzer to accomplish this feat mostly include tracking memory reads from .rodata to determine fresh inputs, new mutations (especially the ones that insert const values from .rodata reads and __sanitizer_conv_const_cmp), and minimizing found inputs. Additionally, the runs per second has greatly been increased due to generating smaller inputs and avoiding clearing the 8-bit pc counters. An additional feature added is that the length of the input file is now stored and the old input file is rerun upon start. Other changes made to the fuzzer include more logical initialization, using one shared file `in` for inputs, creating corpus files with proper sizes, and using hexadecimal-numbered corpus files for simplicity. Furthermore, I added several new fuzz tests to gauge the fuzzer's efficiency. I also tried to add a test for zstandard decompression, which it crashed within 60,000 runs (less than a second.) Bug fixes include: * Fixed a race conditions when multiple fuzzer processes needed to use the same coverage file. * Web interface stats now update even when unique runs is not changing. * Fixed tokenizer.testPropertiesUpheld to allow stray carriage returns since they are valid whitespace. * Closes ziglang#23180
Contributor
Author
|
Closing as superseded by GH-23416, which is merged. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The actual patch is rather trivial, but the debugging process reveals more hidden problems. In today's episode of Who Fuzzes the Fuzzer?, I got a segfault with the following:
Relevant log:
Yes, part of the trace is missing, but I managed to pinpoint the bug to be from below when
old_input.len == 0, leading toomitted_index == std.math.maxInt(usize):zig/lib/fuzzer.zig
Lines 314 to 318 in 8e0a4ca
Mysteriously, assertions are evaded, unreachables are reached, bound checks are ignored and panics don't pan above the segfaulting statement. I took a look at
lib/fuzzer/web/main.zig'spanicfunction does indeed call@trap:zig/lib/fuzzer/web/main.zig
Lines 33 to 38 in 8e0a4ca