Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
134 changes: 134 additions & 0 deletions fuzz/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# libyaml fuzzing harnesses

This directory contains experimental libFuzzer harnesses for several libyaml APIs.

## Targets

fuzz_scan.cpp
Targets yaml_parser_scan().
Exercises the lower-level scanner/tokenization layer.

fuzz_parse.cpp
Targets yaml_parser_parse().
Exercises the parser event pipeline and reaches code in reader.c, scanner.c, and parser.c.

fuzz_load.cpp
Targets yaml_parser_load().
Extends parser-side exploration into document composition and loader logic, including loader.c.

fuzz_emit.cpp
Targets yaml_emitter_emit() using simple valid event streams.
This is the initial emitter-side harness.

fuzz_emit_nested.cpp
Targets yaml_emitter_emit() using nested valid event streams.
This improves emitter-side exploration by generating more structured mappings and sequences.

fuzz_roundtrip_parse_emit.cpp
Targets a parse → emit round-trip workflow.
The harness parses fuzz input into events using yaml_parser_parse() and then emits those events using yaml_emitter_emit().
This helps explore parser/emitter interaction and significantly improves emitter coverage.

## Example build

Build libyaml first, then compile a harness like:

clang++ fuzz/fuzz_parse.cpp src/.libs/libyaml.a -I include -fsanitize=fuzzer,address,undefined -g -O1 -fno-omit-frame-pointer -o fuzz_parse

Other examples:

clang++ fuzz/fuzz_scan.cpp src/.libs/libyaml.a -I include -fsanitize=fuzzer,address,undefined -g -O1 -fno-omit-frame-pointer -o fuzz_scan
clang++ fuzz/fuzz_load.cpp src/.libs/libyaml.a -I include -fsanitize=fuzzer,address,undefined -g -O1 -fno-omit-frame-pointer -o fuzz_load
clang++ fuzz/fuzz_emit.cpp src/.libs/libyaml.a -I include -fsanitize=fuzzer,address,undefined -g -O1 -fno-omit-frame-pointer -o fuzz_emit
clang++ fuzz/fuzz_emit_nested.cpp src/.libs/libyaml.a -I include -fsanitize=fuzzer,address,undefined -g -O1 -fno-omit-frame-pointer -o fuzz_emit_nested
clang++ fuzz/fuzz_roundtrip_parse_emit.cpp src/.libs/libyaml.a -I include -fsanitize=fuzzer,address,undefined -g -O1 -fno-omit-frame-pointer -o fuzz_roundtrip_parse_emit

## Example run

./fuzz_parse corpus_dir

Other examples:

./fuzz_scan corpus_dir
./fuzz_load corpus_dir
./fuzz_emit corpus_dir
./fuzz_emit_nested corpus_dir
./fuzz_roundtrip_parse_emit corpus_dir

## Coverage observations

Local coverage measurements with llvm-cov showed the following.

Parser-side coverage

yaml_parser_parse()
- parser.c: 92.90% line coverage
- reader.c: 94.85% line coverage
- scanner.c: 94.01% line coverage

Overall parse coverage binary
- total line coverage: 77.07%
- total branch coverage: 72.13%
- total region coverage: 78.25%

yaml_parser_load()
- loader.c: 89.55% line coverage
- parser.c: 91.85% line coverage
- reader.c: 94.85% line coverage
- scanner.c: 94.01% line coverage

Overall load coverage binary
- total line coverage: 77.56%
- total branch coverage: 71.80%
- total region coverage: 78.34%

Emitter-side coverage

Initial yaml_emitter_emit() harness
- emitter.c: 59.31% line coverage
- writer.c: 35.44% line coverage

Overall emit coverage binary
- total line coverage: 50.75%
- total branch coverage: 44.94%
- total region coverage: 50.69%

Nested emitter harness
- emitter.c: 68.65% line coverage
- writer.c: 25.32% line coverage

Overall nested emitter coverage binary
- total line coverage: 58.43%
- total branch coverage: 52.12%
- total region coverage: 58.67%

Parse → emit round-trip harness
- emitter.c: 85.35% line coverage
- writer.c: 25.32% line coverage
- parser.c: 92.90% line coverage
- reader.c: 94.85% line coverage
- scanner.c: 94.01% line coverage

Overall round-trip coverage binary
- total line coverage: 79.64%
- total branch coverage: 71.51%
- total region coverage: 79.55%

## Corpus

During local fuzzing runs, seed corpora were generated and minimized using libFuzzer's merge mode.

These corpora are not included in this repository because they are large and mostly machine-generated. The harnesses are designed to work with any YAML seed corpus.

## Summary

The strongest parser-side harnesses are fuzz_parse.cpp and fuzz_load.cpp.
The strongest emitter-side harness is fuzz_roundtrip_parse_emit.cpp.

In local testing, the round-trip harness substantially improved emitter.c coverage compared with the initial standalone emitter harness.

## Notes

During local experimentation, an earlier round-trip harness using a fixed-size emitter output buffer triggered a double-free candidate during cleanup. After replacing that output path with a sink callback, the issue no longer reproduced, so it is not treated as a confirmed libyaml vulnerability.

These harnesses are intended as a starting point for continued fuzzing and may also be useful for future OSS-Fuzz-style integration.
160 changes: 160 additions & 0 deletions fuzz/fuzz_emit.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
#include <stddef.h>
#include <stdint.h>

#include <yaml.h>

static bool EmitOwned(yaml_emitter_t *emitter, yaml_event_t *event) {
if (yaml_emitter_emit(emitter, event)) {
return true; // emitter consumed the event
}
yaml_event_delete(event); // clean up on failure
return false;
}

static bool EmitScalar(yaml_emitter_t *emitter,
const uint8_t *data,
size_t size,
yaml_scalar_style_t style) {
yaml_event_t event;
static yaml_char_t tag[] = "tag:yaml.org,2002:str";

if (!yaml_scalar_event_initialize(
&event,
nullptr, // anchor
tag, // tag
(yaml_char_t *)data, // value
(int)size, // length
1, // plain_implicit
1, // quoted_implicit
style)) {
return false;
}

return EmitOwned(emitter, &event);
}

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
yaml_emitter_t emitter;
yaml_event_t event;

unsigned char output[8192];
size_t written = 0;

// Declare these before any goto target crossing.
uint8_t selector = (size > 0) ? data[0] : 0;
const uint8_t *payload = (size > 0) ? data + 1 : data;
size_t payload_size = (size > 0) ? size - 1 : 0;

if (!yaml_emitter_initialize(&emitter)) {
return 0;
}

yaml_emitter_set_output_string(&emitter, output, sizeof(output), &written);
yaml_emitter_set_unicode(&emitter, 1);
yaml_emitter_set_encoding(&emitter, YAML_UTF8_ENCODING);

// 1. Stream start
if (!yaml_stream_start_event_initialize(&event, YAML_UTF8_ENCODING) ||
!EmitOwned(&emitter, &event)) {
goto done;
}

// 2. Document start
if (!yaml_document_start_event_initialize(&event, nullptr, nullptr, nullptr, 1) ||
!EmitOwned(&emitter, &event)) {
goto done;
}

yaml_scalar_style_t scalar_style;
switch ((selector >> 2) % 3) {
case 0:
scalar_style = YAML_PLAIN_SCALAR_STYLE;
break;
case 1:
scalar_style = YAML_SINGLE_QUOTED_SCALAR_STYLE;
break;
default:
scalar_style = YAML_DOUBLE_QUOTED_SCALAR_STYLE;
break;
}

switch (selector % 3) {
case 0: {
// Single scalar document
if (!EmitScalar(&emitter, payload, payload_size, scalar_style)) {
goto done;
}
break;
}

case 1: {
// Sequence with up to 2 scalars
yaml_sequence_style_t seq_style =
((selector >> 4) & 1) ? YAML_FLOW_SEQUENCE_STYLE
: YAML_BLOCK_SEQUENCE_STYLE;

if (!yaml_sequence_start_event_initialize(&event, nullptr, nullptr, 1, seq_style) ||
!EmitOwned(&emitter, &event)) {
goto done;
}

size_t mid = payload_size / 2;

if (!EmitScalar(&emitter, payload, mid, scalar_style)) {
goto done;
}
if (!EmitScalar(&emitter, payload + mid, payload_size - mid, scalar_style)) {
goto done;
}

if (!yaml_sequence_end_event_initialize(&event) ||
!EmitOwned(&emitter, &event)) {
goto done;
}
break;
}

case 2: {
// Mapping with one key/value pair
yaml_mapping_style_t map_style =
((selector >> 4) & 1) ? YAML_FLOW_MAPPING_STYLE
: YAML_BLOCK_MAPPING_STYLE;

if (!yaml_mapping_start_event_initialize(&event, nullptr, nullptr, 1, map_style) ||
!EmitOwned(&emitter, &event)) {
goto done;
}

size_t mid = payload_size / 2;

if (!EmitScalar(&emitter, payload, mid, YAML_PLAIN_SCALAR_STYLE)) {
goto done;
}
if (!EmitScalar(&emitter, payload + mid, payload_size - mid, scalar_style)) {
goto done;
}

if (!yaml_mapping_end_event_initialize(&event) ||
!EmitOwned(&emitter, &event)) {
goto done;
}
break;
}
}

// 3. Document end
if (!yaml_document_end_event_initialize(&event, 1) ||
!EmitOwned(&emitter, &event)) {
goto done;
}

// 4. Stream end
if (!yaml_stream_end_event_initialize(&event) ||
!EmitOwned(&emitter, &event)) {
goto done;
}

done:
yaml_emitter_delete(&emitter);
return 0;
}
Loading