fix(fuzz): eliminate spurious context deadline exceeded failures when fuzz time expires by thieman · Pull Request #156 · DataDog/rshell

thieman · 2026-03-26T15:03:39Z

Problem

Fuzz CI was failing in multiple ways:

Coordinator boundary timeout — When -fuzztime=30s expires, Go's internal fuzz coordinator cancels any in-flight worker iteration and emits context deadline exceeded with no file:line reference. This is not a test failure, but the CI step was treating it as one.
Test context not derived from t.Context() — Fuzz iterations used context.WithTimeout(context.Background(), 5s) instead of deriving from t.Context(). This meant an in-flight iteration at the 30s boundary would hit its own deadline, return context.DeadlineExceeded, and call t.Fatal() — causing a spurious CI failure.
FuzzPSFlags hanging on Windows — getSession in procinfo_windows.go walked the PPID chain with only an immediate self-reference guard (p.PPID == cur). On Windows CI the process table can contain longer PPID cycles (A→B→C→A), causing an infinite loop that hung for the full 120s seed corpus timeout.

Fix

1. Fuzz test context propagation (18 files)

Add early-exit guard at top of every f.Fuzz body: if t.Context().Err() != nil { return }
Derive per-iteration contexts from t.Context() so cancellation propagates when fuzz time ends

2. CI workflow boundary timeout handling (`.github/workflows/fuzz.yml`)

Wrap all go test -fuzz=... invocations in a fuzz_run() shell helper that:

Captures output and exit code
If exit non-zero, checks for _test.go:NNN: file:line references — real assertion failures always have these; coordinator boundary timeouts don't
If no file:line found → logs a NOTE and exits 0 (suppresses spurious failure)
If file:line found → propagates the real failure

Also reduces per-invocation -timeout from 300s (shared across all functions) to 90s (30s fuzz + 60s grace per function call).

Applies to both fuzz and fuzz-differential CI jobs.

3. `procinfo_windows.go` PPID cycle detection

Replace the narrow self-reference guard with a proper visited set:

visited := make(map[int]bool)
for cur > 0 {
    if visited[cur] {
        break // cycle detected in PPID chain
    }
    visited[cur] = true
    ancestors[cur] = true
    ...
}

Test plan

All fuzz jobs pass in CI (no context deadline exceeded failures)
Fuzz seed corpus (windows-latest) passes (FuzzPSFlags no longer hangs)
Fuzz Differential (wc) passes consistently
Real fuzz failures (with file:line references) still propagate as failures

🤖 Generated with Claude Code

…z timeout When go test -fuzz -fuzztime=30s completes its budget, Go cancels t.Context(). Previously, fuzz iterations derived their context from context.Background(), so an in-flight iteration would hit its own 5s per-iteration deadline at the fuzz-time boundary, return context.DeadlineExceeded, and the test would call t.Fatal() — reporting a spurious CI failure. Two-part fix across all fuzz test files: 1. Check t.Context().Err() at the top of every f.Fuzz body; return immediately if fuzz time has already elapsed. 2. Derive per-iteration contexts from t.Context() so they are cancelled when fuzz time ends, and treat context cancellation as a clean exit rather than a test failure by adding t.Context().Err() guards before every t.Errorf/t.Fatalf call. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

When -fuzztime=30s expires, Go's internal fuzz coordinator cancels any in-flight worker iteration and emits "context deadline exceeded" with no file:line reference. This is not a test failure — it's the normal fuzz time boundary. Real test assertion failures always include a file:line reference (e.g. foo_test.go:42:). Wrap all fuzz invocations in a fuzz_run() helper that: 1. Captures output and exit code 2. If exit non-zero, checks for file:line references in the output 3. If no file:line found, treats as boundary timeout and exits 0 4. If file:line found, propagates the real failure Also reduces per-invocation -timeout from 300s to 90s (30s fuzz + 60s grace) since each go test call fuzzes exactly one function. Applies to both the fuzz and fuzz-differential CI jobs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The PPID chain walk only guarded against immediate self-reference (p.PPID == cur) but not longer cycles like A→B→C→A. On Windows CI the process table can contain such cycles, causing FuzzPSFlags to hang indefinitely in the seed corpus run. Replace the narrow self-reference check with a visited set that detects any cycle in the PPID chain. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

With bash -e active in GitHub Actions, output=$(go test ...) causes the entire function to exit immediately when go test returns non-zero. This meant: - echo "$output" never ran (no visible log output) - The grep check for real failures never ran - The step silently failed rather than logging what went wrong Replace with tee to a tmpfile so: 1. go test output streams live to the Actions log in real time 2. PIPESTATUS[0] correctly captures go test's exit code 3. The tmpfile is available for grep after the command finishes Applies to both fuzz and fuzz-differential jobs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

thieman and others added 4 commits March 26, 2026 11:03

thieman marked this pull request as ready for review March 26, 2026 15:50

thieman requested review from AlexandreYang, astuyve, julesmcrt and matt-dz as code owners March 26, 2026 15:50

matt-dz approved these changes Mar 26, 2026

View reviewed changes

thieman added this pull request to the merge queue Mar 26, 2026

Merged via the queue into main with commit ff6ded4 Mar 26, 2026
34 checks passed

thieman deleted the thieman/fix-fuzz-context-timeouts branch March 26, 2026 17:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(fuzz): eliminate spurious context deadline exceeded failures when fuzz time expires#156

fix(fuzz): eliminate spurious context deadline exceeded failures when fuzz time expires#156
thieman merged 4 commits intomainfrom
thieman/fix-fuzz-context-timeouts

thieman commented Mar 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

thieman commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

1. Fuzz test context propagation (18 files)

2. CI workflow boundary timeout handling (.github/workflows/fuzz.yml)

3. procinfo_windows.go PPID cycle detection

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

thieman commented Mar 26, 2026 •

edited

Loading

2. CI workflow boundary timeout handling (`.github/workflows/fuzz.yml`)

3. `procinfo_windows.go` PPID cycle detection