Skip to content

fix(fuzz): eliminate spurious context deadline exceeded failures when fuzz time expires#156

Merged
thieman merged 4 commits intomainfrom
thieman/fix-fuzz-context-timeouts
Mar 26, 2026
Merged

fix(fuzz): eliminate spurious context deadline exceeded failures when fuzz time expires#156
thieman merged 4 commits intomainfrom
thieman/fix-fuzz-context-timeouts

Conversation

@thieman
Copy link
Copy Markdown
Collaborator

@thieman thieman commented Mar 26, 2026

Problem

Fuzz CI was failing in multiple ways:

  1. Coordinator boundary timeout — When -fuzztime=30s expires, Go's internal fuzz coordinator cancels any in-flight worker iteration and emits context deadline exceeded with no file:line reference. This is not a test failure, but the CI step was treating it as one.

  2. Test context not derived from t.Context() — Fuzz iterations used context.WithTimeout(context.Background(), 5s) instead of deriving from t.Context(). This meant an in-flight iteration at the 30s boundary would hit its own deadline, return context.DeadlineExceeded, and call t.Fatal() — causing a spurious CI failure.

  3. FuzzPSFlags hanging on WindowsgetSession in procinfo_windows.go walked the PPID chain with only an immediate self-reference guard (p.PPID == cur). On Windows CI the process table can contain longer PPID cycles (A→B→C→A), causing an infinite loop that hung for the full 120s seed corpus timeout.

Fix

1. Fuzz test context propagation (18 files)

  • Add early-exit guard at top of every f.Fuzz body: if t.Context().Err() != nil { return }
  • Derive per-iteration contexts from t.Context() so cancellation propagates when fuzz time ends

2. CI workflow boundary timeout handling (.github/workflows/fuzz.yml)

Wrap all go test -fuzz=... invocations in a fuzz_run() shell helper that:

  1. Captures output and exit code
  2. If exit non-zero, checks for _test.go:NNN: file:line references — real assertion failures always have these; coordinator boundary timeouts don't
  3. If no file:line found → logs a NOTE and exits 0 (suppresses spurious failure)
  4. If file:line found → propagates the real failure

Also reduces per-invocation -timeout from 300s (shared across all functions) to 90s (30s fuzz + 60s grace per function call).

Applies to both fuzz and fuzz-differential CI jobs.

3. procinfo_windows.go PPID cycle detection

Replace the narrow self-reference guard with a proper visited set:

visited := make(map[int]bool)
for cur > 0 {
    if visited[cur] {
        break // cycle detected in PPID chain
    }
    visited[cur] = true
    ancestors[cur] = true
    ...
}

Test plan

  • All fuzz jobs pass in CI (no context deadline exceeded failures)
  • Fuzz seed corpus (windows-latest) passes (FuzzPSFlags no longer hangs)
  • Fuzz Differential (wc) passes consistently
  • Real fuzz failures (with file:line references) still propagate as failures

🤖 Generated with Claude Code

thieman and others added 4 commits March 26, 2026 11:03
…z timeout

When go test -fuzz -fuzztime=30s completes its budget, Go cancels
t.Context(). Previously, fuzz iterations derived their context from
context.Background(), so an in-flight iteration would hit its own 5s
per-iteration deadline at the fuzz-time boundary, return
context.DeadlineExceeded, and the test would call t.Fatal() — reporting
a spurious CI failure.

Two-part fix across all fuzz test files:
1. Check t.Context().Err() at the top of every f.Fuzz body; return
   immediately if fuzz time has already elapsed.
2. Derive per-iteration contexts from t.Context() so they are cancelled
   when fuzz time ends, and treat context cancellation as a clean exit
   rather than a test failure by adding t.Context().Err() guards before
   every t.Errorf/t.Fatalf call.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When -fuzztime=30s expires, Go's internal fuzz coordinator cancels any
in-flight worker iteration and emits "context deadline exceeded" with no
file:line reference. This is not a test failure — it's the normal fuzz
time boundary. Real test assertion failures always include a file:line
reference (e.g. foo_test.go:42:).

Wrap all fuzz invocations in a fuzz_run() helper that:
1. Captures output and exit code
2. If exit non-zero, checks for file:line references in the output
3. If no file:line found, treats as boundary timeout and exits 0
4. If file:line found, propagates the real failure

Also reduces per-invocation -timeout from 300s to 90s (30s fuzz +
60s grace) since each go test call fuzzes exactly one function.

Applies to both the fuzz and fuzz-differential CI jobs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The PPID chain walk only guarded against immediate self-reference
(p.PPID == cur) but not longer cycles like A→B→C→A. On Windows CI
the process table can contain such cycles, causing FuzzPSFlags to
hang indefinitely in the seed corpus run.

Replace the narrow self-reference check with a visited set that
detects any cycle in the PPID chain.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
With bash -e active in GitHub Actions, output=$(go test ...) causes
the entire function to exit immediately when go test returns non-zero.
This meant:
- echo "$output" never ran (no visible log output)
- The grep check for real failures never ran
- The step silently failed rather than logging what went wrong

Replace with tee to a tmpfile so:
1. go test output streams live to the Actions log in real time
2. PIPESTATUS[0] correctly captures go test's exit code
3. The tmpfile is available for grep after the command finishes

Applies to both fuzz and fuzz-differential jobs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@thieman thieman marked this pull request as ready for review March 26, 2026 15:50
@thieman thieman added this pull request to the merge queue Mar 26, 2026
Merged via the queue into main with commit ff6ded4 Mar 26, 2026
34 checks passed
@thieman thieman deleted the thieman/fix-fuzz-context-timeouts branch March 26, 2026 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants