feat(prefork): graceful shutdown, leak fixes, hook robustness (re-open of #2180 follow-up)#2199
Open
ReneWerner87 wants to merge 5 commits into
Open
feat(prefork): graceful shutdown, leak fixes, hook robustness (re-open of #2180 follow-up)#2199ReneWerner87 wants to merge 5 commits into
ReneWerner87 wants to merge 5 commits into
Conversation
Addresses outstanding review concerns and several adjacent issues surfaced during a follow-up review pass. Lifecycle / supervision - Track every per-child Wait goroutine via sync.WaitGroup and unblock pending sigCh sends through a context.Cancel so early-return paths (OnChildSpawn / OnMasterReady error, recovery doCommand error, ErrOverRecovery) can no longer leak goroutines or stall children. - Install signal.Notify(SIGTERM, SIGINT) in the master so deploy/ rolling-restart signals enter the shutdown path instead of killing the master without graceful teardown. - Replace the unconditional SIGKILL defer with a SIGTERM-then-SIGKILL sequence gated by a configurable ShutdownGracePeriod (defaults to 5s, Windows path stays SIGKILL since Signal(SIGTERM) is unsupported). API - OnChildRecover now returns error so callers can implement recovery policies (circuit-breaker etc.); panic in any hook is recovered and surfaced as the returned error, with diagnostic logging. - Add RecoverInterval (optional crash-loop backoff) and ShutdownGracePeriod fields with safe zero-value defaults. - Export ErrCommandProducerNilCmd and ErrCommandProducerNotStarted sentinel errors so callers can errors.Is them. - Rename oldPid/newPid to oldPID/newPID per Go initialism convention. - Logger interface now declares an explicit compile-time compatibility check with fasthttp.Logger. Resource hygiene - Master closes both the original tcpListener and the duped fd in p.files when prefork() returns; previously the duped fd leaked once per call. - doCommand wraps every error path with %w + fmt.Errorf so caller-side diagnostics keep stage context. - Strip pre-existing FASTHTTP_PREFORK_CHILD entries before appending so child env never carries duplicate keys. - Extract magic numbers as package constants (inheritedListenerFD, masterPollInterval, defaultShutdownGracePeriod, preforkChildEnvValue). - Rename the inherited listener fd via os.NewFile so net.FileListener errors are diagnosable. Tests - Migrate to t.Setenv (drop the global setUp/tearDown helpers) — fixes the env-mutation-vs-parallel race. - Replace rand.Intn port helper with `:0` + Listener.Addr() to remove port-collision flakes under -count and parallel runs. - Collapse the three near-identical Test_ListenAndServe* tests into a single table-driven subtest that actually asserts the args forwarded to ServeFunc/ServeTLSFunc/ServeTLSEmbedFunc. - Add coverage for the previously untested branches: CommandProducer returning err / nil cmd / unstarted cmd, initial OnChildSpawn error, OnMasterReady error, hook panic surfacing, RecoverInterval enforcement. - noopChildProducer helper kills + waits any spawned child binaries during cleanup so failed tests no longer leave subprocesses around.
There was a problem hiding this comment.
Pull request overview
This PR revisits the prefork package lifecycle/supervision logic to make shutdown and recovery more robust (avoid goroutine/fd leaks, improve signal handling), while expanding the hook API and strengthening test coverage around failure modes.
Changes:
- Add master signal handling plus a structured shutdown path (SIGTERM → grace → SIGKILL) and goroutine-leak prevention for per-child
Waitroutines. - Harden hook execution (panic recovery) and extend the public API (new fields + exported sentinel errors).
- Refactor and expand tests to reduce flakes (no random ports / env races) and cover new/previously-untested branches.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
prefork/prefork.go |
Adds shutdown orchestration, hook recovery, new exported errors/config knobs, and resource-hygiene improvements. |
prefork/prefork_test.go |
Refactors tests for determinism and adds coverage for new error paths, hook behavior, and recovery timing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- listen(): the *os.File wrapping the inherited fd was never closed. net.FileListener dups the fd, so the original was leaking on every child startup. Close it explicitly and return the dup'd listener. - setTCPListenerFiles(): if tcpListener.File() failed, the bound net.Listener stayed open and p.ln pointed at it. Close the listener on the error path and only assign p.ln after the dup succeeds. - prefork(): replace time.After in the RecoverInterval branch with a time.NewTimer that we Stop+drain when a shutdown signal wins the select, so the timer goroutine and channel allocation don't linger during crash-loop shutdown. - invokeHook(): drop the panic log line. The hook caller logs the returned error already, so logging in the recover block produced duplicate output for the same panic.
erikdubbelboer
requested changes
May 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Re-opens the follow-up commit that was applied to master right before #2180 was merged and then reverted in e9208ec because @erikdubbelboer hadn't reviewed it. Splitting it back out so it can be reviewed on its own merits.
Original PR (merged): #2180
Reverted commit: 262ea09 → e9208ec
Original summary
Addresses outstanding review concerns from #2180 plus several adjacent issues surfaced during a follow-up review pass.
Lifecycle / supervision
Waitgoroutine viasync.WaitGroupand unblock pendingsigChsends through acontext.Cancelso early-return paths (OnChildSpawn/OnMasterReadyerror, recoverydoCommanderror,ErrOverRecovery) can no longer leak goroutines or stall children.signal.Notify(SIGTERM, SIGINT)in the master so deploy / rolling-restart signals enter the shutdown path instead of killing the master without graceful teardown.SIGKILLdefer with aSIGTERM-then-SIGKILLsequence gated by a configurableShutdownGracePeriod(defaults to 5s; the Windows path staysSIGKILLsinceProcess.Signal(SIGTERM)is unsupported there).API
OnChildRecovernow returnserrorso callers can implement recovery policies (circuit-breaker etc.); panics in any hook are recovered and surfaced as the returned error, with diagnostic logging.RecoverInterval(optional crash-loop backoff) andShutdownGracePeriodfields with safe zero-value defaults — both are opt-in, existing behaviour is preserved when zero.ErrCommandProducerNilCmdandErrCommandProducerNotStartedsentinel errors so callers canerrors.Isthem.oldPid/newPidtooldPID/newPIDper Go initialism convention.var _ Logger = fasthttp.Logger(nil)check so the localLoggerinterface stays in sync withfasthttp.Logger.Resource hygiene
tcpListenerand the duped fd inp.fileswhenprefork()returns; previously the duped fd leaked once per call.doCommandwraps every error path with%w+fmt.Errorfso caller-side diagnostics keep stage context.FASTHTTP_PREFORK_CHILDentries before appending so child env never carries duplicate keys.inheritedListenerFD,masterPollInterval,defaultShutdownGracePeriod,preforkChildEnvValue).os.NewFilesonet.FileListenererrors are diagnosable.Tests
t.Setenv(drop the globalsetUp/tearDownhelpers) — fixes the env-mutation-vs-parallel race.rand.Intnport helper with:0+Listener.Addr()to remove port-collision flakes under-count/ parallel runs.Test_ListenAndServe*tests into a single table-driven subtest that actually asserts the args forwarded toServeFunc/ServeTLSFunc/ServeTLSEmbedFunc.CommandProducerreturning err / nil cmd / unstarted cmd, initialOnChildSpawnerror,OnMasterReadyerror, hook panic surfacing,RecoverIntervalenforcement.noopChildProducerhelper kills + waits any spawned child binaries during cleanup so failed tests no longer leave subprocesses around.Local run on this branch:
go test -race ./prefork/...andgolangci-lint run ./prefork/...are both clean. Happy to split this into smaller PRs if you'd prefer to land the goroutine-leak fix and the API change separately from the new fields.