feat(prefork): Enhance prefork management with WatchMaster, CommandProducer, and Windows support#2180
Conversation
… callback comment
Integrate upstream's OnMasterDeath callback (replaces WatchMaster bool), os.Executable() for child command, and watchMaster as method on Prefork. Keep our OnChildSpawn, OnMasterReady, OnChildRecover callbacks and CommandProducer. Update tests accordingly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On Windows, os.Getppid() returns a static PID that doesn't change when the parent exits (no reparenting). Use FindProcess+Wait instead, which correctly detects parent exit. Also document why masterPID comparison works for Docker containers (master PID 1 case). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The three ListenAndServe* methods had identical child setup code (listen, set ln, watch master). Extract to listenAsChild() for cleaner code. Also add comment for the magic file descriptor number 3. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Keep upstream's (addr, certKey, certFile) signature to avoid breaking callers. Fix the doc comment to match the actual parameter order instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Enhances prefork to reduce duplication in downstream consumers (e.g., Fiber) by exposing lifecycle hooks and customizable child command creation, while also improving child/master monitoring behavior on Windows.
Changes:
- Added optional lifecycle callbacks (
OnChildSpawn,OnMasterReady,OnChildRecover) and aCommandProducerhook for custom child command creation. - Refactored child listener setup into
listenAsChild()to deduplicate logic acrossListenAndServe*methods. - Updated master-watch logic to correctly detect master exit on Windows and fixed goroutine loop-variable capture in spawn/recovery.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
prefork/prefork.go |
Adds callbacks + CommandProducer, refactors child setup, improves Windows master monitoring, and adjusts spawn/recovery loop behavior. |
prefork/prefork_test.go |
Adds tests around logger/callback fields and CommandProducer usage. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Lint fixes: - Remove unused Reuseport field write in test (govet/unusedwrite) - Replace fmt.Errorf with errors.New for static errors (perfsprint) Review feedback (Copilot): - Validate CommandProducer returns a started command (nil/Process check) - Clarify ListenAndServeTLS doc: parameter order and internal forwarding - Use hermetic test binary re-exec instead of external 'go' binary - Rename misleading test to reflect what it actually asserts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@erikdubbelboer WDYT? |
|
will check the hints and answer tomorrow, thx |
|
I will implement the changes suggested in the comments |
- watchMaster: log errors from FindProcess/Wait instead of swallowing - watchMaster: don't call OnMasterDeath if FindProcess fails - OnChildRecover: change signature to func(pid int), drop unused error return - OnChildSpawn: add comment clarifying deferred cleanup handles the child - CommandProducer: improve docs describing contract and use cases Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
ready for a recheck @erikdubbelboer |
- OnChildRecover: signature changed to func(oldPid, newPid int) so callers can track which process was replaced - OnChildSpawn: also called for recovered children (a recovered child is still a spawned child) - watchMaster: call OnMasterDeath when FindProcess fails (process is most likely gone) - CommandProducer: document that FASTHTTP_PREFORK_CHILD=1 must be set in the child env, and what the default does when nil Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@erikdubbelboer something else or ready for merge ? |
- Move Wait() goroutine before OnChildSpawn so Kill()+Wait() works correctly if a callback fails and the deferred cleanup runs - Add Wait() call in deferred cleanup after Kill() to reap children - Same fix in recovery loop - Remove shallow callback tests that only tested Go compiler - Add Test_Prefork_Lifecycle: runs full prefork with CommandProducer, verifies callbacks fire in correct order with correct arguments Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
All feedback addressed:
Ready for another look @erikdubbelboer |
Addresses outstanding review concerns and several adjacent issues surfaced during a follow-up review pass. Lifecycle / supervision - Track every per-child Wait goroutine via sync.WaitGroup and unblock pending sigCh sends through a context.Cancel so early-return paths (OnChildSpawn / OnMasterReady error, recovery doCommand error, ErrOverRecovery) can no longer leak goroutines or stall children. - Install signal.Notify(SIGTERM, SIGINT) in the master so deploy/ rolling-restart signals enter the shutdown path instead of killing the master without graceful teardown. - Replace the unconditional SIGKILL defer with a SIGTERM-then-SIGKILL sequence gated by a configurable ShutdownGracePeriod (defaults to 5s, Windows path stays SIGKILL since Signal(SIGTERM) is unsupported). API - OnChildRecover now returns error so callers can implement recovery policies (circuit-breaker etc.); panic in any hook is recovered and surfaced as the returned error, with diagnostic logging. - Add RecoverInterval (optional crash-loop backoff) and ShutdownGracePeriod fields with safe zero-value defaults. - Export ErrCommandProducerNilCmd and ErrCommandProducerNotStarted sentinel errors so callers can errors.Is them. - Rename oldPid/newPid to oldPID/newPID per Go initialism convention. - Logger interface now declares an explicit compile-time compatibility check with fasthttp.Logger. Resource hygiene - Master closes both the original tcpListener and the duped fd in p.files when prefork() returns; previously the duped fd leaked once per call. - doCommand wraps every error path with %w + fmt.Errorf so caller-side diagnostics keep stage context. - Strip pre-existing FASTHTTP_PREFORK_CHILD entries before appending so child env never carries duplicate keys. - Extract magic numbers as package constants (inheritedListenerFD, masterPollInterval, defaultShutdownGracePeriod, preforkChildEnvValue). - Rename the inherited listener fd via os.NewFile so net.FileListener errors are diagnosable. Tests - Migrate to t.Setenv (drop the global setUp/tearDown helpers) — fixes the env-mutation-vs-parallel race. - Replace rand.Intn port helper with `:0` + Listener.Addr() to remove port-collision flakes under -count and parallel runs. - Collapse the three near-identical Test_ListenAndServe* tests into a single table-driven subtest that actually asserts the args forwarded to ServeFunc/ServeTLSFunc/ServeTLSEmbedFunc. - Add coverage for the previously untested branches: CommandProducer returning err / nil cmd / unstarted cmd, initial OnChildSpawn error, OnMasterReady error, hook panic surfacing, RecoverInterval enforcement. - noopChildProducer helper kills + waits any spawned child binaries during cleanup so failed tests no longer leave subprocesses around.
|
Thanks! I'll tag a new release next week. |
|
I now see that you pushed another commit right between me checking your previous commit and merging this. Github didn't show this 😠 |
|
Oh, sorry, I didn't realize you were about to merge. Usually a little time passes before that happens. Feel free to review the last commit again, and I'll fix whatever you notice. |
|
I have reverted that commit on master: e9208ec |
|
ok |
We've been working on reducing code duplication between Fiber and fasthttp's prefork package. Right now Fiber maintains its own prefork implementation (~200 lines) that reimplements most of what
fasthttp/preforkalready does - process spawning, child management, the recovery loop - just to hook into a few lifecycle events (logging when a child spawns, showing a startup message after all workers are up, etc.).This PR adds a small set of optional callbacks to the
Preforkstruct that make it possible for consumers like Fiber to plug into the prefork lifecycle without copying the whole thing:OnChildSpawn(pid int) error- called when a new child is forked, so the caller can track PIDs or run hooksOnMasterReady(childPIDs []int) error- called once after all children are up, useful for startup messages or metricsOnChildRecover(pid int) error- called when a crashed child gets replaced, for logging/alertingCommandProducer(files []*os.File) (*exec.Cmd, error)- allows overriding how child commands are created, which is useful for testing (injecting dummy commands instead of re-executing the binary)All fields are optional and nil-safe - existing code doesn't need to change. The zero value works exactly as before.
Additionally:
listenAsChild()extracted to remove duplicated setup code across the threeListenAndServe*methodswatchMasternow handles Windows correctly (usingFindProcess+Waitinstead ofGetppidpolling, which doesn't detect parent exit on Windows)cmdandpidas parameters instead of capturing the loop variableYou can see the practical usage in the Fiber PR that builds on this: gofiber/fiber#4210 - it replaces Fiber's entire prefork implementation with ~50 lines that configure these callbacks. The rest is handled by this package.