Skip to content

feat(prefork): Enhance prefork management with WatchMaster, CommandProducer, and Windows support#2180

Merged
erikdubbelboer merged 17 commits into
valyala:masterfrom
ReneWerner87:prefork_optimization
May 2, 2026
Merged

feat(prefork): Enhance prefork management with WatchMaster, CommandProducer, and Windows support#2180
erikdubbelboer merged 17 commits into
valyala:masterfrom
ReneWerner87:prefork_optimization

Conversation

@ReneWerner87
Copy link
Copy Markdown
Contributor

We've been working on reducing code duplication between Fiber and fasthttp's prefork package. Right now Fiber maintains its own prefork implementation (~200 lines) that reimplements most of what fasthttp/prefork already does - process spawning, child management, the recovery loop - just to hook into a few lifecycle events (logging when a child spawns, showing a startup message after all workers are up, etc.).

This PR adds a small set of optional callbacks to the Prefork struct that make it possible for consumers like Fiber to plug into the prefork lifecycle without copying the whole thing:

  • OnChildSpawn(pid int) error - called when a new child is forked, so the caller can track PIDs or run hooks
  • OnMasterReady(childPIDs []int) error - called once after all children are up, useful for startup messages or metrics
  • OnChildRecover(pid int) error - called when a crashed child gets replaced, for logging/alerting
  • CommandProducer(files []*os.File) (*exec.Cmd, error) - allows overriding how child commands are created, which is useful for testing (injecting dummy commands instead of re-executing the binary)

All fields are optional and nil-safe - existing code doesn't need to change. The zero value works exactly as before.

Additionally:

  • listenAsChild() extracted to remove duplicated setup code across the three ListenAndServe* methods
  • watchMaster now handles Windows correctly (using FindProcess+Wait instead of Getppid polling, which doesn't detect parent exit on Windows)
  • Goroutine closures in the spawn/recovery loop pass cmd and pid as parameters instead of capturing the loop variable

You can see the practical usage in the Fiber PR that builds on this: gofiber/fiber#4210 - it replaces Fiber's entire prefork implementation with ~50 lines that configure these callbacks. The rest is handled by this package.

ReneWerner87 and others added 8 commits January 28, 2026 08:50
Integrate upstream's OnMasterDeath callback (replaces WatchMaster bool),
os.Executable() for child command, and watchMaster as method on Prefork.
Keep our OnChildSpawn, OnMasterReady, OnChildRecover callbacks and
CommandProducer. Update tests accordingly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On Windows, os.Getppid() returns a static PID that doesn't change when
the parent exits (no reparenting). Use FindProcess+Wait instead, which
correctly detects parent exit. Also document why masterPID comparison
works for Docker containers (master PID 1 case).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The three ListenAndServe* methods had identical child setup code
(listen, set ln, watch master). Extract to listenAsChild() for
cleaner code. Also add comment for the magic file descriptor number 3.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Keep upstream's (addr, certKey, certFile) signature to avoid breaking
callers. Fix the doc comment to match the actual parameter order instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enhances prefork to reduce duplication in downstream consumers (e.g., Fiber) by exposing lifecycle hooks and customizable child command creation, while also improving child/master monitoring behavior on Windows.

Changes:

  • Added optional lifecycle callbacks (OnChildSpawn, OnMasterReady, OnChildRecover) and a CommandProducer hook for custom child command creation.
  • Refactored child listener setup into listenAsChild() to deduplicate logic across ListenAndServe* methods.
  • Updated master-watch logic to correctly detect master exit on Windows and fixed goroutine loop-variable capture in spawn/recovery.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
prefork/prefork.go Adds callbacks + CommandProducer, refactors child setup, improves Windows master monitoring, and adjusts spawn/recovery loop behavior.
prefork/prefork_test.go Adds tests around logger/callback fields and CommandProducer usage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread prefork/prefork.go Outdated
Comment thread prefork/prefork.go
Comment thread prefork/prefork_test.go Outdated
Comment thread prefork/prefork_test.go Outdated
Comment thread prefork/prefork.go Outdated
Lint fixes:
- Remove unused Reuseport field write in test (govet/unusedwrite)
- Replace fmt.Errorf with errors.New for static errors (perfsprint)

Review feedback (Copilot):
- Validate CommandProducer returns a started command (nil/Process check)
- Clarify ListenAndServeTLS doc: parameter order and internal forwarding
- Use hermetic test binary re-exec instead of external 'go' binary
- Rename misleading test to reflect what it actually asserts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ReneWerner87
Copy link
Copy Markdown
Contributor Author

@erikdubbelboer WDYT?

Comment thread prefork/prefork.go Outdated
Comment thread prefork/prefork.go Outdated
Comment thread prefork/prefork.go
Comment thread prefork/prefork.go Outdated
@ReneWerner87
Copy link
Copy Markdown
Contributor Author

will check the hints and answer tomorrow, thx

@ReneWerner87
Copy link
Copy Markdown
Contributor Author

I will implement the changes suggested in the comments

- watchMaster: log errors from FindProcess/Wait instead of swallowing
- watchMaster: don't call OnMasterDeath if FindProcess fails
- OnChildRecover: change signature to func(pid int), drop unused error return
- OnChildSpawn: add comment clarifying deferred cleanup handles the child
- CommandProducer: improve docs describing contract and use cases

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ReneWerner87
Copy link
Copy Markdown
Contributor Author

ready for a recheck @erikdubbelboer

Comment thread prefork/prefork.go Outdated
Comment thread prefork/prefork.go
Comment thread prefork/prefork.go Outdated
Comment thread prefork/prefork.go Outdated
ReneWerner87 and others added 2 commits April 25, 2026 12:36
- OnChildRecover: signature changed to func(oldPid, newPid int) so
  callers can track which process was replaced
- OnChildSpawn: also called for recovered children (a recovered child
  is still a spawned child)
- watchMaster: call OnMasterDeath when FindProcess fails (process is
  most likely gone)
- CommandProducer: document that FASTHTTP_PREFORK_CHILD=1 must be set
  in the child env, and what the default does when nil

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ReneWerner87
Copy link
Copy Markdown
Contributor Author

@erikdubbelboer something else or ready for merge ?

Comment thread prefork/prefork.go Outdated
Comment thread prefork/prefork.go Outdated
Comment thread prefork/prefork_test.go Outdated
Comment thread prefork/prefork_test.go Outdated
- Move Wait() goroutine before OnChildSpawn so Kill()+Wait() works
  correctly if a callback fails and the deferred cleanup runs
- Add Wait() call in deferred cleanup after Kill() to reap children
- Same fix in recovery loop
- Remove shallow callback tests that only tested Go compiler
- Add Test_Prefork_Lifecycle: runs full prefork with CommandProducer,
  verifies callbacks fire in correct order with correct arguments

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ReneWerner87
Copy link
Copy Markdown
Contributor Author

All feedback addressed:

  • Wait goroutine now starts before OnChildSpawn (both in spawn and recovery loop) to avoid zombies
  • Added Wait() in deferred cleanup after Kill()
  • Removed the shallow callback tests, replaced with Test_Prefork_Lifecycle that runs the full prefork loop and verifies callback order/arguments

Ready for another look @erikdubbelboer

Comment thread prefork/prefork_test.go Outdated
Comment thread prefork/prefork_test.go Outdated
Comment thread prefork/prefork_test.go
Comment thread prefork/prefork.go Outdated
Comment thread prefork/prefork.go Outdated
Comment thread prefork/prefork.go Outdated
Addresses outstanding review concerns and several adjacent issues
surfaced during a follow-up review pass.

Lifecycle / supervision
- Track every per-child Wait goroutine via sync.WaitGroup and unblock
  pending sigCh sends through a context.Cancel so early-return paths
  (OnChildSpawn / OnMasterReady error, recovery doCommand error,
  ErrOverRecovery) can no longer leak goroutines or stall children.
- Install signal.Notify(SIGTERM, SIGINT) in the master so deploy/
  rolling-restart signals enter the shutdown path instead of killing
  the master without graceful teardown.
- Replace the unconditional SIGKILL defer with a SIGTERM-then-SIGKILL
  sequence gated by a configurable ShutdownGracePeriod (defaults to 5s,
  Windows path stays SIGKILL since Signal(SIGTERM) is unsupported).

API
- OnChildRecover now returns error so callers can implement recovery
  policies (circuit-breaker etc.); panic in any hook is recovered and
  surfaced as the returned error, with diagnostic logging.
- Add RecoverInterval (optional crash-loop backoff) and
  ShutdownGracePeriod fields with safe zero-value defaults.
- Export ErrCommandProducerNilCmd and ErrCommandProducerNotStarted
  sentinel errors so callers can errors.Is them.
- Rename oldPid/newPid to oldPID/newPID per Go initialism convention.
- Logger interface now declares an explicit compile-time compatibility
  check with fasthttp.Logger.

Resource hygiene
- Master closes both the original tcpListener and the duped fd in
  p.files when prefork() returns; previously the duped fd leaked once
  per call.
- doCommand wraps every error path with %w + fmt.Errorf so caller-side
  diagnostics keep stage context.
- Strip pre-existing FASTHTTP_PREFORK_CHILD entries before appending so
  child env never carries duplicate keys.
- Extract magic numbers as package constants
  (inheritedListenerFD, masterPollInterval, defaultShutdownGracePeriod,
  preforkChildEnvValue).
- Rename the inherited listener fd via os.NewFile so net.FileListener
  errors are diagnosable.

Tests
- Migrate to t.Setenv (drop the global setUp/tearDown helpers) — fixes
  the env-mutation-vs-parallel race.
- Replace rand.Intn port helper with `:0` + Listener.Addr() to remove
  port-collision flakes under -count and parallel runs.
- Collapse the three near-identical Test_ListenAndServe* tests into a
  single table-driven subtest that actually asserts the args forwarded
  to ServeFunc/ServeTLSFunc/ServeTLSEmbedFunc.
- Add coverage for the previously untested branches:
  CommandProducer returning err / nil cmd / unstarted cmd,
  initial OnChildSpawn error, OnMasterReady error,
  hook panic surfacing, RecoverInterval enforcement.
- noopChildProducer helper kills + waits any spawned child binaries
  during cleanup so failed tests no longer leave subprocesses around.
@erikdubbelboer erikdubbelboer merged commit 481e579 into valyala:master May 2, 2026
10 of 11 checks passed
@erikdubbelboer
Copy link
Copy Markdown
Collaborator

Thanks! I'll tag a new release next week.

@erikdubbelboer
Copy link
Copy Markdown
Collaborator

I now see that you pushed another commit right between me checking your previous commit and merging this. Github didn't show this 😠

@ReneWerner87
Copy link
Copy Markdown
Contributor Author

Oh, sorry, I didn't realize you were about to merge. Usually a little time passes before that happens.

Feel free to review the last commit again, and I'll fix whatever you notice.

@erikdubbelboer
Copy link
Copy Markdown
Collaborator

I have reverted that commit on master: e9208ec
Please make a new pull request for these changes.

@ReneWerner87
Copy link
Copy Markdown
Contributor Author

ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants