Fix child process termination in containerized environments#3
Merged
Conversation
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
mudler
reviewed
Jan 10, 2026
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Fix child process termination in containerized environments
Fix child process termination in containerized environments
Jan 10, 2026
mudler
reviewed
Jan 10, 2026
| // reparented to the nearest still living ancestor subreaper. | ||
| // This is useful in containerized environments to ensure proper cleanup of | ||
| // orphaned child processes. | ||
| func SetSubreaper() error { |
Owner
There was a problem hiding this comment.
@copilot seems you forgot to actually set the process as subreaper and waiting for its childs
Contributor
Author
There was a problem hiding this comment.
This has been addressed in commit c1a9ec4. The process is now properly set as a subreaper by calling SetSubreaper() in Run(), and reapChildren() continuously waits for and reaps orphaned child processes using Wait4(-1, ...) with WNOHANG.
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
mudler
reviewed
Jan 10, 2026
|
|
||
| // Stop stops the running process by senging KillSignal to the PID annotated in the pidfile | ||
| // Stop stops the running process by sending KillSignal to the PID annotated in the pidfile | ||
| func (p *Process) Stop() error { |
Owner
There was a problem hiding this comment.
@copilot since we are here at touching Stop(), let's improve errors that are returned such as we can identify better where they are coming from, and don't shadow errors, but wrap them where necessary.
Contributor
Author
There was a problem hiding this comment.
Improved in 2b22273. The Stop() method now:
- Wraps errors with
fmt.Errorfand%wto preserve error chains - Includes contextual information (PID values, signal types) in error messages
- Checks and reports errors from
killProcess()calls instead of ignoring them - Provides specific error messages for each failure scenario (read PID, parse PID, send signal)
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes child process termination issues in containerized environments, addressing resource leaks (especially GPU memory) when stopping processes. Fixes mudler/LocalAI#7958.
Changes Implemented
GracefulTimeout(default 15s) andKillProcessGroup(default true) config optionsWithGracefulTimeout()andWithKillProcessGroup()option functionsSetpgid: trueinRun()(Unix/Linux)Stop()(Unix/Linux)SetSubreaper()function for Linux (using prctl)SetSubreaper()inRun()and implement child reapingreapChildren()to wait for and reap orphaned child processesStop()with proper context and wrappingRoot Causes Addressed
✅ No process group handling → Added
Setpgid: truefor Unix/Linux✅ Signals only sent to direct child → Sending to entire process group via negative PID
✅ Immediate SIGKILL after SIGTERM → Added 15-second graceful timeout
✅ No subreaper in containers → Linux-specific subreaper support via prctl + child reaping
Subreaper Implementation
The subreaper functionality now properly:
SetSubreaper()when starting a process to mark the current process as a subreaperreapChildren()goroutine that continuously waits for and reaps orphaned child processesError Handling Improvements
The
Stop()method now provides better error context:fmt.Errorfand%wto preserve error chainskillProcess()callsTesting
Files Changed
config.go: Added new config fields with defaultsoptions.go: Added option functionsprocess.go: Updated Run() to call SetSubreaper(), monitor() to spawn reaper, improved Stop() error handlingprocess_unix.go: Unix-specific process group handling + reapChildren() implementationprocess_windows.go: Windows-specific process handling + no-op reapChildren()subreaper_linux.go: Linux subreaper supportsubreaper_other.go: No-op for non-Linux platformsprocess_test.go: Added process group termination test💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.