Skip to content

fix: prevent goroutine leak and data race in terminal exec timeout#126

Closed
mason5052 wants to merge 1 commit intovxcontrol:masterfrom
mason5052:fix/terminal-exec-goroutine-leak
Closed

fix: prevent goroutine leak and data race in terminal exec timeout#126
mason5052 wants to merge 1 commit intovxcontrol:masterfrom
mason5052:fix/terminal-exec-goroutine-leak

Conversation

@mason5052
Copy link
Contributor

Description of the Change

Problem

In getExecResult() (terminal.go), when command execution times out, three issues occur:

  1. Goroutine leak: The goroutine running io.Copy(&dst, resp.Reader) remains blocked on the reader indefinitely. While defer resp.Close() eventually runs, there is a window where the goroutine is still blocked between the timeout and function return. Under sustained load with many timeouts, this leads to goroutine accumulation and memory bloat.

  2. Data race on buffer: On timeout, dst.String() is called (line 223) while the goroutine may still be writing to dst via io.Copy — a concurrent read/write on bytes.Buffer.

  3. Data race on err variable: The goroutine writes to the shared err variable (line 216), while the main goroutine also writes to err in the timeout case (line 224) — a classic data race.

Solution

  • Close resp immediately on timeout to unblock io.Copy — prevents goroutine leak
  • Wait for goroutine to finish (with 5s safety timeout) before reading dst.String() — eliminates buffer data race
  • Use separate copyErr variable for the goroutine, assigned to err only after goroutine completes — eliminates err variable data race

The existing defer resp.Close() serves as a safety net for the normal (non-timeout) path. Double-close on HijackedResponse is safe.

Closes #108

Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • 🛡️ Security update

Areas Affected

  • Security Tools Integration

Testing and Verification

Test Steps

  1. Build the backend: cd backend && go build ./...
  2. Run a long-running command in a terminal container with a short timeout
  3. Monitor goroutine count before and after timeout (e.g., via pprof or runtime.NumGoroutine())
  4. Verify goroutine count returns to baseline after timeout

Security Considerations

Goroutine leaks can lead to resource exhaustion (memory, file descriptors) during sustained penetration testing operations. Fixing this prevents potential DoS conditions when many commands time out simultaneously.

Performance Impact

Positive impact: eliminates goroutine leaks that would otherwise accumulate during long-running pentest sessions with command timeouts.

Checklist

Code Quality

  • My code follows the project's coding standards
  • All new and existing tests pass
  • I have run go fmt and go vet (for Go code)

Security

  • I have considered security implications
  • Changes maintain or improve the security model

Compatibility

  • Changes are backward compatible

When command execution times out in getExecResult(), the goroutine
running io.Copy() remains blocked on resp.Reader indefinitely, causing
a goroutine leak. Additionally, there are two data races:

1. The goroutine writes to dst via io.Copy while the main goroutine
   reads dst.String() on timeout (concurrent read/write on buffer).
2. The goroutine writes to the shared err variable while the main
   goroutine also writes to it in the timeout case.

Fix by:
- Closing resp immediately on timeout to unblock the io.Copy goroutine
- Waiting for the goroutine to finish (with 5s safety timeout) before
  reading dst, eliminating the data race on the buffer
- Using a separate copyErr variable for the goroutine, only assigning
  to err after the goroutine completes, eliminating the data race on err

The existing defer resp.Close() serves as a safety net for the normal
path; double-close on HijackedResponse is safe.

Ref: vxcontrol#108

Signed-off-by: mason5052 <ehehwnwjs5052@gmail.com>
Copilot AI review requested due to automatic review settings February 22, 2026 22:46
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes critical concurrency issues in the terminal exec timeout handling that could lead to goroutine leaks and data races. The fix addresses three problems: goroutine leaks when commands timeout, concurrent read/write data races on the output buffer, and data races on the error variable.

Changes:

  • Introduced a separate copyErr variable to eliminate data race on the error variable
  • Added immediate resp.Close() call in the timeout case to unblock the io.Copy goroutine
  • Added a wait mechanism with 5-second safety timeout before reading the buffer to prevent data races

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

case <-time.After(5 * time.Second):
}
result := fmt.Sprintf("temporary output: %s", dst.String())
err = fmt.Errorf("timeout value is too low, use greater value if you need so: %w: %s", ctx.Err(), result)
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the timeout case, the error set at line 234 will be further wrapped by the error handling at line 236-237 with "failed to copy output", which is misleading. The timeout error is not a copy failure - it's a timeout. Consider returning early after setting the timeout error to avoid this misleading double-wrapping, or check specifically for timeout errors before the generic error handling.

Suggested change
err = fmt.Errorf("timeout value is too low, use greater value if you need so: %w: %s", ctx.Err(), result)
err = fmt.Errorf("timeout value is too low, use greater value if you need so: %w: %s", ctx.Err(), result)
return "", err

Copilot uses AI. Check for mistakes.
Comment on lines +229 to 234
select {
case <-done:
case <-time.After(5 * time.Second):
}
result := fmt.Sprintf("temporary output: %s", dst.String())
err = fmt.Errorf("timeout value is too low, use greater value if you need so: %w: %s", ctx.Err(), result)
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the goroutine doesn't complete within the 5-second grace period at line 231, the code proceeds to read dst.String() at line 233 while the goroutine may still be writing to dst via io.Copy. This creates a data race on the buffer. Consider checking which case was selected and logging a warning if the timeout occurred, or avoid reading dst.String() if the goroutine hasn't completed.

Suggested change
select {
case <-done:
case <-time.After(5 * time.Second):
}
result := fmt.Sprintf("temporary output: %s", dst.String())
err = fmt.Errorf("timeout value is too low, use greater value if you need so: %w: %s", ctx.Err(), result)
copyDone := false
select {
case <-done:
copyDone = true
case <-time.After(5 * time.Second):
}
if copyDone {
result := fmt.Sprintf("temporary output: %s", dst.String())
err = fmt.Errorf("timeout value is too low, use greater value if you need so: %w: %s", ctx.Err(), result)
} else {
logrus.Warn("io.Copy did not complete within the grace period; omitting temporary output to avoid data race")
err = fmt.Errorf("timeout value is too low, use greater value if you need so: %w", ctx.Err())
}

Copilot uses AI. Check for mistakes.
@asdek
Copy link
Contributor

asdek commented Feb 23, 2026

hey @mason5052

these changes was included by #124 into temporary branch
https://github.com/vxcontrol/pentagi/commits/feature/project_improvements/

it'll be added today in the master branch

thank you for suggestion!

@asdek asdek closed this Feb 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Goroutine and Resource Leak in Terminal Exec Timeout Handling

3 participants