fix: prevent goroutine leak and data race in terminal exec timeout#126
fix: prevent goroutine leak and data race in terminal exec timeout#126mason5052 wants to merge 1 commit intovxcontrol:masterfrom
Conversation
When command execution times out in getExecResult(), the goroutine running io.Copy() remains blocked on resp.Reader indefinitely, causing a goroutine leak. Additionally, there are two data races: 1. The goroutine writes to dst via io.Copy while the main goroutine reads dst.String() on timeout (concurrent read/write on buffer). 2. The goroutine writes to the shared err variable while the main goroutine also writes to it in the timeout case. Fix by: - Closing resp immediately on timeout to unblock the io.Copy goroutine - Waiting for the goroutine to finish (with 5s safety timeout) before reading dst, eliminating the data race on the buffer - Using a separate copyErr variable for the goroutine, only assigning to err after the goroutine completes, eliminating the data race on err The existing defer resp.Close() serves as a safety net for the normal path; double-close on HijackedResponse is safe. Ref: vxcontrol#108 Signed-off-by: mason5052 <ehehwnwjs5052@gmail.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes critical concurrency issues in the terminal exec timeout handling that could lead to goroutine leaks and data races. The fix addresses three problems: goroutine leaks when commands timeout, concurrent read/write data races on the output buffer, and data races on the error variable.
Changes:
- Introduced a separate
copyErrvariable to eliminate data race on the error variable - Added immediate
resp.Close()call in the timeout case to unblock the io.Copy goroutine - Added a wait mechanism with 5-second safety timeout before reading the buffer to prevent data races
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| case <-time.After(5 * time.Second): | ||
| } | ||
| result := fmt.Sprintf("temporary output: %s", dst.String()) | ||
| err = fmt.Errorf("timeout value is too low, use greater value if you need so: %w: %s", ctx.Err(), result) |
There was a problem hiding this comment.
In the timeout case, the error set at line 234 will be further wrapped by the error handling at line 236-237 with "failed to copy output", which is misleading. The timeout error is not a copy failure - it's a timeout. Consider returning early after setting the timeout error to avoid this misleading double-wrapping, or check specifically for timeout errors before the generic error handling.
| err = fmt.Errorf("timeout value is too low, use greater value if you need so: %w: %s", ctx.Err(), result) | |
| err = fmt.Errorf("timeout value is too low, use greater value if you need so: %w: %s", ctx.Err(), result) | |
| return "", err |
| select { | ||
| case <-done: | ||
| case <-time.After(5 * time.Second): | ||
| } | ||
| result := fmt.Sprintf("temporary output: %s", dst.String()) | ||
| err = fmt.Errorf("timeout value is too low, use greater value if you need so: %w: %s", ctx.Err(), result) |
There was a problem hiding this comment.
If the goroutine doesn't complete within the 5-second grace period at line 231, the code proceeds to read dst.String() at line 233 while the goroutine may still be writing to dst via io.Copy. This creates a data race on the buffer. Consider checking which case was selected and logging a warning if the timeout occurred, or avoid reading dst.String() if the goroutine hasn't completed.
| select { | |
| case <-done: | |
| case <-time.After(5 * time.Second): | |
| } | |
| result := fmt.Sprintf("temporary output: %s", dst.String()) | |
| err = fmt.Errorf("timeout value is too low, use greater value if you need so: %w: %s", ctx.Err(), result) | |
| copyDone := false | |
| select { | |
| case <-done: | |
| copyDone = true | |
| case <-time.After(5 * time.Second): | |
| } | |
| if copyDone { | |
| result := fmt.Sprintf("temporary output: %s", dst.String()) | |
| err = fmt.Errorf("timeout value is too low, use greater value if you need so: %w: %s", ctx.Err(), result) | |
| } else { | |
| logrus.Warn("io.Copy did not complete within the grace period; omitting temporary output to avoid data race") | |
| err = fmt.Errorf("timeout value is too low, use greater value if you need so: %w", ctx.Err()) | |
| } |
|
hey @mason5052 these changes was included by #124 into temporary branch it'll be added today in the master branch thank you for suggestion! |
Description of the Change
Problem
In
getExecResult()(terminal.go), when command execution times out, three issues occur:Goroutine leak: The goroutine running
io.Copy(&dst, resp.Reader)remains blocked on the reader indefinitely. Whiledefer resp.Close()eventually runs, there is a window where the goroutine is still blocked between the timeout and function return. Under sustained load with many timeouts, this leads to goroutine accumulation and memory bloat.Data race on buffer: On timeout,
dst.String()is called (line 223) while the goroutine may still be writing todstviaio.Copy— a concurrent read/write onbytes.Buffer.Data race on err variable: The goroutine writes to the shared
errvariable (line 216), while the main goroutine also writes toerrin the timeout case (line 224) — a classic data race.Solution
respimmediately on timeout to unblockio.Copy— prevents goroutine leakdst.String()— eliminates buffer data racecopyErrvariable for the goroutine, assigned toerronly after goroutine completes — eliminates err variable data raceThe existing
defer resp.Close()serves as a safety net for the normal (non-timeout) path. Double-close onHijackedResponseis safe.Closes #108
Type of Change
Areas Affected
Testing and Verification
Test Steps
cd backend && go build ./...runtime.NumGoroutine())Security Considerations
Goroutine leaks can lead to resource exhaustion (memory, file descriptors) during sustained penetration testing operations. Fixing this prevents potential DoS conditions when many commands time out simultaneously.
Performance Impact
Positive impact: eliminates goroutine leaks that would otherwise accumulate during long-running pentest sessions with command timeouts.
Checklist
Code Quality
go fmtandgo vet(for Go code)Security
Compatibility