fix(file_search): wrap walker in spawn_blocking + 30s timeout#1790
fix(file_search): wrap walker in spawn_blocking + 30s timeout#1790h3c-hexin wants to merge 1 commit into
Conversation
`FileSearchTool::execute` is `async fn` but `search_files()` synchronously
iterates the entire directory tree via `ignore::WalkBuilder`. On large trees
(e.g. `$HOME`, mounted network volumes, repos with deep `node_modules/`) this
blocks the tokio task for tens of seconds to several minutes.
While the walker runs, the engine's turn loop cannot observe the shared
`CancellationToken` — `Op::Cancel` and frontend stop buttons are effectively
no-ops until the walker finishes naturally. Observed in a wrapping GUI
(pinvou3): the stop button was unresponsive for 90+ seconds when the model
ran `file_search` with `workspace=$HOME`.
This patch wraps `search_files()` in `tokio::task::spawn_blocking` and races
it against a 30s timeout:
- Blocking walker now runs on the blocking thread pool, freeing the async
runtime to poll cancel signals between iterations.
- 30s timeout returns a `ToolError` with an actionable hint ("Narrow the
search with `path` or specific `extensions`"). The caller (typically an
LLM) can then issue a scoped search instead of repeating the broad one.
- 30s is generous for normal workspace usage (<1s typical) and protective
against pathological cases.
The orphaned `spawn_blocking` task continues until natural completion after
the timeout fires — there is no portable way to abort a blocking task
mid-iteration without `unsafe` thread interruption. In practice this is
acceptable: user-facing cancellation works, and the orphan eventually
finishes without holding shared state.
The deeper fix is to expose `cancel_token` through `ToolContext` so
synchronous tools can periodically check it; that's out of scope here and
tracked separately as a follow-up discussion issue.
Existing tests in the module continue to pass — they exercise the new
spawn_blocking path. No new timeout-specific test added: the behavior is
hard to test deterministically without an injectable clock, and the
spawn_blocking + timeout pattern is widely understood.
There was a problem hiding this comment.
Code Review
This pull request moves the synchronous file search operation into a blocking task with a 30-second timeout to prevent blocking the asynchronous event loop. Feedback suggests utilizing the existing CancellationToken from ToolContext to allow for immediate cancellation and avoid orphaned background threads.
| /// against pathological cases. A long-term fix to expose `cancel_token` | ||
| /// through `ToolContext` is tracked separately. |
There was a problem hiding this comment.
The ToolContext already provides a cancel_token: Option<CancellationToken> (see crates/tui/src/tools/spec.rs).
You can clone this token and pass it into the spawn_blocking closure, then update search_files to check token.is_cancelled() periodically during iteration. This would allow the tool to stop immediately when the user clicks the stop button, preventing orphaned threads from continuing to walk the file system in the background after the engine has moved on.
|
This PR was opened before the v0.8.41 rebrand and is now stale. Feel free to rebase onto current |
Summary
Wrap
FileSearchTool's synchronousWalkBuildertraversal inspawn_blocking+ 30stimeoutso the engine turn loop stays responsive to cancellation.Problem
FileSearchTool::executeisasync fnbutsearch_files()is fully synchronous — it iterates the entire directory tree viaignore::WalkBuilderwhile blocking the tokio task. When the model runsfile_searchover a large tree like$HOME, the call blocks for tens of seconds to several minutes.During this blocking call, the engine's turn loop cannot observe
CancellationTokenupdates —Op::Canceland frontend stop buttons are effectively no-ops until the walker finishes naturally.Observed in a wrapping GUI (pinvou3): stop button unresponsive for 90+ seconds when the model searched
workspace=$HOMEfornotes.md.Fix
Wrap
search_files()intokio::task::spawn_blockingand race it against a 30s timeout:ToolErrorwith an actionable hint. The model can then fall back to a scoped search.Limitations
spawn_blockingtask keeps running until natural completion. There is no portable way to abort it mid-iteration withoutunsafethread interruption. Acceptable in practice: user-facing cancellation works, and the orphan eventually finishes without holding shared state.cancel_tokenthroughToolContextso synchronous tools can periodically check it. That's out of scope here — I'll open a follow-up discussion issue covering this and the other affected tools (grep_files,list_dir).Test plan
cargo test --bin deepseek-tui file_search— all 6 existing tests pass (they exercise the new spawn_blocking path).cargo fmt --check— clean.cargo clippy --workspace --all-targets --all-features— clean.workspace=$HOME) now return within 30s and stop button works.No new timeout-specific test added: the behavior is hard to test deterministically without an injectable clock, and the spawn_blocking + timeout pattern is widely understood. Happy to add one if you'd like.