Skip to content

fix(file_search): wrap walker in spawn_blocking + 30s timeout#1790

Closed
h3c-hexin wants to merge 1 commit into
Hmbown:mainfrom
h3c-hexin:fix/file-search-timeout
Closed

fix(file_search): wrap walker in spawn_blocking + 30s timeout#1790
h3c-hexin wants to merge 1 commit into
Hmbown:mainfrom
h3c-hexin:fix/file-search-timeout

Conversation

@h3c-hexin
Copy link
Copy Markdown
Contributor

Summary

Wrap FileSearchTool's synchronous WalkBuilder traversal in spawn_blocking + 30s timeout so the engine turn loop stays responsive to cancellation.

Problem

FileSearchTool::execute is async fn but search_files() is fully synchronous — it iterates the entire directory tree via ignore::WalkBuilder while blocking the tokio task. When the model runs file_search over a large tree like $HOME, the call blocks for tens of seconds to several minutes.

During this blocking call, the engine's turn loop cannot observe CancellationToken updatesOp::Cancel and frontend stop buttons are effectively no-ops until the walker finishes naturally.

Observed in a wrapping GUI (pinvou3): stop button unresponsive for 90+ seconds when the model searched workspace=$HOME for notes.md.

Fix

Wrap search_files() in tokio::task::spawn_blocking and race it against a 30s timeout:

  • Blocking walker now runs on the blocking thread pool, freeing the async runtime to poll cancel signals.
  • 30s timeout returns a ToolError with an actionable hint. The model can then fall back to a scoped search.
  • 30s is generous for normal workspaces (<1s typical) and protective against pathological cases.

Limitations

  • After the timeout fires, the orphaned spawn_blocking task keeps running until natural completion. There is no portable way to abort it mid-iteration without unsafe thread interruption. Acceptable in practice: user-facing cancellation works, and the orphan eventually finishes without holding shared state.
  • The deeper fix is to expose cancel_token through ToolContext so synchronous tools can periodically check it. That's out of scope here — I'll open a follow-up discussion issue covering this and the other affected tools (grep_files, list_dir).

Test plan

  • cargo test --bin deepseek-tui file_search — all 6 existing tests pass (they exercise the new spawn_blocking path).
  • cargo fmt --check — clean.
  • cargo clippy --workspace --all-targets --all-features — clean.
  • Manually verified in pinvou3 GUI: previously hung searches (workspace=$HOME) now return within 30s and stop button works.

No new timeout-specific test added: the behavior is hard to test deterministically without an injectable clock, and the spawn_blocking + timeout pattern is widely understood. Happy to add one if you'd like.

`FileSearchTool::execute` is `async fn` but `search_files()` synchronously
iterates the entire directory tree via `ignore::WalkBuilder`. On large trees
(e.g. `$HOME`, mounted network volumes, repos with deep `node_modules/`) this
blocks the tokio task for tens of seconds to several minutes.

While the walker runs, the engine's turn loop cannot observe the shared
`CancellationToken` — `Op::Cancel` and frontend stop buttons are effectively
no-ops until the walker finishes naturally. Observed in a wrapping GUI
(pinvou3): the stop button was unresponsive for 90+ seconds when the model
ran `file_search` with `workspace=$HOME`.

This patch wraps `search_files()` in `tokio::task::spawn_blocking` and races
it against a 30s timeout:

- Blocking walker now runs on the blocking thread pool, freeing the async
  runtime to poll cancel signals between iterations.
- 30s timeout returns a `ToolError` with an actionable hint ("Narrow the
  search with `path` or specific `extensions`"). The caller (typically an
  LLM) can then issue a scoped search instead of repeating the broad one.
- 30s is generous for normal workspace usage (<1s typical) and protective
  against pathological cases.

The orphaned `spawn_blocking` task continues until natural completion after
the timeout fires — there is no portable way to abort a blocking task
mid-iteration without `unsafe` thread interruption. In practice this is
acceptable: user-facing cancellation works, and the orphan eventually
finishes without holding shared state.

The deeper fix is to expose `cancel_token` through `ToolContext` so
synchronous tools can periodically check it; that's out of scope here and
tracked separately as a follow-up discussion issue.

Existing tests in the module continue to pass — they exercise the new
spawn_blocking path. No new timeout-specific test added: the behavior is
hard to test deterministically without an injectable clock, and the
spawn_blocking + timeout pattern is widely understood.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request moves the synchronous file search operation into a blocking task with a 30-second timeout to prevent blocking the asynchronous event loop. Feedback suggests utilizing the existing CancellationToken from ToolContext to allow for immediate cancellation and avoid orphaned background threads.

Comment on lines +19 to +20
/// against pathological cases. A long-term fix to expose `cancel_token`
/// through `ToolContext` is tracked separately.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The ToolContext already provides a cancel_token: Option<CancellationToken> (see crates/tui/src/tools/spec.rs).

You can clone this token and pass it into the spawn_blocking closure, then update search_files to check token.is_cancelled() periodically during iteration. This would allow the tool to stop immediately when the user clicks the stop button, preventing orphaned threads from continuing to walk the file system in the background after the engine has moved on.

@Hmbown
Copy link
Copy Markdown
Owner

Hmbown commented May 23, 2026

This PR was opened before the v0.8.41 rebrand and is now stale. Feel free to rebase onto current main and reopen. 鲸鱼兄弟们等你 🐋

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants