Skip to content

[Feature] Enhanced File Operations System with Workspace-Aware @file References #1214

@zerob13

Description

@zerob13

What do you need? / 你需要什么?

We would like to add a stronger, IDE‑grade file tools layer to DeepChat, focused on search, editing and LLM‑friendly file references.

1. High‑performance file search tools

Add two built‑in tools for file search:

  1. Glob Tool

    • Match files using glob patterns, e.g. **/*.ts, with an optional root directory.
    • Automatically respects .gitignore / common ignore patterns (e.g. .git, node_modules).
    • Returns a formatted file list, sorted by last modified time, with a configurable maximum number of results.
  2. Grep Tool (content search)

    • Use ripgrep to search file contents with regular expressions.
    • Support options such as:
      • include glob (e.g. *.ts)
      • outputMode: content | files_with_matches | count
      • context lines (like -A/-B/-C) and case‑insensitive mode
    • Results are returned as text plus metadata (match count, file count), suitable for feeding back into an LLM.

Ripgrep (rg) itself is not managed by DeepChat directly. Instead, we rely on the upstream project tiny-runtime-injector to download and inject a small, per‑platform rg binary into a known directory, and DeepChat just executes that binary from the configured path.

2. Robust file read / write / edit tools

Add a unified file operations toolset with the following behavior:

  1. Read Tool

    • Read text files with line‑based paging: offset (0‑based line index) and limit (default 2000 lines).
    • Returns output formatted like cat -n (line numbers + content) and metadata (totalLines, returnedLines, hasMore).
    • Built‑in protections:
      • Sensitive file patterns are blocked by default (e.g. .env, keys, credentials).
      • Binary files are detected via extension + content heuristics and rejected.
  2. Write Tool

    • Write full file content to a path.
    • Returns bytes written plus optional diagnostics produced by a language server (LSP) after the write.
    • Can be integrated with an event bus (e.g. file.written) for UI or plugin hooks.
  3. Edit Tool

    • Apply "smart" text edits with multiple matching strategies, from strict exact match to more tolerant variants (whitespace‑normalized, block‑anchored with Levenshtein similarity, multi‑occurrence replacement, etc.).
    • Produces a unified diff as output for review.
    • Edit is guarded by:
      • A per‑session timestamp check: the file must be read first, and must not have changed since the last read.
      • A per‑file lock to serialize concurrent edits and avoid race conditions.
      • A permission check that can show the diff and ask for confirmation before writing.

From the caller's perspective (LLM / MCP tool layer), all of these tools share a consistent, typed API with parameters validated via Zod and structured ToolResult outputs.

3. @file reference support in messages (workspace‑aware enhancement)

DeepChat already supports @file‑style references in messages, but this feature can be made more robust and workspace‑aware:

  1. Workspace‑scoped path resolution
  • Each DeepChat session has a well‑defined workspace root (for example, the MCP workspace root or a project directory selected by the user).
  • When parsing @path references (such as @src/index.ts or @./config/app.json), resolution should:
    • Always start from the current session's workspace root.
    • Support . and .. relative paths, but reject any resolution that escapes the workspace root (hard boundary).
    • Optionally support ~ expansion only if it maps inside the workspace; otherwise it should be treated as invalid.

This ensures that @file references always point to files within the active workspace for that conversation, instead of arbitrary locations on the host.

  1. Stronger matching against workspace contents
  • When a bare or ambiguous path is provided (e.g. @index.ts), the resolver can:
    • First attempt a direct relative match under the workspace root.
    • Optionally fall back to a constrained glob search within the workspace to find the closest matching file(s).
  • All matches are restricted to the current workspace tree and never cross workspace boundaries.
  1. Sandboxed Node command execution
  • Any Node‑based command or script execution triggered by tools (such as file search, code analysis, or build/test commands) must be sandboxed to the same workspace directory used for @file resolution.
  • Concretely:
    • cwd for child_process.spawn / exec is always set to the workspace root (or a subdirectory under it derived from the tool parameters).
    • Paths passed into commands are validated to ensure they remain inside the workspace root.
    • Commands that attempt to operate outside the workspace (via ../ or absolute paths) are rejected by path sanitization logic.

With these enhancements, @file references become tightly coupled to DeepChat's workspace model: users can rely on @path to always target files in the current workspace, and all Node commands run in a sandboxed directory that matches what the UI shows as the project context.

4. Ripgrep provisioning via tiny-runtime-injector

To make this work reliably on all platforms without asking users to install rg manually:

  • Use tiny-runtime-injector in deployment scripts or a separate setup step to download and unpack a small rg binary for Windows, macOS and Linux.
  • Configure DeepChat with the resulting ripgrep path (for example via an environment variable or config option like fileTools.ripgrepPath).
  • The search tools then call child_process.spawn(ripgrepPath, [...args]) and do not assume that rg exists globally on the system.

Summary

With these features, DeepChat gains:

  • IDE‑like file search and editing capabilities that are safe, concurrent, and LLM‑friendly.
  • A simple @file UX for attaching code and documents into conversations.
  • A clean separation where DeepChat focuses on behavior, while tiny-runtime-injector takes care of distributing the actual ripgrep binary across platforms.
  • Workspace‑scoped path resolution and sandboxed command execution that keep all file operations within the user's project boundary.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions