Skip to content

feat(agent): add agent mode for cross-document evaluation#74

Open
oshorefueled wants to merge 14 commits intomainfrom
feat/agent
Open

feat(agent): add agent mode for cross-document evaluation#74
oshorefueled wants to merge 14 commits intomainfrom
feat/agent

Conversation

@oshorefueled
Copy link
Contributor

@oshorefueled oshorefueled commented Mar 19, 2026

Why

VectorLint could only evaluate files in isolation, which missed cross-document issues such as corpus-level consistency and structural gaps. This change adds an explicit agent mode so users can run multi-file evidence gathering while preserving the existing lint workflow as the default path.

What

  • GitHub-scoped PR contents: 28 files across 13 commits.
  • Add --mode CLI option with lint (default) and agent paths.
  • Introduce agent executor and finding model:
    • src/agent/agent-executor.ts
    • src/agent/types.ts
    • src/agent/merger.ts
  • Add read-only agent tool suite under src/agent/tools/:
    • lint, read_file, search_content, search_files, list_directory
  • Wire orchestrator agent branch to run one agent execution per rule and merge findings.
  • Add line/json output support for agent findings.
  • Expose provider language model access for agent tool-loop execution.
  • Add comprehensive agent tests in tests/agent/.
  • Harden failure behavior after local review:
    • surface agent execution failures (no silent success)
    • propagate operational errors to exit behavior
    • improve fallback parity/validation in content search
    • improve path safety and check-mode scoring behavior in lint tool

Scope

In scope

  • Agent-mode architecture and tooling in CLI runtime.
  • Output path updates for agent findings.
  • Tests for new agent modules.

Out of scope

  • Any write/edit/exec tools for agent mode.
  • Bugsy-triggered review workflow changes.
  • Posting implementation artifacts as PR comments.

Behavior impact

  • Existing users remain on lint mode by default (--mode lint).
  • --mode agent now enables cross-document evaluation through read-only tools.
  • Agent-mode runtime failures now surface as operational failures instead of being silently treated as zero findings.

Risk

  • New mode introduces additional execution path complexity (tool loop + multi-rule concurrency).
  • Misconfiguration risk in model/provider setup is mitigated by explicit operational failure reporting.
  • Path traversal risk is reduced with stricter root checks.

How to test / verify

Checks run

  • npm run test:run
  • npm run lint

Manual verification

  1. Ensure local config exists (.vectorlint.ini and provider env) then run:
    • npm run dev -- --mode agent <path>
  2. Confirm agent output appears (line or json mode).
  3. Validate non-agent behavior remains unchanged with default mode.

Rollback

  • Revert agent module additions and CLI --mode agent orchestration branch.
  • Revert provider interface extension (getLanguageModel) and output helper additions.
  • Re-run npm run lint && npm run test:run.

Summary by CodeRabbit

  • New Features

    • Added --mode CLI option (lint | agent) and a full “agent” evaluation mode with integrated file reading, searching, directory listing, and rule-scoped linting.
    • Agent findings include file/line context, messages, optional suggestions, and are rendered to the console or JSON.
  • Documentation

    • Published an agentic capabilities execution log documenting agent-run output.
  • Tests

    • Added comprehensive tests covering agent executor, tools, path utilities, types, and result merging.

@coderabbitai
Copy link

coderabbitai bot commented Mar 19, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d77e26c6-889c-4e70-8857-d07f49074afc

📥 Commits

Reviewing files that changed from the base of the PR and between 2a43d21 and fdb0d5a.

📒 Files selected for processing (5)
  • src/cli/commands.ts
  • src/cli/mode.ts
  • src/cli/orchestrator.ts
  • src/cli/types.ts
  • src/schemas/cli-schemas.ts
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/cli/commands.ts
  • src/cli/types.ts
  • src/schemas/cli-schemas.ts
  • src/cli/orchestrator.ts

📝 Walkthrough

Walkthrough

Adds an agent execution path: an LLM-driven agent executor with workspace‑scoped tools (read/search/list/lint), type-validated agent finding schemas, CLI "agent" mode wiring, orchestration changes for concurrent rule runs, reporting/export adjustments, and comprehensive tests and docs.

Changes

Cohort / File(s) Summary
Agent executor & orchestration
src/agent/agent-executor.ts, src/cli/orchestrator.ts
New agent executor that builds prompts, exposes tools to the model, enforces output schema, and returns run results; orchestrator branches on mode==='agent', runs agents concurrently, aggregates results, and alters printing/JSON output.
Agent subsystem exports & aggregation
src/agent/index.ts, src/agent/merger.ts, src/agent/types.ts, src/agent/tools/index.ts
Barrel exports, type/Zod schemas for findings, and a simple collector that flattens agent run results.
Tools: read/search/list
src/agent/tools/read-file.ts, src/agent/tools/search-content.ts, src/agent/tools/search-files.ts, src/agent/tools/list-directory.ts
Workspace-root constrained file read (line pagination/truncation), content search (ripgrep with JS fallback, context/limit), glob-based file discovery with limits, and sorted directory listing with dotfile support and truncation notices.
Tool: lint sub-tool
src/agent/tools/lint-tool.ts
Rule-scoped lint tool exposing execute({file, ruleId}), blocks traversal, resolves rule, runs evaluator, normalizes judge-style vs violations-style outputs, and computes scores/violation lists.
Path utilities & safety
src/agent/tools/path-utils.ts
Home expansion, cwd-relative resolution, and in-root verification using realpath normalization to prevent traversal/symlink escapes.
CLI mode, options & schemas
src/cli/mode.ts, src/cli/commands.ts, src/cli/types.ts, src/schemas/cli-schemas.ts
Introduces `lint
Provider surface
src/providers/llm-provider.ts, src/providers/vercel-ai-provider.ts
Adds getLanguageModel() to LLMProvider and implements it on VercelAIProvider so orchestrator/agent can obtain a model instance.
Output/reporting
src/output/reporter.ts, src/output/json-formatter.ts
Adds printAgentFinding for inline/top-level rendering; JSON Issue shape extended with optional `source?: 'lint'
Tests
tests/agent/* (many files)
Extensive Vitest coverage for executor, tools, path utils, types, merger, and listing/search behaviors.
Docs / logs
docs/logs/2026-03-17-agentic-capabilities.log.md
Execution log documenting the agentic capabilities rollout and artifacts.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI
    participant Orch as Orchestrator
    participant Agent as Agent Executor
    participant LLM as Language Model
    participant Tools as Tool Set
    participant FS as File System

    CLI->>Orch: evaluateFiles(targets, { mode: "agent" })
    Orch->>Orch: build tools and diffContext
    Orch->>Orch: model = provider.getLanguageModel()
    Orch->>Agent: runAgentExecutor({ rule, cwd, model, tools, diffContext })
    Agent->>LLM: generateText(systemPrompt + toolSchemas, stepLimit)
    loop Agent-driven tool calls
        LLM->>Tools: invoke tool (read_file / search_content / list_directory / lint)
        Tools->>Tools: resolve path & isWithinRoot check
        Tools->>FS: read/list/search files
        FS-->>Tools: content/results
        Tools-->>LLM: tool response (paginated/truncated/no-match)
    end
    LLM-->>Agent: structured JSON { findings: [...] }
    Agent->>Agent: validate with AGENT_OUTPUT_SCHEMA
    Agent-->>Orch: { findings, ruleId, error? }
    Orch->>Orch: collectAgentFindings(allResults)
    Orch-->>CLI: print findings / emit JSON summary
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested reviewers

  • ayo6706

Poem

🐰 I hopped through roots and files today,

Tools in paw, I showed the way.
LLM asked, I fetched a line,
Findings stitched in tidy sign.
A rabbit cheers—safe paths, hooray!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 13.04% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the primary change: adding an agent mode for cross-document evaluation to the codebase.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/agent
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (3)
tests/agent/list-directory.test.ts (1)

39-46: Minor redundancy: subdir is already created in beforeEach.

Line 40 re-creates subdir which is already set up by beforeEach at line 9. The recursive: true makes this harmless, but you could simplify by only writing the nested file.

♻️ Suggested simplification
   it('lists a specific subdirectory', async () => {
-    mkdirSync(path.join(TMP, 'subdir'), { recursive: true });
     writeFileSync(path.join(TMP, 'subdir', 'nested.md'), '');
     const tool = createListDirectoryTool(TMP);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/agent/list-directory.test.ts` around lines 39 - 46, The test "lists a
specific subdirectory" redundantly recreates the subdir; remove the
mkdirSync(path.join(TMP, 'subdir'), { recursive: true }) call and keep only
writeFileSync(path.join(TMP, 'subdir', 'nested.md'), '') so the test uses the
setup from beforeEach; locate the test using the it block name and the
createListDirectoryTool(TMP) / tool.execute({ path: 'subdir' }) calls to update
the snippet.
src/agent/tools/list-directory.ts (1)

18-59: Consider using async/await for consistency.

The function returns Promise<string> but wraps synchronous operations in Promise.resolve/Promise.reject. Other tools in this module (e.g., search-files.ts) use async/await. For consistency and readability, consider making this an async function.

♻️ Proposed refactor using async/await
-    execute({ path: dirPath, limit }) {
-      try {
-        const absolutePath = resolveToCwd(dirPath || '.', cwd);
-
-        if (!isWithinRoot(absolutePath, cwd)) {
-          return Promise.reject(new Error(`Path traversal blocked: ${dirPath} is outside the allowed root`));
-        }
-
-        if (!existsSync(absolutePath)) {
-          return Promise.reject(new Error(`Directory not found: ${dirPath}`));
-        }
+    async execute({ path: dirPath, limit }) {
+      const absolutePath = resolveToCwd(dirPath || '.', cwd);
+
+      if (!isWithinRoot(absolutePath, cwd)) {
+        throw new Error(`Path traversal blocked: ${dirPath} is outside the allowed root`);
+      }
+
+      if (!existsSync(absolutePath)) {
+        throw new Error(`Directory not found: ${dirPath}`);
+      }
         // ... rest of implementation using return instead of Promise.resolve
-        return Promise.resolve(output);
-      } catch (error) {
-        return Promise.reject(error instanceof Error ? error : new Error(String(error)));
-      }
+      return output;
     },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/tools/list-directory.ts` around lines 18 - 59, Convert the execute
method to an async function and stop wrapping sync results in
Promise.resolve/Promise.reject: change the execute signature to async execute({
path: dirPath, limit }) and inside use the existing try/catch but return plain
strings or throw Errors instead of Promise.resolve/Promise.reject; keep using
resolveToCwd, isWithinRoot, existsSync, readdirSync, statSync and path.join as
before, compute effectiveLimit from DEFAULT_LIMIT, build results, and when
errors occur throw the Error (or rethrow in catch using throw error instanceof
Error ? error : new Error(String(error))). Also remove any
Promise.resolve/Promise.reject uses and adjust the truncation message logic to
return the combined string directly.
src/agent/agent-executor.ts (1)

140-154: Consider adding fallback parsing for structured output robustness.

The Vercel AI SDK v6.x throws NoOutputGeneratedError when structured output parsing fails (commonly when finishReason is not "stop", especially with AI Gateway or certain provider configurations). While the existing try/catch handles the exception, the recommended pattern in the SDK's issue tracker is to catch NoOutputGeneratedError and fall back to manually parsing result.text as JSON when available. This improves robustness without changing the happy path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/agent-executor.ts` around lines 140 - 154, Catch the specific
NoOutputGeneratedError thrown by generateText and implement a fallback that
attempts to parse the raw text output (e.g., result.text or response.outputText)
as JSON when structured parsing fails; update the try/catch around generateText
in agent-executor (where generateText, AGENT_OUTPUT_SCHEMA, and
stepCountIs(MAX_AGENT_STEPS) are used) to detect NoOutputGeneratedError, attempt
JSON.parse on the raw result text to extract findings and ruleId, and only
rethrow if parsing is impossible so the existing happy path using
response.output.findings remains unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/agent/tools/path-utils.ts`:
- Around line 17-30: The containment check in isWithinRoot is unsafe because
normalizePath can use realpathSync for one input and path.resolve for the other;
fix by applying the same normalization strategy to both paths: attempt
realpathSync on both and if either call throws, fall back to using path.resolve
for both so both normalizedRoot and normalizedPath are produced by the same
method; keep the final startsWith/equals check and ensure you still append
path.sep when checking prefix so path boundary logic (normalizedRoot + path.sep)
remains correct.

In `@src/agent/tools/search-content.ts`:
- Line 159: Update the tool description string (the description property in the
search-content tool definition) to accurately state the default glob filter as
"**/*.md" instead of "*.md"; verify consistency with the other occurrences that
use "**/*.md" (seen near the uses at lines referencing the default glob in this
file) so the description matches the actual default behavior.
- Around line 104-109: The code constructs a RegExp directly from the untrusted
pattern (pattern / new RegExp(...)) which allows ReDoS; fix by validating or
replacing the engine before compiling: either validate the pattern with a safety
check (e.g., integrate safe-regex to reject dangerous patterns) or switch to a
backtracking-free engine (e.g., use re2 to instantiate the regex instead of
RegExp) and/or wrap regex execution in a short timeout guard to abort
long-running matches; update the logic around the RegExp creation in
search-content.ts where regex = new RegExp(pattern, opts.ignoreCase ? 'i' : ''),
rejecting unsafe patterns (returning the same error string) or using re2 and
ensure opts.ignoreCase behavior is preserved.

In `@src/agent/tools/search-files.ts`:
- Around line 25-30: The fast-glob results in searchFiles (function in
src/agent/tools/search-files.ts) return paths relative to searchRoot when a
subdirectory `path` is provided, but caller tools (read_file, lint) expect
repo-relative paths, so prepend the provided `path` prefix to each match before
returning; import Node's `path` module, update the function description to state
it returns repository-relative paths, and ensure both the code branches that map
`matches` (the arrays created around lines with fg(...) and the later mapping at
36-42) add `path.join(pathPrefix, match)` (or similar) only when a non-empty
`path` argument was passed.

In `@src/cli/orchestrator.ts`:
- Around line 190-196: The RdJson/ValeJson branches instantiate RdJsonFormatter
and ValeJsonFormatter but never add agent findings, causing formatter.toJson()
to emit empty output; update the branches handling OutputFormat.RdJson and
OutputFormat.ValeJson in orchestrator.ts (the sections creating RdJsonFormatter
and ValeJsonFormatter) to either (A) log a warning that RdJson/ValeJson are not
supported in agent mode and fall back to the existing JSON formatter path (e.g.,
reuse the code that populates findings for JSON output), or (B) map the
collected agent findings into the RdJsonFormatter/ValeJsonFormatter APIs before
calling formatter.toJson(); implement one of these fixes and ensure the
warning/fallback is clearly emitted when in agent mode so users don’t get empty
output.

In `@src/output/reporter.ts`:
- Line 229: The ternary that computes loc uses a truthy check that treats 0 as
absent; update the expression that sets loc to check explicitly for undefined
(reference.startLine !== undefined) so a valid startLine of 0 is preserved—loc
should remain `${reference.file}:${reference.startLine}` when startLine is any
number, and fall back to reference.file only when startLine is strictly
undefined; modify the assignment that references reference.startLine in
src/output/reporter.ts accordingly.

---

Nitpick comments:
In `@src/agent/agent-executor.ts`:
- Around line 140-154: Catch the specific NoOutputGeneratedError thrown by
generateText and implement a fallback that attempts to parse the raw text output
(e.g., result.text or response.outputText) as JSON when structured parsing
fails; update the try/catch around generateText in agent-executor (where
generateText, AGENT_OUTPUT_SCHEMA, and stepCountIs(MAX_AGENT_STEPS) are used) to
detect NoOutputGeneratedError, attempt JSON.parse on the raw result text to
extract findings and ruleId, and only rethrow if parsing is impossible so the
existing happy path using response.output.findings remains unchanged.

In `@src/agent/tools/list-directory.ts`:
- Around line 18-59: Convert the execute method to an async function and stop
wrapping sync results in Promise.resolve/Promise.reject: change the execute
signature to async execute({ path: dirPath, limit }) and inside use the existing
try/catch but return plain strings or throw Errors instead of
Promise.resolve/Promise.reject; keep using resolveToCwd, isWithinRoot,
existsSync, readdirSync, statSync and path.join as before, compute
effectiveLimit from DEFAULT_LIMIT, build results, and when errors occur throw
the Error (or rethrow in catch using throw error instanceof Error ? error : new
Error(String(error))). Also remove any Promise.resolve/Promise.reject uses and
adjust the truncation message logic to return the combined string directly.

In `@tests/agent/list-directory.test.ts`:
- Around line 39-46: The test "lists a specific subdirectory" redundantly
recreates the subdir; remove the mkdirSync(path.join(TMP, 'subdir'), {
recursive: true }) call and keep only writeFileSync(path.join(TMP, 'subdir',
'nested.md'), '') so the test uses the setup from beforeEach; locate the test
using the it block name and the createListDirectoryTool(TMP) / tool.execute({
path: 'subdir' }) calls to update the snippet.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 00616295-063b-477b-b4e0-26cada6dd396

📥 Commits

Reviewing files that changed from the base of the PR and between 27097c3 and 2a43d21.

📒 Files selected for processing (28)
  • docs/logs/2026-03-17-agentic-capabilities.log.md
  • src/agent/agent-executor.ts
  • src/agent/index.ts
  • src/agent/merger.ts
  • src/agent/tools/index.ts
  • src/agent/tools/lint-tool.ts
  • src/agent/tools/list-directory.ts
  • src/agent/tools/path-utils.ts
  • src/agent/tools/read-file.ts
  • src/agent/tools/search-content.ts
  • src/agent/tools/search-files.ts
  • src/agent/types.ts
  • src/cli/commands.ts
  • src/cli/orchestrator.ts
  • src/cli/types.ts
  • src/output/json-formatter.ts
  • src/output/reporter.ts
  • src/providers/llm-provider.ts
  • src/providers/vercel-ai-provider.ts
  • src/schemas/cli-schemas.ts
  • tests/agent/agent-executor.test.ts
  • tests/agent/list-directory.test.ts
  • tests/agent/merger.test.ts
  • tests/agent/path-utils.test.ts
  • tests/agent/read-file.test.ts
  • tests/agent/search-content.test.ts
  • tests/agent/search-files.test.ts
  • tests/agent/types.test.ts

Comment on lines +17 to +30
export function isWithinRoot(absolutePath: string, root: string): boolean {
const normalizePath = (input: string): string => {
try {
return realpathSync(input);
} catch {
return path.resolve(input);
}
};

const normalizedPath = normalizePath(absolutePath);
const normalizedRoot = normalizePath(root);
return normalizedPath.startsWith(normalizedRoot + path.sep) ||
normalizedPath === normalizedRoot;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Inconsistent path normalization creates a security gap.

normalizePath applies realpathSync or path.resolve independently to each path. If one succeeds with realpathSync while the other falls back to path.resolve, the resulting paths use incompatible normalization strategies, making the startsWith check unreliable.

For example, if root resolves via realpathSync to /real/path but absolutePath doesn't exist yet and falls back to path.resolve returning /symlink/../real/path/file, the containment check may incorrectly pass or fail.

Consider normalizing both paths with the same strategy:

Proposed fix
 export function isWithinRoot(absolutePath: string, root: string): boolean {
-  const normalizePath = (input: string): string => {
-    try {
-      return realpathSync(input);
-    } catch {
-      return path.resolve(input);
-    }
-  };
-
-  const normalizedPath = normalizePath(absolutePath);
-  const normalizedRoot = normalizePath(root);
+  let normalizedPath: string;
+  let normalizedRoot: string;
+
+  try {
+    // Try realpath for both - ensures symlinks are resolved consistently
+    normalizedRoot = realpathSync(root);
+    normalizedPath = realpathSync(absolutePath);
+  } catch {
+    // Fallback: use path.resolve for both to maintain consistency
+    normalizedRoot = path.resolve(root);
+    normalizedPath = path.resolve(absolutePath);
+  }
+
   return normalizedPath.startsWith(normalizedRoot + path.sep) ||
     normalizedPath === normalizedRoot;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export function isWithinRoot(absolutePath: string, root: string): boolean {
const normalizePath = (input: string): string => {
try {
return realpathSync(input);
} catch {
return path.resolve(input);
}
};
const normalizedPath = normalizePath(absolutePath);
const normalizedRoot = normalizePath(root);
return normalizedPath.startsWith(normalizedRoot + path.sep) ||
normalizedPath === normalizedRoot;
}
export function isWithinRoot(absolutePath: string, root: string): boolean {
let normalizedPath: string;
let normalizedRoot: string;
try {
// Try realpath for both - ensures symlinks are resolved consistently
normalizedRoot = realpathSync(root);
normalizedPath = realpathSync(absolutePath);
} catch {
// Fallback: use path.resolve for both to maintain consistency
normalizedRoot = path.resolve(root);
normalizedPath = path.resolve(absolutePath);
}
return normalizedPath.startsWith(normalizedRoot + path.sep) ||
normalizedPath === normalizedRoot;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/tools/path-utils.ts` around lines 17 - 30, The containment check in
isWithinRoot is unsafe because normalizePath can use realpathSync for one input
and path.resolve for the other; fix by applying the same normalization strategy
to both paths: attempt realpathSync on both and if either call throws, fall back
to using path.resolve for both so both normalizedRoot and normalizedPath are
produced by the same method; keep the final startsWith/equals check and ensure
you still append path.sep when checking prefix so path boundary logic
(normalizedRoot + path.sep) remains correct.

Comment on lines +104 to +109
let regex: RegExp;
try {
regex = new RegExp(pattern, opts.ignoreCase ? 'i' : '');
} catch {
return 'Invalid regex pattern';
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

How to prevent ReDoS attacks in Node.js regex?

💡 Result:

To prevent ReDoS (Regular Expression Denial of Service) attacks in Node.js, follow these key strategies: 1. Write safe regex patterns: Avoid nested quantifiers like * inside , + inside +, or overlapping alternatives that cause catastrophic backtracking. For example, replace /^([a-zA-Z]+)$/ with /^[a-zA-Z]+$/ or /^(?:[a-zA-Z]{1,10})+$/. Use bounded repetitions like {1,1000} instead of unbounded * or +. Simplify patterns and make optional parts mandatory where possible, e.g., /^\w+(?:[-\s]\w+)*$/. 2. Use static analysis tools: Check regex safety with packages like safe-regex or safe-regex2. Example: const safeRegex = require('safe-regex'); if (!safeRegex(pattern)) { throw new Error('Unsafe regex'); } 3. Switch to RE2 engine: Use the 're2' npm package, which implements a backtracking-free regex engine guaranteeing linear time matching. Drop-in replacement for RegExp: const RE2 = require('re2'); const re = new RE2('your-pattern'); const match = re.exec(input); Note: RE2 lacks some JS features like lookaheads and backreferences, which are often ReDoS-prone anyway. 4. Input validation and limits: Limit input length before regex processing (e.g., if (input.length > 1000) reject). Use Express body-parser limits: app.use(express.json({ limit: '100kb' }));. Implement rate limiting. 5. Alternatives to regex: Prefer native string methods like trim, split, or every for simple validation when possible. 6. Additional defenses: Use ESLint plugin-security to detect unsafe regex. Monitor regex performance with console.time. Consider WAFs like Cloudflare for filtering. No official Node.js documentation mandates specific ReDoS prevention, but these practices align with security best practices from OWASP, Node.js security resources, and community tools.

Citations:


🏁 Script executed:

# First, verify the code at lines 104-109
head -110 src/agent/tools/search-content.ts | tail -20

Repository: TRocket-Labs/vectorlint

Length of output: 537


🏁 Script executed:

# Check function context around the vulnerable code
sed -n '90,115p' src/agent/tools/search-content.ts

Repository: TRocket-Labs/vectorlint

Length of output: 725


🏁 Script executed:

# Check dependencies for safe-regex, re2, or similar security tools
cat package.json | grep -A 50 "dependencies"

Repository: TRocket-Labs/vectorlint

Length of output: 1103


ReDoS vulnerability from unsanitized user-supplied regex pattern.

The pattern parameter is passed directly to RegExp without validation. A malicious or complex pattern like (a+)+$ can cause catastrophic backtracking, freezing the process. No regex safety libraries are currently in dependencies.

Add regex validation before compilation. Options in order of strength:

  • Lightweight: Use the safe-regex npm package to validate pattern safety before instantiation
  • Strong: Switch to the re2 npm package (implements backtracking-free regex engine with O(n) guarantees)
  • Timeout guard: Add an execution timeout for regex operations to prevent indefinite blocking

Avoid simple regex-based pattern checks alone, as they cannot cover all ReDoS cases.

🧰 Tools
🪛 ast-grep (0.41.1)

[warning] 105-105: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(pattern, opts.ignoreCase ? 'i' : '')
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/tools/search-content.ts` around lines 104 - 109, The code
constructs a RegExp directly from the untrusted pattern (pattern / new
RegExp(...)) which allows ReDoS; fix by validating or replacing the engine
before compiling: either validate the pattern with a safety check (e.g.,
integrate safe-regex to reject dangerous patterns) or switch to a
backtracking-free engine (e.g., use re2 to instantiate the regex instead of
RegExp) and/or wrap regex execution in a short timeout guard to abort
long-running matches; update the logic around the RegExp creation in
search-content.ts where regex = new RegExp(pattern, opts.ignoreCase ? 'i' : ''),
rejecting unsafe patterns (returning the same error string) or using re2 and
ensure opts.ignoreCase behavior is preserved.

export function createSearchContentTool(cwd: string): SearchContentTool {
return {
name: 'search_content',
description: 'Search file contents for a pattern. Returns file:line: matchedtext format. Default glob filter: *.md. Supports regex patterns.',
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Description mentions *.md but default is **/*.md.

The description says "Default glob filter: *.md" but lines 97 and 168 use **/*.md. Update the description for accuracy.

📝 Proposed fix
-    description: 'Search file contents for a pattern. Returns file:line: matchedtext format. Default glob filter: *.md. Supports regex patterns.',
+    description: 'Search file contents for a pattern. Returns file:line: matchedtext format. Default glob filter: **/*.md. Supports regex patterns.',
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
description: 'Search file contents for a pattern. Returns file:line: matchedtext format. Default glob filter: *.md. Supports regex patterns.',
description: 'Search file contents for a pattern. Returns file:line: matchedtext format. Default glob filter: **/*.md. Supports regex patterns.',
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/tools/search-content.ts` at line 159, Update the tool description
string (the description property in the search-content tool definition) to
accurately state the default glob filter as "**/*.md" instead of "*.md"; verify
consistency with the other occurrences that use "**/*.md" (seen near the uses at
lines referencing the default glob in this file) so the description matches the
actual default behavior.

Comment on lines +25 to +30
const matches = await fg(pattern, {
cwd: searchRoot,
ignore: ['**/node_modules/**', '**/.git/**'],
onlyFiles: true,
followSymbolicLinks: false,
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Path reference mismatch when searching subdirectories.

When path is provided (e.g., "docs"), fast-glob returns paths relative to searchRoot (e.g., "api.md"). However, the agent passes these paths directly to read_file and lint tools, which expect paths relative to cwd (e.g., "docs/api.md"). This causes file-not-found errors when searching in subdirectories.

The description on line 15 states "Returns paths relative to the search root," but the consuming tools in agent-executor.ts expect paths relative to the repository root.

🛠️ Proposed fix: Prepend relative path prefix to results
       const matches = await fg(pattern, {
         cwd: searchRoot,
         ignore: ['**/node_modules/**', '**/.git/**'],
         onlyFiles: true,
         followSymbolicLinks: false,
       });
 
       if (matches.length === 0) {
         return 'No files found matching pattern';
       }
 
       const limited = matches.slice(0, effectiveLimit);
-      const output = limited.join('\n');
+      // Prepend the relative search directory so paths are relative to cwd
+      const relativePrefix = searchDir ? path.relative(cwd, searchRoot) : '';
+      const output = limited
+        .map(match => relativePrefix ? path.join(relativePrefix, match) : match)
+        .join('\n');
       if (matches.length > effectiveLimit) {
         return `${output}\n\n[${effectiveLimit} results limit reached. Refine your pattern for more specific results.]`;
       }
 
       return output;

You'll also need to import path:

 import fg from 'fast-glob';
+import * as path from 'node:path';
 import { resolveToCwd, isWithinRoot } from './path-utils.js';

And update the description:

-    description: 'Find files by glob pattern. Returns paths relative to the search root. Examples: **/*.md, docs/*.md, src/**/*.ts',
+    description: 'Find files by glob pattern. Returns paths relative to repo root. Examples: **/*.md, docs/*.md, src/**/*.ts',

Also applies to: 36-42

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/tools/search-files.ts` around lines 25 - 30, The fast-glob results
in searchFiles (function in src/agent/tools/search-files.ts) return paths
relative to searchRoot when a subdirectory `path` is provided, but caller tools
(read_file, lint) expect repo-relative paths, so prepend the provided `path`
prefix to each match before returning; import Node's `path` module, update the
function description to state it returns repository-relative paths, and ensure
both the code branches that map `matches` (the arrays created around lines with
fg(...) and the later mapping at 36-42) add `path.join(pathPrefix, match)` (or
similar) only when a non-empty `path` argument was passed.

Comment on lines +190 to +196
} else if (outputFormat === OutputFormat.RdJson) {
const formatter = new RdJsonFormatter();
console.log(formatter.toJson());
} else if (outputFormat === OutputFormat.ValeJson) {
const formatter = new ValeJsonFormatter();
console.log(formatter.toJson());
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Agent findings are not emitted for RdJson/ValeJson formats.

When outputFormat is RdJson or ValeJson, the formatters are created but no findings are added—formatter.toJson() outputs an empty structure. Users selecting these formats in agent mode will see empty output even when findings exist.

Consider either:

  1. Logging a warning that these formats aren't supported in agent mode
  2. Falling back to JSON format with a notice
  3. Mapping agent findings to these formats if feasible
Suggested warning
   } else if (outputFormat === OutputFormat.RdJson) {
+    console.warn('[vectorlint] RdJson output format is not supported in agent mode. Use --output json instead.');
     const formatter = new RdJsonFormatter();
     console.log(formatter.toJson());
   } else if (outputFormat === OutputFormat.ValeJson) {
+    console.warn('[vectorlint] ValeJson output format is not supported in agent mode. Use --output json instead.');
     const formatter = new ValeJsonFormatter();
     console.log(formatter.toJson());
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
} else if (outputFormat === OutputFormat.RdJson) {
const formatter = new RdJsonFormatter();
console.log(formatter.toJson());
} else if (outputFormat === OutputFormat.ValeJson) {
const formatter = new ValeJsonFormatter();
console.log(formatter.toJson());
}
} else if (outputFormat === OutputFormat.RdJson) {
console.warn('[vectorlint] RdJson output format is not supported in agent mode. Use --output json instead.');
const formatter = new RdJsonFormatter();
console.log(formatter.toJson());
} else if (outputFormat === OutputFormat.ValeJson) {
console.warn('[vectorlint] ValeJson output format is not supported in agent mode. Use --output json instead.');
const formatter = new ValeJsonFormatter();
console.log(formatter.toJson());
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cli/orchestrator.ts` around lines 190 - 196, The RdJson/ValeJson branches
instantiate RdJsonFormatter and ValeJsonFormatter but never add agent findings,
causing formatter.toJson() to emit empty output; update the branches handling
OutputFormat.RdJson and OutputFormat.ValeJson in orchestrator.ts (the sections
creating RdJsonFormatter and ValeJsonFormatter) to either (A) log a warning that
RdJson/ValeJson are not supported in agent mode and fall back to the existing
JSON formatter path (e.g., reuse the code that populates findings for JSON
output), or (B) map the collected agent findings into the
RdJsonFormatter/ValeJsonFormatter APIs before calling formatter.toJson();
implement one of these fixes and ensure the warning/fallback is clearly emitted
when in agent mode so users don’t get empty output.

if (finding.suggestion) console.log(` Suggestion: ${finding.suggestion}`);
if (finding.references && finding.references.length > 0) {
for (const reference of finding.references) {
const loc = reference.startLine ? `${reference.file}:${reference.startLine}` : reference.file;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify whether startLine is unconstrained numeric (so 0 is currently representable)
rg -n "startLine:\s*z\.number\(\)" src/agent/types.ts -C2

Repository: TRocket-Labs/vectorlint

Length of output: 338


🏁 Script executed:

sed -n '215,240p' src/output/reporter.ts

Repository: TRocket-Labs/vectorlint

Length of output: 799


🏁 Script executed:

cat -n src/agent/types.ts

Repository: TRocket-Labs/vectorlint

Length of output: 1569


🏁 Script executed:

rg "startLine.*[:\s]0" --type ts src/ tests/

Repository: TRocket-Labs/vectorlint

Length of output: 49


Use an explicit undefined check for startLine.

reference.startLine ? ... treats 0 as absent. Prefer checking reference.startLine !== undefined to handle all valid numeric values including 0.

💡 Suggested patch
-      const loc = reference.startLine ? `${reference.file}:${reference.startLine}` : reference.file;
+      const loc = reference.startLine !== undefined
+        ? `${reference.file}:${reference.startLine}`
+        : reference.file;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const loc = reference.startLine ? `${reference.file}:${reference.startLine}` : reference.file;
const loc = reference.startLine !== undefined
? `${reference.file}:${reference.startLine}`
: reference.file;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/output/reporter.ts` at line 229, The ternary that computes loc uses a
truthy check that treats 0 as absent; update the expression that sets loc to
check explicitly for undefined (reference.startLine !== undefined) so a valid
startLine of 0 is preserved—loc should remain
`${reference.file}:${reference.startLine}` when startLine is any number, and
fall back to reference.file only when startLine is strictly undefined; modify
the assignment that references reference.startLine in src/output/reporter.ts
accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant