feat: improve Issue Location Accuracy with Line Numbering and Fuzzy Matching by hurshore · Pull Request #46 · TRocket-Labs/vectorlint

hurshore · 2025-12-27T17:49:56Z

Summary

This PR significantly improves the accuracy of issue location detection in VectorLint by implementing a multi-layered approach: line number hints from the LLM, quoted text matching, and robust fuzzy matching as fallback.

Changes

Line Numbering for Content Analysis
- Prepends line numbers to input content before sending to the LLM (format: 123\ttext)
- Adds line field to violation schema so the LLM can report which line the issue appears on
- Uses LLM-provided line numbers as hints for faster, more accurate location resolution
- Falls back to fuzzy matching when line hints don't resolve
Fuzzy Text Matching for LLM Output
- Adds fuzzball dependency for fuzzy string matching
- Implements multi-phase location strategy:
  - Phase 1: Exact matching (fastest)
  - Phase 2: Progressive substring matching (handles LLM adding/removing words)
  - Phase 3: Case-insensitive exact matching
  - Phase 4: Fuzzy line-by-line matching (handles typos)
  - Phase 5: Sliding window fuzzy matching (handles multi-line quotes)
- Returns confidence scores and strategy used for each match
- Adds tests for fuzzy matching
Improved Issue Location Using Quoted Text
- Replaces pre/post anchor fields with more reliable quoted_text, context_before, context_after
- Updates LLM prompt directives to emphasize verbatim quoting:
  - Instructions to COPY-PASTE exact phrases (5-50 chars)
  - Critical rules against fabricating quotes
  - Requirement to verify quotes exist in input before reporting
- Moves reasoning field to appear first in schema (encouraging LLM to think before answering)

Files Changed

File	Description
[src/output/line-numbering.ts](src/output/line-numbering.ts)	[NEW] Line numbering utilities
[src/output/location.ts](src/output/location.ts)	Multi-phase fuzzy matching implementation
[src/prompts/schema.ts](src/prompts/schema.ts)	Updated schema with `quoted_text`, `line`, and context fields
[src/prompts/directive-loader.ts](src/prompts/directive-loader.ts)	Enhanced LLM instructions for accurate quoting
[src/evaluators/base-evaluator.ts](src/evaluators/base-evaluator.ts)	Integration with line numbering
[src/cli/orchestrator.ts](src/cli/orchestrator.ts)	Line number prepending before LLM calls
[src/cli/types.ts](src/cli/types.ts)	Type updates for new fields
[tests/fuzzy-matching.test.ts](tests/fuzzy-matching.test.ts)	[NEW] Fuzzy matching tests
`package.json`	Added `fuzzball` dependency

Why This Matters

LLMs often paraphrase, truncate, or slightly modify quotes when reporting issues. This makes locating the exact issue in the original text challenging. This PR addresses this by:

Giving the LLM explicit line numbers to reference
Accepting fuzzy matches when exact matching fails
Providing confidence scores so downstream consumers know match quality

Testing

Added comprehensive tests for fuzzy matching covering exact, case-insensitive, substring, and fuzzy strategies
Tests verify confidence scores and match strategies are correctly reported

Summary by CodeRabbit

New Features
- Multi‑strategy quote verification with fuzzy matching and standardized report fields (line, quoted_text, context_before/context_after)
- Optional verbose logging to surface warnings and diagnostics
Bug Fixes
- Line-numbered content handling and de-duplication to avoid duplicate or unverifiable reports
- Graceful handling and reporting of unverifiable quotes
Chores
- Added fuzzball dependency for similarity scoring
Tests
- New tests for exact, case-insensitive, substring, fuzzy, and no-match quote location scenarios

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-27T17:50:05Z

📝 Walkthrough

Walkthrough

Overhauls evidence location from pre/post context to quoted-text fuzzy matching with multiple fallbacks, adds line-numbering utilities, updates violation shapes to use quoted_text and contextual fields, propagates an optional verbose flag through evaluation flows, updates prompts/schema, and adds tests and a fuzzball dependency.

Changes

Cohort / File(s)	Summary
Dependency Management `package.json`	Added `fuzzball` `^2.2.3` dependency for fuzzy string matching
Type System Updates `src/cli/types.ts`, `src/prompts/schema.ts`	Replaced `pre`/`post` with `quoted_text`, `context_before`, `context_after` in violation shapes; added optional `verbose?: boolean` to evaluation/context types; updated LLM and evaluation result types
Evidence Location Refactor `src/output/location.ts`	Replaced pre/post locating with multi-strategy quoted-text pipeline (exact → context → substring → case-insensitive → fuzzy-line → fuzzy-window); added `QuotedTextEvidence`, enriched `LocationWithMatch` (match, confidence, strategy), `locateQuotedText`, `locateMultipleQuotes`; removed legacy locate/extract functions
Line Numbering Utilities `src/output/line-numbering.ts`	New helpers: `prependLineNumbers`, `stripLineNumbers`, `getLineContent`, `getLineStartIndex` for deterministic line handling
Orchestration & Reporting `src/cli/orchestrator.ts`	Now uses `locateQuotedText`, verifies matches, deduplicates violations by `quoted_text`+line, reports only verified unique violations, propagates `verbose` flag, and logs unverifiable quotes when verbose
Evaluator Base Changes `src/evaluators/base-evaluator.ts`	Prepend line numbers to content before LLM calls; mapping updated to new violation fields (`quoted_text`, `context_before`, `context_after`); minor signature/formatting changes
Prompt Directives `src/prompts/directive-loader.ts`	DEFAULT_DIRECTIVE rewritten to require reported fields `line`, `quoted_text`, `context_before`, `context_after`, `analysis`, `suggestion`; various string quoting standardized
Prompt Schema Updates `src/prompts/schema.ts`	Schemas and result types updated to the new violation shape and required fields
Tests `tests/fuzzy-matching.test.ts`	New test suite for `locateQuotedText` covering exact, context, case-insensitive, substring, fuzzy-line/window matching and no-match cases

Sequence Diagram(s)

sequenceDiagram
  participant CLI as CLI Orchestrator
  participant Locator as locateQuotedText
  participant Evaluator as BaseEvaluator / LLM
  participant Reporter as reportIssue

  CLI->>Locator: request locate(quoted_text, context_before, context_after, content)
  alt Found
    Locator-->>CLI: { line, column, match, confidence, strategy }
  else Not found
    Locator-->>CLI: null
  end

  CLI->>CLI: filter verified matches, dedupe by (quoted_text,line)
  CLI->>Evaluator: send numberedContent + verified violations
  Evaluator-->>CLI: evaluation result (scores, messages, updated violations)

  CLI->>Reporter: reportIssue for each unique verified violation
  Reporter-->>CLI: ACK

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐰 Fuzzy whiskers trace the quoted line,
Numbers march tidy, context in a row—
Verified quotes hop up, duplicates decline,
Reports skip the shadows, confidence aglow.
nibbles carrot, cheerful thump ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 68.42% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and accurately summarizes the main change: improving issue location accuracy through line numbering and fuzzy matching, which is the core objective of this substantial PR.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/issue-location

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a5addc and 3e946ae.

📒 Files selected for processing (4)

src/cli/orchestrator.ts
src/cli/types.ts
src/output/location.ts
tests/fuzzy-matching.test.ts

🚧 Files skipped from review as they are similar to previous changes (1)

tests/fuzzy-matching.test.ts

🧰 Additional context used

📓 Path-based instructions (1)

src/**/*.ts

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.ts: Use TypeScript ESM with explicit imports and narrow types
Use 2-space indentation; avoid trailing whitespace
Use strict TypeScript with no any types; use unknown + schema validation for external data
Use custom error types with proper inheritance; catch blocks should use unknown type

Files:

src/cli/types.ts
src/cli/orchestrator.ts
src/output/location.ts

🧠 Learnings (1)

📚 Learning: 2025-12-27T17:28:16.346Z

Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T17:28:16.346Z
Learning: Applies to src/**/*.ts : Use custom error types with proper inheritance; catch blocks should use `unknown` type

Applied to files:

src/cli/orchestrator.ts

🧬 Code graph analysis (1)

src/cli/orchestrator.ts (2)

src/output/location.ts (1)

locateQuotedText (150-342)

src/errors/index.ts (1)

handleUnknownError (46-51)

🔇 Additional comments (6)

src/cli/types.ts (1)

59-59: LGTM! Type updates align with the new quoted-text location strategy.

The addition of verbose to EvaluationContext and the new violation fields (line, quoted_text, context_before, context_after) properly support the PR's shift from pre/post anchors to fuzzy-matched quoted text with contextual evidence.

Also applies to: 78-84

src/output/location.ts (2)

135-342: Excellent multi-phase fuzzy matching implementation.

The six-phase algorithm is well-structured and documented:

Line hint optimization for LLM-provided line numbers

Exact matching with context-based disambiguation

Progressive substring matching for partial text

Case-insensitive fallback

Fast fuzzy line-by-line matching

Slower sliding-window matching for multi-line quotes

The phase ordering optimizes for performance (fast paths first) while maintaining robust fallbacks. Confidence scoring is appropriate for each strategy.

1-1: The fuzzball library is already configured at the latest stable version (2.2.3) with no known security vulnerabilities. No action needed.

src/cli/orchestrator.ts (3)

162-222: LGTM! Verification-first approach with proper deduplication.

The refactored logic correctly:

Calls locateQuotedText with the new quoted-text fields (lines 179-188)

Skips unverifiable quotes and logs warnings when verbose (lines 190-199)

Handles empty quoted_text in deduplication by creating a key only when quoted_text exists (lines 206-211), addressing the past review concern

Logs location errors when verbose (lines 217-219), addressing another past review concern

Collects only verified violations before reporting (lines 214, 224-245)

This ensures that only violations with successfully located evidence are reported to users.

157-157: Verbose flag properly propagated through evaluation workflows.

The verbose parameter is correctly threaded through the entire evaluation pipeline:

Added to function parameters (locateAndReportViolations, extractAndReportCriterion, routePromptResult)

Extracted from options in evaluateFile (line 737)

Consistently passed to downstream functions

Used to gate debug logging (e.g., lines 192-196, 217-219)

This enables conditional verbose output throughout the evaluation flow without cluttering normal output.

Also applies to: 268-268, 414-414, 548-556, 581-581, 645-645, 737-737, 835-835

399-406: Violation type correctly updated to use quoted-text fields.

The type assertion properly reflects the new violation shape with line, quoted_text, context_before, and context_after fields, replacing the legacy pre/post anchors. This aligns with the broader refactor to fuzzy-matched quoted text.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/cli/orchestrator.ts (2)
460-466: Update type cast to match new schema structure.

The inline type definition still references pre? and post? fields from the old evidence location approach. However, the function now uses quoted_text, context_before, and context_after fields (as seen in the locateQuotedText call at lines 236-245).
🔧 Proposed fix
     const violationResult = locateAndReportViolations({
       violations: violations as Array<{
-        pre?: string;
-        post?: string;
+        quoted_text?: string;
+        context_before?: string;
+        context_after?: string;
+        line?: number;
         analysis?: string;
         suggestion?: string;
       }>,
Alternatively, define a proper type in src/cli/types.ts and import it to avoid inline type casts.
695-705: Add missing verbose parameter.

The function signature of extractAndReportCriterion (line 317-331) includes a verbose parameter, and it's used internally (line 474). However, this call site doesn't pass the verbose parameter, which means verbose logging won't work for these violations.
🔧 Proposed fix
     const criterionResult = extractAndReportCriterion({
       exp,
       result,
       content,
       relFile,
       promptId,
       promptFilename: promptFile.filename,
       meta,
       outputFormat,
       jsonFormatter,
+      verbose,
     });

🧹 Nitpick comments (2)

tests/fuzzy-matching.test.ts (1)

24-33: Verify expected strategy for context disambiguation.

Per the locateQuotedText function documentation, when context is used to disambiguate multiple exact matches, the strategy should be "context", not "exact". However, in this test case there's only one occurrence of "quick brown fox" in the text, so context disambiguation isn't needed and "exact" is correct.

Consider adding a test with actual duplicate text to properly test the context disambiguation path returning "context" strategy.

src/cli/orchestrator.ts (1)

144-194: Consider simplifying or removing this legacy logic.

The extractMatchText function attempts to extract quoted text from the analysis message using regex patterns, which was useful in the old pre/post approach. However, with the new quoted-text-based strategy, violations already have an explicit quoted_text field, and locateQuotedText returns the actual matched text. This additional extraction and refinement step may be redundant and could cause confusion.

💡 Consider simplifying to just return the location from locateQuotedText

Since locateQuotedText already returns line, column, and match with confidence scoring, you might be able to simplify or remove this function entirely. If quote refinement from the analysis message is still needed for specific cases, document why and add tests for those scenarios.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between efb0218 and c3f759b.

⛔ Files ignored due to path filters (1)

package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (9)

package.json
src/cli/orchestrator.ts
src/cli/types.ts
src/evaluators/base-evaluator.ts
src/output/line-numbering.ts
src/output/location.ts
src/prompts/directive-loader.ts
src/prompts/schema.ts
tests/fuzzy-matching.test.ts

🧰 Additional context used

📓 Path-based instructions (2)

src/**/*.ts

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.ts: Use TypeScript ESM with explicit imports and narrow types
Use 2-space indentation; avoid trailing whitespace
Use strict TypeScript with no any types; use unknown + schema validation for external data
Use custom error types with proper inheritance; catch blocks should use unknown type

Files:

src/cli/types.ts
src/output/line-numbering.ts
src/prompts/directive-loader.ts
src/evaluators/base-evaluator.ts
src/output/location.ts
src/prompts/schema.ts
src/cli/orchestrator.ts

tests/**/*.test.ts

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.test.ts: Use Vitest for testing framework; locate tests under tests/ with *.test.ts naming
Use dependency injection in tests: mock providers and do not hit network in unit tests

Files:

tests/fuzzy-matching.test.ts

🧠 Learnings (4)

📚 Learning: 2025-12-27T17:28:16.346Z

Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T17:28:16.346Z
Learning: Applies to tests/**/*.test.ts : Use Vitest for testing framework; locate tests under `tests/` with `*.test.ts` naming

Applied to files:

tests/fuzzy-matching.test.ts

📚 Learning: 2025-12-27T17:28:16.346Z

Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T17:28:16.346Z
Learning: Applies to src/providers/**/*.ts : Depend on `LLMProvider` and `SearchProvider` interfaces; keep providers thin (transport only)

Applied to files:

src/evaluators/base-evaluator.ts

📚 Learning: 2025-12-27T17:28:16.346Z

Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T17:28:16.346Z
Learning: Separate concerns: evals define rubric; schemas enforce structure; CLI orchestrates; evaluators process; reporters format

Applied to files:

src/evaluators/base-evaluator.ts
src/prompts/schema.ts

📚 Learning: 2025-12-27T17:28:16.346Z

Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-27T17:28:16.346Z
Learning: Add providers by implementing `LLMProvider` or `SearchProvider` interfaces; add evaluators via registry pattern

Applied to files:

src/evaluators/base-evaluator.ts

🧬 Code graph analysis (3)

tests/fuzzy-matching.test.ts (1)

src/output/location.ts (1)

locateQuotedText (148-345)

src/evaluators/base-evaluator.ts (3)

src/prompts/schema.ts (6)

buildSubjectiveLLMSchema (3-55)

SubjectiveLLMResult (95-109)

SemiObjectiveResult (152-169)

buildSemiObjectiveLLMSchema (57-93)

SemiObjectiveLLMResult (111-120)

SemiObjectiveItem (143-150)

src/output/line-numbering.ts (1)

prependLineNumbers (13-18)

src/evaluators/evaluator-registry.ts (1)

registerEvaluator (65-67)

src/cli/orchestrator.ts (2)

src/output/location.ts (1)

locateQuotedText (148-345)

src/errors/index.ts (1)

handleUnknownError (46-51)

🪛 GitHub Actions: Lint

src/output/location.ts

[error] 172-172: 'lineStartIdx' is assigned a value but never used @typescript-eslint/no-unused-vars

🪛 GitHub Check: ESLint

tests/fuzzy-matching.test.ts

[failure] 4-4:
Variable name originalText must match one of the following formats: UPPER_CASE

src/output/location.ts

[failure] 172-172:
'lineStartIdx' is assigned a value but never used

🔇 Additional comments (32)

src/cli/types.ts (2)

59-59: LGTM!

The optional verbose flag is appropriately added to EvaluationContext for controlling logging behavior in the evaluation workflow.

91-97: LGTM!

The violation structure is updated to use the new quoted-text-based approach with line, quoted_text, context_before, and context_after. The fields are appropriately optional to handle cases where the LLM might not provide all evidence.

tests/fuzzy-matching.test.ts (2)

58-73: LGTM!

Fuzzy matching tests appropriately verify that imperfect quotes (missing words, reordered words) are still matched with reasonable confidence. The minimum confidence threshold of 80 aligns with the default minConfidence parameter.

76-90: LGTM!

Edge cases are well covered: unrelated text returns null, and empty quoted_text also returns null as expected.

src/output/line-numbering.ts (4)

13-18: LGTM!

Clean implementation using split/map/join pattern. The 1-based line numbering with tab separator is clear and deterministic.

27-32: LGTM!

The regex ^\d+\t correctly strips the line number prefix added by prependLineNumbers.

41-47: LGTM!

Proper bounds checking with 1-based indexing. The defensive || "" on line 46 handles potential edge cases.

56-63: LGTM!

Correctly accumulates character indices accounting for newline characters. The loop bounds are safe with the i < lines.length check.

src/prompts/directive-loader.ts (2)

11-35: LGTM!

The updated directive is well-structured with clear instructions for the LLM:

Explicit format requirements for line, quoted_text, and context fields

Strong anti-hallucination rules (CRITICAL RULES 2-5)

Requirement for step-by-step reasoning before reporting

Clear guidance on verbatim copy-paste and verification

This aligns well with the new quoted-text-based location strategy.

37-49: LGTM!

The override loading logic is unchanged; only formatting adjustments.

src/evaluators/base-evaluator.ts (4)

16-16: LGTM!

Correctly imports the new line-numbering utility.

67-76: LGTM!

Line numbers are prepended before sending content to the LLM, enabling deterministic line reporting. The numbered content is correctly passed to runPromptStructured while preserving the original content reference for other calculations.

125-134: LGTM!

Consistent with the subjective path—line numbers are prepended before LLM evaluation for semi-objective mode as well.

152-159: LGTM!

The violation mapping correctly uses the new field names (quoted_text, context_before, context_after) with conditional spreading to handle optional values.

src/prompts/schema.ts (5)

17-48: LGTM!

The subjective schema is well-designed:

reasoning moved to the start of the schema to encourage step-by-step thinking before scoring (chain-of-thought pattern)

line is optional (not in required) since LLM-provided line numbers are hints, not guarantees

quoted_text, context_before, context_after are required to ensure the fuzzy matching system has evidence to work with

65-91: LGTM!

The semi-objective schema mirrors the subjective structure with required quoted-text evidence fields. This ensures consistent violation data across both evaluation paths.

101-119: LGTM!

Type definitions correctly reflect the updated violation structure for both SubjectiveLLMResult and SemiObjectiveLLMResult.

133-149: LGTM!

The runtime result types (SubjectiveResult, SemiObjectiveItem) are updated to match the new violation shape with quoted-text fields.

161-168: LGTM!

SemiObjectiveResult.violations correctly includes the optional criterionName field for downstream reporting.

src/output/location.ts (8)

1-1: LGTM!

Correctly imports the necessary fuzzy matching functions from fuzzball.

3-32: LGTM!

Well-defined interfaces:

QuotedTextEvidence captures the LLM-provided evidence

LocationWithMatch extends location with match metadata and strategy for debugging/confidence reporting

FuzzyMatch is appropriately scoped as internal

52-89: LGTM!

findBestLineMatch efficiently scores each line using multiple fuzzball strategies (partial_ratio, token_sort_ratio, ratio) and takes the maximum. Skipping empty lines is a sensible optimization.

95-131: LGTM!

The sliding window approach with 50%-150% size variation and step-by-5 granularity provides a good balance between accuracy and performance for multi-line quote matching.

219-278: LGTM!

Phase 2 exact matching is well-implemented:

Collects all exact matches

Single match returns immediately with 100% confidence

Multiple matches use context for disambiguation

Falls back to first match if context doesn't help

280-300: LGTM!

Progressive substring matching is a clever approach to handle LLM word additions/removals. Starting from words.length - 1 down to 3 finds the longest valid substring first, and confidence scales proportionally.

302-341: LGTM!

Phases 4-6 provide robust fallbacks:

Case-insensitive at 95% confidence (reasonable penalty for case mismatch)

Line-based fuzzy matching for fast handling of typos

Window-based fuzzy matching as the last resort for multi-line issues

351-357: LGTM!

Simple and clean batch processing helper that applies locateQuotedText to each evidence item.

package.json (1)

61-61: LGTM!

The fuzzball dependency is correctly added to runtime dependencies and is a well-established library for fuzzy string matching. It's properly imported and used in src/output/location.ts for fuzzy matching functionality. The caret version specifier allows compatible updates.

src/cli/orchestrator.ts (4)

21-21: LGTM!

The import of locateQuotedText correctly supports the new quoted-text-based location strategy.

608-616: LGTM!

The verbose parameter is correctly destructured and propagated to locateAndReportViolations, enabling conditional logging throughout the violation reporting workflow.

Also applies to: 641-641

796-796: LGTM!

The verbose flag is correctly added to the function parameters and properly propagated to routePromptResult, ensuring consistent verbose behavior across the evaluation pipeline.

Also applies to: 894-894

219-256: Solid implementation of verification and deduplication workflow.

The new approach correctly:

Attempts to locate each violation using locateQuotedText with fuzzy matching

Skips unverifiable quotes with optional verbose warnings

Tracks operational errors without blocking other violations

Separates verification from reporting for cleaner flow

The multi-phase location strategy (exact, substring, fuzzy) with confidence thresholds is a robust improvement over the old pre/post approach.

src/cli/orchestrator.ts

src/output/location.ts

tests/fuzzy-matching.test.ts

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

src/output/location.ts (1)

275-295: Consider making the minimum word count configurable.

Phase 3's progressive substring matching requires at least 3 words (line 278), which means 1-2 word quotes skip this phase entirely. While this might prevent false positives, it could also miss legitimate short-quote matches that have minor edits.

This is likely intentional, but consider documenting this threshold or making it configurable if short quotes are common in your use case.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c3f759b and 4a5addc.

📒 Files selected for processing (2)

src/output/location.ts
tests/fuzzy-matching.test.ts

🚧 Files skipped from review as they are similar to previous changes (1)

tests/fuzzy-matching.test.ts

🧰 Additional context used

📓 Path-based instructions (1)

src/**/*.ts

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.ts: Use TypeScript ESM with explicit imports and narrow types
Use 2-space indentation; avoid trailing whitespace
Use strict TypeScript with no any types; use unknown + schema validation for external data
Use custom error types with proper inheritance; catch blocks should use unknown type

Files:

src/output/location.ts

🔇 Additional comments (5)

src/output/location.ts (5)

1-1: LGTM: Fuzzy matching imports are appropriate.

The fuzzball library imports are correctly structured and all three functions are utilized in the multi-phase matching strategy.

3-32: Well-structured type definitions.

The interfaces clearly define the quoted-text evidence model and location results with rich metadata (confidence, strategy). The 1-based indexing is properly documented.

34-46: Line/column computation is correct.

The function properly converts absolute indices to 1-based line/column positions. The use of charCodeAt(i) === 10 correctly identifies newline characters.

148-340: Well-architected multi-phase matching strategy.

The six-phase approach provides excellent fallback coverage, moving from fast exact matches to progressively more expensive fuzzy strategies. The confidence scoring and strategy labeling enable downstream consumers to make informed decisions about match quality.

The phase ordering (line hint → exact → context → substring → case-insensitive → fuzzy) is logical and well-documented.

346-352: LGTM: Batch processing helper is clean and correct.

The function appropriately maps over the evidences array, and the return type correctly represents that some quotes might not be located (null values).

src/output/location.ts

- Update violation type cast to use new schema fields (line, quoted_text, context_before, context_after) instead of legacy pre/post fields - Add missing verbose parameter to extractAndReportCriterion call - Fix deduplication to skip when quoted_text is empty to avoid false collisions - Log caught errors during evidence location when verbose is enabled - Add test case with duplicate text for context disambiguation - Remove legacy extractMatchText function - use locateQuotedText results directly - Remove unused ExtractMatchTextParams and LocationMatch types

- Fix index/match mismatch in findBestLineMatch and findBestWindowMatch where trimmed match text didn't align with stored index position - Add leadingWhitespace offset to ensure columns point to actual content start rather than leading whitespace - Fix nested loop bug in lineHint fuzzy matching that could overwrite longer substring matches with shorter ones using labeled break

…atching (#46) * feat: improve issue location using quoted text * feat: implement fuzzy text matching for LLM output * feat: implement line numbering for content analysis * fix: resolve eslint errors * refactor: improve violation processing and remove legacy code - Update violation type cast to use new schema fields (line, quoted_text, context_before, context_after) instead of legacy pre/post fields - Add missing verbose parameter to extractAndReportCriterion call - Fix deduplication to skip when quoted_text is empty to avoid false collisions - Log caught errors during evidence location when verbose is enabled - Add test case with duplicate text for context disambiguation - Remove legacy extractMatchText function - use locateQuotedText results directly - Remove unused ExtractMatchTextParams and LocationMatch types * fix: correct column alignment in fuzzy matching functions - Fix index/match mismatch in findBestLineMatch and findBestWindowMatch where trimmed match text didn't align with stored index position - Add leadingWhitespace offset to ensure columns point to actual content start rather than leading whitespace - Fix nested loop bug in lineHint fuzzy matching that could overwrite longer substring matches with shorter ones using labeled break

hurshore added 3 commits December 27, 2025 18:37

feat: improve issue location using quoted text

ac04d1e

feat: implement fuzzy text matching for LLM output

800ce36

feat: implement line numbering for content analysis

c3f759b

hurshore requested a review from ayo6706 December 27, 2025 17:49

fix: resolve eslint errors

4a5addc

coderabbitai bot reviewed Dec 27, 2025

View reviewed changes

src/cli/orchestrator.ts Outdated Show resolved Hide resolved

src/cli/orchestrator.ts Show resolved Hide resolved

src/output/location.ts Show resolved Hide resolved

tests/fuzzy-matching.test.ts Outdated Show resolved Hide resolved

coderabbitai bot reviewed Dec 27, 2025

View reviewed changes

src/output/location.ts Show resolved Hide resolved

src/output/location.ts Show resolved Hide resolved

src/output/location.ts Show resolved Hide resolved

hurshore added 2 commits December 27, 2025 19:12

oshorefueled merged commit 1287051 into main Dec 27, 2025
3 checks passed

This was referenced Jan 9, 2026

refactor: standardize evaluation terminology to Check and Judge #52

Merged

refactor: replace analysis field with issue/message pair #53

Open

Two-Phase Evaluation Architecture #55

Open

coderabbitai bot mentioned this pull request Feb 2, 2026

Always include user instruction in every run #56

Merged

coderabbitai bot mentioned this pull request Feb 23, 2026

Deduplicate issues when different rules flag the same issue #61

Open

coderabbitai bot mentioned this pull request Mar 19, 2026

feat(agent): add agent mode planner and execution tooling #73

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: improve Issue Location Accuracy with Line Numbering and Fuzzy Matching#46

feat: improve Issue Location Accuracy with Line Numbering and Fuzzy Matching#46
oshorefueled merged 6 commits intomainfrom
feat/issue-location

hurshore commented Dec 27, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 27, 2025 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hurshore commented Dec 27, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Files Changed

Why This Matters

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hurshore commented Dec 27, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 27, 2025 •

edited

Loading