Skip to content

fix(core): use byte offsets for position reporting in raw-scoped script rules#1081

Merged
jdkato merged 1 commit intovale-cli:v3from
ryan-ronnander:fix/raw-scope-script-position
Mar 4, 2026
Merged

fix(core): use byte offsets for position reporting in raw-scoped script rules#1081
jdkato merged 1 commit intovale-cli:v3from
ryan-ronnander:fix/raw-scope-script-position

Conversation

@ryan-ronnander
Copy link
Contributor

@ryan-ronnander ryan-ronnander commented Feb 23, 2026

Fixes #1083

I found myself attempting to fix a raw "entire document" scope rule with a Tengo script. After working through the rule with Claude code, the alert's line and column values were not always correct. Claude discovered there was a bug in the upstream project. Here's the proposed fix.

Summary

When a Tengo script rule uses scope: raw, Vale does not use the begin and end byte offsets returned by the script to calculate the alert's line and column. Instead, Vale extracts the matched text (scope[begin:end]) and performs a text search in the parsed document to determine the position. If the matched text appears multiple times in the document, Vale always reports the position of the first occurrence, regardless of which occurrence the script intended to flag.

Vale version

v3.12.0 (still reproduces on v3.13.1)

Steps to reproduce

1. Create a script rule

styles/Example/FindTODO.yml:

extends: script
message: "Found a TODO."
scope: raw
level: warning
script: find-todo.tengo

styles/config/scripts/find-todo.tengo:

text := import("text")
matches := []

// Hardcode a match at byte offset 54, which is the second
// occurrence of "TODO" in the test document (line 3).
matches = append(matches, {
    begin: 54,
    end:   58
})

2. Create a test document

test.md (the word "TODO" appears on lines 1 and 3):

# TODO list for the project

This paragraph has a TODO that should be flagged by the script rule.

In this document:

  • TODO first appears at byte offset 2 (in the heading, line 1)
  • TODO next appears at byte offset 54 (in the body text, line 3)

3. Run Vale

vale test.md

Expected behavior

The alert should be reported at line 3 (where byte offset 54 falls), since the script returned begin: 54, end: 58.

Actual behavior

The alert is reported at line 1, column 3 — the position of the first occurrence of "TODO" in the document, not the occurrence at byte offset 54.

Additional observations

  • Hardcoding begin: 0, end: 1 (matching just the first byte of the file) still reports the alert at 1:3 — confirming that byte offsets are completely ignored and Vale is searching for the extracted text.
  • Returning two matches (both offset 2 and offset 54) correctly reports both 1:3 and 3:25 — because Vale finds the first and second occurrences in order.
  • Matching unique text (text that only appears once in the document) always maps to the correct position, because the text search finds the right (only) occurrence.
  • Small files appear to work correctly for common cases only because the matched text often happens to first appear at the intended location.

Root cause

The bug flows through three files:

1. internal/check/script.go — Script execution (correct)

The Run method correctly extracts begin/end from the Tengo script output and sets a.Span and a.Match correctly.

2. internal/core/file.goAddAlert() discards byte offsets

For scope: raw, lintBlock is called with lookup=true. Inside AddAlert, because lookup=true, the code always falls through to FindLoc(), ignoring the byte offsets the script provided.

There is also a disambiguation attempt capped at 1000 characters:

if len(a.Offset) == 0 && strings.Count(ctx, a.Match) > 1 && len(ctx) < 1000 {
    a.Offset = append(a.Offset, strings.Fields(ctx[0:a.Span[0]])...)
}

For scope: raw, ctx is the entire document, so any file over 1000 characters skips this disambiguation entirely.

3. internal/core/location.goinitialPosition() searches for text

FindLoc delegates to initialPosition(), which builds a regex from a.Match, finds all occurrences, and always returns the first one — never consulting a.Span byte offsets.

Fix

This PR adds a HasByteOffsets flag to the Alert struct and a locFromByteOffset() helper to compute line:column directly from byte offsets.

The approach:

  1. internal/core/alert.go — A new HasByteOffsets bool field on Alert signals that Span contains byte offsets into the raw document (not column ranges).
  2. internal/check/script.go — The script runner's Run method sets HasByteOffsets: true when building alerts from Tengo script matches, since scripts always return byte offsets.
  3. internal/core/file.goAddAlert checks a.HasByteOffsets to decide whether to use locFromByteOffset() (direct byte-offset-to-line:column conversion) or the existing FindLoc text-search path.

This is more precise than checking blk.Context == blk.Text to detect raw-scope blocks, which would also match non-script alerts (code comment blocks, plain text blocks) where Span contains column ranges rather than byte offsets.

The test expectation for Scripts.CustomMsg in checks.feature is updated from 4:19 to 1:2 — the old value was a side effect of the buggy text-search finding the matched text at the wrong position.

Related issues

…pt rules

Script rules with `scope: raw` return begin/end byte offsets in their
match arrays, but AddAlert ignores these and performs a text search via
FindLoc/initialPosition to determine the alert position. When the
matched text appears multiple times in the document, this always reports
the position of the first occurrence rather than the intended one.

Add locFromByteOffset() to compute line:column directly from the byte
offsets the script provides, bypassing the text-search path. The new
path activates when the alert carries valid byte offsets within a
raw-scope block, falling back to the existing FindLoc path otherwise.

Relates to vale-cli#869, vale-cli#272.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ryan-ronnander ryan-ronnander force-pushed the fix/raw-scope-script-position branch from 2287cbb to 5fa2949 Compare February 23, 2026 23:36
@jdkato jdkato merged commit f9f5e68 into vale-cli:v3 Mar 4, 2026
1 check passed
@jdkato
Copy link
Member

jdkato commented Mar 4, 2026

Thanks!

@ryan-ronnander ryan-ronnander deleted the fix/raw-scope-script-position branch March 4, 2026 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

scope: raw script rules report wrong position when matched text appears multiple times

2 participants