-
-
Notifications
You must be signed in to change notification settings - Fork 192
Description
Check for existing issues
- Completed
Environment
- Vale version: 3.13.1 (also reproduces on 3.12.0)
- Operating system: Linux
- Installation method: Source / Docker
Describe the bug / provide steps to reproduce it
When a Tengo script rule uses scope: raw, Vale does not use the begin and end byte offsets returned by the script to calculate the alert's line and column. Instead, Vale extracts the matched text (scope[begin:end]) and performs a text search in the parsed document to determine the position. If the matched text appears multiple times in the document, Vale always reports the position of the first occurrence, regardless of which occurrence the script intended to flag.
Steps to reproduce
1. Create a script rule
styles/Example/FindTODO.yml:
extends: script
message: "Found a TODO."
scope: raw
level: warning
script: find-todo.tengostyles/config/scripts/find-todo.tengo:
text := import("text")
matches := []
// Hardcode a match at byte offset 54, which is the second
// occurrence of "TODO" in the test document (line 3).
matches = append(matches, {
begin: 54,
end: 58
})
2. Create a test document
test.md (the word "TODO" appears on lines 1 and 3):
# TODO list for the project
This paragraph has a TODO that should be flagged by the script rule.TODOfirst appears at byte offset 2 (heading, line 1)TODOnext appears at byte offset 54 (body text, line 3)
3. Run Vale
vale test.md
Expected: Alert at line 3 (byte offset 54)
Actual: Alert at line 1, column 3 (first occurrence of "TODO")
Root cause
In AddAlert (internal/core/file.go), for scope: raw blocks, the code always falls through to FindLoc() which performs a text search and returns the first occurrence. The byte offsets from a.Span are never consulted.
There is also a disambiguation attempt capped at 1000 characters:
if len(a.Offset) == 0 && strings.Count(ctx, a.Match) > 1 && len(ctx) < 1000 {
a.Offset = append(a.Offset, strings.Fields(ctx[0:a.Span[0]])...)
}For scope: raw, ctx is the entire document, so any file over 1000 characters skips disambiguation entirely.
Related
- Regex caret (^) not working properly for scope: raw when there are over 999 characters in a Markdown file. #869: "Regex caret (^) not working properly for scope: raw when there are over 999 characters in a Markdown file" — directly caused by the 1000-char cap
- Exclusions don't seem to work properly in scope: raw #272: "Exclusions don't seem to work properly in scope: raw"
- PR fix(core): use byte offsets for position reporting in raw-scoped script rules #1081: Proposed fix using a
HasByteOffsetsflag on the Alert struct