Skip to content

scope: raw script rules report wrong position when matched text appears multiple times #1083

@ryan-ronnander

Description

@ryan-ronnander

Check for existing issues

  • Completed

Environment

  • Vale version: 3.13.1 (also reproduces on 3.12.0)
  • Operating system: Linux
  • Installation method: Source / Docker

Describe the bug / provide steps to reproduce it

When a Tengo script rule uses scope: raw, Vale does not use the begin and end byte offsets returned by the script to calculate the alert's line and column. Instead, Vale extracts the matched text (scope[begin:end]) and performs a text search in the parsed document to determine the position. If the matched text appears multiple times in the document, Vale always reports the position of the first occurrence, regardless of which occurrence the script intended to flag.

Steps to reproduce

1. Create a script rule

styles/Example/FindTODO.yml:

extends: script
message: "Found a TODO."
scope: raw
level: warning
script: find-todo.tengo

styles/config/scripts/find-todo.tengo:

text := import("text")
matches := []

// Hardcode a match at byte offset 54, which is the second
// occurrence of "TODO" in the test document (line 3).
matches = append(matches, {
    begin: 54,
    end:   58
})

2. Create a test document

test.md (the word "TODO" appears on lines 1 and 3):

# TODO list for the project

This paragraph has a TODO that should be flagged by the script rule.
  • TODO first appears at byte offset 2 (heading, line 1)
  • TODO next appears at byte offset 54 (body text, line 3)

3. Run Vale

vale test.md

Expected: Alert at line 3 (byte offset 54)
Actual: Alert at line 1, column 3 (first occurrence of "TODO")

Root cause

In AddAlert (internal/core/file.go), for scope: raw blocks, the code always falls through to FindLoc() which performs a text search and returns the first occurrence. The byte offsets from a.Span are never consulted.

There is also a disambiguation attempt capped at 1000 characters:

if len(a.Offset) == 0 && strings.Count(ctx, a.Match) > 1 && len(ctx) < 1000 {
    a.Offset = append(a.Offset, strings.Fields(ctx[0:a.Span[0]])...)
}

For scope: raw, ctx is the entire document, so any file over 1000 characters skips disambiguation entirely.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions