Skip to content

Regex caret (^) not working properly for scope: raw when there are over 999 characters in a Markdown file. #869

@michael-nok

Description

@michael-nok

Check for existing issues

  • Completed

Environment

  • Windows 10
  • Direct download of Windows executable
  • Vale 2.29.1 or later

Describe the bug / provide steps to reproduce it

The changes implemented by this commit causes the problem to exist: f769fcd

	// NOTE: If the `ctx` document is large (as could be the case with
	// `scope: raw`) this is *slow*. Thus, the cap at 1k.
	//
	// TODO: Actually fix this.

I have a rule that looks for incorrectly indented content. It uses the following token:

extends: existence
message: 'Content must be indented using 4x spaces each time. "%s"'
level: error
nonword: true
scope: raw
tokens:
  - '^[ ]{1,3}\`'

When the Markdown file contains 999 characters or more (i.e. ctx > 1000), the ^ part of the token stops using the start of the line properly and invents (hallucinates) new starting positions.

Attached are sample files that exactly show the spillover in the logic:
vale.zip

Consequently, text with four spaces before the first ` is flagged as incorrect, and the starting position for the ^ is column 1.

image

Using the vale.exe from release 2.29.0 does not have this problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions