Skip to content

perf: fix ~50% regression in FindIncludesInContent#21265

Merged
pelikhan merged 2 commits intomainfrom
copilot/fix-regression-find-includes
Mar 16, 2026
Merged

perf: fix ~50% regression in FindIncludesInContent#21265
pelikhan merged 2 commits intomainfrom
copilot/fix-regression-find-includes

Conversation

Copy link
Contributor

Copilot AI commented Mar 16, 2026

FindIncludesInContent regressed to ~50,715 ns/op vs the historical ~33,698 ns/op. Three hot-path issues in ParseImportDirective and findIncludesInContent:

  • Redundant regex — after matching IncludeDirectivePattern, isLegacy was re-evaluated via a second LegacyIncludeDirectivePattern.MatchString call. The captured groups from the first match already encode this: matches[2] != "" iff the legacy alternative matched.
  • No fast-path — the full regex ran on every line. Import directives must start with @ or {; a single byte check now rejects ordinary content lines before the regex engine is invoked.
  • Scanner allocationfindIncludesInContent allocated a bufio.Scanner + strings.Reader per call. Replaced with strings.Lines (Go 1.25), which is allocation-free. Since strings.Reader never produces I/O errors, scanner.Err() was always nil — no behavioral change.

Copilot AI and others added 2 commits March 16, 2026 16:27
Three targeted optimizations:
1. Fast-path byte check in ParseImportDirective before running regex
2. Eliminate redundant LegacyIncludeDirectivePattern regex by deriving
   isLegacy from captured groups of the first match
3. Replace bufio.Scanner allocation with strings.Lines in findIncludesInContent

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
@github-actions
Copy link
Contributor

Hey @Copilot 👋 — great work tracking down this performance regression in FindIncludesInContent! The three-pronged approach (fast-path byte check, reusing existing capture groups instead of a second regex pass, and replacing the bufio.Scanner allocation with strings.Lines) is well-reasoned and the PR description does a thorough job explaining each change.

A couple of things to address before this is ready for review:

  • Tests are missing — There's an existing BenchmarkFindIncludesInContent in pkg/cli/commands_utils_test.go (line 378) that only uses @include lines. That benchmark won't exercise the new fast-path for ordinary content lines, which is where the biggest win comes from. It would be ideal to:
    1. Update BenchmarkFindIncludesInContent to use a more realistic content string (mostly regular prose lines with sparse @include directives) so the benchmark actually demonstrates the improvement.
    2. Add a BenchmarkParseImportDirective in pkg/parser/ that tests the fast-path skip on non-directive lines.
    3. Add unit tests for the new ParseImportDirective fast-path guard: empty string, a line starting with a regular letter, and a line starting with @/{.
  • Draft state — the PR is currently a draft; once tests are added, remember to run make agent-finish and mark it ready for review.

If you'd like a hand adding the missing coverage, you can assign this prompt to your coding agent:

Add benchmark and unit tests for the performance improvements in PR #21265 (pkg/parser/import_directive.go and pkg/cli/remove_command.go).

1. Update `BenchmarkFindIncludesInContent` in `pkg/cli/commands_utils_test.go`:
   - Change the benchmark content to be realistic: a long string of mostly plain prose lines (50–100 lines) with only 3–4 `@include shared/tools.md`-style directives mixed in.
   - This exercises the new fast-path byte guard on ordinary lines, which is the hot path.

2. Add a new `BenchmarkParseImportDirective` in `pkg/parser/import_directive_test.go` (create the file if it does not exist):
   - Bench a non-directive line (e.g., "This is plain text content.") — should exit via the fast-path.
   - Bench a legacy directive (e.g., "`@include` shared/tools.md").
   - Bench a new-syntax directive (e.g., "\{\\{\#import shared/tools.md}}").
   - Follow the benchmark pattern from `pkg/parser/frontmatter_benchmark_test.go` (use `b.Loop()`).

3. Add unit tests in the same file for the fast-path guard in `ParseImportDirective`:
   - Line starting with a regular letter → returns nil.
   - Empty string → returns nil.
   - Line with leading whitespace then a letter → returns nil.
   - Valid `@include` → returns a non-nil match with IsLegacy=true.
   - Valid `\{\\{\#import}}` → returns a non-nil match with IsLegacy=false.

4. Run `make agent-finish` and confirm all tests pass.

Generated by Contribution Check ·

@pelikhan pelikhan marked this pull request as ready for review March 16, 2026 17:43
Copilot AI review requested due to automatic review settings March 16, 2026 17:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves the performance of FindIncludesInContent / import-directive parsing by reducing regex work on the hot path and avoiding per-call scanner allocations.

Changes:

  • Adds a fast-path in ParseImportDirective to skip regex evaluation for lines that can’t be directives.
  • Removes a redundant legacy-regex match by deriving legacy/new syntax from the primary regex capture groups.
  • Replaces bufio.Scanner line iteration with strings.Lines to avoid per-call scanner/reader allocations.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
pkg/parser/import_directive.go Adds directive fast-path and derives legacy/new syntax from existing regex captures to avoid extra regex work.
pkg/cli/remove_command.go Uses strings.Lines to iterate content lines without allocating a scanner and returns nil error accordingly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

}

return includes, scanner.Err()
return includes, nil // strings.Lines iterates over an in-memory string; no I/O errors can occur.
@pelikhan pelikhan merged commit 7d1a279 into main Mar 16, 2026
88 checks passed
@pelikhan pelikhan deleted the copilot/fix-regression-find-includes branch March 16, 2026 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[performance] Regression in FindIncludesInContent: 50.5% slower

3 participants