Skip to content

fix: skip ignored directories early during file walk#225

Merged
CreatorHead merged 1 commit into
mainfrom
fix/skip-ignored-directories
Apr 29, 2026
Merged

fix: skip ignored directories early during file walk#225
CreatorHead merged 1 commit into
mainfrom
fix/skip-ignored-directories

Conversation

@CreatorHead
Copy link
Copy Markdown
Contributor

Previously, ignore patterns were only checked against files — the walk still descended into every directory unconditionally. This meant a directory like .venv with thousands of files was fully traversed before each file was individually filtered out, causing significant slowdowns.

Now, when a directory itself matches an ignore pattern, filepath.SkipDir is returned immediately, pruning the entire subtree in one step regardless of how many files are inside.

🛠️ Description

What was broken:

The file walk in both addlicense/main.go and cmd/headers.go only applied ignore pattern checks to files. When the walk encountered a directory, it always returned nil and descended into it unconditionally. This meant a directory like .venv with thousands of Python files was fully traversed — visiting every file, checking each one against the pattern, and then discarding it. The cost scaled linearly with the number of files in the ignored directory.

What changed:

Added a directory-level ignore check in both walk functions. When a directory path matches an ignore pattern, filepath.SkipDir is returned immediately, telling the OS not to read the directory's children at all. The path != start / path != "." guard ensures the root directory itself is never accidentally skipped.

🔗 External Links

👍 Definition of Done

  • New functionality works? — verified across 17 test scenarios including directory skipping, nested patterns, and all pre-existing header behaviours
  • Tests added? — all existing tests pass; manual regression suite run below

Test Results

# Scenario Expected Result
A All headers correct, LICENSE up to date No changes ✅ Clean
B Source file missing header, LICENSE already at current year Header added, LICENSE untouched ✅ Pass
C Source file missing header, LICENSE behind (2025) Header added + LICENSE bumped ✅ Pass
D Source file year outdated (2025), LICENSE behind Both year-bumped to current year ✅ Pass
E --plan mode with missing header Shows what would change, writes nothing ✅ Pass
F --plan mode with everything correct No output, writes nothing ✅ Clean
G No LICENSE file in repo Runs cleanly, no crash ✅ Pass
H Mixed — correct, missing, outdated in same repo Only affected files changed, correct file untouched ✅ Pass
I ignore_year1=true, single-year headers already correct No changes ✅ Clean
J Idempotency — run once then run again immediately Second run always produces zero changes ✅ Pass
K Existing header year outdated, no missing headers (step 1 only) Year bumped in source + LICENSE ✅ Pass
L Copyright holder mismatch (e.g. HashiCorp → Acme) Holder updated, LICENSE untouched if already current year ✅ Pass
M .venv in header_ignore, 100 files inside Files inside .venv not touched ✅ Pass
N Multiple ignored dirs (vendor/** + **/node_modules/**) Neither directory traversed ✅ Pass
O Ignored dir alongside a source file missing a header Source file updated, ignored dir untouched ✅ Pass
P Nested ignored dir (**/vendor/** inside a subdirectory) Nested dir skipped correctly ✅ Pass
Q 200 files in .venv — verify single skip log entry skipping directory: .venv appears exactly once ✅ Pass

🤔 Can be merged upon approval?

PCI review checklist

  • I have documented a clear reason for, and description of, the change I am making.
  • If applicable, I've documented a plan to revert these changes if they require more than reverting the pull request. — simple git revert; no schema, data, or config changes involved.
  • If applicable, I've documented the impact of any changes to security controls. — no security controls affected; this is a performance fix in directory traversal logic only.

Previously, ignore patterns were only checked against files — the walk
still descended into every directory unconditionally. This meant a
directory like .venv with thousands of files was fully traversed before
each file was individually filtered out, causing significant slowdowns.

Now, when a directory itself matches an ignore pattern, filepath.SkipDir
is returned immediately, pruning the entire subtree in one step regardless
of how many files are inside.
@CreatorHead CreatorHead requested a review from a team as a code owner April 29, 2026 09:56
@CreatorHead CreatorHead merged commit b3e6599 into main Apr 29, 2026
5 checks passed
@CreatorHead CreatorHead deleted the fix/skip-ignored-directories branch April 29, 2026 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants