Skip to content

feat: gts validator resolves #58#59

Merged
Artifizer merged 2 commits intoGlobalTypeSystem:mainfrom
devjow:feat/gts-validator
Feb 23, 2026
Merged

feat: gts validator resolves #58#59
Artifizer merged 2 commits intoGlobalTypeSystem:mainfrom
devjow:feat/gts-validator

Conversation

@devjow
Copy link
Copy Markdown
Contributor

@devjow devjow commented Feb 18, 2026

Summary by CodeRabbit

  • New Features

    • Adds gts-validator: filesystem scanning and validation of GTS identifiers in Markdown, JSON, and YAML; supports vendor policies (allow-list, must-match), strict mode, skip-tokens, wildcard handling, file discovery limits, and returns a ValidationReport with human-readable and JSON outputs.
  • Documentation

    • New README with usage examples, architecture notes, and output options.
  • Tests

    • Extensive unit and integration tests covering scanners, normalization, validation, and end-to-end filesystem runs.
  • Chores

    • New workspace package and dependency additions.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 18, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new workspace crate gts-validator that scans filesystem sources (Markdown, JSON, YAML), normalizes and validates GTS identifiers, reports scan/validation errors, and provides JSON and human-readable output plus configuration for vendor policy and scanning limits.

Changes

Cohort / File(s) Summary
Workspace manifest
Cargo.toml
Adds gts-validator to workspace members and workspace dependencies; introduces glob and colored workspace deps.
Crate manifest & docs
gts-validator/Cargo.toml, gts-validator/README.md
New crate manifest and README with metadata, dependencies, usage examples, and architecture.
Public API & wiring
gts-validator/src/lib.rs
Adds validate_fs() entry, re-exports core types, wires internal modules (config, error, format, normalize, report, strategy, validator, output), and implements scan orchestration and allow-list filtering.
Config types
gts-validator/src/config.rs
New ValidationConfig, FsSourceConfig (with Default), VendorPolicy, and DiscoveryMode for vendor policy, discovery, paths, excludes, and scanning limits.
Error and report models
gts-validator/src/error.rs, gts-validator/src/report.rs
Adds ScanErrorKind/ScanError, ValidationError (with human formatting), and ValidationReport aggregation struct with helper methods and tests.
Format scanners
gts-validator/src/format/...
gts-validator/src/format/mod.rs, .../markdown.rs, .../json.rs, .../yaml.rs
Adds Markdown (stateful line scanner), JSON (tree-walker), YAML (multi-doc parsing + JSON walker) scanners exposing scan_*_content APIs and test wrappers; includes many unit tests.
Normalization & validation
gts-validator/src/normalize.rs, gts-validator/src/validator.rs
Adds normalization (strip quotes/prefix, reject fragments/queries) and validation that delegates to gts crate with wildcard handling, vendor checks, context helpers, and tests.
Filesystem strategy
gts-validator/src/strategy/...
gts-validator/src/strategy/mod.rs, .../fs.rs
Implements find_files, content format detection, bounded file reads, exclude patterns, skip dirs, depth/size limits, and returns scan errors for walk/IO issues.
Output formatting
gts-validator/src/output.rs
Adds write_json and write_human for serializing ValidationReport and rendering human summaries with conditional fix hints.
Integration tests
gts-validator/tests/validate_fs_tests.rs
Integration tests for validate_fs() covering path errors, formats, vendor policies, exclusions, size/encoding edge cases, skip_tokens, and output contracts.

Sequence Diagram

sequenceDiagram
    actor User
    participant Validator as validate_fs()
    participant Finder as fs::find_files()
    participant Reader as read_file_bounded()
    participant Scanner as FormatScanner
    participant NormVal as normalize::validator
    participant Reporter as ValidationReport

    User->>Validator: validate_fs(fs_config, validation_config)
    Validator->>Finder: find_files(fs_config)
    Finder-->>Validator: (paths, scan_errors)

    loop per path
        Validator->>Reader: read_file_bounded(path, max_file_size)
        Reader-->>Validator: (content, format) or ScanError
        Validator->>Scanner: scan_*_content(content, path, vendor, options)
        Scanner->>NormVal: normalize_candidate(...) / validate_candidate(...)
        NormVal-->>Scanner: Vec<error strings>
        Scanner-->>Validator: Vec<ValidationError>
    end

    Validator->>Reporter: aggregate(scan_errors, validation_errors)
    Reporter-->>User: ValidationReport
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 I hop through markdown, JSON, YAML light,
Sniffing gts ids from morning to night.
I normalize, validate, then cheer —
Reports appear, both far and near.
A crunchy carrot for each bug in sight!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: gts validator resolves #58' is directly related to the main change: introducing a new gts-validator crate with comprehensive validation functionality. However, it is somewhat generic and vague—'gts validator' alone doesn't convey what specifically this validator does or its scope.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (2)
gts-validator/src/report.rs (1)

10-19: Consider adding a constructor to ensure field consistency.

The ok, errors_count, and errors.len() could become inconsistent if the struct is constructed manually with mismatched values. A constructor method would enforce invariants:

💡 Optional: Add a constructor
impl ValidationReport {
    pub fn new(files_scanned: usize, errors: Vec<ValidationError>) -> Self {
        Self {
            files_scanned,
            errors_count: errors.len(),
            ok: errors.is_empty(),
            errors,
        }
    }
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gts-validator/src/report.rs` around lines 10 - 19, Add a constructor to
ValidationReport to enforce consistency between fields: implement impl
ValidationReport { pub fn new(files_scanned: usize, errors:
Vec<ValidationError>) -> Self { ... } } that sets errors_count = errors.len()
and ok = errors.is_empty(), and stores files_scanned and errors; update call
sites to use ValidationReport::new where appropriate to avoid manually
constructing mismatched field values.
gts-validator/src/lib.rs (1)

82-124: Consider surfacing skipped files in the report.

Unreadable/oversized/invalid-UTF8 files are silently skipped, so a run can appear “OK” even when nothing was scanned. Consider tracking and reporting skipped files to make this visible (requires updating ValidationReport and output formatting).

✅ Possible update (requires adding `skipped_files` to ValidationReport)
-    let mut errors = Vec::new();
+    let mut errors = Vec::new();
+    let mut skipped_files: usize = 0;
@@
-        let Some(item) = read_validation_item(file_path, fs_config.max_file_size) else {
-            continue; // skip unreadable/oversized files — don't count as scanned
-        };
+        let Some(item) = read_validation_item(file_path, fs_config.max_file_size) else {
+            skipped_files += 1;
+            continue; // skip unreadable/oversized files — don't count as scanned
+        };
@@
-    Ok(ValidationReport {
+    Ok(ValidationReport {
         files_scanned,
         errors_count,
         ok: errors.is_empty(),
+        skipped_files,
         errors,
     })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gts-validator/src/lib.rs` around lines 82 - 124, The loop currently silently
skips files when read_validation_item(...) returns None or
content_format_for(...) is None; modify the logic to record skipped file paths
(and optionally reasons like "unreadable/oversized/invalid-UTF8" or "unknown
format") into a new Vec<String> (e.g., skipped_files) as you iterate, increment
files_scanned only for files actually scanned, and then add skipped_files to the
ValidationReport struct (update the ValidationReport definition and any
output/formatting code to display skipped_files); update references to
ValidationReport construction here and any callers to handle the new field.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@gts-validator/src/config.rs`:
- Around line 1-59: CI failed due to rustfmt formatting issues; run `cargo fmt`
(or apply rustfmt) to reformat the file so it matches the repository style, then
commit the changes. Specifically ensure `ValidationConfig`, `FsSourceConfig`,
and the `impl Default for FsSourceConfig` block are formatted according to
rustfmt rules (spacing, indentation, line breaks, and attribute placement) so
the diffs disappear.

In `@gts-validator/src/format/markdown.rs`:
- Around line 1-474: The file fails rustfmt; run the standard formatter (cargo
fmt) to fix style/whitespace issues in this module — apply formatting to
gts-validator/src/format/markdown.rs (affecting items like
GTS_DISCOVERY_PATTERN_RELAXED, GTS_DISCOVERY_PATTERN_WELL_FORMED,
scan_markdown_content, and scan_markdown_file) and commit the resulting changes
so CI formatting checks pass.

In `@gts-validator/src/format/yaml.rs`:
- Around line 19-25: The current code uses serde_saphyr::from_str(content) which
fails on multi-document YAML and returns early with vec![], causing silent pass;
replace this with serde_saphyr::from_multiple(content) and iterate through the
returned iterator of documents, converting each document to a serde_json::Value
and processing/validating each one instead of the single `value` variable;
ensure you accumulate validation results across all documents (do not return
early on the first Err—handle or log per-document parse errors and continue with
others) so multi-document files are fully validated.

In `@gts-validator/src/strategy/fs.rs`:
- Around line 60-65: The code silently drops invalid glob patterns when building
exclude_patterns using Pattern::new in the block that iterates config.exclude;
change this to validate each pattern, collect any Pattern::Error for malformed
entries (e.g., by mapping to Result<Pattern, _>), and surface failures to the
caller instead of silently filtering them out — either by returning a Result
from the surrounding function (propagate an Err listing invalid patterns) or by
returning a structured error that includes the offending config.exclude entries;
update the code that uses exclude_patterns accordingly so callers can handle or
report invalid patterns rather than proceeding with a silently altered exclude
list.

In `@gts-validator/tests/validate_fs_tests.rs`:
- Around line 1-339: The file gts-validator/tests/validate_fs_tests.rs is not
rustfmt-formatted; run cargo fmt to fix formatting issues for the test suite
(e.g., functions like test_validate_fs_empty_paths_errors,
test_validate_fs_valid_markdown, test_write_human_failure_output, etc.). Execute
`cargo fmt` in the workspace root (or format just this file with rustfmt) and
recommit the reformatted gts-validator/tests/validate_fs_tests.rs so CI
formatting checks pass.

---

Nitpick comments:
In `@gts-validator/src/lib.rs`:
- Around line 82-124: The loop currently silently skips files when
read_validation_item(...) returns None or content_format_for(...) is None;
modify the logic to record skipped file paths (and optionally reasons like
"unreadable/oversized/invalid-UTF8" or "unknown format") into a new Vec<String>
(e.g., skipped_files) as you iterate, increment files_scanned only for files
actually scanned, and then add skipped_files to the ValidationReport struct
(update the ValidationReport definition and any output/formatting code to
display skipped_files); update references to ValidationReport construction here
and any callers to handle the new field.

In `@gts-validator/src/report.rs`:
- Around line 10-19: Add a constructor to ValidationReport to enforce
consistency between fields: implement impl ValidationReport { pub fn
new(files_scanned: usize, errors: Vec<ValidationError>) -> Self { ... } } that
sets errors_count = errors.len() and ok = errors.is_empty(), and stores
files_scanned and errors; update call sites to use ValidationReport::new where
appropriate to avoid manually constructing mismatched field values.

Comment thread gts-validator/src/config.rs Outdated
Comment thread gts-validator/src/format/markdown.rs
Comment thread gts-validator/src/format/yaml.rs Outdated
Comment thread gts-validator/src/strategy/fs.rs Outdated
Comment on lines +60 to +65
// Parse exclude patterns
let exclude_patterns: Vec<Pattern> = config
.exclude
.iter()
.filter_map(|p| Pattern::new(p).ok())
.collect();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's find and examine the fs.rs file
find . -name "fs.rs" -path "*/strategy/*" | head -5

Repository: GlobalTypeSystem/gts-rust

Length of output: 103


🏁 Script executed:

# Get the full context around lines 60-65
cat -n gts-validator/src/strategy/fs.rs | sed -n '50,75p'

Repository: GlobalTypeSystem/gts-rust

Length of output: 1049


🏁 Script executed:

# Find the Pattern type definition and understand what errors it can raise
rg "struct Pattern|impl Pattern" --type rust -A 5 | head -30

Repository: GlobalTypeSystem/gts-rust

Length of output: 51


🏁 Script executed:

# Check how exclude_patterns is used after being collected
rg "exclude_patterns" gts-validator/src/strategy/fs.rs -B 2 -A 5

Repository: GlobalTypeSystem/gts-rust

Length of output: 1265


🏁 Script executed:

# Look for any tests related to exclude patterns
find . -name "*.rs" -exec rg -l "exclude.*pattern|Pattern.*exclude" {} \; | head -10

Repository: GlobalTypeSystem/gts-rust

Length of output: 175


🏁 Script executed:

# Find Pattern type definition - likely from glob or similar crate
rg "use.*Pattern|extern.*Pattern" gts-validator/src/strategy/fs.rs | head -10

Repository: GlobalTypeSystem/gts-rust

Length of output: 87


🏁 Script executed:

# Check Cargo.toml to see what crate Pattern comes from
cat gts-validator/Cargo.toml | grep -A 20 "^\[dependencies\]"

Repository: GlobalTypeSystem/gts-rust

Length of output: 457


🏁 Script executed:

# Look at test file for exclude patterns
cat gts-validator/tests/validate_fs_tests.rs

Repository: GlobalTypeSystem/gts-rust

Length of output: 11703


🏁 Script executed:

# Search for FsSourceConfig definition to understand exclude field
rg "struct FsSourceConfig|pub exclude" gts-validator/src/ -B 2 -A 5

Repository: GlobalTypeSystem/gts-rust

Length of output: 846


🏁 Script executed:

# Check if there's any logging or error reporting around exclude patterns
rg "exclude|Pattern" gts-validator/src/strategy/fs.rs -B 3 -A 3

Repository: GlobalTypeSystem/gts-rust

Length of output: 1538


🏁 Script executed:

# Search for any error handling in find_files or related functions
rg "find_files|matches_exclude" gts-validator/src/strategy/fs.rs -B 5 -A 10

Repository: GlobalTypeSystem/gts-rust

Length of output: 2131


🏁 Script executed:

# Check if there's any documentation or comments about the intentionality of silent dropping
rg "silent|ignore|filter_map" gts-validator/src/ -B 2 -A 2

Repository: GlobalTypeSystem/gts-rust

Length of output: 1569


🏁 Script executed:

# Look at the entire find_files function to see if there's any error collection
cat -n gts-validator/src/strategy/fs.rs | sed -n '55,95p'

Repository: GlobalTypeSystem/gts-rust

Length of output: 1625


🏁 Script executed:

# Check glob crate documentation or examples of what patterns can fail
web_search "glob::Pattern::new error cases invalid pattern examples rust"

Repository: GlobalTypeSystem/gts-rust

Length of output: 119


🏁 Script executed:

# Verify there's no alternative validation or error collection elsewhere in the call chain
rg "find_files" gts-validator/src/ -B 2 -A 2

Repository: GlobalTypeSystem/gts-rust

Length of output: 867


Invalid exclude patterns are silently dropped without notification.

Using filter_map(Pattern::new(p).ok()) discards malformed glob patterns, meaning users won't be aware if their exclude configurations are broken. This can result in files being unexpectedly included in scans. Consider collecting and reporting invalid patterns to the caller, or return a Result to propagate errors at the call site.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gts-validator/src/strategy/fs.rs` around lines 60 - 65, The code silently
drops invalid glob patterns when building exclude_patterns using Pattern::new in
the block that iterates config.exclude; change this to validate each pattern,
collect any Pattern::Error for malformed entries (e.g., by mapping to
Result<Pattern, _>), and surface failures to the caller instead of silently
filtering them out — either by returning a Result from the surrounding function
(propagate an Err listing invalid patterns) or by returning a structured error
that includes the offending config.exclude entries; update the code that uses
exclude_patterns accordingly so callers can handle or report invalid patterns
rather than proceeding with a silently altered exclude list.

Comment thread gts-validator/tests/validate_fs_tests.rs
@devjow devjow force-pushed the feat/gts-validator branch from 6effc51 to f131d2e Compare February 18, 2026 12:02
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@gts-validator/tests/validate_fs_tests.rs`:
- Around line 28-35: Replace the hardcoded absolute path in
test_validate_fs_nonexistent_path_errors with a temp-derived path to ensure
portability: create a temporary directory (e.g., via std::env::temp_dir() or
tempfile::tempdir()) and append a unique non-existent subpath
(PathBuf::from("does_not_exist_...") or use rand/UUID) to form the PathBuf
passed to default_fs_config; keep the rest of the test using validate_fs and
default_validation_config and assert that result.is_err() and the error message
contains "does not exist".

---

Duplicate comments:
In `@gts-validator/src/format/yaml.rs`:
- Around line 19-25: The current parsing uses serde_saphyr::from_str which only
parses the first YAML document and silently ignores later docs; update the
parsing in the block that sets let value: Value (using content) to use
serde_saphyr::from_multiple (or serde_saphyr::read) to iterate over all
documents, parse each document into a serde_json::Value, and run the existing
validation logic for each parsed document instead of returning vec![] on the
first parse error—collect/merge validation results (or errors) from every
document so invalid IDs in subsequent documents are detected.

In `@gts-validator/src/strategy/fs.rs`:
- Around line 60-65: The current code silently drops invalid glob patterns by
using filter_map(Pattern::new(p).ok()) on config.exclude; instead validate each
pattern from config.exclude using Pattern::new and surface failures (e.g.,
return or propagate a parsing error or log/report the bad pattern) rather than
ignoring them. Update the creation of exclude_patterns (and any calling function
that expects Vec<Pattern>) to collect Results from Pattern::new for each entry,
and on Err include the offending pattern string and error (reference symbols:
config.exclude, Pattern::new, exclude_patterns) so the caller can fail fast or
provide a clear diagnostic instead of silently omitting patterns.

Comment on lines +28 to +35
#[test]
fn test_validate_fs_nonexistent_path_errors() {
let fs_config = default_fs_config(vec![PathBuf::from("/nonexistent/path/abc123")]);
let result = validate_fs(&fs_config, &default_validation_config());
assert!(result.is_err());
let msg = result.unwrap_err().to_string();
assert!(msg.contains("does not exist"), "got: {msg}");
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use a temp-based nonexistent path for portability.
Absolute /nonexistent/... can behave inconsistently across platforms or sandboxed CI environments. A temp-dir derived path guarantees non-existence everywhere.

🔧 Proposed fix
 #[test]
 fn test_validate_fs_nonexistent_path_errors() {
-    let fs_config = default_fs_config(vec![PathBuf::from("/nonexistent/path/abc123")]);
+    let tmp = TempDir::new().unwrap();
+    let missing = tmp.path().join("does-not-exist");
+    let fs_config = default_fs_config(vec![missing]);
     let result = validate_fs(&fs_config, &default_validation_config());
     assert!(result.is_err());
     let msg = result.unwrap_err().to_string();
     assert!(msg.contains("does not exist"), "got: {msg}");
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gts-validator/tests/validate_fs_tests.rs` around lines 28 - 35, Replace the
hardcoded absolute path in test_validate_fs_nonexistent_path_errors with a
temp-derived path to ensure portability: create a temporary directory (e.g., via
std::env::temp_dir() or tempfile::tempdir()) and append a unique non-existent
subpath (PathBuf::from("does_not_exist_...") or use rand/UUID) to form the
PathBuf passed to default_fs_config; keep the rest of the test using validate_fs
and default_validation_config and assert that result.is_err() and the error
message contains "does not exist".

@devjow devjow force-pushed the feat/gts-validator branch from f131d2e to c085a9b Compare February 18, 2026 14:07
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (6)
gts-validator/src/output.rs (1)

91-99: Error classification via string matching is fragile.

The categorization of errors relies on checking if e.error.contains("Vendor mismatch") or e.error.contains("Wildcard"). If error message wording changes in the validator, these hints will silently stop appearing. Consider using an error kind/category enum in ValidationError for more robust classification.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gts-validator/src/output.rs` around lines 91 - 99, The current classification
using string matching on report.errors[*].error is fragile; update the
ValidationError type to include an explicit enum (e.g., ValidationErrorKind with
variants VendorMismatch, Wildcard, ParseError/Other), ensure code that
constructs ValidationError (where errors are created) sets that kind, and then
replace the string-based checks (has_vendor_mismatch, has_wildcard_error,
has_parse_error) to inspect e.kind (or match on ValidationErrorKind) instead of
calling contains on the message; this keeps classification stable regardless of
message text and centralizes categories on the ValidationError struct.
gts-validator/src/format/markdown.rs (1)

75-79: Counter-intuitive strict parameter naming.

When strict=true, the code uses GTS_DISCOVERY_PATTERN_RELAXED, and when strict=false, it uses GTS_DISCOVERY_PATTERN_WELL_FORMED. While this is intentional (strict mode catches more errors by finding more candidates), the naming is confusing. Consider renaming the parameter to something like catch_malformed or adding a clarifying doc comment to explain this behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gts-validator/src/format/markdown.rs` around lines 75 - 79, The parameter
name `strict` is misleading because when `strict == true` the code picks
`GTS_DISCOVERY_PATTERN_RELAXED` and vice versa; update the API to either rename
the parameter (e.g., to `catch_malformed` or `include_relaxed_matches`) or add a
clear doc comment on the function that explains the inverted behavior, and
update all callers and tests referencing `strict` accordingly; specifically
change the parameter name wherever the function/method that contains the
`pattern` selection (the branch toggling between `GTS_DISCOVERY_PATTERN_RELAXED`
and `GTS_DISCOVERY_PATTERN_WELL_FORMED`) is declared/used, or add a comment
above that function explaining that `true` means use relaxed matching to catch
malformed candidates.
gts-validator/src/normalize.rs (1)

38-42: Edge case: empty string after quote stripping.

If the input is "" (just two quotes), after stripping the result would be an empty string, which would then fail the gts. prefix check. This is acceptable behavior, but the error message "Does not start with 'gts.': '""'" might be confusing. Consider handling this edge case explicitly if such inputs are possible.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gts-validator/src/normalize.rs` around lines 38 - 42, The quote-stripping
block in normalize.rs that updates the local variable trimmed can produce an
empty string for inputs like "" which leads to a confusing "Does not start with
'gts.': '\"\"'" error; after the quote-strip (the block that assigns trimmed =
&trimmed[1..trimmed.len() - 1]) detect if trimmed.is_empty() and return or raise
a clearer error (or adjust the existing validation error) such as "Empty string
after stripping surrounding quotes" or include the original raw input in the
message; update the code paths that call the gts prefix check (the subsequent
gts. prefix validation) to use this new explicit error/message so the error is
unambiguous.
gts-validator/src/strategy/fs.rs (1)

133-138: Redundant content_format_for check.

content_format_for(path)? is called here at line 134, but the caller in lib.rs (line 91) also calls content_format_for(file_path) to dispatch to the appropriate scanner. Consider removing one of these checks or restructuring to avoid the redundancy.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gts-validator/src/strategy/fs.rs` around lines 133 - 138, The call to
content_format_for(path)? in gts-validator/src/strategy/fs.rs is redundant
because the caller (lib.rs) already performs this check; remove the
content_format_for(path)? line from this function so it simply reads the file
and returns Some(ValidationItem { content }) (or, if you prefer defensive
coding, replace the ?-error with a non-fatal guard that returns None only on
read/format failures). Update the function that constructs the ValidationItem to
rely on the caller's dispatch and ensure any existing error handling/tests still
pass.
gts-validator/src/format/json.rs (1)

83-89: Filename detection logic may have false positives/negatives.

The looks_like_filename check assumes a filename if there's a ~. pattern and a dot after the tilde. However:

  1. A GTS ID like gts.x.core.type.v1~ followed by text .foo in the same string would be incorrectly skipped
  2. The logic uses rfind which finds the last occurrence, but a chained ID could have multiple tildes

Consider a more precise check, such as looking for common file extensions after the tilde.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gts-validator/src/format/json.rs` around lines 83 - 89, The current filename
heuristic in the conditional using looks_like_filename (computed from
candidate_str.contains("~.") and rfind checks) can mis-detect GTS IDs with
tildes; update the logic in the block that computes looks_like_filename and the
subsequent if that checks candidate_str.starts_with(...) so it looks
specifically for a tilde immediately followed by a dot and a known file
extension (or a regex that matches "~.<ext>" where <ext> is a typical filename
extension and appears before whitespace/end-of-string) rather than relying on
rfind of '.' and '~'. Locate the candidate_str usage and replace the fuzzy rfind
logic with a stricter pattern check (e.g., test the substring after the last
tilde for a recognized extension or use a regex anchored to tilde+extension) so
chained IDs with multiple tildes or trailing segments do not produce false
positives/negatives.
gts-validator/tests/validate_fs_tests.rs (1)

51-88: Consider consolidating duplicate vendor-mismatch tests.

test_validate_fs_invalid_markdown_vendor_mismatch and test_validate_fs_vendor_mismatch have nearly identical setup and overlap significantly. The latter is more comprehensive (checks error message content). Consider merging them into a single test.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gts-validator/tests/validate_fs_tests.rs` around lines 51 - 88, Merge the two
near-duplicate tests test_validate_fs_invalid_markdown_vendor_mismatch and
test_validate_fs_vendor_mismatch into a single test (e.g.,
test_validate_fs_vendor_mismatch_comprehensive) that uses the same setup
(TempDir, writing the markdown, fs_config via default_fs_config and
ValidationConfig with vendor = Some("x")), calls validate_fs, and preserves all
assertions from both tests: assert files_scanned == 1, assert report.ok is
false, assert report.errors_count > 0, and assert that report.errors contains an
error with "Vendor mismatch" (use the existing .iter().any(...) check). Remove
the now-redundant test_validate_fs_invalid_markdown_vendor_mismatch and keep
references to validate_fs, ValidationConfig, and default_fs_config to locate the
code to change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@gts-validator/src/format/markdown.rs`:
- Around line 96-120: The parser currently ignores fence type when
entering/exiting fenced blocks; update the logic in the block around
fence_prefix and the MarkdownState::FencedBlock handling to record the fence
delimiter when opening (e.g., store the fence string or its char/marker in the
state) and only transition back to MarkdownState::Prose when a closing fence
with the same delimiter is seen; specifically, change how you compute and pass
the fence identifier in the opening branch (where you compute language from
trimmed_line.strip_prefix(prefix)) and modify MarkdownState::FencedBlock to
carry that identifier and the skip flag so the closing branch checks for a
matching fence (same char/token and length >= 3) before exiting.

In `@gts-validator/tests/validate_fs_tests.rs`:
- Around line 179-182: The comment above the test is misleading: it says a
directory is excluded but the test sets FsSourceConfig.exclude to "test.md"
which excludes a file; update the comment to state that the exclude pattern is a
filename (or file path) and that the file "test.md" should be skipped,
referencing the FsSourceConfig instance fs_config_with_exclude and its exclude
field to make the intent clear.

---

Duplicate comments:
In `@gts-validator/src/strategy/fs.rs`:
- Around line 60-65: The exclude pattern parsing currently drops invalid globs
silently (using config.exclude.iter().filter_map(|p| Pattern::new(p).ok())), so
change this to validate each Pattern::new(p) call: iterate over config.exclude,
collect both successful Pattern instances into exclude_patterns and collect any
Pattern::new errors into a separate list, and if that error list is non-empty
then surface it (return a Result::Err from the surrounding function or
log/propagate a clear error) rather than ignoring them; reference the
exclude_patterns variable and Pattern::new to locate where to implement this
validation and error aggregation.

In `@gts-validator/tests/validate_fs_tests.rs`:
- Around line 28-35: Replace the hardcoded absolute path in
test_validate_fs_nonexistent_path_errors with a temp-derived nonexistent path to
make the test portable: create a temporary directory (e.g., via
tempfile::tempdir() or std::env::temp_dir() combined with a uuid/random suffix),
build the nonexistent PathBuf by joining a non-existent child to that temp dir,
pass that PathBuf into default_fs_config (the same place where
PathBuf::from("/nonexistent/path/abc123") is used), then run
validate_fs(&fs_config, &default_validation_config()) and assert the same error
behavior; keep references to the same test function name and helper functions
(test_validate_fs_nonexistent_path_errors, default_fs_config, validate_fs,
default_validation_config).

---

Nitpick comments:
In `@gts-validator/src/format/json.rs`:
- Around line 83-89: The current filename heuristic in the conditional using
looks_like_filename (computed from candidate_str.contains("~.") and rfind
checks) can mis-detect GTS IDs with tildes; update the logic in the block that
computes looks_like_filename and the subsequent if that checks
candidate_str.starts_with(...) so it looks specifically for a tilde immediately
followed by a dot and a known file extension (or a regex that matches "~.<ext>"
where <ext> is a typical filename extension and appears before
whitespace/end-of-string) rather than relying on rfind of '.' and '~'. Locate
the candidate_str usage and replace the fuzzy rfind logic with a stricter
pattern check (e.g., test the substring after the last tilde for a recognized
extension or use a regex anchored to tilde+extension) so chained IDs with
multiple tildes or trailing segments do not produce false positives/negatives.

In `@gts-validator/src/format/markdown.rs`:
- Around line 75-79: The parameter name `strict` is misleading because when
`strict == true` the code picks `GTS_DISCOVERY_PATTERN_RELAXED` and vice versa;
update the API to either rename the parameter (e.g., to `catch_malformed` or
`include_relaxed_matches`) or add a clear doc comment on the function that
explains the inverted behavior, and update all callers and tests referencing
`strict` accordingly; specifically change the parameter name wherever the
function/method that contains the `pattern` selection (the branch toggling
between `GTS_DISCOVERY_PATTERN_RELAXED` and `GTS_DISCOVERY_PATTERN_WELL_FORMED`)
is declared/used, or add a comment above that function explaining that `true`
means use relaxed matching to catch malformed candidates.

In `@gts-validator/src/normalize.rs`:
- Around line 38-42: The quote-stripping block in normalize.rs that updates the
local variable trimmed can produce an empty string for inputs like "" which
leads to a confusing "Does not start with 'gts.': '\"\"'" error; after the
quote-strip (the block that assigns trimmed = &trimmed[1..trimmed.len() - 1])
detect if trimmed.is_empty() and return or raise a clearer error (or adjust the
existing validation error) such as "Empty string after stripping surrounding
quotes" or include the original raw input in the message; update the code paths
that call the gts prefix check (the subsequent gts. prefix validation) to use
this new explicit error/message so the error is unambiguous.

In `@gts-validator/src/output.rs`:
- Around line 91-99: The current classification using string matching on
report.errors[*].error is fragile; update the ValidationError type to include an
explicit enum (e.g., ValidationErrorKind with variants VendorMismatch, Wildcard,
ParseError/Other), ensure code that constructs ValidationError (where errors are
created) sets that kind, and then replace the string-based checks
(has_vendor_mismatch, has_wildcard_error, has_parse_error) to inspect e.kind (or
match on ValidationErrorKind) instead of calling contains on the message; this
keeps classification stable regardless of message text and centralizes
categories on the ValidationError struct.

In `@gts-validator/src/strategy/fs.rs`:
- Around line 133-138: The call to content_format_for(path)? in
gts-validator/src/strategy/fs.rs is redundant because the caller (lib.rs)
already performs this check; remove the content_format_for(path)? line from this
function so it simply reads the file and returns Some(ValidationItem { content
}) (or, if you prefer defensive coding, replace the ?-error with a non-fatal
guard that returns None only on read/format failures). Update the function that
constructs the ValidationItem to rely on the caller's dispatch and ensure any
existing error handling/tests still pass.

In `@gts-validator/tests/validate_fs_tests.rs`:
- Around line 51-88: Merge the two near-duplicate tests
test_validate_fs_invalid_markdown_vendor_mismatch and
test_validate_fs_vendor_mismatch into a single test (e.g.,
test_validate_fs_vendor_mismatch_comprehensive) that uses the same setup
(TempDir, writing the markdown, fs_config via default_fs_config and
ValidationConfig with vendor = Some("x")), calls validate_fs, and preserves all
assertions from both tests: assert files_scanned == 1, assert report.ok is
false, assert report.errors_count > 0, and assert that report.errors contains an
error with "Vendor mismatch" (use the existing .iter().any(...) check). Remove
the now-redundant test_validate_fs_invalid_markdown_vendor_mismatch and keep
references to validate_fs, ValidationConfig, and default_fs_config to locate the
code to change.

Comment thread gts-validator/src/format/markdown.rs Outdated
Comment on lines +179 to +182
// With exclude: excluded dir should be skipped, only included.md scanned
let mut fs_config_with_exclude = FsSourceConfig::default();
fs_config_with_exclude.paths = vec![tmp.path().to_path_buf()];
fs_config_with_exclude.exclude = vec!["test.md".to_owned()];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Misleading comment: excluding file, not directory.

The comment says "excluded dir should be skipped" but the exclude pattern is "test.md" which matches the filename, not the directory. The test works correctly, but the comment should reflect the actual behavior.

📝 Proposed fix
-    // With exclude: excluded dir should be skipped, only included.md scanned
+    // With exclude: test.md pattern skips excluded/test.md, only included.md scanned
     let mut fs_config_with_exclude = FsSourceConfig::default();
     fs_config_with_exclude.paths = vec![tmp.path().to_path_buf()];
     fs_config_with_exclude.exclude = vec!["test.md".to_owned()];
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// With exclude: excluded dir should be skipped, only included.md scanned
let mut fs_config_with_exclude = FsSourceConfig::default();
fs_config_with_exclude.paths = vec![tmp.path().to_path_buf()];
fs_config_with_exclude.exclude = vec!["test.md".to_owned()];
// With exclude: test.md pattern skips excluded/test.md, only included.md scanned
let mut fs_config_with_exclude = FsSourceConfig::default();
fs_config_with_exclude.paths = vec![tmp.path().to_path_buf()];
fs_config_with_exclude.exclude = vec!["test.md".to_owned()];
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gts-validator/tests/validate_fs_tests.rs` around lines 179 - 182, The comment
above the test is misleading: it says a directory is excluded but the test sets
FsSourceConfig.exclude to "test.md" which excludes a file; update the comment to
state that the exclude pattern is a filename (or file path) and that the file
"test.md" should be skipped, referencing the FsSourceConfig instance
fs_config_with_exclude and its exclude field to make the intent clear.

@devjow devjow force-pushed the feat/gts-validator branch from c085a9b to 4932368 Compare February 18, 2026 14:35
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
gts-validator/src/format/yaml.rs (1)

12-34: Consider handling YAML document end markers (...).

The split_yaml_documents function splits on --- but YAML also supports ... as an explicit document end marker. While --- is more common, files using ... followed by --- might not split correctly. This is a minor edge case.

♻️ Optional enhancement
 fn split_yaml_documents(content: &str) -> Vec<String> {
     let mut documents = Vec::new();
     let mut current_doc: Vec<&str> = Vec::new();
 
     for line in content.lines() {
-        if line.trim() == "---" {
+        let trimmed = line.trim();
+        if trimmed == "---" || trimmed == "..." {
             let doc = current_doc.join("\n");
             if !doc.trim().is_empty() {
                 documents.push(doc);
             }
-            current_doc.clear();
+            // Only clear on --- (new doc start), not on ... (doc end)
+            if trimmed == "---" {
+                current_doc.clear();
+            }
             continue;
         }
         current_doc.push(line);
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gts-validator/src/format/yaml.rs` around lines 12 - 34, The
split_yaml_documents function currently only treats lines equal to "---" as
document separators; update it to treat both "---" and "..." as separators so
files using the explicit end marker are handled correctly. In practice, change
the separator check inside split_yaml_documents (where line.trim() == "---") to
a check that accepts either line.trim() == "---" || line.trim() == "..." so the
logic that joins current_doc, pushes non-empty docs, and clears current_doc
works the same for both markers (this prevents missing splits when a file
contains "..." followed by "---").
gts-validator/src/format/markdown.rs (1)

376-400: Tests confirm the strict flag behavior.

Looking at the tests, the intended behavior is:

  • strict=true: Uses relaxed pattern to catch and report malformed IDs (more errors reported)
  • strict=false: Uses well-formed pattern to only match obviously valid-looking IDs (fewer errors)

The naming is counterintuitive but the behavior is documented in the pattern comments (lines 47-52 vs 69-71). Consider renaming strict to something like catch_malformed or adding a doc comment to scan_markdown_content clarifying this.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gts-validator/src/format/markdown.rs` around lines 376 - 400, Tests show the
boolean parameter `strict` on scan_markdown_file/scan_markdown_content is
counterintuitive (true means "catch malformed" using the relaxed pattern);
either rename the parameter from `strict` to `catch_malformed` (and update all
callers and tests like test_scan_markdown_strict_mode_catches_*) or add a clear
doc comment on the scan_markdown_content function explaining that `strict =
true` enables the relaxed matcher to report malformed IDs (and `false` uses the
well-formed matcher), then run and update tests to reflect the new name or
preserved docstring.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@gts-validator/src/format/markdown.rs`:
- Around line 97-101: The selection of discovery regex is inverted: the `strict`
parameter currently picks GTS_DISCOVERY_PATTERN_RELAXED when true and
GTS_DISCOVERY_PATTERN_WELL_FORMED when false; change the conditional so that
when `strict` is true it uses GTS_DISCOVERY_PATTERN_WELL_FORMED and when false
it uses GTS_DISCOVERY_PATTERN_RELAXED (update the `pattern` assignment in the
function that takes `strict`), or alternatively update the naming/docs if the
inverted behavior is intentional—refer to GTS_DISCOVERY_PATTERN_RELAXED,
GTS_DISCOVERY_PATTERN_WELL_FORMED, and the `pattern` variable to locate the
change.

---

Duplicate comments:
In `@gts-validator/tests/validate_fs_tests.rs`:
- Around line 179-182: The comment above the test using FsSourceConfig
(fs_config_with_exclude) is misleading: the exclude pattern is "test.md" which
excludes a file, not a directory; update the comment to say "With exclude:
excluded file should be skipped, only included.md scanned" or otherwise reflect
that exclude on FsSourceConfig.exclude matches filenames (referencing
FsSourceConfig and fs_config_with_exclude) so the test description correctly
describes the behavior.
- Around line 28-35: The test test_validate_fs_nonexistent_path_errors uses a
hardcoded absolute path which can be non-portable; replace it by creating a temp
directory (e.g., via tempfile::tempdir()) and then build a
guaranteed-nonexistent PathBuf under that temp dir (e.g.,
temp_dir.path().join("does_not_exist_...")) and pass that PathBuf into
default_fs_config; keep calling validate_fs and asserting is_err() and the error
message contains "does not exist" as before so the test is portable across
platforms and CI.

---

Nitpick comments:
In `@gts-validator/src/format/markdown.rs`:
- Around line 376-400: Tests show the boolean parameter `strict` on
scan_markdown_file/scan_markdown_content is counterintuitive (true means "catch
malformed" using the relaxed pattern); either rename the parameter from `strict`
to `catch_malformed` (and update all callers and tests like
test_scan_markdown_strict_mode_catches_*) or add a clear doc comment on the
scan_markdown_content function explaining that `strict = true` enables the
relaxed matcher to report malformed IDs (and `false` uses the well-formed
matcher), then run and update tests to reflect the new name or preserved
docstring.

In `@gts-validator/src/format/yaml.rs`:
- Around line 12-34: The split_yaml_documents function currently only treats
lines equal to "---" as document separators; update it to treat both "---" and
"..." as separators so files using the explicit end marker are handled
correctly. In practice, change the separator check inside split_yaml_documents
(where line.trim() == "---") to a check that accepts either line.trim() == "---"
|| line.trim() == "..." so the logic that joins current_doc, pushes non-empty
docs, and clears current_doc works the same for both markers (this prevents
missing splits when a file contains "..." followed by "---").

Comment thread gts-validator/src/format/markdown.rs Outdated
@Artifizer
Copy link
Copy Markdown
Contributor

@devjow , please review my LLM-powered feedback, you can probably copy-paste it into IDE and then update your PR:

🧠 LLM TASK: Improve gts-validator Safety, Correctness and Architecture

You are working on a CI-grade repository scanning and validation
library
.
This tool runs on untrusted repositories and must behave like a
security-sensitive static analysis tool.

The current implementation has blocker / critical / major issues
that must be fixed before the crate can be safely used.

Your task is to refactor the crate to address the issues below.


🚨 BLOCKER --- Silent error swallowing

Problem

The scanner frequently returns empty results when failures occur:

Examples: - File cannot be read → returns empty list - JSON parsing
fails → returns empty list - File too large → returns empty list

This causes CI to show green results when the validator did not
actually run
.

This defeats the purpose of the tool.

Required change

Introduce explicit scan error reporting and never silently ignore
failures.

Add new result model

pub struct ValidationReport {
    pub validation_errors: Vec<ValidationError>,
    pub scan_errors: Vec<ScanError>,
    pub scanned_files: usize,
    pub failed_files: usize,
}

Add scan error type:

pub enum ScanErrorKind {
    IoError,
    FileTooLarge,
    JsonParseError,
    YamlParseError,
    InvalidEncoding,
}

Scanner must: - collect validation errors - collect scan errors - NEVER
silently discard failures

CI must be able to fail if scan errors > 0.


🚨 BLOCKER --- Symlink traversal is unsafe

Problem

follow_links defaults to true.

This allows: - Escaping repository root - Traversing system
directories - Reading secrets in CI - Infinite recursion loops

This is a real security vulnerability.

Required change

1) Change default

follow_links: false

2) Enforce repository boundary

let root = root.canonicalize()?;
let candidate = candidate.canonicalize()?;
if !candidate.starts_with(&root) {
    return Err(ScanErrorKind::OutsideRepository);
}

🚨 BLOCKER --- TOCTOU + memory DoS via read_to_string

Problem

Reading files directly into memory is unsafe and race-condition
vulnerable.

Required change

Use bounded streaming reads:

let mut file = File::open(path)?;
let mut buffer = Vec::new();
file.take(max_size as u64 + 1).read_to_end(&mut buffer)?;

if buffer.len() > max_size {
    return Err(ScanErrorKind::FileTooLarge);
}

🚨 CRITICAL --- Invalid JSON/YAML treated as success

Invalid structured files must produce a ScanError, not silent success.


🚨 CRITICAL --- Missing repository threat model

Add protections: - Ignore devices, pipes, sockets - Prevent infinite
recursion - Add limits: - max directory depth - max total scanned
files - max total bytes scanned


🚨 CRITICAL --- Vendor policy redesign

Replace:

pub vendor: Option<String>

With:

pub enum VendorPolicy {
    Any,
    MustMatch(String),
    AllowList(Vec<String>),
}

⚠️ MAJOR --- Separate core library from CLI formatting

Core crate must be pure logic only.
Move colored output and human formatting to CLI/formatter crate.


⚠️ MAJOR --- Replace fragile heuristics

Introduce discovery modes:

pub enum DiscoveryMode {
    StrictSpecOnly,
    Heuristic,
}

⚠️ MAJOR --- Builder pattern for public config

Replace public fields + non_exhaustive with builder API.


⚠️ MAJOR --- Add fuzz testing

Add cargo-fuzz targets for: - JSON scanner - Markdown scanner - GTS ID
extraction


🎯 Expected outcome

Validator must become a CI-grade static analysis tool: - Never
silently ignore failures - Safe against malicious repositories -
Deterministic layered architecture - Explicit policies - Memory and
recursion safe

Copy link
Copy Markdown
Contributor

@Artifizer Artifizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the comments above

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
gts-validator/src/lib.rs (2)

197-203: Sentinel vendor approach for AllowList is clever but fragile.

Using "\x00" as a sentinel relies on the assumption that no valid GTS vendor contains a null byte. While this is likely true per spec, the approach is non-obvious and could break if vendor validation rules change.

Consider adding a compile-time or runtime assertion, or documenting this invariant more prominently.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gts-validator/src/lib.rs` around lines 197 - 203, The Use of "\x00" as a
sentinel in effective_vendor_for_scanning (for VendorPolicy::AllowList) is
fragile; change this by defining a named constant (e.g., SENTINEL_VENDOR) and
add a runtime assertion inside effective_vendor_for_scanning (use debug_assert!
or a hard assert) that validates no legitimate vendor string may contain '\0'
before returning the sentinel, and/or call the existing vendor validation helper
(or add one) to prove vendors cannot contain NUL; ensure the assertion
references VendorPolicy::AllowList and SENTINEL_VENDOR so the invariant is
explicit and fails fast if vendor validation rules change.

158-166: YAML files with parse errors are counted as scanned AND failed.

When a YAML file has scan errors (lines 161-164), failed_files is incremented, but the code continues and scanned_files is incremented at line 170. This means a partially-parsed YAML file (some documents valid, some invalid) is counted in both scanned_files and failed_files.

This may be intentional behavior (the file was partially scanned), but consider whether the semantics should be:

  • Only increment scanned_files if all documents parsed successfully, or
  • Document this as expected behavior for multi-document YAML files
Option: Skip scanned_files increment if scan errors occurred
             Some(ContentFormat::Yaml) => {
                 let (val_errs, yaml_scan_errs) =
                     format::yaml::scan_yaml_content(&content, file_path, vendor, validation_config.scan_keys);
                 if !yaml_scan_errs.is_empty() {
                     failed_files += 1;
                     scan_errors.extend(yaml_scan_errs);
+                    // Don't count as scanned if any document failed to parse
+                    validation_errors.extend(apply_allow_list_filter(val_errs, &validation_config.vendor_policy));
+                    continue 'files;
                 }
                 val_errs
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gts-validator/src/lib.rs` around lines 158 - 166, The current YAML branch
(ContentFormat::Yaml) calls format::yaml::scan_yaml_content and increments
failed_files when yaml_scan_errs is non-empty, but the caller still increments
scanned_files later, causing files with parse errors to be counted as both
scanned and failed; update the caller logic so that scanned_files is only
incremented when yaml_scan_errs.is_empty() (i.e., all documents parsed
successfully), or alternatively document the intended semantics—identify the
code paths using failed_files, scanned_files, ContentFormat::Yaml and the call
to format::yaml::scan_yaml_content and modify the post-call bookkeeping to skip
incrementing scanned_files when yaml_scan_errs contains errors.
gts-validator/src/format/json.rs (1)

83-90: Filename heuristic may have edge cases.

The heuristic contains("~.") && rfind('.') > rfind('~') aims to skip schema filenames like gts.x.core.type.v1~.schema.json, but could also match legitimate GTS IDs in unusual contexts (e.g., a string containing both a GTS ID and an unrelated tilde-dot sequence).

Consider documenting this trade-off or making the heuristic more specific (e.g., checking for common file extensions after the tilde).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gts-validator/src/format/json.rs` around lines 83 - 90, The current heuristic
using looks_like_filename and candidate_str is too broad; update the check so
instead of just contains("~.") && rfind('.') > rfind('~'), verify that the
sequence after the last '~' corresponds to a known file extension (e.g., the
slice after candidate_str.rfind('~') starts with '.' and matches extensions like
json, schema.json, yaml, yml, etc.), and use that refined predicate in the same
conditional that checks candidate_str.starts_with("gts://gts.") ||
candidate_str.starts_with("gts."); also add a brief comment next to
looks_like_filename documenting the trade-off and which extensions are
considered to avoid false positives.
gts-validator/src/format/yaml.rs (1)

12-34: Manual YAML document splitting may miss edge cases.

The split_yaml_documents function splits on line.trim() == "---" but:

  1. YAML spec allows --- with trailing content (e.g., --- !tag)
  2. ... is also a valid document end marker that should not split
  3. --- inside block scalars or quoted strings should not trigger splits

Since this is only used as a fallback when serde_saphyr::from_multiple fails, the impact is limited to partial recovery scenarios. However, consider using a more robust approach or documenting the limitations.

Potential improvement
 fn split_yaml_documents(content: &str) -> Vec<String> {
+    // NOTE: This is a best-effort fallback for when serde_saphyr::from_multiple fails.
+    // It handles common cases but may not correctly handle:
+    // - Document markers with trailing content (--- !tag)
+    // - Document end markers (...)
+    // - Markers inside block scalars or quoted strings
     let mut documents = Vec::new();
     let mut current_doc: Vec<&str> = Vec::new();
 
     for line in content.lines() {
-        if line.trim() == "---" {
+        // Only split on standalone "---" (optionally with trailing whitespace)
+        if line.trim_start().starts_with("---") && line.trim().len() <= 3 {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gts-validator/src/format/yaml.rs` around lines 12 - 34, split_yaml_documents
currently splits on line.trim() == "---" which misses cases like trailing tags
(e.g., `--- !tag`), incorrectly treats `...` as a split, and can split inside
block scalars/quoted strings; update split_yaml_documents to be YAML-aware by
either (preferred) delegating to a YAML tokenizer/parser to split documents, or
(if keeping a fallback) replace the simple trim check with a regex anchored to
the line start to match `^---(?:\s|$)` (so `--- !tag` still matches) and add
minimal state tracking for block scalars and quoted strings (track when a line
opens a block scalar with `|` or `>` and skip splits until the scalar ends, and
toggle quote_state when encountering unescaped `'` or `"` characters) so that
function split_yaml_documents only splits on real document start markers and
ignores `...` and markers inside scalars/strings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@gts-validator/Cargo.toml`:
- Around line 37-38: Update the dev-dependency version for tempfile in
Cargo.toml to a non-vulnerable release (at least 3.15.0); replace the existing
tempfile = "3.8" entry with tempfile = "3.15" or a specific patch like "3.15.2"
to remediate the AIKIDO-2025-10003 DoS vulnerability and then run cargo update
to refresh the lockfile.

---

Duplicate comments:
In `@gts-validator/src/strategy/fs.rs`:
- Around line 81-85: The code silently drops invalid glob patterns when building
exclude_patterns; change the logic that uses Pattern::new for config.exclude so
you collect both valid patterns and invalid ones (e.g., map each pattern to
Result and partition into ok and err), then surface the invalid patterns—either
log a warning via the project's logger or return a descriptive error from the
surrounding function in fs.rs—so users are notified which patterns (like
"[invalid") failed to parse; reference the exclude_patterns variable and the use
of Pattern::new when locating the change.

In `@gts-validator/tests/validate_fs_tests.rs`:
- Around line 28-35: The test function test_validate_fs_nonexistent_path_errors
uses a hard-coded absolute path; replace that with a path under a temp directory
to ensure portability: create a temp dir or use std::env::temp_dir() plus a
randomized/nonexistent child name, pass that PathBuf into default_fs_config,
then call validate_fs(...) and assert the same error. Update references in the
test (test_validate_fs_nonexistent_path_errors, default_fs_config, validate_fs,
default_validation_config) so the test no longer relies on /nonexistent/... but
instead derives the non-existent path from a temp location.
- Around line 180-183: The inline test comment is misleading:
fs_config_with_exclude.exclude = vec!["test.md".to_owned()] excludes a file, not
a directory; update the comment above the test case (near
fs_config_with_exclude) to say something like "With exclude: excluded file
should be skipped, only included.md scanned" or, if you intended to exclude a
directory, replace the exclude pattern with a directory pattern (e.g.,
"excluded_dir/" or the actual directory name) and adjust the test expectations
accordingly.

---

Nitpick comments:
In `@gts-validator/src/format/json.rs`:
- Around line 83-90: The current heuristic using looks_like_filename and
candidate_str is too broad; update the check so instead of just contains("~.")
&& rfind('.') > rfind('~'), verify that the sequence after the last '~'
corresponds to a known file extension (e.g., the slice after
candidate_str.rfind('~') starts with '.' and matches extensions like json,
schema.json, yaml, yml, etc.), and use that refined predicate in the same
conditional that checks candidate_str.starts_with("gts://gts.") ||
candidate_str.starts_with("gts."); also add a brief comment next to
looks_like_filename documenting the trade-off and which extensions are
considered to avoid false positives.

In `@gts-validator/src/format/yaml.rs`:
- Around line 12-34: split_yaml_documents currently splits on line.trim() ==
"---" which misses cases like trailing tags (e.g., `--- !tag`), incorrectly
treats `...` as a split, and can split inside block scalars/quoted strings;
update split_yaml_documents to be YAML-aware by either (preferred) delegating to
a YAML tokenizer/parser to split documents, or (if keeping a fallback) replace
the simple trim check with a regex anchored to the line start to match
`^---(?:\s|$)` (so `--- !tag` still matches) and add minimal state tracking for
block scalars and quoted strings (track when a line opens a block scalar with
`|` or `>` and skip splits until the scalar ends, and toggle quote_state when
encountering unescaped `'` or `"` characters) so that function
split_yaml_documents only splits on real document start markers and ignores
`...` and markers inside scalars/strings.

In `@gts-validator/src/lib.rs`:
- Around line 197-203: The Use of "\x00" as a sentinel in
effective_vendor_for_scanning (for VendorPolicy::AllowList) is fragile; change
this by defining a named constant (e.g., SENTINEL_VENDOR) and add a runtime
assertion inside effective_vendor_for_scanning (use debug_assert! or a hard
assert) that validates no legitimate vendor string may contain '\0' before
returning the sentinel, and/or call the existing vendor validation helper (or
add one) to prove vendors cannot contain NUL; ensure the assertion references
VendorPolicy::AllowList and SENTINEL_VENDOR so the invariant is explicit and
fails fast if vendor validation rules change.
- Around line 158-166: The current YAML branch (ContentFormat::Yaml) calls
format::yaml::scan_yaml_content and increments failed_files when yaml_scan_errs
is non-empty, but the caller still increments scanned_files later, causing files
with parse errors to be counted as both scanned and failed; update the caller
logic so that scanned_files is only incremented when yaml_scan_errs.is_empty()
(i.e., all documents parsed successfully), or alternatively document the
intended semantics—identify the code paths using failed_files, scanned_files,
ContentFormat::Yaml and the call to format::yaml::scan_yaml_content and modify
the post-call bookkeeping to skip incrementing scanned_files when yaml_scan_errs
contains errors.

Comment thread gts-validator/Cargo.toml Outdated
@devjow devjow force-pushed the feat/gts-validator branch from 4bcb80f to 179b618 Compare February 19, 2026 19:14
Comment thread gts-validator/tests/validate_fs_tests.rs Outdated
Comment thread gts-validator/tests/validate_fs_tests.rs
Comment thread gts-validator/src/format/json.rs Outdated
Comment thread gts-validator/src/format/json.rs Outdated
@devjow devjow force-pushed the feat/gts-validator branch from 179b618 to 3c52e03 Compare February 20, 2026 11:11
Signed-off-by: devjow <me@devjow.com>
@devjow devjow force-pushed the feat/gts-validator branch from 3c52e03 to 80bf31d Compare February 20, 2026 17:34
Comment on lines +38 to +42
if (trimmed.starts_with('"') && trimmed.ends_with('"'))
|| (trimmed.starts_with('\'') && trimmed.ends_with('\''))
{
trimmed = &trimmed[1..trimmed.len() - 1];
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Panic on single-character quote strings. If the input is a single quote character like " or ', this code will panic with an invalid range.

When trimmed is ", the conditions trimmed.starts_with('"') and trimmed.ends_with('"') are both true, causing the slice operation &trimmed[1..trimmed.len() - 1] which becomes &trimmed[1..0] - an invalid range that panics.

Fix:

if trimmed.len() >= 2
    && ((trimmed.starts_with('"') && trimmed.ends_with('"'))
        || (trimmed.starts_with('\'') && trimmed.ends_with('\''')))
{
    trimmed = &trimmed[1..trimmed.len() - 1];
}
Suggested change
if (trimmed.starts_with('"') && trimmed.ends_with('"'))
|| (trimmed.starts_with('\'') && trimmed.ends_with('\''))
{
trimmed = &trimmed[1..trimmed.len() - 1];
}
if trimmed.len() >= 2
&& ((trimmed.starts_with('"') && trimmed.ends_with('"'))
|| (trimmed.starts_with('\'') && trimmed.ends_with('\'')))
{
trimmed = &trimmed[1..trimmed.len() - 1];
}

Spotted by Graphite Agent

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Comment thread gts-validator/src/format/json.rs Outdated
Signed-off-by: devjow <me@devjow.com>
@devjow devjow force-pushed the feat/gts-validator branch from 80bf31d to 6cc2e36 Compare February 23, 2026 10:54
@Artifizer Artifizer merged commit fa7169c into GlobalTypeSystem:main Feb 23, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants