Skip to content

feat: Implement libmagic meta-type directives and format substitution (#42)#230

Merged
unclesp1d3r merged 16 commits into
mainfrom
42-parser-implement-default-clear-name-use-and-indirect-meta-types
Apr 24, 2026
Merged

feat: Implement libmagic meta-type directives and format substitution (#42)#230
unclesp1d3r merged 16 commits into
mainfrom
42-parser-implement-default-clear-name-use-and-indirect-meta-types

Conversation

@unclesp1d3r
Copy link
Copy Markdown
Member

Closes #42.

Summary

  • All six MetaType variants (Default, Clear, Name, Use, Indirect, Offset) are fully evaluated with GNU file-compatible semantics: default fires only when no sibling at the same level matched, clear resets the flag, name/use enable subroutines with endian-flip via RuleEnvironment, indirect re-applies the root rule set at the resolved offset bounded by max_recursion_depth, and offset reports the resolved file position as Value::Uint(pos).
  • Printf-style format substitution (src/output/format.rs::format_magic_message) supports %d, %i, %u, %x, %X, %o, %s, %c, %% with width/padding/length modifiers. Hex specifiers mask to TypeKind::bit_width() so signed-byte -1 renders as ff, not sign-extended ffffffffffffffff. Alt-form follows C printf: %#06x + 0xab0x00ab.
  • Continuation-sibling anchor reset and subroutine base_offset close the semantic gaps that blocked byte-for-byte parity with GNU file. Top-level siblings still chain per GOTCHAS S3.8 (regression-guarded).

What changed

Phase 1 (AST + parser + name/use + default/clear/indirect dispatch) and Phase 2 (this PR's close-out: offset evaluator, printf substitution, parser x-strip fix, sibling-anchor semantic fix, subroutine offset bias) are bundled because they're a single cohesive issue-#42 unit.

Area Files Notes
AST src/parser/ast.rs TypeKind::Meta(MetaType) enum with six variants; Offset doc comment updated to reflect live semantics
Parser src/parser/grammar/mod.rs, src/parser/name_table.rs, src/parser/loader.rs, src/parser/mod.rs, src/parser/types.rs, src/parser/codegen.rs strip_optional_x_operator helper stops x leaking into messages; extract_name_table hoists Meta::Name subtrees at load time; ParsedMagic { rules, name_table } replaces the old Vec<MagicRule> parser return
Evaluator src/evaluator/engine/mod.rs, src/evaluator/mod.rs, src/evaluator/offset/mod.rs, src/evaluator/strength.rs, src/evaluator/types/mod.rs All six Meta dispatches, is_child_sibling_list gate for continuation-sibling anchor reset, base_offset on EvaluationContext, new resolve_offset_with_base
Output src/output/format.rs (new), src/output/mod.rs, src/lib.rs format_magic_message pure function; MagicDatabase::concatenate_messages runs messages through it before the \b check (GOTCHAS S14.1)
Tests tests/meta_types_integration.rs (new), src/evaluator/engine/tests.rs, tests/* Un-ignored test_searchbug_matches_full_result_string; 18 format unit tests; 7 Offset dispatch tests; synthetic default/clear/indirect scenarios
Docs AGENTS.md, GOTCHAS.md, docs/solutions/integration-issues/meta-type-subroutine-dispatch-architecture.md, docs/src/*.md, ROADMAP.md, CHANGELOG.md S14.2 rewritten, new S3.10 (subroutine base offset), S3.9→S3.11 with S14.1 cross-ref fixed
Build .github/workflows/release.yml, build.rs, src/build_helpers.rs, src/error.rs dist generate refresh; Phase 1 plumbing for NameTable/RuleEnvironment

Net: 41 files changed, +4661 / -205 lines.

Test Plan

  • mise exec -- cargo nextest run — 1348/1348 pass, 5 pre-existing skipped
  • mise exec -- just ci-check — green (fmt, clippy -D warnings, cargo test, nextest, audit, deny, machete, mdformat)
  • test_searchbug_matches_full_result_string produces byte-for-byte parity with third_party/tests/searchbug.result: Testfmt (0) found_ABC followed_by 0x31 at_offset 11 (64) found_ABC followed_by 0x32 at_offset 75
  • No new unsafe, unwrap(), or expect() in library code
  • GOTCHAS S3.8 top-level sibling chaining preserved (relative_anchor_can_decrease_when_later_sibling_matches_at_lower_position still passes)
  • MagicRule serde back-compat: old serialized rules deserialize unchanged

Residual items (separate follow-up PRs)

Documented in .context/compound-engineering/ce-review/20260422-101802-31ab7e83/synthesis.md (local-only). Non-blocking:

  • COR-003 / COR-004: AnchorScope in Indirect arm and manual save/restore in evaluate_use_rule leave anchor state mutated on RecursionLimitExceeded error paths. Low-probability with default max_recursion_depth=20; worth RAII refactor in its own PR.
  • M-02: Child-evaluation graceful-error match block is duplicated across four dispatch arms. Extracting an evaluate_children_gracefully helper is worthwhile but risks subtle behavior drift.
  • Open question: does a dedicated max_indirect_depth cap (separate from max_recursion_depth) make sense for 1.0? Needs a pathological-fixture smoke test first.

Compatibility impact

Breaking behavior change (library consumers): rule messages containing % that were previously passed through verbatim will now be interpreted as format specifiers. Literal % in user-facing data must be escaped as %%. Documented in GOTCHAS S14.2 rewrite. Recommend calling out in the changelog for the next release.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…#42)

Complete issue #42 by landing the remaining Phase 2 work on top of the
prior Phase 1 AST + parser + `name`/`use` dispatch. Closes byte-for-byte
parity with GNU `file`'s `searchbug.result` fixture.

Meta-type directives
- `TypeKind::Meta(MetaType::{Default, Clear, Name, Use, Indirect, Offset})`
  all fully evaluated in `src/evaluator/engine/mod.rs`.
- `default` fires only when no sibling at the same level has matched;
  `clear` resets the per-level `sibling_matched` flag so later `default`
  rules can fire again.
- `name` subroutines are hoisted into a `NameTable` at load time
  (`parser::name_table::extract_name_table`). `use` invokes them via
  `RuleEnvironment` threaded through `EvaluationContext::rule_env`.
- `indirect` re-applies the root rule set at the resolved offset via
  `AnchorScope`, bounded by `EvaluationConfig::max_recursion_depth`.
- `offset` reports the resolved file offset as `Value::Uint(pos)` for
  printf-style substitution.

Printf-style format substitution
- New pure function `src/output/format.rs::format_magic_message`
  supporting `%d`, `%i`, `%u`, `%x`, `%X`, `%o`, `%s`, `%c`, `%%`, plus
  width/padding/length modifiers. Hex specifiers mask to the natural
  `TypeKind::bit_width()` so a signed byte carrying `-1` renders as
  `ff` rather than sign-extended `ffffffffffffffff`.
- Alt-form width follows C printf: zero-pad inserts between prefix and
  digits (`%#06x` + `0xab` -> `0x00ab`), space-pad goes before the
  prefix, left-align trails the digits.
- Wired into `MagicDatabase::concatenate_messages` before the existing
  `\b` backspace-suppression check (GOTCHAS S14.1).

Parser and evaluator semantics
- Parser consumes an optional `x` (AnyValue) token between a Meta
  keyword and its message so `offset x at_offset %lld` no longer
  leaks the operator into output.
- Subroutine `base_offset` on `EvaluationContext`: inside a `use` body,
  positive `OffsetSpec::Absolute(n)` resolves to `base_offset + n`,
  matching magic(5) semantics. Negative absolute, `FromEnd`,
  `Relative`, and `Indirect` are unaffected (new GOTCHAS S3.10).
- Continuation-sibling anchor reset: at `recursion_depth > 0`, each
  sibling resolves `&N` against the parent-level entry anchor rather
  than the previous sibling's advance. Top-level siblings (depth 0)
  keep chaining per GOTCHAS S3.8 (`relative_anchor_can_decrease_...`
  still passes).

Tests
- `test_searchbug_matches_full_result_string` un-ignored and passing
  against the GNU `file` `searchbug.testfile` fixture.
- 18 unit tests for `format_magic_message` including regression guards
  for alt-form zero-padding.
- 7 new engine tests for `MetaType::Offset` dispatch (match emission,
  offset-at-zero, out-of-bounds skip, non-`AnyValue` operator reject,
  children, anchor semantics, `sibling_matched` propagation).
- Integration test file `tests/meta_types_integration.rs` covers
  `default`/`clear`/`indirect` synthetic scenarios plus the searchbug
  round-trip.

Documentation
- AGENTS.md "Currently Implemented" expanded with the six meta-types
  and printf-style substitution; "Current Limitations" updated.
- GOTCHAS.md S14.2 rewritten (specifiers ARE substituted); new S3.10
  for subroutine base offset; S3.9 renumbered to S3.11 with S14.1
  cross-reference fixed.
- `docs/solutions/integration-issues/meta-type-subroutine-dispatch-architecture.md`
  documents the three-layer parse-time / `ParsedMagic` / optional
  `RuleEnvironment` pattern.

Build tooling
- `dist generate` regenerated `.github/workflows/release.yml` to
  resolve cargo-dist staleness vs recent `actions/upload-artifact`
  dependabot bumps.

Test results: 1348/1348 pass, 5 pre-existing skipped. `just ci-check`
green (clippy `-D warnings`, fmt, audit, deny).

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Copilot AI review requested due to automatic review settings April 22, 2026 23:30
@unclesp1d3r unclesp1d3r linked an issue Apr 22, 2026 that may be closed by this pull request
7 tasks
@dosubot dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Apr 22, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 22, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Require conventional commit format per https://www.conventionalcommits.org/en/v1.0.0/. Skipped for bots.

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?!?:

🟢 Full CI must pass

Wonderful, this rule succeeded.

All CI checks must pass. Release-plz PRs are exempt because they only bump versions and changelogs (code was already tested on main), and GITHUB_TOKEN-triggered force-pushes suppress CI.

  • check-success = coverage
  • check-success = quality
  • check-success = test
  • check-success = test-cross-platform (macos-latest, macOS)
  • check-success = test-cross-platform (ubuntu-22.04, Linux)
  • check-success = test-cross-platform (ubuntu-latest, Linux)
  • check-success = test-cross-platform (windows-latest, Windows)

🟢 Do not merge outdated PRs

Wonderful, this rule succeeded.

Make sure PRs are within 10 commits of the base branch before merging

  • #commits-behind <= 10

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

Summary by CodeRabbit

  • New Features

    • Added meta-type directives: default, clear, name, use, indirect, offset; and printf-style format substitution in rule messages.
  • Breaking Changes

    • Parser/load now returns rules plus a hoisted name-table; callers must destructure the returned value to access rules and named subroutines.
  • Documentation

    • Updated docs to document meta directives, evaluation behavior, name-hoisting, offset semantics, and message-formatting rules.
  • Tests

    • Expanded test suites covering meta dispatch, offsets, use/name subroutines, and message formatting.

Walkthrough

Implements meta-type directives (default, clear, name, use, indirect, offset), hoists/merges named subroutines into a parser NameTable via ParsedMagic, threads a RuleEnvironment through evaluation with base-offset/recursion guards, adds printf-style message formatting, and expands docs/tests accordingly.

Changes

Cohort / File(s) Summary
Docs & Guides
AGENTS.md, CHANGELOG.md, GOTCHAS.md, ROADMAP.md, docs/src/...
Documented meta-type directives, ParsedMagic { rules, name_table } return type, evaluator dispatch semantics, printf-style message formatting, and an "Adding a new meta-type" guide; updated roadmap and breaking-changes notes.
Build & Metadata
Cargo.toml, build.rs, src/build_helpers.rs
Updated package authors metadata; build script and helper capture ParsedMagic and forward parsed.rules to codegen.
Parser AST & Types
src/parser/ast.rs, src/parser/types.rs
Added MetaType enum and TypeKind::Meta(MetaType); bit_width() returns None for meta; type-keyword mapping updated for meta directives.
Parser Grammar & Tests
src/parser/grammar/..., src/parser/grammar/tests/..., src/parser/grammar/tests/meta_types.rs
Added & relative-offset parsing, dedicated meta parsers for name/use, meta-rule operand/ message parsing, and fallback bare-string parsing for string-family types; new and adjusted parser tests.
Parser API, Loader & NameTable
src/parser/mod.rs, src/parser/loader.rs, src/parser/name_table.rs, src/parser/codegen.rs
Introduced ParsedMagic { rules, name_table }, hoisted top-level name subroutines into NameTable, merging logic across files, and updated codegen to serialize/escape MetaType names.
Errors
src/error.rs
Added EvaluationError::UnknownName { name } and IndirectWithoutEnvironment plus constructor helper.
Evaluator Context & Env
src/evaluator/mod.rs
Added RuleEnvironment, EvaluationContext.rule_env: Option<Arc<...>>, base_offset, indirect_reentry flag and accessors/mutators; reset behavior updated.
Evaluator Engine & Meta Dispatch
src/evaluator/engine/mod.rs, src/evaluator/engine/tests/mod.rs
Refactored evaluate_rules to inline-dispatch meta directives; added evaluate_use_rule, evaluate_children_or_warn, recursion/anchor handling, base-offset propagation, and diagnostics; wired tests/helpers.
Evaluator Offset Resolution
src/evaluator/offset/mod.rs
Added resolve_offset_with_base to apply subroutine base_offset biasing for positive Absolute offsets; delegated existing resolver.
Evaluator Types & Strength
src/evaluator/types/mod.rs, src/evaluator/strength.rs
String read/consumption semantics adjusted to use literal length when max_length absent; meta-types handled as zero-byte; strength calculation extended to include meta-type axis.
Output Formatting
src/output/format.rs, src/output/mod.rs
New format_magic_message(template, value, type_kind) implementing a constrained printf-like substitution (flags, width, conversions, masking); exported via pub mod format.
Library Core & Tests
src/lib.rs, src/tests.rs, src/evaluator/engine/tests/..., src/evaluator/engine/tests/helpers/...
MagicDatabase and loading updated to store root_rules + name_table; evaluation constructs RuleEnvironment and threads it; extensive meta-type test suites and helpers added.
Misc
dist-workspace.toml, docs/examples/...
Pin small GH Actions update; docs and examples updated to use new ParsedMagic pattern.

Sequence Diagram(s)

sequenceDiagram
    participant Parser
    participant NameTable as NameTable (Extraction)
    participant ParsedMagic

    Parser->>Parser: parse_text_magic_file(input)
    Parser->>Parser: build rule hierarchy
    Parser->>NameTable: extract_name_table(rules)
    NameTable-->>Parser: rules (sans Name), name_table
    Parser->>ParsedMagic: return ParsedMagic { rules, name_table }
Loading
sequenceDiagram
    participant Evaluator
    participant RuleEnv as RuleEnvironment (name_table)
    participant Subrules as SubroutineRules
    participant Formatter as OutputFormatter

    Evaluator->>Evaluator: evaluate_rules(root_rules)
    Evaluator->>RuleEnv: lookup name for MetaType::Use
    RuleEnv-->>Evaluator: Arc<[MagicRule]> (subroutine)
    Evaluator->>Subrules: evaluate_use_rule(subroutine, base_offset)
    Subrules-->>Evaluator: matches + terminal anchor
    Evaluator->>Formatter: format_magic_message(template, value, type_kind)
    Formatter-->>Evaluator: formatted text
    Evaluator-->>Evaluator: integrate matches & return
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related PRs

Suggested labels

parser, evaluator, output, testing

Poem

Meta rules whispered in the file, ✨
Names hoisted, subroutines compile, 🔁
Default guards where others fail, 🛡️
Indirect dives a nested trail, 🕳️
Formatted messages sew the tale. 🪄

🚥 Pre-merge checks | ✅ 10
✅ Passed checks (10 passed)
Check name Status Explanation
Title check ✅ Passed Title follows Conventional Commits with type 'feat', scope 'meta-type directives and format substitution', and issue reference (#42); accurately reflects the main implementation.
Description check ✅ Passed Description is detailed and directly related to the changeset, covering all six MetaType variants, printf-style substitution, semantic fixes, test results, and compatibility impact.
Linked Issues check ✅ Passed All core objectives from issue #42 are met: parser recognizes meta-types, default/clear/name/use/indirect are fully evaluated, unit and integration tests added [#42].
Out of Scope Changes check ✅ Passed All changes align with #42 objectives. Minor scope items (GitHub Actions version, Cargo.toml author field) are incidental metadata updates unrelated to meta-type implementation.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 85.00%.
Memory-Safety-Check ✅ Passed Pull request introduces no unsafe code blocks and maintains strict memory safety throughout all new modules using idiomatic Rust patterns.
Libmagic-Compatibility-Check ✅ Passed PR implements all six meta-type directives with GNU file-compatible semantics verified through comprehensive test coverage.
Performance-Regression-Check ✅ Passed Arc sharing for rules and name-tables uses efficient zero-copy with clones only at context creation. Hot-path evaluation loops have no Vec allocations. Message formatting deferred post-evaluation with proper capacity pre-allocation. Memory-mapped I/O correctly used for buffer loading. No performance regressions detected.
Test-Coverage-Check ✅ Passed PR demonstrates strong test coverage with 60 test functions across ~2,003 lines covering all six MetaType variants, all printf-style format specifiers, error conditions, and boundary cases systematically.
Error-Handling-Check ✅ Passed PR demonstrates comprehensive error handling with Result-based APIs, thiserror derive macros, proper error propagation via ? operator, and safe arithmetic using checked operations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Warning

Review ran into problems

🔥 Problems

These MCP integrations need to be re-authenticated in the Integrations settings: Linear, Notion


Comment @coderabbitai help to get the list of available commands and usage tips.

@dosubot dosubot Bot added enhancement New feature or request evaluator Rule evaluation engine and logic output Result formatting and output generation parser Magic file parsing components and grammar labels Apr 22, 2026
@coderabbitai coderabbitai Bot added the testing Test infrastructure and coverage label Apr 22, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements GNU file-compatible meta-type directive parsing + evaluation and adds printf-style message substitution so real-world magic corpora (e.g. searchbug.magic) can round-trip end-to-end within libmagic-rs.

Changes:

  • Add TypeKind::Meta(MetaType) with evaluation semantics for default, clear, name/use, indirect, and offset, including parse-time name extraction into a NameTable and runtime dispatch via RuleEnvironment.
  • Introduce output::format::format_magic_message and wire it into MagicDatabase message concatenation for %d/%u/%x/%s/... substitutions.
  • Update parser API to return ParsedMagic { rules, name_table }, plus extensive tests/docs/bench updates to cover the new behavior.

Reviewed changes

Copilot reviewed 42 out of 43 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
tests/regex_search_corpus_tests.rs Adapts tests to destructure ParsedMagic from parse_text_magic_file.
tests/property_tests.rs Extends generators and adds a meta-type “never panics” property test.
tests/property_tests.proptest-regressions Checks in proptest regression seeds for meta-type cases.
tests/parser_integration_tests.rs Updates loader tests for ParsedMagic and adds a name/use round-trip test.
tests/meta_types_integration.rs Adds end-to-end integration tests for meta-types and searchbug parity.
tests/directory_loading_tests.rs Updates directory loader tests to destructure ParsedMagic.
tests/compatibility_tests.rs Adds searchbug partial-match regression coverage.
src/parser/types.rs Adds meta-type keywords and maps keyword-only meta-types into TypeKind::Meta.
src/parser/name_table.rs Implements NameTable and extraction/scrubbing logic for name subroutines.
src/parser/mod.rs Introduces ParsedMagic and changes parse_text_magic_file to return it.
src/parser/loader.rs Updates directory/file loading to merge rule lists + name tables into ParsedMagic.
src/parser/grammar/tests/mod.rs Adds parsing tests for relative offsets, meta-types, and searchbug.magic fixture.
src/parser/grammar/mod.rs Adds &N relative offsets, meta-type parsing (name/use), x-strip, and bare-string fallback for string-family types.
src/parser/codegen.rs Adds codegen support for TypeKind::Meta(MetaType::...) and injection-escape test.
src/parser/ast.rs Adds MetaType and TypeKind::Meta, plus tests and bit_width() behavior.
src/output/mod.rs Exposes new output::format module.
src/output/format.rs Adds printf-style substitution implementation + unit tests.
src/lib.rs Threads RuleEnvironment through evaluation and applies message formatting during concatenation.
src/evaluator/types/mod.rs Updates string semantics to be pattern-length-based when applicable and handles TypeKind::Meta as unsupported for reads/compare.
src/evaluator/strength.rs Defines strength ordering behavior for meta-types and adds tests.
src/evaluator/offset/mod.rs Adds resolve_offset_with_base to support subroutine base offsets.
src/evaluator/mod.rs Adds RuleEnvironment and stores optional environment + base_offset on EvaluationContext.
src/evaluator/engine/mod.rs Implements meta-type dispatch (default/clear/use/indirect/offset), continuation-sibling anchor reset, and subroutine base-offset handling.
src/error.rs Adds reserved evaluation errors for unknown-name and indirect-without-environment.
src/build_helpers.rs Updates codegen path to use ParsedMagic.rules.
mise.lock Updates tool lock entries (python build-standalone artifacts, rust version, provenance fields).
docs/src/parser.md Documents ParsedMagic and meta-types (needs updating for offset implementation status).
docs/src/magic-format.md Adds a meta-types section (currently missing offset).
docs/src/evaluator.md Updates evaluator architecture docs and shows meta-type dispatch behavior.
docs/src/ast-structures.md Documents TypeKind::Meta (currently missing MetaType::Offset in snippet).
docs/src/architecture.md Updates architecture overview to include NameTable/ParsedMagic/RuleEnvironment.
docs/solutions/integration-issues/meta-type-subroutine-dispatch-architecture.md Adds an architecture decision record for meta-type dispatch + environments.
docs/MAGIC_FORMAT.md Mirrors meta-type documentation in the top-level docs (currently missing offset).
docs/ARCHITECTURE.md Updates high-level repository architecture and MagicDatabase fields.
build.rs Updates build-time codegen to use ParsedMagic.rules.
benches/evaluation_bench.rs Adds a benchmark for name/use subroutine dispatch overhead.
ROADMAP.md Marks meta-types (#42) as completed.
GOTCHAS.md Adds gotchas for meta-type dispatch, ParsedMagic, bare-string fallback, and format substitution.
Cargo.toml Updates authors metadata.
CHANGELOG.md Documents meta-type and ParsedMagic breaking change.
AGENTS.md Updates project structure and implemented-feature list to include meta-types + format substitution.
.github/workflows/release.yml Downgrades actions/upload-artifact version pins.

Comment thread docs/src/ast-structures.md
Comment thread src/output/format.rs
Comment thread src/output/format.rs
Comment thread src/evaluator/engine/mod.rs Outdated
Comment thread src/parser/ast.rs Outdated
Comment thread src/parser/types.rs Outdated
Comment thread docs/src/parser.md Outdated
Comment thread tests/meta_types_integration.rs Outdated
Comment thread docs/src/magic-format.md
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 22, 2026

@dosubot
Copy link
Copy Markdown
Contributor

dosubot Bot commented Apr 22, 2026

Related Documentation

2 document(s) may need updating based on files changed in this PR:

libMagic-rs

api-reference /libmagic-rs/blob/main/docs/src/api-reference.md — ⏳ Awaiting Merge
API_REFERENCE /libmagic-rs/blob/main/docs/API_REFERENCE.md — ⏳ Awaiting Merge

How did I do? Any feedback?

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 10

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/src/ast-structures.md`:
- Around line 455-491: The docs for MetaType are missing the Offset variant:
update the MetaType enum documentation and examples to include MetaType::Offset
(and describe its semantics) so the AST docs match the code; modify the enum
block under MetaType to add an Offset variant description and add corresponding
examples creating TypeKind::Meta(MetaType::Offset) alongside the existing
Default/Clear/Name/Use/Indirect cases.

In `@docs/src/parser.md`:
- Around line 452-457: Documentation for ParsedMagic is inaccurate: the struct
field name_table is declared as pub(crate) in the implementation but the docs
show it as pub. Either update the docs code block to match the implementation by
changing the declaration to pub(crate) name_table in the ParsedMagic snippet, or
if external access was intended, change the implementation's ParsedMagic struct
to pub name_table; ensure you update whichever of ParsedMagic or name_table you
modify so the docs and the implementation remain consistent.

In `@src/build_helpers.rs`:
- Around line 32-35: The build step currently parses builtin magic with
parse_text_magic_file and then calls generate_builtin_rules(&parsed.rules) but
discards parsed.name_table (extracted by extract_name_table), so name/use
subroutines are lost; either update the code generation to accept and serialize
the name_table (pass parsed.name_table into generate_builtin_rules and include
it in the generated BUILTIN_RULES static), or add a validation step after
parse_text_magic_file that inspects parsed.name_table and returns an error
rejecting any name/use directives in builtin magic (with a clear build-time
message); modify the call site around parse_text_magic_file and
generate_builtin_rules to implement one of these two fixes and ensure the
generated output includes or rejects parsed.name_table accordingly.

In `@src/evaluator/engine/mod.rs`:
- Around line 1124-1136: The debug-only panic must be converted into a proper
error return: in evaluate_rules_with_config, replace the debug_assert!
containing contains_indirect_rule(rules) with an explicit runtime check that
returns Err(...) when an env-less indirect rule is detected; use the existing
error constructor
(crate::error::EvaluationError::indirect_without_environment()) or map it into
the public error type the function exposes (e.g., LibmagicError) so callers
receive a Result::Err instead of panicking; keep original release behavior for
non-error paths and ensure the check references contains_indirect_rule and
crate::error::EvaluationError::indirect_without_environment() so the change is
easy to locate.
- Around line 546-577: Extract the duplicated child-evaluation block in
evaluate_rules into a helper (suggest name evaluate_children_or_warn) that
encapsulates RecursionGuard::enter(context) + evaluate_rules(&rule.children,
buffer, guard.context()) and implements the exact error handling: propagate
LibmagicError::Timeout, return Err for other errors except swallow and warn on
the specific EvaluationError variants (BufferOverrun, InvalidOffset,
TypeReadError::BufferOverrun/InvalidPStringLength) and LibmagicError::IoError
while logging the same message including rule.message and the error; update the
Default/Indirect/Offset/Use branches in evaluate_rules to call this helper and
use its returned matches (or empty/handled result) so behavior is identical but
the recursion/timeout/warn logic is centralized.
- Around line 264-281: The code saves and restores EvaluationContext's
last_match_end and base_offset only on the success path, so if
RecursionGuard::enter or evaluate_rules returns Err the context is left mutated;
fix by introducing an RAII scope guard (e.g. UseScope) that captures
saved_anchor = context.last_match_end() and saved_base = context.base_offset(),
sets the new offsets via set_last_match_end(absolute_offset) and
set_base_offset(absolute_offset) in its enter() constructor, exposes a context()
method for use with RecursionGuard::enter and evaluate_rules, and implements
Drop to always restore the saved_anchor and saved_base using set_last_match_end
and set_base_offset so restoration happens on every exit path (including errors)
instead of the current manual restore around
evaluate_rules/RecursionGuard::enter.

In `@src/output/format.rs`:
- Around line 270-277: Change the octal alternate-form prefix from "0o" to the
C-style "0" in the Conv::Octal branch: when spec.alt_form is true, pass "0" (not
"0o") into render_prefixed_int so %#o renders a single leading zero (e.g., 010).
Update any tests asserting "0o..." to expect the C-style "0" prefix and ensure
behavior remains governed by coerce_to_u64_masked(value, type_kind) and the
existing render_prefixed_int call.

In `@src/parser/ast.rs`:
- Around line 168-175: The rustdoc for TypeKind::Meta is out of date: it states
meta directives are “parsed and preserved in the AST but evaluated as silent
no-ops” and omits the `offset` keyword; update the comment for the
TypeKind::Meta variant to reflect the current implemented behavior (i.e., that
these directives are parsed into the AST and will be wired into later evaluator
phases rather than being silent no-ops) and include `offset` in the list of
control-flow keywords (`default`, `clear`, `name`, `use`, `indirect`, `offset`)
so the documentation matches the code. Ensure the same correction is applied to
the duplicate doc block referenced around lines 504–510.

In `@src/parser/codegen.rs`:
- Around line 522-552: Add a new unit test mirroring
test_serialize_meta_name_escapes_injection that targets the MetaType::Use match
arm: create a MagicRule with typ:
TypeKind::Meta(MetaType::Use(malicious.to_string())) using the same malicious
payload, call serialize_magic_rule(&rule, 0), and assert the generated string
does not contain injected Rust tokens (e.g., panic!("pwned-from-meta")) and does
contain the escaped quote sequence; this ensures MetaType::Use is escaped the
same way as MetaType::Name.

In `@src/parser/mod.rs`:
- Around line 145-146: Remove the unnecessary #[allow(dead_code)] on the
name_table module since parse_text_magic_file uses it; open the module
declaration pub(crate) mod name_table and delete the #[allow(dead_code)]
attribute so the module is exported without suppressing dead-code lints, then
run cargo check to confirm no new warnings surface (inspect functions referenced
from parse_text_magic_file to ensure they remain public).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 372d4029-2871-4188-bc87-756cb445414d

📥 Commits

Reviewing files that changed from the base of the PR and between f26676f and 6e4ba1a.

⛔ Files ignored due to path filters (13)
  • .github/workflows/release.yml is excluded by !.github/workflows/release.yml and included by .github/**
  • benches/evaluation_bench.rs is excluded by none and included by none
  • docs/ARCHITECTURE.md is excluded by none and included by none
  • docs/MAGIC_FORMAT.md is excluded by none and included by none
  • docs/solutions/integration-issues/meta-type-subroutine-dispatch-architecture.md is excluded by none and included by none
  • mise.lock is excluded by !**/*.lock and included by none
  • tests/compatibility_tests.rs is excluded by none and included by none
  • tests/directory_loading_tests.rs is excluded by none and included by none
  • tests/meta_types_integration.rs is excluded by none and included by none
  • tests/parser_integration_tests.rs is excluded by none and included by none
  • tests/property_tests.proptest-regressions is excluded by none and included by none
  • tests/property_tests.rs is excluded by none and included by none
  • tests/regex_search_corpus_tests.rs is excluded by none and included by none
📒 Files selected for processing (30)
  • AGENTS.md
  • CHANGELOG.md
  • Cargo.toml
  • GOTCHAS.md
  • ROADMAP.md
  • build.rs
  • docs/src/architecture.md
  • docs/src/ast-structures.md
  • docs/src/evaluator.md
  • docs/src/magic-format.md
  • docs/src/parser.md
  • src/build_helpers.rs
  • src/error.rs
  • src/evaluator/engine/mod.rs
  • src/evaluator/engine/tests.rs
  • src/evaluator/mod.rs
  • src/evaluator/offset/mod.rs
  • src/evaluator/strength.rs
  • src/evaluator/types/mod.rs
  • src/lib.rs
  • src/output/format.rs
  • src/output/mod.rs
  • src/parser/ast.rs
  • src/parser/codegen.rs
  • src/parser/grammar/mod.rs
  • src/parser/grammar/tests/mod.rs
  • src/parser/loader.rs
  • src/parser/mod.rs
  • src/parser/name_table.rs
  • src/parser/types.rs
👮 Files not reviewed due to content moderation or server errors (8)
  • docs/src/magic-format.md
  • src/error.rs
  • src/evaluator/strength.rs
  • CHANGELOG.md
  • src/parser/grammar/tests/mod.rs
  • docs/src/evaluator.md
  • docs/src/architecture.md
  • src/lib.rs

Comment thread docs/src/ast-structures.md
Comment thread docs/src/parser.md
Comment thread src/build_helpers.rs
Comment thread src/evaluator/engine/mod.rs Outdated
Comment thread src/evaluator/engine/mod.rs Outdated
Comment thread src/evaluator/engine/mod.rs Outdated
Comment thread src/output/format.rs
Comment thread src/parser/ast.rs Outdated
Comment thread src/parser/codegen.rs
Comment thread src/parser/mod.rs Outdated
@coderabbitai coderabbitai Bot removed parser Magic file parsing components and grammar evaluator Rule evaluation engine and logic output Result formatting and output generation testing Test infrastructure and coverage labels Apr 22, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/src/api-reference.md`:
- Around line 545-547: Update the docs example that destructures ParsedMagic to
avoid accessing the non-public pub(crate) field name_table: change the example
to either destructure while ignoring the internal field (e.g., let ParsedMagic {
rules, .. } = parse_text_magic_file(...)?;) or bind the return value (let parsed
= parse_text_magic_file(...)?;) and then use parsed.rules; reference ParsedMagic
and parse_text_magic_file in the updated text.
- Line 435: Update the parenthetical note for the `message` field to explicitly
state that unescaped '%' characters are now treated as printf-style format
specifiers (e.g., %d, %s, %x) and that literal percent signs must be escaped as
'%%'; also mention this is a breaking change so callers must escape existing '%'
in rule messages. Use the `message` field description to add the escaping
requirement and the breaking-change warning.
- Around line 233-250: Add a "#### Examples" subsection under the MetaType
documentation mirroring other type sections: include small Rust examples that
import MetaType and TypeKind and use parser::grammar::parse_magic_rule to parse
rules and assert the resulting rule.typ matches
TypeKind::Meta(MetaType::Default), MetaType::Name("..."), MetaType::Use("..."),
and MetaType::Indirect; keep examples concise and consistent with the existing
Operator examples so readers can see how to parse/construct meta-type rules.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 65dfb59d-7bdf-4d58-a1a9-fedac2dd6ed6

📥 Commits

Reviewing files that changed from the base of the PR and between 6e4ba1a and f2647f0.

⛔ Files ignored due to path filters (1)
  • docs/API_REFERENCE.md is excluded by none and included by none
📒 Files selected for processing (1)
  • docs/src/api-reference.md

Comment thread docs/src/api-reference.md Outdated
Comment thread docs/src/api-reference.md Outdated
Comment thread docs/src/api-reference.md Outdated
…format substitution

Critical: SubroutineScope RAII replaces manual save/restore in evaluate_use_rule so last_match_end and base_offset are restored on all error exit paths (RecursionLimitExceeded and Timeout previously leaked corrupted state). format_magic_message no longer mangles non-ASCII template bytes -- replaced byte-by-byte push with plain-run slice flush.

Hygiene: removed stale allow(dead_code) on AnchorScope::context().

Tests: 9 new cases -- 5 for resolve_offset_with_base bias invariants (positive Absolute biased, negative Absolute / FromEnd / Relative / Indirect not biased), 2 for Use subroutine with non-zero use-site (proves bias is active and Relative is not double-biased), 1 continuation-sibling reset regression guard with bytes-consumed first sibling, 1 non-ASCII template format test. 1357/1357 pass.
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Copilot AI review requested due to automatic review settings April 23, 2026 01:16
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements GNU file-compatible meta-type directives (default, clear, name, use, indirect, offset) across the parser/evaluator pipeline and adds printf-style message substitution so magic fixtures like searchbug.magic round-trip byte-for-byte through MagicDatabase.

Changes:

  • Introduces TypeKind::Meta(MetaType) + ParsedMagic { rules, name_table }, including load-time name hoisting into a NameTable.
  • Adds evaluator support for meta-type dispatch (subroutines, indirect re-entry, default/clear sibling state) and subroutine base-offset biasing.
  • Adds extensive tests, docs updates, and workflow/tooling refreshes to validate parity and document breaking changes.

Reviewed changes

Copilot reviewed 45 out of 46 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/regex_search_corpus_tests.rs Updates parser call sites to destructure ParsedMagic and continue corpus checks.
tests/property_tests.rs Expands generators to include MetaType and adds a never-panics property for meta-type evaluation.
tests/property_tests.proptest-regressions Checks in proptest regression seeds related to new meta-type paths.
tests/parser_integration_tests.rs Updates directory/file loaders to ParsedMagic; adds end-to-end name/use round-trip test.
tests/meta_types_integration.rs New end-to-end smoke + parity tests for meta-types and searchbug fixture output.
tests/directory_loading_tests.rs Updates directory loader tests to ParsedMagic return type.
tests/compatibility_tests.rs Adds weaker searchbug partial-match regression guard.
src/parser/types.rs Adds meta keywords to parsing + maps keyword-only meta types in type_keyword_to_kind.
src/parser/name_table.rs New NameTable + extract_name_table hoisting logic for name subroutines.
src/parser/mod.rs Introduces ParsedMagic and updates parse_text_magic_file to hoist names.
src/parser/loader.rs Returns ParsedMagic from loaders and merges NameTable across directory loads.
src/parser/grammar/tests/mod.rs Adds grammar tests for relative offsets and meta-type parsing/roundtrip.
src/parser/grammar/mod.rs Adds &N relative offsets, name/use operand parsing, meta-rule x stripping, and bare-word string fallback.
src/parser/codegen.rs Serializes TypeKind::Meta and adds injection regression tests for Name/Use identifiers.
src/parser/ast.rs Defines MetaType and adds TypeKind::Meta(MetaType) with serde/tests.
src/output/mod.rs Exposes output::format module.
src/lib.rs Threads RuleEnvironment into evaluations and applies printf substitution during message concatenation.
src/evaluator/types/mod.rs Adjusts string semantics to pattern-length reads; explicitly rejects reading TypeKind::Meta as values/patterns.
src/evaluator/strength.rs Assigns strength defaults for meta-types + tests for ordering expectations.
src/evaluator/offset/mod.rs Adds resolve_offset_with_base for subroutine base_offset biasing.
src/evaluator/mod.rs Adds RuleEnvironment + EvaluationContext fields for rule_env and base_offset.
src/error.rs Adds reserved errors for strict-mode handling of missing env/unknown name.
src/build_helpers.rs Updates codegen helper to use parsed.rules.
mise.lock Toolchain lock refresh (Rust/Python artifacts and provenance fields).
docs/src/parser.md Documents meta-types and the ParsedMagic breaking-change return type.
docs/src/magic-format.md Documents meta-type syntax/semantics in the user-facing magic format doc.
docs/src/evaluator.md Documents meta-type dispatch behavior and updates examples for ParsedMagic.
docs/src/ast-structures.md Documents TypeKind::Meta / MetaType in AST reference.
docs/src/architecture.md Updates architecture overview to include NameTable, ParsedMagic, and evaluator environment threading.
docs/src/api-reference.md Adds MetaType docs and calls out breaking changes + printf substitution.
docs/solutions/logic-errors/raii-scope-guards-for-evaluator-context-save-restore.md New design note capturing the RAII scope-guard pattern for context save/restore.
docs/solutions/integration-issues/meta-type-subroutine-dispatch-architecture.md New design note describing the parse→env→dispatch architecture for meta-types.
docs/MAGIC_FORMAT.md Mirrors meta-type documentation in the non-mdbook doc.
docs/ARCHITECTURE.md Updates tree/struct documentation to include name table + root rule slice plumbing.
docs/API_REFERENCE.md Updates API reference for ParsedMagic, meta-types, and message substitution semantics.
build.rs Updates builtin rules generation to use parsed.rules.
benches/evaluation_bench.rs Adds a benchmark focused on name/use subroutine dispatch overhead.
ROADMAP.md Marks meta-types as complete (including offset).
GOTCHAS.md Updates gotchas for meta-types, bare-word string fallback, and printf substitution.
Cargo.toml Updates authors field.
CHANGELOG.md Adds feature + breaking-change notes for ParsedMagic and meta-type support.
AGENTS.md Updates repo guidance to include meta-types + printf substitution as implemented.
.github/workflows/release.yml Updates artifact upload action version usage.

Comment thread src/evaluator/mod.rs Outdated
Comment thread src/parser/grammar/mod.rs
Comment thread src/evaluator/offset/mod.rs Outdated
Comment thread .github/workflows/release.yml
…ect top-level re-entry

Bug 1 (PRRT...PrI): evaluate_use_rule now returns the subroutine's TERMINAL anchor (where the subroutine left last_match_end) instead of the use-site offset. Callers propagate that terminal anchor via context.set_last_match_end, so sibling rules after a successful 'use' resolve &N against the subroutine's final match position -- matching GNU file's inlining semantics. Previously siblings resolved against the use-site, silently breaking continuation chains.

Bug 2 (PRRT...PrL): MetaType::Indirect root re-entry is evaluated with TOP-LEVEL sibling semantics (anchor chains across siblings per GOTCHAS S3.8), not CONTINUATION semantics (anchor resets). Previously the RecursionGuard wrapping the indirect sub-evaluation forced recursion_depth > 0, which triggered the continuation-reset path and broke top-level siblings inside the re-entered database. Fixed via a new one-shot EvaluationContext::indirect_reentry flag: indirect dispatch sets it before evaluate_rules; evaluate_rules consumes it at entry so only the top-level iteration is affected -- children of matched rules inside the re-entry correctly fall back to recursion_depth > 0 for their own continuation semantics.

Deferred (PRRT...PrQ refactor suggestion): split engine tests file into focused modules. Out of scope for a correctness-focused review round; tracked as a follow-on maintenance task.

Test: 1364/1364 pass (searchbug byte-for-byte parity preserved through both semantic fixes); just ci-check green.
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
coderabbitai[bot]
coderabbitai Bot previously approved these changes Apr 23, 2026
…offset docs

The real guard is named SubroutineScope (saves+restores both last_match_end and base_offset). The base_offset doc comment referenced a BaseOffsetScope that never existed -- leftover from design intent in an earlier session. Updated the two doc-comment sites.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Copilot AI review requested due to automatic review settings April 23, 2026 03:33
coderabbitai[bot]
coderabbitai Bot previously approved these changes Apr 23, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements GNU file-compatible meta-type (control-flow) directives and adds printf-style message substitution so libmagic-rs can round-trip real-world magic corpora (e.g., searchbug.magic) with byte-for-byte output parity.

Changes:

  • Adds TypeKind::Meta(MetaType) (default/clear/name/use/indirect/offset) across AST, parser, evaluator dispatch, and strength ordering.
  • Introduces ParsedMagic { rules, name_table } parser return type with load-time name extraction into a NameTable for efficient use dispatch.
  • Adds output::format::format_magic_message and wires message rendering (and backspace handling) through it.

Reviewed changes

Copilot reviewed 45 out of 46 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/regex_search_corpus_tests.rs Updates parser call sites to destructure ParsedMagic.
tests/property_tests.rs Extends generators/property coverage for meta-types and MetaType variants.
tests/property_tests.proptest-regressions Checks in proptest regression seeds for meta-type scenarios.
tests/parser_integration_tests.rs Adjusts loader return type usage; adds name/use round-trip integration test.
tests/meta_types_integration.rs Adds end-to-end integration tests for meta-type directives and searchbug parity.
tests/directory_loading_tests.rs Updates directory loader return type usage (ParsedMagic).
tests/compatibility_tests.rs Adds a weaker searchbug regression assertion for consumers that alter formatting layers.
src/parser/types.rs Recognizes meta keywords and maps keyword-only meta directives to TypeKind::Meta(...).
src/parser/name_table.rs New: extracts top-level name blocks into a NameTable and scrubs nested name rules.
src/parser/mod.rs Defines ParsedMagic and updates parse_text_magic_file to return it.
src/parser/loader.rs Updates directory/file loading to return ParsedMagic and merge name tables.
src/parser/grammar/tests/mod.rs Adds parsing tests for relative offsets and meta-type rules.
src/parser/grammar/mod.rs Adds &N relative offset parsing, name/use identifier parsing, meta-type message x-strip, and bare-word string fallback for string-family types.
src/parser/codegen.rs Serializes TypeKind::Meta(MetaType) and adds injection-escape regression tests for meta identifiers.
src/parser/ast.rs Adds MetaType and TypeKind::Meta(MetaType) with serde + unit tests.
src/output/mod.rs Exposes new output::format module.
src/output/format.rs New: printf-style format substitution for match messages.
src/lib.rs Wires name table + root rules into evaluation context; applies format substitution during message concatenation.
src/evaluator/types/mod.rs Adjusts string read semantics and defines behavior for meta-types in read paths/byte consumption.
src/evaluator/strength.rs Assigns strength defaults for meta-types and adds strength ordering tests.
src/evaluator/offset/mod.rs Adds resolve_offset_with_base for subroutine-relative absolute offsets; tests.
src/evaluator/mod.rs Introduces RuleEnvironment and extends EvaluationContext with env/base_offset/indirect-reentry state.
src/error.rs Adds reserved evaluation error variants for strict-mode unknown-name / indirect-without-env.
src/build_helpers.rs Updates build helper to use ParsedMagic.rules.
build.rs Updates builtin rule generation to use ParsedMagic.rules.
benches/evaluation_bench.rs Adds benchmark for name/use subroutine dispatch overhead.
docs/src/parser.md Documents meta-types and ParsedMagic breaking change.
docs/src/magic-format.md Documents meta-type directives in the magic format guide.
docs/src/evaluator.md Documents evaluator meta-type dispatch and RuleEnvironment threading.
docs/src/ast-structures.md Documents MetaType and TypeKind::Meta.
docs/src/architecture.md Updates architecture docs for name table and ParsedMagic pipeline.
docs/src/api-reference.md Documents MetaType and breaking changes (ParsedMagic + printf substitution).
docs/MAGIC_FORMAT.md Mirrors meta-type documentation in top-level docs.
docs/ARCHITECTURE.md Updates repo architecture doc for name_table + root_rules/name_table fields.
docs/API_REFERENCE.md Updates API reference doc with ParsedMagic and MetaType info.
docs/solutions/logic-errors/raii-scope-guards-for-evaluator-context-save-restore.md Adds a postmortem/solution doc on RAII scope guards for context save/restore.
docs/solutions/integration-issues/meta-type-subroutine-dispatch-architecture.md Adds an architecture pattern doc for name/use extraction + RuleEnvironment threading.
ROADMAP.md Marks meta-types as completed (including offset).
GOTCHAS.md Updates gotchas for meta-type dispatch, ParsedMagic return type, and printf substitution.
CHANGELOG.md Notes meta-type feature landing and parser breaking change.
AGENTS.md Updates contributor-facing implementation status for meta-types and formatting.
Cargo.toml Updates authors metadata.
mise.lock Updates tool lock entries (Rust/tooling versions and provenance fields).
.github/workflows/release.yml Adjusts actions/upload-artifact version pins.

Comment thread src/lib.rs Outdated
Comment thread src/lib.rs Outdated
Comment thread src/output/format.rs Outdated
Comment thread tests/property_tests.rs
…PR review)

Addresses PRRT...PrQ from the round-3 PR review: src/evaluator/engine/tests.rs grew to 3666 lines, well past the 500-600 line project guideline. Split into the three meta-type suites the reviewer called out.

Structure:

- src/evaluator/engine/tests/mod.rs (2777 lines): core evaluator tests (byte/short/long/string/pstring/relative-offset/recursion) plus centralized meta-type test helpers (make_context_with_env, use_rule, use_rule_at, build_name_table, default_rule, clear_rule, byte_eq_rule, indirect_rule, offset_rule) marked pub(super) so submodules can share them.

- src/evaluator/engine/tests/meta_use_tests.rs (405 lines): MetaType::Use dispatch, subroutine name-table resolution, and subroutine base_offset biasing tests.

- src/evaluator/engine/tests/meta_default_clear_indirect_tests.rs (275 lines): MetaType::Default fallback, MetaType::Clear state-reset, and MetaType::Indirect root re-entry tests.

- src/evaluator/engine/tests/meta_offset_tests.rs (247 lines): MetaType::Offset dispatch and evaluate_children_or_warn helper coverage.

No test content changed -- each test's body is byte-identical to the pre-split version. The three new submodules each have their own SPDX/copyright header and module doc. Test count unchanged (1364/1364 pass, 5 pre-existing skipped).

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
@coderabbitai coderabbitai Bot removed parser Magic file parsing components and grammar evaluator Rule evaluation engine and logic output Result formatting and output generation compatibility libmagic compatibility and migration testing Test infrastructure and coverage labels Apr 24, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 9

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/src/architecture.md`:
- Around line 175-176: The documentation list in docs/src/architecture.md omits
MetaType::Offset; update the inline-dispatch bullet to include
`MetaType::Offset` alongside `MetaType::Default`, `MetaType::Clear`,
`MetaType::Use`, and `MetaType::Indirect` (i.e., mention that `evaluate_rules`
performs inline dispatch for `MetaType::Default`, `MetaType::Clear`,
`MetaType::Use`, `MetaType::Indirect`, and `MetaType::Offset`) so it matches the
evaluator behavior described by `mod.rs` (functions `evaluate_single_rule`,
`evaluate_rules`, `evaluate_rules_with_config`) and the implementation in the
`evaluate_rules` loop.

In `@docs/src/evaluator.md`:
- Around line 529-537: The examples call evaluate_rules(&parsed.rules, &buffer)
but the real API requires passing an &mut EvaluationContext; update each example
(including the blocks using parsed from parse_text_magic_file) to construct a
mutable EvaluationContext (e.g., new or default) and call evaluate_rules(&mut
ctx, &parsed.rules, &buffer) instead; ensure you reference the same
parsed.name_table/MagicDatabase handling as before and apply this change
consistently to all listed example blocks (lines ~529, 555, 575, 598, 609).
- Line 631: Update the checklist entry to include the missing meta-type
`offset`: edit the line currently listing meta-type directives (which mentions
`default`, `clear`, `name`/`use`, and `indirect`) and add `offset` (e.g.,
include `offset` in backticks alongside the other directives) so the implemented
meta-type set in the docs matches the PR.
- Around line 372-373: The documentation wrongly states that MetaType::Offset
performs printf-style substitution during evaluator dispatch; update the
description for MetaType::Offset to say evaluate_rules records the resolved
offset as a raw Value (Value::Uint(resolved_offset)) and stores the raw message,
and that printf-style substitution (e.g., replacing %lld, %d) is performed later
during output/message assembly (via the formatting stage such as
format_magic_message) rather than inside the evaluator; reference
MetaType::Offset, evaluate_rules, and format_magic_message in the updated text.

In `@src/evaluator/engine/tests/mod.rs`:
- Around line 2634-2779: Extract the shared test helpers (make_context_with_env,
use_rule, use_rule_at, build_name_table, default_rule, clear_rule, byte_eq_rule,
indirect_rule, offset_rule) out of the large tests mod into a new dedicated test
helper module (e.g. tests/helpers/meta.rs): move the function definitions into
that file with the same signatures and keep their pub(super) visibility, then in
the original mod.rs replace the moved code with a module declaration
(#[cfg(test)] mod helpers; or pub(super) mod helpers;) and re-export or
reference the helpers as needed (e.g. use self::helpers::* or pub(super) use
helpers::*;) so existing test submodules (meta_default_clear_indirect_tests,
meta_offset_tests, meta_use_tests) continue to compile without other changes.

In `@src/output/format.rs`:
- Around line 269-313: The Str and Char conversions currently either drop
width/alignment or use pad_numeric (which applies zero-padding); implement a
text-padding helper (e.g., pad_text) that takes the rendered string, the
existing FormatSpec, and pads with spaces only while honoring width and
left-alignment and ignoring the '0' flag, then replace Conv::Str to call
render_string(value) and feed that to pad_text (instead of returning raw string)
and replace Conv::Char to convert the byte to a single-character String
(char::from(byte).to_string()) and feed that to pad_text instead of pad_numeric;
keep numeric conversions using pad_numeric unchanged and add tests for "%5s",
"%-5s", and "%03c" to assert space-padding behavior.

In `@src/parser/grammar/tests/mod.rs`:
- Around line 2505-2679: The four tests (test_parse_magic_rule_meta_types,
test_parse_magic_rule_meta_name_use_reject_malformed_identifiers,
test_parse_text_magic_file_meta_roundtrip,
test_parse_text_magic_file_searchbug_fixture) should be extracted from the huge
tests file into a new focused test module for meta-type parsing; create a new
sibling test module (e.g., meta_types tests module), move these test functions
there, ensure the new module imports crate::parser::parse_text_magic_file and
parse_magic_rule and any enums (TypeKind, MetaType) used by the assertions, and
update the parent tests module to reference the new module (pub mod ...) so the
tests run; keep test logic unchanged but remove them from the original file.

In `@src/parser/name_table.rs`:
- Line 112: The warning printed by scrub_nested_names uses the passed parent
level for messaging, but the recursive calls currently pass rule.level + 1 which
misreports the actual parent level; update the two recursive call sites that
invoke scrub_nested_names to pass rule.level (not rule.level + 1) so the warning
text reflects the true parent level (e.g., change
scrub_nested_names(rule.children, rule.level + 1) to
scrub_nested_names(rule.children, rule.level) for both occurrences).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f3b40de6-4239-4197-9f34-1c303e9bc165

📥 Commits

Reviewing files that changed from the base of the PR and between 840ac7d and e9d20a0.

⛔ Files ignored due to path filters (3)
  • docs/ARCHITECTURE.md is excluded by none and included by none
  • docs/MAGIC_FORMAT.md is excluded by none and included by none
  • tests/compatibility_tests.rs is excluded by none and included by none
📒 Files selected for processing (14)
  • ROADMAP.md
  • docs/src/api-reference.md
  • docs/src/architecture.md
  • docs/src/evaluator.md
  • src/evaluator/engine/mod.rs
  • src/evaluator/engine/tests/meta_default_clear_indirect_tests.rs
  • src/evaluator/engine/tests/meta_offset_tests.rs
  • src/evaluator/engine/tests/meta_use_tests.rs
  • src/evaluator/engine/tests/mod.rs
  • src/evaluator/mod.rs
  • src/output/format.rs
  • src/parser/ast.rs
  • src/parser/grammar/tests/mod.rs
  • src/parser/name_table.rs

Comment thread docs/src/architecture.md Outdated
Comment thread docs/src/evaluator.md Outdated
Comment thread docs/src/evaluator.md Outdated
Comment thread docs/src/evaluator.md Outdated
Comment thread src/evaluator/engine/mod.rs
Comment thread src/evaluator/engine/tests/mod.rs Outdated
Comment thread src/output/format.rs
Comment thread src/parser/grammar/tests/mod.rs Outdated
Comment thread src/parser/name_table.rs Outdated
Sixteen unresolved review threads across correctness, performance,
documentation, and test organization:

- Evaluator: AnchorScope now saves/restores both last_match_end AND
  base_offset so `indirect` inside a `use` subroutine re-enters root
  rules with base_offset=0 (RFn1).
- Offset resolution: resolve_offset_with_base switches to checked
  arithmetic and maps overflow to InvalidOffset rather than saturating
  into a spurious BufferOverrun (RU0).
- MagicDatabase: drop the redundant rules Vec field; root_rules is the
  sole storage built via Arc::from(rules.into_boxed_slice()) to avoid
  Arc::from-slice clones (Vdy, VeD).
- Format: Conv::Char uses pad_non_numeric (space-only) so %03c follows
  POSIX instead of zero-padding; Conv::Char doc now says full 0x00-0xff
  range via Latin-1 (VeM, RFn6).
- Docs: MetaType::Offset added to architecture.md inline-dispatch list;
  evaluator.md dispatch wording separated from output formatting;
  evaluate_rules usage examples updated to pass &mut EvaluationContext;
  checklist bullet includes offset (RFnh, RFnl, RFnx, RFnz).
- Grammar: meta-type directives reject attached operators (`default&0xf`
  no longer silently drops the mask) with a regression test (RUs).
- Release workflow: dist-workspace.toml pin for actions/upload-artifact
  bumped from v7.0.0 to v7.0.1; release.yml regenerated via `dist
  generate` (RU7).
- Property test: timeout doc comment rewritten to reflect that the
  test constructs an env-less context, so MetaType::Indirect/Use take
  the silent no-op path rather than recursing (VeW).
- Name table: scrub_nested_names is now called with the actual parent
  level rather than parent.level + 1 (RFn-).
- Test layout: engine test helpers moved to
  src/evaluator/engine/tests/helpers/meta.rs (RFn3); grammar meta-type
  parser tests moved to src/parser/grammar/tests/meta_types.rs (RFn8).

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Copilot AI review requested due to automatic review settings April 24, 2026 01:21
@coderabbitai coderabbitai Bot added parser Magic file parsing components and grammar evaluator Rule evaluation engine and logic output Result formatting and output generation testing Test infrastructure and coverage labels Apr 24, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements GNU file-compatible evaluation for libmagic meta-type directives (default, clear, name, use, indirect, offset) and adds printf-style message substitution, wiring parse-time name-table extraction through MagicDatabase evaluation so real-world corpora (e.g. searchbug.magic) round-trip end-to-end.

Changes:

  • Introduces TypeKind::Meta(MetaType) + parse/load plumbing (ParsedMagic { rules, name_table }) and NameTable extraction/merging for name/use.
  • Wires evaluator dispatch for all meta-types (including indirect re-entry and subroutine base-offset) and applies printf-style formatting at output concatenation time.
  • Adds/updates unit + integration tests (parser, evaluator engine, corpus fixtures) and refreshes docs to describe the new architecture and breaking changes.

Reviewed changes

Copilot reviewed 52 out of 53 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/regex_search_corpus_tests.rs Updates tests to destructure ParsedMagic from parser API.
tests/property_tests.rs Extends generators + adds property test coverage for TypeKind::Meta(...) evaluation.
tests/property_tests.proptest-regressions Adds saved proptest regression seeds for meta-type cases.
tests/parser_integration_tests.rs Updates parser integration tests for ParsedMagic + adds name/use round-trip test.
tests/meta_types_integration.rs Adds end-to-end smoke/integration tests for all meta-types + searchbug parity.
tests/directory_loading_tests.rs Updates directory-loading tests to destructure ParsedMagic.
tests/compatibility_tests.rs Adds a weaker “partial match” regression test around the searchbug fixture.
src/tests.rs Adjusts built-in rules test to use root_rules storage.
src/parser/types.rs Teaches keyword parsing/conversion about meta-types, and name/use suffix handling.
src/parser/name_table.rs Adds load-time name extraction into a NameTable plus merge/sort utilities.
src/parser/mod.rs Changes parse_text_magic_file to return ParsedMagic { rules, name_table }.
src/parser/loader.rs Updates directory/file loaders to return ParsedMagic and merge name tables.
src/parser/grammar/tests/mod.rs Adds/extends grammar tests (notably relative &N offset parsing).
src/parser/grammar/tests/meta_types.rs Adds dedicated grammar tests for meta-type parsing + fixture parse check.
src/parser/grammar/mod.rs Adds name/use identifier parsing, optional x stripping for meta messages, bare-string fallback, and relative offsets.
src/parser/codegen.rs Serializes TypeKind::Meta(...) and adds identifier escaping regression tests.
src/parser/ast.rs Introduces MetaType enum and TypeKind::Meta(MetaType) with docs + tests.
src/output/mod.rs Exposes new output::format module.
src/lib.rs Stores root_rules + name_table in MagicDatabase, threads RuleEnvironment, and applies printf substitution during message concatenation.
src/evaluator/types/mod.rs Adjusts string-read semantics for pattern-length reads; treats meta-types as non-readable/zero-consume.
src/evaluator/strength.rs Assigns strength contributions for meta-types + adds ordering tests.
src/evaluator/offset/mod.rs Adds resolve_offset_with_base and tests for subroutine base-offset bias + overflow behavior.
src/evaluator/mod.rs Adds RuleEnvironment and extends EvaluationContext with env/base_offset/indirect-reentry plumbing.
src/evaluator/engine/tests/mod.rs Refactors test helpers into submodules and wires new meta-type test modules.
src/evaluator/engine/tests/meta_offset_tests.rs Adds unit tests for MetaType::Offset dispatch + child evaluation behavior.
src/evaluator/engine/tests/meta_default_clear_indirect_tests.rs Adds unit tests for default/clear/indirect semantics + recursion limit behavior.
src/evaluator/engine/tests/helpers/mod.rs Adds shared helper module entry point for engine tests.
src/evaluator/engine/tests/helpers/meta.rs Adds builders/helpers for meta-type rule construction and contexts with env.
src/error.rs Adds reserved evaluation errors for strict-mode unknown-name/indirect-without-env cases.
src/build_helpers.rs Updates build helper to use ParsedMagic (rules-only for codegen).
docs/src/parser.md Documents meta-types and ParsedMagic breaking change.
docs/src/magic-format.md Documents meta-types in the magic(5)-format guide.
docs/src/evaluator.md Documents meta-type dispatch and updates examples for EvaluationContext usage.
docs/src/ast-structures.md Documents MetaType / TypeKind::Meta AST structures and semantics.
docs/src/architecture.md Updates architecture docs to include NameTable and ParsedMagic shape.
docs/src/api-reference.md Updates API reference to include meta-types and printf substitution behavior.
docs/solutions/logic-errors/raii-scope-guards-for-evaluator-context-save-restore.md Adds a postmortem/solution writeup for RAII scope guards and formatting pitfalls.
docs/solutions/integration-issues/meta-type-subroutine-dispatch-architecture.md Adds architecture writeup for name-table extraction + env-threading pattern.
docs/MAGIC_FORMAT.md Mirrors meta-type documentation in the top-level magic format doc.
docs/ARCHITECTURE.md Updates architecture doc for new database internals (needs correction per review).
docs/API_REFERENCE.md Updates API reference (needs correction per review).
dist-workspace.toml Bumps actions/upload-artifact pinned version.
build.rs Updates build-time parser call sites to ParsedMagic.
benches/evaluation_bench.rs Adds benchmark for name/use subroutine dispatch overhead.
ROADMAP.md Marks meta-types milestone complete (includes offset).
GOTCHAS.md Adds/updates gotchas for meta-types, bare string fallback, and format substitution.
Cargo.toml Updates crate authors list.
CHANGELOG.md Documents meta-types feature and ParsedMagic breaking change.
AGENTS.md Updates repo guidance with new meta-type + printf substitution support and module structure.

Comment thread docs/ARCHITECTURE.md
Comment on lines +237 to +239
rules: Vec<MagicRule>, // Parsed magic rules (top-level, strength-sorted)
name_table: Arc<NameTable>, // `name`/`use` subroutine dispatch table (Arc for cheap clone across evaluations)
root_rules: Arc<[MagicRule]>, // Shared immutable slice of top-level rules for `indirect` re-entry
Comment thread docs/API_REFERENCE.md

| Field (Internal) | Type | Description |
| ---------------- | ------------------ | ----------------------------------------------------------------- |
| `rules` | `Vec<MagicRule>` | Top-level magic rules |
Comment thread src/parser/grammar/mod.rs
Comment on lines +355 to +358
let (input, _) = space1(input)?;
let (after_id, id) =
take_while(|c: char| c.is_alphanumeric() || c == '_' || c == '-').parse(input)?;
if id.is_empty() {
Comment thread docs/src/architecture.md
- `numbers.rs`: Numeric type parsing (decimal/hex, signed/unsigned)
- `value.rs`: Value literal parsing (strings, floats, hex bytes)
- `mod.rs`: Parser interface, format detection, and hierarchical rule building (✅ Complete)
- `name_table.rs`: Load-time extraction of `name <id>` subroutine blocks into a `HashMap<String, Vec<MagicRule>>` (the `NameTable` type).
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/lib.rs (1)

248-256: ⚠️ Potential issue | 🟠 Major

Hoist builtin name blocks before storing the database.

load_from_file_with_config() now builds both root_rules and name_table, but the builtin path still hardcodes NameTable::empty(). If the generated builtin corpus contains any name/use directives, builtin mode will silently degrade: MetaType::Name stays in root_rules as a no-op and every use lookup misses.

Suggested fix
-        let mut rules = crate::builtin_rules::get_builtin_rules();
+        let rules = crate::builtin_rules::get_builtin_rules();
+        let (mut rules, mut name_table) = crate::parser::name_table::extract_name_table(rules);
         crate::evaluator::strength::sort_rules_by_strength_recursive(&mut rules);
+        name_table.sort_subroutines(|rules| {
+            crate::evaluator::strength::sort_rules_by_strength_recursive(rules);
+        });
         let root_rules: std::sync::Arc<[MagicRule]> =
             std::sync::Arc::from(rules.into_boxed_slice());
         Ok(Self {
-            name_table: std::sync::Arc::new(crate::parser::name_table::NameTable::empty()),
+            name_table: std::sync::Arc::new(name_table),
             root_rules,
             config,
             source_path: None,
             mime_mapper: mime::MimeMapper::new(),
         })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib.rs` around lines 248 - 256, The builtin constructor currently sets
name_table to NameTable::empty(), so any builtin `name`/`use` directives are
ignored; update with_builtin_rules_and_config to reuse the same
hoisting/name-table construction used in load_from_file_with_config: after
computing root_rules (the MagicRule array) run the hoist/name-table extraction
logic (the code path that builds the parser::name_table and hoists `name`
blocks) and set self.name_table to that resulting NameTable Arc instead of
NameTable::empty(), ensuring root_rules reflect any hoisted MetaType::Name
changes and lookups via use succeed.
♻️ Duplicate comments (1)
src/output/format.rs (1)

267-270: ⚠️ Potential issue | 🟠 Major

Apply the non-numeric padding path to %s as well.

%c now honors width/alignment via pad_non_numeric, but %s still returns the raw rendered string, so %5s / %-5s ignore field width entirely. That leaves the formatter only half-fixed for text conversions.

Suggested fix
-        Conv::Str => Some(render_string(value)),
+        Conv::Str => Some(pad_non_numeric(&render_string(value), spec)),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/output/format.rs` around lines 267 - 270, The Conv::Str branch in render
currently returns the raw rendered string (render_string(value)) so
width/alignment is ignored; update the Conv::Str match arm to pass the rendered
string through pad_non_numeric (the same way Conv::Char/%c does) using the
provided type_kind so formats like %5s and %-5s honor padding and alignment;
locate the render function and replace the Conv::Str return with a call that
applies pad_non_numeric(render_string(value), type_kind) (ensuring correct
ownership/borrowing to produce an Option<String>).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dist-workspace.toml`:
- Line 56: The workflow pins "actions/upload-artifact" to the mutable tag
"v7.0.1", which is insecure; locate the entry with the key
"actions/upload-artifact" and replace the tag value "v7.0.1" with the
corresponding immutable commit SHA (the full 40-char commit hash resolved from
the GitHub Actions marketplace or the action's repo) so the TOML contains the
exact commit SHA instead of the tag for reproducible CI.

In `@docs/src/evaluator.md`:
- Around line 374-393: The sequence diagram is incorrect: meta directives "Use"
and "Indirect" are handled inline in evaluate_rules, not routed through
evaluate_single_rule_with_anchor; update the diagram so that the arrows for Use
and Indirect go from ER (evaluate_rules) directly to the relevant components
(e.g., NameTable lookup for Use and cloning/using root_rules for Indirect) and
back to ER, remove or change any arrows that currently route through ESR
(evaluate_single_rule_with_anchor), and keep participants names (evaluate_rules,
evaluate_single_rule_with_anchor, NameTable, root_rules) so readers can trace
the correct inlined control flow.
- Line 520: The docs import is wrong: update the use statements so
evaluate_rules is imported from the evaluator module (e.g. use
libmagic_rs::evaluator::{evaluate_rules, EvaluationConfig, EvaluationContext};)
instead of from the crate root; change every occurrence that currently uses
libmagic_rs::{evaluate_rules, ...} (including the spots mentioned for lines 520,
548, 569, 590) to reference libmagic_rs::evaluator::evaluate_rules (and bring
EvaluationConfig and EvaluationContext into scope from the same evaluator
module) so the examples compile.

In `@src/evaluator/engine/mod.rs`:
- Around line 280-281: Update the helper contract comment that currently reads
"Returns `Ok((Some(absolute_offset), matches))` ..." to accurately state that
the function returns the subroutine's terminal anchor (e.g., "Returns
`Ok((Some(terminal_anchor), matches))`") rather than the resolved use-site
offset; mention that the returned terminal anchor is what callers use for
subsequent `&N` chaining and keep the `Ok((None, vec![]))` case unchanged,
ensuring the docstring for the helper in mod.rs reflects this behavior and
follows the project's documentation guidance for complex algorithm comments.

In `@src/evaluator/engine/tests/helpers/meta.rs`:
- Around line 69-138: Several tests repeat the same MagicRule struct literal;
extract the shared fields into a single constructor helper (e.g., a private fn
like build_magic_rule(offset: OffsetSpec, typ: TypeKind, op: Operator, value:
Value, message: String, children: Vec<MagicRule>) -> MagicRule) and have
default_rule, clear_rule, byte_eq_rule, indirect_rule, and offset_rule call that
helper with only the differing arguments (for byte_eq_rule provide
signed/type/value variations, for others pass Meta types and op). Update
references to MagicRule, OffsetSpec, TypeKind, Operator, Value, message and
children in the helper so future changes to the common fields need only one
update.

In `@src/parser/grammar/mod.rs`:
- Around line 198-208: The relative-offset branch currently accepts "&+-N"
because after stripping the '+' it still calls parse_number on a '-' prefix;
modify the '+' branch in the parser so that when rest.strip_prefix('+') returns
after_plus you first check that after_plus does not start with '-' and if it
does return a parse error (rather than calling parse_number), otherwise proceed
to call parse_number(after_plus); this change touches the branch that uses
rest.strip_prefix('+'), parse_number, multispace0 and returns
OffsetSpec::Relative.
- Around line 843-863: The branch for string-family types currently calls
parse_value first so bare tokens like "123" or "0x7f" get parsed as Uint/Bytes
instead of literal strings; change the logic in the is_string_family_type branch
to prefer parse_bare_string_value for unquoted single-token inputs (i.e., try
parse_bare_string_value(input) first and return that when it succeeds), falling
back to parse_value only if the bare-string parse fails, while keeping the
Operator::AnyValue path unchanged; update tests for
parse_value/parse_bare_string_value to include numeric-looking and hex-looking
bare tokens for TypeKind::String/PString/Regex/Search to ensure the rule matches
the literal text.

---

Outside diff comments:
In `@src/lib.rs`:
- Around line 248-256: The builtin constructor currently sets name_table to
NameTable::empty(), so any builtin `name`/`use` directives are ignored; update
with_builtin_rules_and_config to reuse the same hoisting/name-table construction
used in load_from_file_with_config: after computing root_rules (the MagicRule
array) run the hoist/name-table extraction logic (the code path that builds the
parser::name_table and hoists `name` blocks) and set self.name_table to that
resulting NameTable Arc instead of NameTable::empty(), ensuring root_rules
reflect any hoisted MetaType::Name changes and lookups via use succeed.

---

Duplicate comments:
In `@src/output/format.rs`:
- Around line 267-270: The Conv::Str branch in render currently returns the raw
rendered string (render_string(value)) so width/alignment is ignored; update the
Conv::Str match arm to pass the rendered string through pad_non_numeric (the
same way Conv::Char/%c does) using the provided type_kind so formats like %5s
and %-5s honor padding and alignment; locate the render function and replace the
Conv::Str return with a call that applies pad_non_numeric(render_string(value),
type_kind) (ensuring correct ownership/borrowing to produce an Option<String>).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 24bb0714-2ff5-4601-862c-6abf0150e97e

📥 Commits

Reviewing files that changed from the base of the PR and between e9d20a0 and 4c576c7.

⛔ Files ignored due to path filters (1)
  • tests/property_tests.rs is excluded by none and included by none
📒 Files selected for processing (16)
  • dist-workspace.toml
  • docs/src/architecture.md
  • docs/src/evaluator.md
  • src/evaluator/engine/mod.rs
  • src/evaluator/engine/tests/helpers/meta.rs
  • src/evaluator/engine/tests/helpers/mod.rs
  • src/evaluator/engine/tests/meta_default_clear_indirect_tests.rs
  • src/evaluator/engine/tests/mod.rs
  • src/evaluator/offset/mod.rs
  • src/lib.rs
  • src/output/format.rs
  • src/parser/grammar/mod.rs
  • src/parser/grammar/tests/meta_types.rs
  • src/parser/grammar/tests/mod.rs
  • src/parser/name_table.rs
  • src/tests.rs

Comment thread dist-workspace.toml
"actions/download-artifact" = "3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c"
"actions/attest-build-provenance" = "a2bbfa25375fe432b6a289bc6b6cd05ecd0c4c32"
"actions/upload-artifact" = "v7.0.0"
"actions/upload-artifact" = "v7.0.1"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

repo="actions/upload-artifact"
tag="v7.0.1"

ref_json="$(curl -fsSL "https://api.github.com/repos/${repo}/git/ref/tags/${tag}")"
obj_type="$(jq -r '.object.type' <<<"$ref_json")"
obj_sha="$(jq -r '.object.sha' <<<"$ref_json")"

if [[ "$obj_type" == "tag" ]]; then
  commit_sha="$(curl -fsSL "https://api.github.com/repos/${repo}/git/tags/${obj_sha}" | jq -r '.object.sha')"
else
  commit_sha="$obj_sha"
fi

echo "Resolved commit SHA for ${repo}@${tag}: ${commit_sha}"

Repository: EvilBit-Labs/libmagic-rs

Length of output: 164


Pin actions/upload-artifact to an immutable commit SHA, not a tag.

Using v7.0.1 leaves CI exposed to tag retargeting. Pin to the resolved commit SHA for reproducible and tamper-resistant builds.

Suggested change
-"actions/upload-artifact"         = "v7.0.1"
+"actions/upload-artifact"         = "043fb46d1a93c77aae656e7c1c64a875d1fc6a0a"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dist-workspace.toml` at line 56, The workflow pins "actions/upload-artifact"
to the mutable tag "v7.0.1", which is insecure; locate the entry with the key
"actions/upload-artifact" and replace the tag value "v7.0.1" with the
corresponding immutable commit SHA (the full 40-char commit hash resolved from
the GitHub Actions marketplace or the action's repo) so the TOML contains the
exact commit SHA instead of the tag for reproducible CI.

Comment thread docs/src/evaluator.md
Comment on lines +374 to +393
```mermaid
sequenceDiagram
participant ER as evaluate_rules
participant ESR as evaluate_single_rule_with_anchor
participant NT as NameTable
participant RR as root_rules

ER->>ESR: rule (Use "part2")
ESR->>NT: lookup("part2")
NT-->>ESR: Vec<MagicRule> (subroutine)
ESR->>ER: evaluate_rules(subroutine, buffer, ctx)
ER-->>ESR: subroutine matches
ESR-->>ER: Ok(Some(offset, value)) + merged matches

ER->>ESR: rule (Indirect)
ESR->>RR: clone root_rules
ESR->>ER: evaluate_rules(root_rules, sub_buffer, ctx)
ER-->>ESR: inner matches
ESR-->>ER: Ok(Some(offset, value)) + merged matches
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Sequence diagram still shows the old dispatcher.

The prose above correctly says meta directives are handled inline in evaluate_rules, but this diagram routes Use and Indirect through evaluate_single_rule_with_anchor. That will send readers to the wrong place when tracing the control flow.

As per coding guidelines: "docs/**: Review documentation quality, completeness, and accuracy. Ensure comprehensive API documentation, usage examples, and migration guides."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/src/evaluator.md` around lines 374 - 393, The sequence diagram is
incorrect: meta directives "Use" and "Indirect" are handled inline in
evaluate_rules, not routed through evaluate_single_rule_with_anchor; update the
diagram so that the arrows for Use and Indirect go from ER (evaluate_rules)
directly to the relevant components (e.g., NameTable lookup for Use and
cloning/using root_rules for Indirect) and back to ER, remove or change any
arrows that currently route through ESR (evaluate_single_rule_with_anchor), and
keep participants names (evaluate_rules, evaluate_single_rule_with_anchor,
NameTable, root_rules) so readers can trace the correct inlined control flow.

Comment thread docs/src/evaluator.md

```rust
use libmagic_rs::{evaluate_rules, EvaluationConfig};
use libmagic_rs::{evaluate_rules, EvaluationConfig, EvaluationContext};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Import evaluate_rules from libmagic_rs::evaluator.

These snippets currently use use libmagic_rs::{evaluate_rules, ...};, but the crate root does not re-export evaluate_rules. As written, the examples will not compile.

Suggested fix pattern
-use libmagic_rs::{evaluate_rules, EvaluationConfig, EvaluationContext};
+use libmagic_rs::{EvaluationConfig, EvaluationContext};
+use libmagic_rs::evaluator::evaluate_rules;

As per coding guidelines: "docs/**: Review documentation quality, completeness, and accuracy. Ensure comprehensive API documentation, usage examples, and migration guides."

Also applies to: 548-548, 569-569, 590-590

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/src/evaluator.md` at line 520, The docs import is wrong: update the use
statements so evaluate_rules is imported from the evaluator module (e.g. use
libmagic_rs::evaluator::{evaluate_rules, EvaluationConfig, EvaluationContext};)
instead of from the crate root; change every occurrence that currently uses
libmagic_rs::{evaluate_rules, ...} (including the spots mentioned for lines 520,
548, 569, 590) to reference libmagic_rs::evaluator::evaluate_rules (and bring
EvaluationConfig and EvaluationContext into scope from the same evaluator
module) so the examples compile.

Comment on lines +280 to +281
/// Returns `Ok((Some(absolute_offset), matches))` on a successful resolution
/// (even if the subroutine produced no matches), or `Ok((None, vec![]))`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Update the helper contract to say it returns the terminal anchor.

The implementation below no longer returns the resolved use-site offset; it returns the subroutine's terminal anchor, and the caller relies on that for subsequent &N chaining. The stale Some(absolute_offset) wording is now misleading.

As per coding guidelines: "src/**/*.rs: Document magic file syntax support, libmagic compatibility, design decisions, and trade-offs in code comments for complex algorithms."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/evaluator/engine/mod.rs` around lines 280 - 281, Update the helper
contract comment that currently reads "Returns `Ok((Some(absolute_offset),
matches))` ..." to accurately state that the function returns the subroutine's
terminal anchor (e.g., "Returns `Ok((Some(terminal_anchor), matches))`") rather
than the resolved use-site offset; mention that the returned terminal anchor is
what callers use for subsequent `&N` chaining and keep the `Ok((None, vec![]))`
case unchanged, ensuring the docstring for the helper in mod.rs reflects this
behavior and follows the project's documentation guidance for complex algorithm
comments.

Comment on lines +69 to +138
pub fn default_rule(message: &str, children: Vec<MagicRule>) -> MagicRule {
MagicRule {
offset: OffsetSpec::Absolute(0),
typ: TypeKind::Meta(MetaType::Default),
op: Operator::Equal,
value: Value::Uint(0),
message: message.to_string(),
children,
level: 0,
strength_modifier: None,
}
}

/// Build a `Clear` rule. Carries no message in the magic file syntax, but the
/// AST requires a message field.
pub fn clear_rule() -> MagicRule {
MagicRule {
offset: OffsetSpec::Absolute(0),
typ: TypeKind::Meta(MetaType::Clear),
op: Operator::Equal,
value: Value::Uint(0),
message: String::new(),
children: vec![],
level: 0,
strength_modifier: None,
}
}

/// Build a single byte-equality rule at `offset` for `value`.
pub fn byte_eq_rule(offset: i64, value: u64, message: &str) -> MagicRule {
MagicRule {
offset: OffsetSpec::Absolute(offset),
typ: TypeKind::Byte { signed: false },
op: Operator::Equal,
value: Value::Uint(value),
message: message.to_string(),
children: vec![],
level: 0,
strength_modifier: None,
}
}

/// Build an `Indirect` rule at `offset` with optional children.
pub fn indirect_rule(offset: i64, message: &str, children: Vec<MagicRule>) -> MagicRule {
MagicRule {
offset: OffsetSpec::Absolute(offset),
typ: TypeKind::Meta(MetaType::Indirect),
op: Operator::Equal,
value: Value::Uint(0),
message: message.to_string(),
children,
level: 0,
strength_modifier: None,
}
}

/// Build an `Offset` rule at `offset` with an `x` (`AnyValue`) operator and
/// the given message. Mirrors `default_rule`/`indirect_rule` helpers.
pub fn offset_rule(offset: i64, message: &str, children: Vec<MagicRule>) -> MagicRule {
MagicRule {
offset: OffsetSpec::Absolute(offset),
typ: TypeKind::Meta(MetaType::Offset),
op: Operator::AnyValue,
value: Value::Uint(0),
message: message.to_string(),
children,
level: 0,
strength_modifier: None,
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider collapsing repeated MagicRule builder boilerplate.

The repeated literals are consistent today, but centralizing shared fields would reduce maintenance risk if MagicRule changes again.

Refactor sketch
+fn build_meta_rule(
+    offset: i64,
+    meta: MetaType,
+    op: Operator,
+    message: &str,
+    children: Vec<MagicRule>,
+) -> MagicRule {
+    MagicRule {
+        offset: OffsetSpec::Absolute(offset),
+        typ: TypeKind::Meta(meta),
+        op,
+        value: Value::Uint(0),
+        message: message.to_string(),
+        children,
+        level: 0,
+        strength_modifier: None,
+    }
+}
+
 pub fn default_rule(message: &str, children: Vec<MagicRule>) -> MagicRule {
-    MagicRule {
-        offset: OffsetSpec::Absolute(0),
-        typ: TypeKind::Meta(MetaType::Default),
-        op: Operator::Equal,
-        value: Value::Uint(0),
-        message: message.to_string(),
-        children,
-        level: 0,
-        strength_modifier: None,
-    }
+    build_meta_rule(0, MetaType::Default, Operator::Equal, message, children)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/evaluator/engine/tests/helpers/meta.rs` around lines 69 - 138, Several
tests repeat the same MagicRule struct literal; extract the shared fields into a
single constructor helper (e.g., a private fn like build_magic_rule(offset:
OffsetSpec, typ: TypeKind, op: Operator, value: Value, message: String,
children: Vec<MagicRule>) -> MagicRule) and have default_rule, clear_rule,
byte_eq_rule, indirect_rule, and offset_rule call that helper with only the
differing arguments (for byte_eq_rule provide signed/type/value variations, for
others pass Meta types and op). Update references to MagicRule, OffsetSpec,
TypeKind, Operator, Value, message and children in the helper so future changes
to the common fields need only one update.

Comment thread src/parser/grammar/mod.rs
Comment on lines +198 to +208
} else if let Some(rest) = input.strip_prefix('&') {
// Relative offset: `&N`, `&+N`, or `&-N`. `parse_number` handles the
// bare and `-`-prefixed cases natively; `+` is consumed manually
// (see the indirect-offset adjustment parser for the same pattern).
let (rest, value) = if let Some(after_plus) = rest.strip_prefix('+') {
parse_number(after_plus)?
} else {
parse_number(rest)?
};
let (rest, _) = multispace0(rest)?;
Ok((rest, OffsetSpec::Relative(value)))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Reject &+-N in the relative-offset parser.

After the explicit + is stripped, this branch still hands the remainder to parse_number, so &+-4 currently parses as OffsetSpec::Relative(-4) instead of failing. That lets malformed magic syntax through silently.

Proposed fix
     } else if let Some(rest) = input.strip_prefix('&') {
         // Relative offset: `&N`, `&+N`, or `&-N`. `parse_number` handles the
         // bare and `-`-prefixed cases natively; `+` is consumed manually
         // (see the indirect-offset adjustment parser for the same pattern).
         let (rest, value) = if let Some(after_plus) = rest.strip_prefix('+') {
+            if after_plus.starts_with(['+', '-']) {
+                return Err(nom::Err::Error(NomError::new(
+                    after_plus,
+                    nom::error::ErrorKind::Verify,
+                )));
+            }
             parse_number(after_plus)?
         } else {
             parse_number(rest)?
         };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/parser/grammar/mod.rs` around lines 198 - 208, The relative-offset branch
currently accepts "&+-N" because after stripping the '+' it still calls
parse_number on a '-' prefix; modify the '+' branch in the parser so that when
rest.strip_prefix('+') returns after_plus you first check that after_plus does
not start with '-' and if it does return a parse error (rather than calling
parse_number), otherwise proceed to call parse_number(after_plus); this change
touches the branch that uses rest.strip_prefix('+'), parse_number, multispace0
and returns OffsetSpec::Relative.

Comment thread src/parser/grammar/mod.rs
Comment on lines +843 to +863
// For AnyValue (`x`), no operand is needed -- treat remaining text as message.
// For string-family types, fall back to a bare (unquoted) single-token
// literal if the strict `parse_value` alternatives all fail. magic(5)
// syntax permits writing `string TEST` or `search/12 ABC` without
// surrounding quotes, and this fallback supports that form without
// relaxing value parsing for non-string types (where `xyz` must
// still be rejected -- see `test_parse_value_invalid_input`).
let is_string_family_type = matches!(
typ,
TypeKind::String { .. }
| TypeKind::PString { .. }
| TypeKind::Regex { .. }
| TypeKind::Search { .. }
);
let (input, value) = if op == Operator::AnyValue {
(input, Value::Uint(0))
} else if is_string_family_type {
match parse_value(input) {
Ok(ok) => ok,
Err(orig_err) => parse_bare_string_value(input).map_err(|_| orig_err)?,
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

String-family bare operands still lose to numeric/hex parsing.

This only falls back to parse_bare_string_value when parse_value fails, so bare operands like string 123, string 7f, or search/16 0x10 are still parsed as Uint/Bytes instead of literal text. Those are valid unquoted string/pattern operands, so the rule will never match as intended.

Proposed fix
-    } else if is_string_family_type {
-        match parse_value(input) {
-            Ok(ok) => ok,
-            Err(orig_err) => parse_bare_string_value(input).map_err(|_| orig_err)?,
-        }
+    } else if is_string_family_type {
+        let trimmed = input.trim_start();
+        if trimmed.starts_with('"') || trimmed.starts_with('\\') {
+            parse_value(input)?
+        } else {
+            parse_bare_string_value(input)?
+        }
     } else {
         parse_value(input)?
     };

Please add regression cases for numeric-looking and hex-looking bare tokens once this branch is tightened. As per coding guidelines: “src/parser/**: Focus on magic file DSL parsing ... and robust parsing logic.”

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// For AnyValue (`x`), no operand is needed -- treat remaining text as message.
// For string-family types, fall back to a bare (unquoted) single-token
// literal if the strict `parse_value` alternatives all fail. magic(5)
// syntax permits writing `string TEST` or `search/12 ABC` without
// surrounding quotes, and this fallback supports that form without
// relaxing value parsing for non-string types (where `xyz` must
// still be rejected -- see `test_parse_value_invalid_input`).
let is_string_family_type = matches!(
typ,
TypeKind::String { .. }
| TypeKind::PString { .. }
| TypeKind::Regex { .. }
| TypeKind::Search { .. }
);
let (input, value) = if op == Operator::AnyValue {
(input, Value::Uint(0))
} else if is_string_family_type {
match parse_value(input) {
Ok(ok) => ok,
Err(orig_err) => parse_bare_string_value(input).map_err(|_| orig_err)?,
}
// For AnyValue (`x`), no operand is needed -- treat remaining text as message.
// For string-family types, fall back to a bare (unquoted) single-token
// literal if the strict `parse_value` alternatives all fail. magic(5)
// syntax permits writing `string TEST` or `search/12 ABC` without
// surrounding quotes, and this fallback supports that form without
// relaxing value parsing for non-string types (where `xyz` must
// still be rejected -- see `test_parse_value_invalid_input`).
let is_string_family_type = matches!(
typ,
TypeKind::String { .. }
| TypeKind::PString { .. }
| TypeKind::Regex { .. }
| TypeKind::Search { .. }
);
let (input, value) = if op == Operator::AnyValue {
(input, Value::Uint(0))
} else if is_string_family_type {
let trimmed = input.trim_start();
if trimmed.starts_with('"') || trimmed.starts_with('\\') {
parse_value(input)?
} else {
parse_bare_string_value(input)?
}
} else {
parse_value(input)?
};
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/parser/grammar/mod.rs` around lines 843 - 863, The branch for
string-family types currently calls parse_value first so bare tokens like "123"
or "0x7f" get parsed as Uint/Bytes instead of literal strings; change the logic
in the is_string_family_type branch to prefer parse_bare_string_value for
unquoted single-token inputs (i.e., try parse_bare_string_value(input) first and
return that when it succeeds), falling back to parse_value only if the
bare-string parse fails, while keeping the Operator::AnyValue path unchanged;
update tests for parse_value/parse_bare_string_value to include numeric-looking
and hex-looking bare tokens for TypeKind::String/PString/Regex/Search to ensure
the rule matches the literal text.

@unclesp1d3r unclesp1d3r merged commit 95fe971 into main Apr 24, 2026
34 checks passed
@unclesp1d3r unclesp1d3r deleted the 42-parser-implement-default-clear-name-use-and-indirect-meta-types branch April 24, 2026 02:35
@github-actions github-actions Bot mentioned this pull request Apr 25, 2026
unclesp1d3r pushed a commit that referenced this pull request Apr 25, 2026
## 🤖 New release

* `libmagic-rs`: 0.5.0 -> 0.6.0 (⚠ API breaking changes)

### ⚠ `libmagic-rs` breaking changes

```text
--- failure constructible_struct_adds_field: externally-constructible struct adds field ---

Description:
A pub struct constructible with a struct literal has a new pub field. Existing struct literals must be updated to include the new field.
        ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/constructible_struct_adds_field.ron

Failed in:
  field MagicRule.value_transform in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:1189
  field MagicRule.value_transform in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:1189
  field MagicRule.value_transform in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:1189

--- failure copy_impl_added: type now implements Copy ---

Description:
A public type now implements Copy, causing non-move closures to capture it by reference instead of moving it.
        ref: rust-lang/rust#100905
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/copy_impl_added.ron

Failed in:
  libmagic_rs::mime::MimeMapper in /tmp/.tmpwFvgw1/libmagic-rs/src/mime.rs:98

--- failure enum_marked_non_exhaustive: enum marked #[non_exhaustive] ---

Description:
A public enum has been marked #[non_exhaustive]. Pattern-matching on it outside of its crate must now include a wildcard pattern like `_`, or it will fail to compile.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#attr-adding-non-exhaustive
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/enum_marked_non_exhaustive.ron

Failed in:
  enum OffsetSpec in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:198
  enum OffsetSpec in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:198
  enum OffsetSpec in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:198
  enum LibmagicError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:15
  enum LibmagicError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:15
  enum IoError in /tmp/.tmpwFvgw1/libmagic-rs/src/io/mod.rs:26
  enum Operator in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:838
  enum Operator in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:838
  enum Operator in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:838
  enum TypeReadError in /tmp/.tmpwFvgw1/libmagic-rs/src/evaluator/types/mod.rs:56
  enum ParseError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:74
  enum ParseError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:74
  enum Value in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:965
  enum Value in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:965
  enum Value in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:965
  enum TypeKind in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:398
  enum TypeKind in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:398
  enum TypeKind in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:398
  enum EvaluationError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:148
  enum EvaluationError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:148

--- failure enum_struct_variant_field_added: pub enum struct variant field added ---

Description:
An enum's exhaustive struct variant has a new field, which has to be included when constructing or matching on this variant.
        ref: https://doc.rust-lang.org/reference/attributes/type_system.html#the-non_exhaustive-attribute
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/enum_struct_variant_field_added.ron

Failed in:
  field base_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:251
  field adjustment_op of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:266
  field result_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:272
  field base_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:251
  field adjustment_op of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:266
  field result_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:272
  field base_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:251
  field adjustment_op of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:266
  field result_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:272

--- failure function_missing: pub fn removed or renamed ---

Description:
A publicly-visible function cannot be imported by its prior path. A `pub use` may have been removed, or the function itself may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/function_missing.ron

Failed in:
  function libmagic_rs::parser::grammar::is_empty_line, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:1025
  function libmagic_rs::parser::grammar::parse_strength_directive, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:846
  function libmagic_rs::parser::grammar::parse_type_and_operator, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:683
  function libmagic_rs::parser::grammar::parse_offset, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:179
  function libmagic_rs::parser::parse_offset, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:179
  function libmagic_rs::parser::grammar::parse_comment, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:1004
  function libmagic_rs::parser::grammar::parse_message, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:810
  function libmagic_rs::parser::grammar::parse_value, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:633
  function libmagic_rs::parser::grammar::parse_number, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:133
  function libmagic_rs::parser::parse_number, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:133
  function libmagic_rs::parser::grammar::has_continuation, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:1060
  function libmagic_rs::parser::grammar::parse_magic_rule, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:946
  function libmagic_rs::parser::grammar::parse_rule_offset, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:779
  function libmagic_rs::parser::grammar::is_comment_line, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:1042
  function libmagic_rs::parser::grammar::is_strength_directive, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:902
  function libmagic_rs::parser::grammar::parse_type, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:749
  function libmagic_rs::parser::grammar::parse_operator, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:227

--- failure function_parameter_count_changed: pub fn parameter count changed ---

Description:
A publicly-visible function now takes a different number of parameters.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#fn-change-arity
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/function_parameter_count_changed.ron

Failed in:
  libmagic_rs::evaluator::evaluate_single_rule now takes 3 parameters instead of 2, in /tmp/.tmpwFvgw1/libmagic-rs/src/evaluator/engine/mod.rs:196

--- failure inherent_method_missing: pub method removed or renamed ---

Description:
A publicly-visible method or associated fn is no longer available under its prior name. It may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/inherent_method_missing.ron

Failed in:
  FileBuffer::create_symlink, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/io/mod.rs:326
  EvaluationContext::increment_recursion_depth, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/evaluator/mod.rs:114
  EvaluationContext::decrement_recursion_depth, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/evaluator/mod.rs:130
  EvaluationContext::increment_recursion_depth, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/evaluator/mod.rs:114
  EvaluationContext::decrement_recursion_depth, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/evaluator/mod.rs:130

--- failure module_missing: pub module removed or renamed ---

Description:
A publicly-visible module cannot be imported by its prior path. A `pub use` may have been removed, or the module may have been renamed, removed, or made non-public.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/module_missing.ron

Failed in:
  mod libmagic_rs::parser::grammar, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:4

--- failure struct_marked_non_exhaustive: struct marked #[non_exhaustive] ---

Description:
A public struct has been marked #[non_exhaustive], which will prevent it from being constructed using a struct literal outside of its crate. It previously had no private fields, so a struct literal could be used to construct it outside its crate.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#attr-adding-non-exhaustive
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/struct_marked_non_exhaustive.ron

Failed in:
  struct EvaluationConfig in /tmp/.tmpwFvgw1/libmagic-rs/src/config.rs:42
```

<details><summary><i><b>Changelog</b></i></summary><p>

<blockquote>

## [0.6.0] - 2026-04-25

### Features

- **parser**: Add Date and QDate types with serialization support
([#165](#165))
- **parser**: Implement pstring (Pascal string) type
([#170](#170))
- **parser**: Implement pstring multi-byte length prefix variants (/B,
/H, /h, /L, /l, /J)
([#183](#183))
- **evaluator**: Add debug-level tracing for skipped rules
([#184](#184))
- **evaluator**: Implement indirect offset resolution
([#37](#37))
([#199](#199))
- **evaluator**: Implement relative offset resolution
([#38](#38))
([#211](#211))
- **deps**: Add new skills to actionbook/rust-skills and
trailofbits/skills
- **evaluator**: Regex and search types (closes #39)
([#214](#214))
- Implement libmagic meta-type directives and format substitution
([#42](#42))
([#230](#230))

### Bug Fixes

- **regex**: PR #214 follow-up review findings
([#215](#215))
- Load and correctly evaluate /usr/share/file/magic/filesystems and
adjacent magic files
([#233](#233))

### Documentation

- **gotchas**: Clarify requirements for adding TypeKind variants

### Miscellaneous Tasks

- Rename .coderabbitai.yaml to .coderabbit.yaml
- **Mergify**: Configuration update
([#173](#173))
- Update .gitignore to exclude local AI assistant files
- **mergify**: Upgrade configuration to current format
([#205](#205))
- Resolve all pending TODO items
([#212](#212))
- **mergify**: Upgrade configuration to current format
([#231](#231))
<!-- generated by git-cliff -->

### Security

- **io**: Close TOCTOU race in `FileBuffer::new` metadata validation
(CWE-367). `validate_file_metadata` now uses `File::metadata()` on the
open descriptor instead of re-canonicalizing the path, so an attacker
cannot swap the path between `open_file` and validation. Error paths now
report the caller-supplied path rather than the canonicalized variant.
- **cli**: Remove relative-path fallbacks from `default_magic_file_path`
(CWE-426). `./missing.magic`, `./third_party/magic.mgc`, and the
`CI`/`GITHUB_ACTIONS` env-var branch no longer resolve against the
process cwd. CI pipelines must pass `--magic <path>` explicitly.
- **evaluator**: `build_regex` now bounds `size_limit` and
`dfa_size_limit` to 1 MiB (`REGEX_COMPILE_SIZE_LIMIT`) to reject
compile-time DoS patterns (CWE-1333) from adversarial magic files.

### Features

- **parser**: Implement meta-type directives: `name`/`use` subroutines,
`default`/`clear` per-level fallback, and `indirect` re-evaluation.
`parse_text_magic_file` now returns `ParsedMagic { rules, name_table }`
(breaking change from `Vec<MagicRule>`). Named subroutines are hoisted
into `NameTable` at load time and dispatched via `RuleEnvironment` in
the evaluator. Recursion is bounded by
`EvaluationConfig::max_recursion_depth`. Resolves
[#42](#42).
- **evaluator**: Thread-local regex compile cache eliminates the
double-compile paid by every successful regex match.
`regex_bytes_consumed` now reuses the compiled `Regex` from `read_regex`
instead of recompiling the pattern to derive the anchor advance. The
cache is reset at the start of every `evaluate_rules_with_config` call,
bounding memory to one evaluation.
- **config**: `EvaluationConfig` is now `#[non_exhaustive]`; new
builder-style setters (`with_max_recursion_depth`,
`with_max_string_length`, `with_stop_at_first_match`, `with_mime_types`,
`with_timeout_ms`) let external crates construct configurations without
struct literals.
- **parser**: `MagicRule::new()` smart constructor with
`::with_children()`, `::with_strength_modifier()`, `::with_level()`
builder methods and a `::validate()` method enforcing structural
invariants (non-empty message, `level <= MAX_LEVEL`, children nested
strictly deeper than parent). New `MagicRuleValidationError` error type.
- **parser**: `RegexFlags::with_case_insensitive()` and
`::with_start_offset()` builder methods.

### Refactor

- **engine**: Extract `evaluate_pattern_rule()` and
`evaluate_value_rule()` helpers from
`evaluate_single_rule_with_anchor`'s 90-line body. Dispatch is now a
two-arm type-category split; each helper has focused rustdoc on
semantics and invariants.
- **types**: Replace the `_ =>` catch-all in
`bytes_consumed_with_pattern` with an explicit listing of the
fixed-width `TypeKind` variants. Adding a new variable-width variant
without updating this match is now a compile error instead of a silent
relative-offset anchor corruption in release builds.
- **parser**: Split the 185-line `type_keyword_to_kind` match into
per-family helpers (`byte_family`, `short_family`, `long_family`,
`quad_family`, `float_family`, `double_family`, `date_family`,
`qdate_family`, `string_family`). Drops the
`#[allow(clippy::too_many_lines)]` attribute.
- **main**: `main()` returns `std::process::ExitCode` instead of calling
`process::exit`, so destructors run on the happy path. Ctrl-C
`AtomicBool` flag uses `Ordering::Relaxed` instead of `SeqCst`.
- **grammar**: `parse_strength_directive` uses nom 8's `preceded` +
`Parser::map` instead of the legacy `map(pair(char(...), parse_number),
|(_, n)| ...)` pattern.
- **output**: Add `#[serde(skip_serializing_if = "Option::is_none",
default)]` to public `Option<T>` fields so JSON output no longer emits
`"field": null` for unset optional values.

### Documentation

- **lib**: Add `# Security` sections to
`MagicDatabase::with_builtin_rules`, `::with_builtin_rules_and_config`,
`::load_from_file`, and `::load_from_file_with_config` warning about the
unbounded default timeout and recommending
`EvaluationConfig::performance()` for untrusted input.
- **lib**: Document `MagicDatabase: Send + Sync` for parallel scanning.
- **README**: Update `TypeKind` enum example to match the current AST,
add `regex` and `search/N` to the supported types table, add pre-1.0 API
stability warning, correct the roadmap to mark v0.2-v0.4 as shipped.
- **AGENTS.md**: Relabel "Currently Implemented (v0.1.0)" and "Current
Limitations (v0.1.0)" to v0.5.0 and rewrite the Development Phases
section to reflect actual shipped scope.

### Testing

- Security regression tests for S-H1 (planted-magic-file in cwd), S-H2
(TOCTOU path-swap contract), S-M2 (pathological regex bounded runtime),
S-L2 (codegen message escape round-trip), and GOTCHAS S13.1
(`EvaluationConfig::default()` unbounded timeout invariant).
- Backspace message concatenation regression tests for first-match,
consecutive, and empty-rest edge cases.
- `MagicRule::validate()` tests covering empty message, child level
invariant, and max-depth rejection.
- `RegexCache` population/clear/reuse tests.

### Breaking Changes

- **parser**: `parse_text_magic_file` return type changed from
`Result<Vec<MagicRule>, ParseError>` to `Result<ParsedMagic,
ParseError>`. Callers must destructure `ParsedMagic { rules, name_table
}`. Low-level callers that only need the rule list can use
`parsed.rules`. `load_magic_file` and `load_magic_directory` return the
same new type.
</blockquote>


</p></details>

---
This PR was generated with
[release-plz](https://github.com/release-plz/release-plz/).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request evaluator Rule evaluation engine and logic output Result formatting and output generation parser Magic file parsing components and grammar size:XXL This PR changes 1000+ lines, ignoring generated files. testing Test infrastructure and coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parser: implement default, clear, name, use, and indirect meta-types

2 participants