feat: Implement libmagic meta-type directives and format substitution (#42) by unclesp1d3r · Pull Request #230 · EvilBit-Labs/libmagic-rs

unclesp1d3r · 2026-04-22T23:30:34Z

Closes #42.

Summary

All six MetaType variants (Default, Clear, Name, Use, Indirect, Offset) are fully evaluated with GNU file-compatible semantics: default fires only when no sibling at the same level matched, clear resets the flag, name/use enable subroutines with endian-flip via RuleEnvironment, indirect re-applies the root rule set at the resolved offset bounded by max_recursion_depth, and offset reports the resolved file position as Value::Uint(pos).
Printf-style format substitution (src/output/format.rs::format_magic_message) supports %d, %i, %u, %x, %X, %o, %s, %c, %% with width/padding/length modifiers. Hex specifiers mask to TypeKind::bit_width() so signed-byte -1 renders as ff, not sign-extended ffffffffffffffff. Alt-form follows C printf: %#06x + 0xab → 0x00ab.
Continuation-sibling anchor reset and subroutine base_offset close the semantic gaps that blocked byte-for-byte parity with GNU file. Top-level siblings still chain per GOTCHAS S3.8 (regression-guarded).

What changed

Phase 1 (AST + parser + name/use + default/clear/indirect dispatch) and Phase 2 (this PR's close-out: offset evaluator, printf substitution, parser x-strip fix, sibling-anchor semantic fix, subroutine offset bias) are bundled because they're a single cohesive issue-#42 unit.

Area	Files	Notes
AST	`src/parser/ast.rs`	`TypeKind::Meta(MetaType)` enum with six variants; `Offset` doc comment updated to reflect live semantics
Parser	`src/parser/grammar/mod.rs`, `src/parser/name_table.rs`, `src/parser/loader.rs`, `src/parser/mod.rs`, `src/parser/types.rs`, `src/parser/codegen.rs`	`strip_optional_x_operator` helper stops `x` leaking into messages; `extract_name_table` hoists `Meta::Name` subtrees at load time; `ParsedMagic { rules, name_table }` replaces the old `Vec<MagicRule>` parser return
Evaluator	`src/evaluator/engine/mod.rs`, `src/evaluator/mod.rs`, `src/evaluator/offset/mod.rs`, `src/evaluator/strength.rs`, `src/evaluator/types/mod.rs`	All six Meta dispatches, `is_child_sibling_list` gate for continuation-sibling anchor reset, `base_offset` on `EvaluationContext`, new `resolve_offset_with_base`
Output	`src/output/format.rs` (new), `src/output/mod.rs`, `src/lib.rs`	`format_magic_message` pure function; `MagicDatabase::concatenate_messages` runs messages through it before the `\b` check (GOTCHAS S14.1)
Tests	`tests/meta_types_integration.rs` (new), `src/evaluator/engine/tests.rs`, `tests/*`	Un-ignored `test_searchbug_matches_full_result_string`; 18 format unit tests; 7 Offset dispatch tests; synthetic `default`/`clear`/`indirect` scenarios
Docs	`AGENTS.md`, `GOTCHAS.md`, `docs/solutions/integration-issues/meta-type-subroutine-dispatch-architecture.md`, `docs/src/*.md`, `ROADMAP.md`, `CHANGELOG.md`	S14.2 rewritten, new S3.10 (subroutine base offset), S3.9→S3.11 with S14.1 cross-ref fixed
Build	`.github/workflows/release.yml`, `build.rs`, `src/build_helpers.rs`, `src/error.rs`	`dist generate` refresh; Phase 1 plumbing for `NameTable`/`RuleEnvironment`

Net: 41 files changed, +4661 / -205 lines.

Test Plan

mise exec -- cargo nextest run — 1348/1348 pass, 5 pre-existing skipped
mise exec -- just ci-check — green (fmt, clippy -D warnings, cargo test, nextest, audit, deny, machete, mdformat)
test_searchbug_matches_full_result_string produces byte-for-byte parity with third_party/tests/searchbug.result: Testfmt (0) found_ABC followed_by 0x31 at_offset 11 (64) found_ABC followed_by 0x32 at_offset 75
No new unsafe, unwrap(), or expect() in library code
GOTCHAS S3.8 top-level sibling chaining preserved (relative_anchor_can_decrease_when_later_sibling_matches_at_lower_position still passes)
MagicRule serde back-compat: old serialized rules deserialize unchanged

Residual items (separate follow-up PRs)

Documented in .context/compound-engineering/ce-review/20260422-101802-31ab7e83/synthesis.md (local-only). Non-blocking:

COR-003 / COR-004: AnchorScope in Indirect arm and manual save/restore in evaluate_use_rule leave anchor state mutated on RecursionLimitExceeded error paths. Low-probability with default max_recursion_depth=20; worth RAII refactor in its own PR.
M-02: Child-evaluation graceful-error match block is duplicated across four dispatch arms. Extracting an evaluate_children_gracefully helper is worthwhile but risks subtle behavior drift.
Open question: does a dedicated max_indirect_depth cap (separate from max_recursion_depth) make sense for 1.0? Needs a pathological-fixture smoke test first.

Compatibility impact

Breaking behavior change (library consumers): rule messages containing % that were previously passed through verbatim will now be interpreted as format specifiers. Literal % in user-facing data must be escaped as %%. Documented in GOTCHAS S14.2 rewrite. Recommend calling out in the changelog for the next release.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

…#42) Complete issue #42 by landing the remaining Phase 2 work on top of the prior Phase 1 AST + parser + `name`/`use` dispatch. Closes byte-for-byte parity with GNU `file`'s `searchbug.result` fixture. Meta-type directives - `TypeKind::Meta(MetaType::{Default, Clear, Name, Use, Indirect, Offset})` all fully evaluated in `src/evaluator/engine/mod.rs`. - `default` fires only when no sibling at the same level has matched; `clear` resets the per-level `sibling_matched` flag so later `default` rules can fire again. - `name` subroutines are hoisted into a `NameTable` at load time (`parser::name_table::extract_name_table`). `use` invokes them via `RuleEnvironment` threaded through `EvaluationContext::rule_env`. - `indirect` re-applies the root rule set at the resolved offset via `AnchorScope`, bounded by `EvaluationConfig::max_recursion_depth`. - `offset` reports the resolved file offset as `Value::Uint(pos)` for printf-style substitution. Printf-style format substitution - New pure function `src/output/format.rs::format_magic_message` supporting `%d`, `%i`, `%u`, `%x`, `%X`, `%o`, `%s`, `%c`, `%%`, plus width/padding/length modifiers. Hex specifiers mask to the natural `TypeKind::bit_width()` so a signed byte carrying `-1` renders as `ff` rather than sign-extended `ffffffffffffffff`. - Alt-form width follows C printf: zero-pad inserts between prefix and digits (`%#06x` + `0xab` -> `0x00ab`), space-pad goes before the prefix, left-align trails the digits. - Wired into `MagicDatabase::concatenate_messages` before the existing `\b` backspace-suppression check (GOTCHAS S14.1). Parser and evaluator semantics - Parser consumes an optional `x` (AnyValue) token between a Meta keyword and its message so `offset x at_offset %lld` no longer leaks the operator into output. - Subroutine `base_offset` on `EvaluationContext`: inside a `use` body, positive `OffsetSpec::Absolute(n)` resolves to `base_offset + n`, matching magic(5) semantics. Negative absolute, `FromEnd`, `Relative`, and `Indirect` are unaffected (new GOTCHAS S3.10). - Continuation-sibling anchor reset: at `recursion_depth > 0`, each sibling resolves `&N` against the parent-level entry anchor rather than the previous sibling's advance. Top-level siblings (depth 0) keep chaining per GOTCHAS S3.8 (`relative_anchor_can_decrease_...` still passes). Tests - `test_searchbug_matches_full_result_string` un-ignored and passing against the GNU `file` `searchbug.testfile` fixture. - 18 unit tests for `format_magic_message` including regression guards for alt-form zero-padding. - 7 new engine tests for `MetaType::Offset` dispatch (match emission, offset-at-zero, out-of-bounds skip, non-`AnyValue` operator reject, children, anchor semantics, `sibling_matched` propagation). - Integration test file `tests/meta_types_integration.rs` covers `default`/`clear`/`indirect` synthetic scenarios plus the searchbug round-trip. Documentation - AGENTS.md "Currently Implemented" expanded with the six meta-types and printf-style substitution; "Current Limitations" updated. - GOTCHAS.md S14.2 rewritten (specifiers ARE substituted); new S3.10 for subroutine base offset; S3.9 renumbered to S3.11 with S14.1 cross-reference fixed. - `docs/solutions/integration-issues/meta-type-subroutine-dispatch-architecture.md` documents the three-layer parse-time / `ParsedMagic` / optional `RuleEnvironment` pattern. Build tooling - `dist generate` regenerated `.github/workflows/release.yml` to resolve cargo-dist staleness vs recent `actions/upload-artifact` dependabot bumps. Test results: 1348/1348 pass, 5 pre-existing skipped. `just ci-check` green (clippy `-D warnings`, fmt, audit, deny). Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

mergify · 2026-04-22T23:31:11Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Require conventional commit format per https://www.conventionalcommits.org/en/v1.0.0/. Skipped for bots.

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:$.+$)?!?:

🟢 Full CI must pass

Wonderful, this rule succeeded.

All CI checks must pass. Release-plz PRs are exempt because they only bump versions and changelogs (code was already tested on main), and GITHUB_TOKEN-triggered force-pushes suppress CI.

check-success = coverage
check-success = quality
check-success = test
check-success = test-cross-platform (macos-latest, macOS)
check-success = test-cross-platform (ubuntu-22.04, Linux)
check-success = test-cross-platform (ubuntu-latest, Linux)
check-success = test-cross-platform (windows-latest, Windows)

🟢 Do not merge outdated PRs

Wonderful, this rule succeeded.

Make sure PRs are within 10 commits of the base branch before merging

#commits-behind <= 10

coderabbitai · 2026-04-22T23:31:18Z

Summary by CodeRabbit

New Features
- Added meta-type directives: default, clear, name, use, indirect, offset; and printf-style format substitution in rule messages.
Breaking Changes
- Parser/load now returns rules plus a hoisted name-table; callers must destructure the returned value to access rules and named subroutines.
Documentation
- Updated docs to document meta directives, evaluation behavior, name-hoisting, offset semantics, and message-formatting rules.
Tests
- Expanded test suites covering meta dispatch, offsets, use/name subroutines, and message formatting.

Walkthrough

Implements meta-type directives (default, clear, name, use, indirect, offset), hoists/merges named subroutines into a parser NameTable via ParsedMagic, threads a RuleEnvironment through evaluation with base-offset/recursion guards, adds printf-style message formatting, and expands docs/tests accordingly.

Changes

Cohort / File(s)	Summary
Docs & Guides `AGENTS.md`, `CHANGELOG.md`, `GOTCHAS.md`, `ROADMAP.md`, `docs/src/...`	Documented meta-type directives, `ParsedMagic { rules, name_table }` return type, evaluator dispatch semantics, printf-style message formatting, and an "Adding a new meta-type" guide; updated roadmap and breaking-changes notes.
Build & Metadata `Cargo.toml`, `build.rs`, `src/build_helpers.rs`	Updated package authors metadata; build script and helper capture `ParsedMagic` and forward `parsed.rules` to codegen.
Parser AST & Types `src/parser/ast.rs`, `src/parser/types.rs`	Added `MetaType` enum and `TypeKind::Meta(MetaType)`; `bit_width()` returns `None` for meta; type-keyword mapping updated for meta directives.
Parser Grammar & Tests `src/parser/grammar/...`, `src/parser/grammar/tests/...`, `src/parser/grammar/tests/meta_types.rs`	Added `&` relative-offset parsing, dedicated meta parsers for `name`/`use`, meta-rule operand/ message parsing, and fallback bare-string parsing for string-family types; new and adjusted parser tests.
Parser API, Loader & NameTable `src/parser/mod.rs`, `src/parser/loader.rs`, `src/parser/name_table.rs`, `src/parser/codegen.rs`	Introduced `ParsedMagic { rules, name_table }`, hoisted top-level `name` subroutines into `NameTable`, merging logic across files, and updated codegen to serialize/escape `MetaType` names.
Errors `src/error.rs`	Added `EvaluationError::UnknownName { name }` and `IndirectWithoutEnvironment` plus constructor helper.
Evaluator Context & Env `src/evaluator/mod.rs`	Added `RuleEnvironment`, `EvaluationContext.rule_env: Option<Arc<...>>`, `base_offset`, `indirect_reentry` flag and accessors/mutators; reset behavior updated.
Evaluator Engine & Meta Dispatch `src/evaluator/engine/mod.rs`, `src/evaluator/engine/tests/mod.rs`	Refactored `evaluate_rules` to inline-dispatch meta directives; added `evaluate_use_rule`, `evaluate_children_or_warn`, recursion/anchor handling, base-offset propagation, and diagnostics; wired tests/helpers.
Evaluator Offset Resolution `src/evaluator/offset/mod.rs`	Added `resolve_offset_with_base` to apply subroutine `base_offset` biasing for positive `Absolute` offsets; delegated existing resolver.
Evaluator Types & Strength `src/evaluator/types/mod.rs`, `src/evaluator/strength.rs`	String read/consumption semantics adjusted to use literal length when `max_length` absent; meta-types handled as zero-byte; strength calculation extended to include meta-type axis.
Output Formatting `src/output/format.rs`, `src/output/mod.rs`	New `format_magic_message(template, value, type_kind)` implementing a constrained printf-like substitution (flags, width, conversions, masking); exported via `pub mod format`.
Library Core & Tests `src/lib.rs`, `src/tests.rs`, `src/evaluator/engine/tests/...`, `src/evaluator/engine/tests/helpers/...`	`MagicDatabase` and loading updated to store `root_rules` + `name_table`; evaluation constructs `RuleEnvironment` and threads it; extensive meta-type test suites and helpers added.
Misc `dist-workspace.toml`, docs/examples/...	Pin small GH Actions update; docs and examples updated to use new `ParsedMagic` pattern.

Sequence Diagram(s)

sequenceDiagram
    participant Parser
    participant NameTable as NameTable (Extraction)
    participant ParsedMagic

    Parser->>Parser: parse_text_magic_file(input)
    Parser->>Parser: build rule hierarchy
    Parser->>NameTable: extract_name_table(rules)
    NameTable-->>Parser: rules (sans Name), name_table
    Parser->>ParsedMagic: return ParsedMagic { rules, name_table }

sequenceDiagram
    participant Evaluator
    participant RuleEnv as RuleEnvironment (name_table)
    participant Subrules as SubroutineRules
    participant Formatter as OutputFormatter

    Evaluator->>Evaluator: evaluate_rules(root_rules)
    Evaluator->>RuleEnv: lookup name for MetaType::Use
    RuleEnv-->>Evaluator: Arc<[MagicRule]> (subroutine)
    Evaluator->>Subrules: evaluate_use_rule(subroutine, base_offset)
    Subrules-->>Evaluator: matches + terminal anchor
    Evaluator->>Formatter: format_magic_message(template, value, type_kind)
    Formatter-->>Evaluator: formatted text
    Evaluator-->>Evaluator: integrate matches & return

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related PRs

refactor(evaluator): split mod.rs into focused submodules #153 — evaluator module refactor extracting engine logic; overlaps evaluator engine reorganization and context threading.
feat: Implement text magic parser (issue #11) #16 — foundational text-magic parser work; this PR changes parser output to ParsedMagic and hoists name tables on top of that parsing surface.
chore: resolve all pending TODO items #212 — parser/grammar and evaluator refactors that intersect with grammar splitting and evaluator recursion/strength logic modified here.

Suggested labels

parser, evaluator, output, testing

Poem

Meta rules whispered in the file, ✨
Names hoisted, subroutines compile, 🔁
Default guards where others fail, 🛡️
Indirect dives a nested trail, 🕳️
Formatted messages sew the tale. 🪄

🚥 Pre-merge checks | ✅ 10

✅ Passed checks (10 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title follows Conventional Commits with type 'feat', scope 'meta-type directives and format substitution', and issue reference (`#42`); accurately reflects the main implementation.
Description check	✅ Passed	Description is detailed and directly related to the changeset, covering all six MetaType variants, printf-style substitution, semantic fixes, test results, and compatibility impact.
Linked Issues check	✅ Passed	All core objectives from issue `#42` are met: parser recognizes meta-types, default/clear/name/use/indirect are fully evaluated, unit and integration tests added [`#42`].
Out of Scope Changes check	✅ Passed	All changes align with `#42` objectives. Minor scope items (GitHub Actions version, Cargo.toml author field) are incidental metadata updates unrelated to meta-type implementation.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 85.00%.
Memory-Safety-Check	✅ Passed	Pull request introduces no unsafe code blocks and maintains strict memory safety throughout all new modules using idiomatic Rust patterns.
Libmagic-Compatibility-Check	✅ Passed	PR implements all six meta-type directives with GNU file-compatible semantics verified through comprehensive test coverage.
Performance-Regression-Check	✅ Passed	Arc sharing for rules and name-tables uses efficient zero-copy with clones only at context creation. Hot-path evaluation loops have no Vec allocations. Message formatting deferred post-evaluation with proper capacity pre-allocation. Memory-mapped I/O correctly used for buffer loading. No performance regressions detected.
Test-Coverage-Check	✅ Passed	PR demonstrates strong test coverage with 60 test functions across ~2,003 lines covering all six MetaType variants, all printf-style format specifiers, error conditions, and boundary cases systematically.
Error-Handling-Check	✅ Passed	PR demonstrates comprehensive error handling with Result-based APIs, thiserror derive macros, proper error propagation via ? operator, and safe arithmetic using checked operations.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

Warning

Review ran into problems

🔥 Problems

These MCP integrations need to be re-authenticated in the Integrations settings: Linear, Notion

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

Implements GNU file-compatible meta-type directive parsing + evaluation and adds printf-style message substitution so real-world magic corpora (e.g. searchbug.magic) can round-trip end-to-end within libmagic-rs.

Changes:

Add TypeKind::Meta(MetaType) with evaluation semantics for default, clear, name/use, indirect, and offset, including parse-time name extraction into a NameTable and runtime dispatch via RuleEnvironment.
Introduce output::format::format_magic_message and wire it into MagicDatabase message concatenation for %d/%u/%x/%s/... substitutions.
Update parser API to return ParsedMagic { rules, name_table }, plus extensive tests/docs/bench updates to cover the new behavior.

Reviewed changes

Copilot reviewed 42 out of 43 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
tests/regex_search_corpus_tests.rs	Adapts tests to destructure `ParsedMagic` from `parse_text_magic_file`.
tests/property_tests.rs	Extends generators and adds a meta-type “never panics” property test.
tests/property_tests.proptest-regressions	Checks in proptest regression seeds for meta-type cases.
tests/parser_integration_tests.rs	Updates loader tests for `ParsedMagic` and adds a `name`/`use` round-trip test.
tests/meta_types_integration.rs	Adds end-to-end integration tests for meta-types and `searchbug` parity.
tests/directory_loading_tests.rs	Updates directory loader tests to destructure `ParsedMagic`.
tests/compatibility_tests.rs	Adds `searchbug` partial-match regression coverage.
src/parser/types.rs	Adds meta-type keywords and maps keyword-only meta-types into `TypeKind::Meta`.
src/parser/name_table.rs	Implements `NameTable` and extraction/scrubbing logic for `name` subroutines.
src/parser/mod.rs	Introduces `ParsedMagic` and changes `parse_text_magic_file` to return it.
src/parser/loader.rs	Updates directory/file loading to merge rule lists + name tables into `ParsedMagic`.
src/parser/grammar/tests/mod.rs	Adds parsing tests for relative offsets, meta-types, and `searchbug.magic` fixture.
src/parser/grammar/mod.rs	Adds `&N` relative offsets, meta-type parsing (`name`/`use`), x-strip, and bare-string fallback for string-family types.
src/parser/codegen.rs	Adds codegen support for `TypeKind::Meta(MetaType::...)` and injection-escape test.
src/parser/ast.rs	Adds `MetaType` and `TypeKind::Meta`, plus tests and `bit_width()` behavior.
src/output/mod.rs	Exposes new `output::format` module.
src/output/format.rs	Adds printf-style substitution implementation + unit tests.
src/lib.rs	Threads `RuleEnvironment` through evaluation and applies message formatting during concatenation.
src/evaluator/types/mod.rs	Updates string semantics to be pattern-length-based when applicable and handles `TypeKind::Meta` as unsupported for reads/compare.
src/evaluator/strength.rs	Defines strength ordering behavior for meta-types and adds tests.
src/evaluator/offset/mod.rs	Adds `resolve_offset_with_base` to support subroutine base offsets.
src/evaluator/mod.rs	Adds `RuleEnvironment` and stores optional environment + `base_offset` on `EvaluationContext`.
src/evaluator/engine/mod.rs	Implements meta-type dispatch (`default/clear/use/indirect/offset`), continuation-sibling anchor reset, and subroutine base-offset handling.
src/error.rs	Adds reserved evaluation errors for unknown-name and indirect-without-environment.
src/build_helpers.rs	Updates codegen path to use `ParsedMagic.rules`.
mise.lock	Updates tool lock entries (python build-standalone artifacts, rust version, provenance fields).
docs/src/parser.md	Documents `ParsedMagic` and meta-types (needs updating for `offset` implementation status).
docs/src/magic-format.md	Adds a meta-types section (currently missing `offset`).
docs/src/evaluator.md	Updates evaluator architecture docs and shows meta-type dispatch behavior.
docs/src/ast-structures.md	Documents `TypeKind::Meta` (currently missing `MetaType::Offset` in snippet).
docs/src/architecture.md	Updates architecture overview to include `NameTable`/`ParsedMagic`/`RuleEnvironment`.
docs/solutions/integration-issues/meta-type-subroutine-dispatch-architecture.md	Adds an architecture decision record for meta-type dispatch + environments.
docs/MAGIC_FORMAT.md	Mirrors meta-type documentation in the top-level docs (currently missing `offset`).
docs/ARCHITECTURE.md	Updates high-level repository architecture and MagicDatabase fields.
build.rs	Updates build-time codegen to use `ParsedMagic.rules`.
benches/evaluation_bench.rs	Adds a benchmark for `name`/`use` subroutine dispatch overhead.
ROADMAP.md	Marks meta-types (#42) as completed.
GOTCHAS.md	Adds gotchas for meta-type dispatch, ParsedMagic, bare-string fallback, and format substitution.
Cargo.toml	Updates authors metadata.
CHANGELOG.md	Documents meta-type and ParsedMagic breaking change.
AGENTS.md	Updates project structure and implemented-feature list to include meta-types + format substitution.
.github/workflows/release.yml	Downgrades `actions/upload-artifact` version pins.

codecov · 2026-04-22T23:38:08Z

Codecov Report

❌ Patch coverage is 94.26415% with 76 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/evaluator/engine/mod.rs	90.80%	23 Missing ⚠️
src/output/format.rs	95.50%	17 Missing ⚠️
src/evaluator/offset/mod.rs	86.66%	10 Missing ⚠️
src/parser/grammar/mod.rs	90.72%	9 Missing ⚠️
src/evaluator/types/mod.rs	70.37%	8 Missing ⚠️
src/parser/codegen.rs	91.30%	4 Missing ⚠️
src/error.rs	0.00%	3 Missing ⚠️
src/parser/name_table.rs	98.95%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

dosubot · 2026-04-22T23:39:51Z

Related Documentation

2 document(s) may need updating based on files changed in this PR:

libMagic-rs

api-reference `/libmagic-rs/blob/main/docs/src/api-reference.md` — ⏳ Awaiting Merge

API_REFERENCE `/libmagic-rs/blob/main/docs/API_REFERENCE.md` — ⏳ Awaiting Merge

^{How did I do? Any feedback?}

coderabbitai

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 10

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/src/ast-structures.md`:
- Around line 455-491: The docs for MetaType are missing the Offset variant:
update the MetaType enum documentation and examples to include MetaType::Offset
(and describe its semantics) so the AST docs match the code; modify the enum
block under MetaType to add an Offset variant description and add corresponding
examples creating TypeKind::Meta(MetaType::Offset) alongside the existing
Default/Clear/Name/Use/Indirect cases.

In `@docs/src/parser.md`:
- Around line 452-457: Documentation for ParsedMagic is inaccurate: the struct
field name_table is declared as pub(crate) in the implementation but the docs
show it as pub. Either update the docs code block to match the implementation by
changing the declaration to pub(crate) name_table in the ParsedMagic snippet, or
if external access was intended, change the implementation's ParsedMagic struct
to pub name_table; ensure you update whichever of ParsedMagic or name_table you
modify so the docs and the implementation remain consistent.

In `@src/build_helpers.rs`:
- Around line 32-35: The build step currently parses builtin magic with
parse_text_magic_file and then calls generate_builtin_rules(&parsed.rules) but
discards parsed.name_table (extracted by extract_name_table), so name/use
subroutines are lost; either update the code generation to accept and serialize
the name_table (pass parsed.name_table into generate_builtin_rules and include
it in the generated BUILTIN_RULES static), or add a validation step after
parse_text_magic_file that inspects parsed.name_table and returns an error
rejecting any name/use directives in builtin magic (with a clear build-time
message); modify the call site around parse_text_magic_file and
generate_builtin_rules to implement one of these two fixes and ensure the
generated output includes or rejects parsed.name_table accordingly.

In `@src/evaluator/engine/mod.rs`:
- Around line 1124-1136: The debug-only panic must be converted into a proper
error return: in evaluate_rules_with_config, replace the debug_assert!
containing contains_indirect_rule(rules) with an explicit runtime check that
returns Err(...) when an env-less indirect rule is detected; use the existing
error constructor
(crate::error::EvaluationError::indirect_without_environment()) or map it into
the public error type the function exposes (e.g., LibmagicError) so callers
receive a Result::Err instead of panicking; keep original release behavior for
non-error paths and ensure the check references contains_indirect_rule and
crate::error::EvaluationError::indirect_without_environment() so the change is
easy to locate.
- Around line 546-577: Extract the duplicated child-evaluation block in
evaluate_rules into a helper (suggest name evaluate_children_or_warn) that
encapsulates RecursionGuard::enter(context) + evaluate_rules(&rule.children,
buffer, guard.context()) and implements the exact error handling: propagate
LibmagicError::Timeout, return Err for other errors except swallow and warn on
the specific EvaluationError variants (BufferOverrun, InvalidOffset,
TypeReadError::BufferOverrun/InvalidPStringLength) and LibmagicError::IoError
while logging the same message including rule.message and the error; update the
Default/Indirect/Offset/Use branches in evaluate_rules to call this helper and
use its returned matches (or empty/handled result) so behavior is identical but
the recursion/timeout/warn logic is centralized.
- Around line 264-281: The code saves and restores EvaluationContext's
last_match_end and base_offset only on the success path, so if
RecursionGuard::enter or evaluate_rules returns Err the context is left mutated;
fix by introducing an RAII scope guard (e.g. UseScope) that captures
saved_anchor = context.last_match_end() and saved_base = context.base_offset(),
sets the new offsets via set_last_match_end(absolute_offset) and
set_base_offset(absolute_offset) in its enter() constructor, exposes a context()
method for use with RecursionGuard::enter and evaluate_rules, and implements
Drop to always restore the saved_anchor and saved_base using set_last_match_end
and set_base_offset so restoration happens on every exit path (including errors)
instead of the current manual restore around
evaluate_rules/RecursionGuard::enter.

In `@src/output/format.rs`:
- Around line 270-277: Change the octal alternate-form prefix from "0o" to the
C-style "0" in the Conv::Octal branch: when spec.alt_form is true, pass "0" (not
"0o") into render_prefixed_int so %#o renders a single leading zero (e.g., 010).
Update any tests asserting "0o..." to expect the C-style "0" prefix and ensure
behavior remains governed by coerce_to_u64_masked(value, type_kind) and the
existing render_prefixed_int call.

In `@src/parser/ast.rs`:
- Around line 168-175: The rustdoc for TypeKind::Meta is out of date: it states
meta directives are “parsed and preserved in the AST but evaluated as silent
no-ops” and omits the `offset` keyword; update the comment for the
TypeKind::Meta variant to reflect the current implemented behavior (i.e., that
these directives are parsed into the AST and will be wired into later evaluator
phases rather than being silent no-ops) and include `offset` in the list of
control-flow keywords (`default`, `clear`, `name`, `use`, `indirect`, `offset`)
so the documentation matches the code. Ensure the same correction is applied to
the duplicate doc block referenced around lines 504–510.

In `@src/parser/codegen.rs`:
- Around line 522-552: Add a new unit test mirroring
test_serialize_meta_name_escapes_injection that targets the MetaType::Use match
arm: create a MagicRule with typ:
TypeKind::Meta(MetaType::Use(malicious.to_string())) using the same malicious
payload, call serialize_magic_rule(&rule, 0), and assert the generated string
does not contain injected Rust tokens (e.g., panic!("pwned-from-meta")) and does
contain the escaped quote sequence; this ensures MetaType::Use is escaped the
same way as MetaType::Name.

In `@src/parser/mod.rs`:
- Around line 145-146: Remove the unnecessary #[allow(dead_code)] on the
name_table module since parse_text_magic_file uses it; open the module
declaration pub(crate) mod name_table and delete the #[allow(dead_code)]
attribute so the module is exported without suppressing dead-code lints, then
run cargo check to confirm no new warnings surface (inspect functions referenced
from parse_text_magic_file to ensure they remain public).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 372d4029-2871-4188-bc87-756cb445414d

📥 Commits

Reviewing files that changed from the base of the PR and between f26676f and 6e4ba1a.

⛔ Files ignored due to path filters (13)

.github/workflows/release.yml is excluded by !.github/workflows/release.yml and included by .github/**
benches/evaluation_bench.rs is excluded by none and included by none
docs/ARCHITECTURE.md is excluded by none and included by none
docs/MAGIC_FORMAT.md is excluded by none and included by none
docs/solutions/integration-issues/meta-type-subroutine-dispatch-architecture.md is excluded by none and included by none
mise.lock is excluded by !**/*.lock and included by none
tests/compatibility_tests.rs is excluded by none and included by none
tests/directory_loading_tests.rs is excluded by none and included by none
tests/meta_types_integration.rs is excluded by none and included by none
tests/parser_integration_tests.rs is excluded by none and included by none
tests/property_tests.proptest-regressions is excluded by none and included by none
tests/property_tests.rs is excluded by none and included by none
tests/regex_search_corpus_tests.rs is excluded by none and included by none

📒 Files selected for processing (30)

AGENTS.md
CHANGELOG.md
Cargo.toml
GOTCHAS.md
ROADMAP.md
build.rs
docs/src/architecture.md
docs/src/ast-structures.md
docs/src/evaluator.md
docs/src/magic-format.md
docs/src/parser.md
src/build_helpers.rs
src/error.rs
src/evaluator/engine/mod.rs
src/evaluator/engine/tests.rs
src/evaluator/mod.rs
src/evaluator/offset/mod.rs
src/evaluator/strength.rs
src/evaluator/types/mod.rs
src/lib.rs
src/output/format.rs
src/output/mod.rs
src/parser/ast.rs
src/parser/codegen.rs
src/parser/grammar/mod.rs
src/parser/grammar/tests/mod.rs
src/parser/loader.rs
src/parser/mod.rs
src/parser/name_table.rs
src/parser/types.rs

👮 Files not reviewed due to content moderation or server errors (8)

docs/src/magic-format.md
src/error.rs
src/evaluator/strength.rs
CHANGELOG.md
src/parser/grammar/tests/mod.rs
docs/src/evaluator.md
docs/src/architecture.md
src/lib.rs

coderabbitai

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/src/api-reference.md`:
- Around line 545-547: Update the docs example that destructures ParsedMagic to
avoid accessing the non-public pub(crate) field name_table: change the example
to either destructure while ignoring the internal field (e.g., let ParsedMagic {
rules, .. } = parse_text_magic_file(...)?;) or bind the return value (let parsed
= parse_text_magic_file(...)?;) and then use parsed.rules; reference ParsedMagic
and parse_text_magic_file in the updated text.
- Line 435: Update the parenthetical note for the `message` field to explicitly
state that unescaped '%' characters are now treated as printf-style format
specifiers (e.g., %d, %s, %x) and that literal percent signs must be escaped as
'%%'; also mention this is a breaking change so callers must escape existing '%'
in rule messages. Use the `message` field description to add the escaping
requirement and the breaking-change warning.
- Around line 233-250: Add a "#### Examples" subsection under the MetaType
documentation mirroring other type sections: include small Rust examples that
import MetaType and TypeKind and use parser::grammar::parse_magic_rule to parse
rules and assert the resulting rule.typ matches
TypeKind::Meta(MetaType::Default), MetaType::Name("..."), MetaType::Use("..."),
and MetaType::Indirect; keep examples concise and consistent with the existing
Operator examples so readers can see how to parse/construct meta-type rules.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 65dfb59d-7bdf-4d58-a1a9-fedac2dd6ed6

📥 Commits

Reviewing files that changed from the base of the PR and between 6e4ba1a and f2647f0.

⛔ Files ignored due to path filters (1)

docs/API_REFERENCE.md is excluded by none and included by none

📒 Files selected for processing (1)

docs/src/api-reference.md

…format substitution Critical: SubroutineScope RAII replaces manual save/restore in evaluate_use_rule so last_match_end and base_offset are restored on all error exit paths (RecursionLimitExceeded and Timeout previously leaked corrupted state). format_magic_message no longer mangles non-ASCII template bytes -- replaced byte-by-byte push with plain-run slice flush. Hygiene: removed stale allow(dead_code) on AnchorScope::context(). Tests: 9 new cases -- 5 for resolve_offset_with_base bias invariants (positive Absolute biased, negative Absolute / FromEnd / Relative / Indirect not biased), 2 for Use subroutine with non-zero use-site (proves bias is active and Relative is not double-biased), 1 continuation-sibling reset regression guard with bytes-consumed first sibling, 1 non-ASCII template format test. 1357/1357 pass. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

Copilot

Pull request overview

Implements GNU file-compatible meta-type directives (default, clear, name, use, indirect, offset) across the parser/evaluator pipeline and adds printf-style message substitution so magic fixtures like searchbug.magic round-trip byte-for-byte through MagicDatabase.

Changes:

Introduces TypeKind::Meta(MetaType) + ParsedMagic { rules, name_table }, including load-time name hoisting into a NameTable.
Adds evaluator support for meta-type dispatch (subroutines, indirect re-entry, default/clear sibling state) and subroutine base-offset biasing.
Adds extensive tests, docs updates, and workflow/tooling refreshes to validate parity and document breaking changes.

Reviewed changes

Copilot reviewed 45 out of 46 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/regex_search_corpus_tests.rs	Updates parser call sites to destructure `ParsedMagic` and continue corpus checks.
tests/property_tests.rs	Expands generators to include `MetaType` and adds a never-panics property for meta-type evaluation.
tests/property_tests.proptest-regressions	Checks in proptest regression seeds related to new meta-type paths.
tests/parser_integration_tests.rs	Updates directory/file loaders to `ParsedMagic`; adds end-to-end name/use round-trip test.
tests/meta_types_integration.rs	New end-to-end smoke + parity tests for meta-types and `searchbug` fixture output.
tests/directory_loading_tests.rs	Updates directory loader tests to `ParsedMagic` return type.
tests/compatibility_tests.rs	Adds weaker `searchbug` partial-match regression guard.
src/parser/types.rs	Adds meta keywords to parsing + maps keyword-only meta types in `type_keyword_to_kind`.
src/parser/name_table.rs	New `NameTable` + `extract_name_table` hoisting logic for `name` subroutines.
src/parser/mod.rs	Introduces `ParsedMagic` and updates `parse_text_magic_file` to hoist names.
src/parser/loader.rs	Returns `ParsedMagic` from loaders and merges `NameTable` across directory loads.
src/parser/grammar/tests/mod.rs	Adds grammar tests for relative offsets and meta-type parsing/roundtrip.
src/parser/grammar/mod.rs	Adds `&N` relative offsets, `name/use` operand parsing, meta-rule `x` stripping, and bare-word string fallback.
src/parser/codegen.rs	Serializes `TypeKind::Meta` and adds injection regression tests for Name/Use identifiers.
src/parser/ast.rs	Defines `MetaType` and adds `TypeKind::Meta(MetaType)` with serde/tests.
src/output/mod.rs	Exposes `output::format` module.
src/lib.rs	Threads `RuleEnvironment` into evaluations and applies printf substitution during message concatenation.
src/evaluator/types/mod.rs	Adjusts `string` semantics to pattern-length reads; explicitly rejects reading `TypeKind::Meta` as values/patterns.
src/evaluator/strength.rs	Assigns strength defaults for meta-types + tests for ordering expectations.
src/evaluator/offset/mod.rs	Adds `resolve_offset_with_base` for subroutine `base_offset` biasing.
src/evaluator/mod.rs	Adds `RuleEnvironment` + `EvaluationContext` fields for `rule_env` and `base_offset`.
src/error.rs	Adds reserved errors for strict-mode handling of missing env/unknown name.
src/build_helpers.rs	Updates codegen helper to use `parsed.rules`.
mise.lock	Toolchain lock refresh (Rust/Python artifacts and provenance fields).
docs/src/parser.md	Documents meta-types and the `ParsedMagic` breaking-change return type.
docs/src/magic-format.md	Documents meta-type syntax/semantics in the user-facing magic format doc.
docs/src/evaluator.md	Documents meta-type dispatch behavior and updates examples for `ParsedMagic`.
docs/src/ast-structures.md	Documents `TypeKind::Meta` / `MetaType` in AST reference.
docs/src/architecture.md	Updates architecture overview to include `NameTable`, `ParsedMagic`, and evaluator environment threading.
docs/src/api-reference.md	Adds `MetaType` docs and calls out breaking changes + printf substitution.
docs/solutions/logic-errors/raii-scope-guards-for-evaluator-context-save-restore.md	New design note capturing the RAII scope-guard pattern for context save/restore.
docs/solutions/integration-issues/meta-type-subroutine-dispatch-architecture.md	New design note describing the parse→env→dispatch architecture for meta-types.
docs/MAGIC_FORMAT.md	Mirrors meta-type documentation in the non-mdbook doc.
docs/ARCHITECTURE.md	Updates tree/struct documentation to include name table + root rule slice plumbing.
docs/API_REFERENCE.md	Updates API reference for `ParsedMagic`, meta-types, and message substitution semantics.
build.rs	Updates builtin rules generation to use `parsed.rules`.
benches/evaluation_bench.rs	Adds a benchmark focused on `name/use` subroutine dispatch overhead.
ROADMAP.md	Marks meta-types as complete (including `offset`).
GOTCHAS.md	Updates gotchas for meta-types, bare-word string fallback, and printf substitution.
Cargo.toml	Updates `authors` field.
CHANGELOG.md	Adds feature + breaking-change notes for `ParsedMagic` and meta-type support.
AGENTS.md	Updates repo guidance to include meta-types + printf substitution as implemented.
.github/workflows/release.yml	Updates artifact upload action version usage.

…ect top-level re-entry Bug 1 (PRRT...PrI): evaluate_use_rule now returns the subroutine's TERMINAL anchor (where the subroutine left last_match_end) instead of the use-site offset. Callers propagate that terminal anchor via context.set_last_match_end, so sibling rules after a successful 'use' resolve &N against the subroutine's final match position -- matching GNU file's inlining semantics. Previously siblings resolved against the use-site, silently breaking continuation chains. Bug 2 (PRRT...PrL): MetaType::Indirect root re-entry is evaluated with TOP-LEVEL sibling semantics (anchor chains across siblings per GOTCHAS S3.8), not CONTINUATION semantics (anchor resets). Previously the RecursionGuard wrapping the indirect sub-evaluation forced recursion_depth > 0, which triggered the continuation-reset path and broke top-level siblings inside the re-entered database. Fixed via a new one-shot EvaluationContext::indirect_reentry flag: indirect dispatch sets it before evaluate_rules; evaluate_rules consumes it at entry so only the top-level iteration is affected -- children of matched rules inside the re-entry correctly fall back to recursion_depth > 0 for their own continuation semantics. Deferred (PRRT...PrQ refactor suggestion): split engine tests file into focused modules. Out of scope for a correctness-focused review round; tracked as a follow-on maintenance task. Test: 1364/1364 pass (searchbug byte-for-byte parity preserved through both semantic fixes); just ci-check green. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

…offset docs The real guard is named SubroutineScope (saves+restores both last_match_end and base_offset). The base_offset doc comment referenced a BaseOffsetScope that never existed -- leftover from design intent in an earlier session. Updated the two doc-comment sites. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

Copilot

Pull request overview

Implements GNU file-compatible meta-type (control-flow) directives and adds printf-style message substitution so libmagic-rs can round-trip real-world magic corpora (e.g., searchbug.magic) with byte-for-byte output parity.

Changes:

Adds TypeKind::Meta(MetaType) (default/clear/name/use/indirect/offset) across AST, parser, evaluator dispatch, and strength ordering.
Introduces ParsedMagic { rules, name_table } parser return type with load-time name extraction into a NameTable for efficient use dispatch.
Adds output::format::format_magic_message and wires message rendering (and backspace handling) through it.

Reviewed changes

Copilot reviewed 45 out of 46 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/regex_search_corpus_tests.rs	Updates parser call sites to destructure `ParsedMagic`.
tests/property_tests.rs	Extends generators/property coverage for meta-types and `MetaType` variants.
tests/property_tests.proptest-regressions	Checks in proptest regression seeds for meta-type scenarios.
tests/parser_integration_tests.rs	Adjusts loader return type usage; adds name/use round-trip integration test.
tests/meta_types_integration.rs	Adds end-to-end integration tests for meta-type directives and `searchbug` parity.
tests/directory_loading_tests.rs	Updates directory loader return type usage (`ParsedMagic`).
tests/compatibility_tests.rs	Adds a weaker `searchbug` regression assertion for consumers that alter formatting layers.
src/parser/types.rs	Recognizes meta keywords and maps keyword-only meta directives to `TypeKind::Meta(...)`.
src/parser/name_table.rs	New: extracts top-level `name` blocks into a `NameTable` and scrubs nested `name` rules.
src/parser/mod.rs	Defines `ParsedMagic` and updates `parse_text_magic_file` to return it.
src/parser/loader.rs	Updates directory/file loading to return `ParsedMagic` and merge name tables.
src/parser/grammar/tests/mod.rs	Adds parsing tests for relative offsets and meta-type rules.
src/parser/grammar/mod.rs	Adds `&N` relative offset parsing, `name/use` identifier parsing, meta-type message `x`-strip, and bare-word string fallback for string-family types.
src/parser/codegen.rs	Serializes `TypeKind::Meta(MetaType)` and adds injection-escape regression tests for meta identifiers.
src/parser/ast.rs	Adds `MetaType` and `TypeKind::Meta(MetaType)` with serde + unit tests.
src/output/mod.rs	Exposes new `output::format` module.
src/output/format.rs	New: printf-style format substitution for match messages.
src/lib.rs	Wires name table + root rules into evaluation context; applies format substitution during message concatenation.
src/evaluator/types/mod.rs	Adjusts string read semantics and defines behavior for meta-types in read paths/byte consumption.
src/evaluator/strength.rs	Assigns strength defaults for meta-types and adds strength ordering tests.
src/evaluator/offset/mod.rs	Adds `resolve_offset_with_base` for subroutine-relative absolute offsets; tests.
src/evaluator/mod.rs	Introduces `RuleEnvironment` and extends `EvaluationContext` with env/base_offset/indirect-reentry state.
src/error.rs	Adds reserved evaluation error variants for strict-mode unknown-name / indirect-without-env.
src/build_helpers.rs	Updates build helper to use `ParsedMagic.rules`.
build.rs	Updates builtin rule generation to use `ParsedMagic.rules`.
benches/evaluation_bench.rs	Adds benchmark for name/use subroutine dispatch overhead.
docs/src/parser.md	Documents meta-types and `ParsedMagic` breaking change.
docs/src/magic-format.md	Documents meta-type directives in the magic format guide.
docs/src/evaluator.md	Documents evaluator meta-type dispatch and RuleEnvironment threading.
docs/src/ast-structures.md	Documents `MetaType` and `TypeKind::Meta`.
docs/src/architecture.md	Updates architecture docs for name table and ParsedMagic pipeline.
docs/src/api-reference.md	Documents MetaType and breaking changes (ParsedMagic + printf substitution).
docs/MAGIC_FORMAT.md	Mirrors meta-type documentation in top-level docs.
docs/ARCHITECTURE.md	Updates repo architecture doc for name_table + root_rules/name_table fields.
docs/API_REFERENCE.md	Updates API reference doc with ParsedMagic and MetaType info.
docs/solutions/logic-errors/raii-scope-guards-for-evaluator-context-save-restore.md	Adds a postmortem/solution doc on RAII scope guards for context save/restore.
docs/solutions/integration-issues/meta-type-subroutine-dispatch-architecture.md	Adds an architecture pattern doc for name/use extraction + RuleEnvironment threading.
ROADMAP.md	Marks meta-types as completed (including `offset`).
GOTCHAS.md	Updates gotchas for meta-type dispatch, ParsedMagic return type, and printf substitution.
CHANGELOG.md	Notes meta-type feature landing and parser breaking change.
AGENTS.md	Updates contributor-facing implementation status for meta-types and formatting.
Cargo.toml	Updates authors metadata.
mise.lock	Updates tool lock entries (Rust/tooling versions and provenance fields).
.github/workflows/release.yml	Adjusts `actions/upload-artifact` version pins.

…PR review) Addresses PRRT...PrQ from the round-3 PR review: src/evaluator/engine/tests.rs grew to 3666 lines, well past the 500-600 line project guideline. Split into the three meta-type suites the reviewer called out. Structure: - src/evaluator/engine/tests/mod.rs (2777 lines): core evaluator tests (byte/short/long/string/pstring/relative-offset/recursion) plus centralized meta-type test helpers (make_context_with_env, use_rule, use_rule_at, build_name_table, default_rule, clear_rule, byte_eq_rule, indirect_rule, offset_rule) marked pub(super) so submodules can share them. - src/evaluator/engine/tests/meta_use_tests.rs (405 lines): MetaType::Use dispatch, subroutine name-table resolution, and subroutine base_offset biasing tests. - src/evaluator/engine/tests/meta_default_clear_indirect_tests.rs (275 lines): MetaType::Default fallback, MetaType::Clear state-reset, and MetaType::Indirect root re-entry tests. - src/evaluator/engine/tests/meta_offset_tests.rs (247 lines): MetaType::Offset dispatch and evaluate_children_or_warn helper coverage. No test content changed -- each test's body is byte-identical to the pre-split version. The three new submodules each have their own SPDX/copyright header and module doc. Test count unchanged (1364/1364 pass, 5 pre-existing skipped). Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

coderabbitai

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 9

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/src/architecture.md`:
- Around line 175-176: The documentation list in docs/src/architecture.md omits
MetaType::Offset; update the inline-dispatch bullet to include
`MetaType::Offset` alongside `MetaType::Default`, `MetaType::Clear`,
`MetaType::Use`, and `MetaType::Indirect` (i.e., mention that `evaluate_rules`
performs inline dispatch for `MetaType::Default`, `MetaType::Clear`,
`MetaType::Use`, `MetaType::Indirect`, and `MetaType::Offset`) so it matches the
evaluator behavior described by `mod.rs` (functions `evaluate_single_rule`,
`evaluate_rules`, `evaluate_rules_with_config`) and the implementation in the
`evaluate_rules` loop.

In `@docs/src/evaluator.md`:
- Around line 529-537: The examples call evaluate_rules(&parsed.rules, &buffer)
but the real API requires passing an &mut EvaluationContext; update each example
(including the blocks using parsed from parse_text_magic_file) to construct a
mutable EvaluationContext (e.g., new or default) and call evaluate_rules(&mut
ctx, &parsed.rules, &buffer) instead; ensure you reference the same
parsed.name_table/MagicDatabase handling as before and apply this change
consistently to all listed example blocks (lines ~529, 555, 575, 598, 609).
- Line 631: Update the checklist entry to include the missing meta-type
`offset`: edit the line currently listing meta-type directives (which mentions
`default`, `clear`, `name`/`use`, and `indirect`) and add `offset` (e.g.,
include `offset` in backticks alongside the other directives) so the implemented
meta-type set in the docs matches the PR.
- Around line 372-373: The documentation wrongly states that MetaType::Offset
performs printf-style substitution during evaluator dispatch; update the
description for MetaType::Offset to say evaluate_rules records the resolved
offset as a raw Value (Value::Uint(resolved_offset)) and stores the raw message,
and that printf-style substitution (e.g., replacing %lld, %d) is performed later
during output/message assembly (via the formatting stage such as
format_magic_message) rather than inside the evaluator; reference
MetaType::Offset, evaluate_rules, and format_magic_message in the updated text.

In `@src/evaluator/engine/tests/mod.rs`:
- Around line 2634-2779: Extract the shared test helpers (make_context_with_env,
use_rule, use_rule_at, build_name_table, default_rule, clear_rule, byte_eq_rule,
indirect_rule, offset_rule) out of the large tests mod into a new dedicated test
helper module (e.g. tests/helpers/meta.rs): move the function definitions into
that file with the same signatures and keep their pub(super) visibility, then in
the original mod.rs replace the moved code with a module declaration
(#[cfg(test)] mod helpers; or pub(super) mod helpers;) and re-export or
reference the helpers as needed (e.g. use self::helpers::* or pub(super) use
helpers::*;) so existing test submodules (meta_default_clear_indirect_tests,
meta_offset_tests, meta_use_tests) continue to compile without other changes.

In `@src/output/format.rs`:
- Around line 269-313: The Str and Char conversions currently either drop
width/alignment or use pad_numeric (which applies zero-padding); implement a
text-padding helper (e.g., pad_text) that takes the rendered string, the
existing FormatSpec, and pads with spaces only while honoring width and
left-alignment and ignoring the '0' flag, then replace Conv::Str to call
render_string(value) and feed that to pad_text (instead of returning raw string)
and replace Conv::Char to convert the byte to a single-character String
(char::from(byte).to_string()) and feed that to pad_text instead of pad_numeric;
keep numeric conversions using pad_numeric unchanged and add tests for "%5s",
"%-5s", and "%03c" to assert space-padding behavior.

In `@src/parser/grammar/tests/mod.rs`:
- Around line 2505-2679: The four tests (test_parse_magic_rule_meta_types,
test_parse_magic_rule_meta_name_use_reject_malformed_identifiers,
test_parse_text_magic_file_meta_roundtrip,
test_parse_text_magic_file_searchbug_fixture) should be extracted from the huge
tests file into a new focused test module for meta-type parsing; create a new
sibling test module (e.g., meta_types tests module), move these test functions
there, ensure the new module imports crate::parser::parse_text_magic_file and
parse_magic_rule and any enums (TypeKind, MetaType) used by the assertions, and
update the parent tests module to reference the new module (pub mod ...) so the
tests run; keep test logic unchanged but remove them from the original file.

In `@src/parser/name_table.rs`:
- Line 112: The warning printed by scrub_nested_names uses the passed parent
level for messaging, but the recursive calls currently pass rule.level + 1 which
misreports the actual parent level; update the two recursive call sites that
invoke scrub_nested_names to pass rule.level (not rule.level + 1) so the warning
text reflects the true parent level (e.g., change
scrub_nested_names(rule.children, rule.level + 1) to
scrub_nested_names(rule.children, rule.level) for both occurrences).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f3b40de6-4239-4197-9f34-1c303e9bc165

📥 Commits

Reviewing files that changed from the base of the PR and between 840ac7d and e9d20a0.

⛔ Files ignored due to path filters (3)

docs/ARCHITECTURE.md is excluded by none and included by none
docs/MAGIC_FORMAT.md is excluded by none and included by none
tests/compatibility_tests.rs is excluded by none and included by none

📒 Files selected for processing (14)

ROADMAP.md
docs/src/api-reference.md
docs/src/architecture.md
docs/src/evaluator.md
src/evaluator/engine/mod.rs
src/evaluator/engine/tests/meta_default_clear_indirect_tests.rs
src/evaluator/engine/tests/meta_offset_tests.rs
src/evaluator/engine/tests/meta_use_tests.rs
src/evaluator/engine/tests/mod.rs
src/evaluator/mod.rs
src/output/format.rs
src/parser/ast.rs
src/parser/grammar/tests/mod.rs
src/parser/name_table.rs

Sixteen unresolved review threads across correctness, performance, documentation, and test organization: - Evaluator: AnchorScope now saves/restores both last_match_end AND base_offset so `indirect` inside a `use` subroutine re-enters root rules with base_offset=0 (RFn1). - Offset resolution: resolve_offset_with_base switches to checked arithmetic and maps overflow to InvalidOffset rather than saturating into a spurious BufferOverrun (RU0). - MagicDatabase: drop the redundant rules Vec field; root_rules is the sole storage built via Arc::from(rules.into_boxed_slice()) to avoid Arc::from-slice clones (Vdy, VeD). - Format: Conv::Char uses pad_non_numeric (space-only) so %03c follows POSIX instead of zero-padding; Conv::Char doc now says full 0x00-0xff range via Latin-1 (VeM, RFn6). - Docs: MetaType::Offset added to architecture.md inline-dispatch list; evaluator.md dispatch wording separated from output formatting; evaluate_rules usage examples updated to pass &mut EvaluationContext; checklist bullet includes offset (RFnh, RFnl, RFnx, RFnz). - Grammar: meta-type directives reject attached operators (`default&0xf` no longer silently drops the mask) with a regression test (RUs). - Release workflow: dist-workspace.toml pin for actions/upload-artifact bumped from v7.0.0 to v7.0.1; release.yml regenerated via `dist generate` (RU7). - Property test: timeout doc comment rewritten to reflect that the test constructs an env-less context, so MetaType::Indirect/Use take the silent no-op path rather than recursing (VeW). - Name table: scrub_nested_names is now called with the actual parent level rather than parent.level + 1 (RFn-). - Test layout: engine test helpers moved to src/evaluator/engine/tests/helpers/meta.rs (RFn3); grammar meta-type parser tests moved to src/parser/grammar/tests/meta_types.rs (RFn8). Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

Copilot

Pull request overview

Implements GNU file-compatible evaluation for libmagic meta-type directives (default, clear, name, use, indirect, offset) and adds printf-style message substitution, wiring parse-time name-table extraction through MagicDatabase evaluation so real-world corpora (e.g. searchbug.magic) round-trip end-to-end.

Changes:

Introduces TypeKind::Meta(MetaType) + parse/load plumbing (ParsedMagic { rules, name_table }) and NameTable extraction/merging for name/use.
Wires evaluator dispatch for all meta-types (including indirect re-entry and subroutine base-offset) and applies printf-style formatting at output concatenation time.
Adds/updates unit + integration tests (parser, evaluator engine, corpus fixtures) and refreshes docs to describe the new architecture and breaking changes.

Reviewed changes

Copilot reviewed 52 out of 53 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/regex_search_corpus_tests.rs	Updates tests to destructure `ParsedMagic` from parser API.
tests/property_tests.rs	Extends generators + adds property test coverage for `TypeKind::Meta(...)` evaluation.
tests/property_tests.proptest-regressions	Adds saved proptest regression seeds for meta-type cases.
tests/parser_integration_tests.rs	Updates parser integration tests for `ParsedMagic` + adds name/use round-trip test.
tests/meta_types_integration.rs	Adds end-to-end smoke/integration tests for all meta-types + searchbug parity.
tests/directory_loading_tests.rs	Updates directory-loading tests to destructure `ParsedMagic`.
tests/compatibility_tests.rs	Adds a weaker “partial match” regression test around the searchbug fixture.
src/tests.rs	Adjusts built-in rules test to use `root_rules` storage.
src/parser/types.rs	Teaches keyword parsing/conversion about meta-types, and name/use suffix handling.
src/parser/name_table.rs	Adds load-time `name` extraction into a `NameTable` plus merge/sort utilities.
src/parser/mod.rs	Changes `parse_text_magic_file` to return `ParsedMagic { rules, name_table }`.
src/parser/loader.rs	Updates directory/file loaders to return `ParsedMagic` and merge name tables.
src/parser/grammar/tests/mod.rs	Adds/extends grammar tests (notably relative `&N` offset parsing).
src/parser/grammar/tests/meta_types.rs	Adds dedicated grammar tests for meta-type parsing + fixture parse check.
src/parser/grammar/mod.rs	Adds name/use identifier parsing, optional `x` stripping for meta messages, bare-string fallback, and relative offsets.
src/parser/codegen.rs	Serializes `TypeKind::Meta(...)` and adds identifier escaping regression tests.
src/parser/ast.rs	Introduces `MetaType` enum and `TypeKind::Meta(MetaType)` with docs + tests.
src/output/mod.rs	Exposes new `output::format` module.
src/lib.rs	Stores `root_rules` + `name_table` in `MagicDatabase`, threads `RuleEnvironment`, and applies printf substitution during message concatenation.
src/evaluator/types/mod.rs	Adjusts string-read semantics for pattern-length reads; treats meta-types as non-readable/zero-consume.
src/evaluator/strength.rs	Assigns strength contributions for meta-types + adds ordering tests.
src/evaluator/offset/mod.rs	Adds `resolve_offset_with_base` and tests for subroutine base-offset bias + overflow behavior.
src/evaluator/mod.rs	Adds `RuleEnvironment` and extends `EvaluationContext` with env/base_offset/indirect-reentry plumbing.
src/evaluator/engine/tests/mod.rs	Refactors test helpers into submodules and wires new meta-type test modules.
src/evaluator/engine/tests/meta_offset_tests.rs	Adds unit tests for `MetaType::Offset` dispatch + child evaluation behavior.
src/evaluator/engine/tests/meta_default_clear_indirect_tests.rs	Adds unit tests for `default`/`clear`/`indirect` semantics + recursion limit behavior.
src/evaluator/engine/tests/helpers/mod.rs	Adds shared helper module entry point for engine tests.
src/evaluator/engine/tests/helpers/meta.rs	Adds builders/helpers for meta-type rule construction and contexts with env.
src/error.rs	Adds reserved evaluation errors for strict-mode unknown-name/indirect-without-env cases.
src/build_helpers.rs	Updates build helper to use `ParsedMagic` (rules-only for codegen).
docs/src/parser.md	Documents meta-types and `ParsedMagic` breaking change.
docs/src/magic-format.md	Documents meta-types in the magic(5)-format guide.
docs/src/evaluator.md	Documents meta-type dispatch and updates examples for `EvaluationContext` usage.
docs/src/ast-structures.md	Documents `MetaType` / `TypeKind::Meta` AST structures and semantics.
docs/src/architecture.md	Updates architecture docs to include NameTable and `ParsedMagic` shape.
docs/src/api-reference.md	Updates API reference to include meta-types and printf substitution behavior.
docs/solutions/logic-errors/raii-scope-guards-for-evaluator-context-save-restore.md	Adds a postmortem/solution writeup for RAII scope guards and formatting pitfalls.
docs/solutions/integration-issues/meta-type-subroutine-dispatch-architecture.md	Adds architecture writeup for name-table extraction + env-threading pattern.
docs/MAGIC_FORMAT.md	Mirrors meta-type documentation in the top-level magic format doc.
docs/ARCHITECTURE.md	Updates architecture doc for new database internals (needs correction per review).
docs/API_REFERENCE.md	Updates API reference (needs correction per review).
dist-workspace.toml	Bumps `actions/upload-artifact` pinned version.
build.rs	Updates build-time parser call sites to `ParsedMagic`.
benches/evaluation_bench.rs	Adds benchmark for name/use subroutine dispatch overhead.
ROADMAP.md	Marks meta-types milestone complete (includes `offset`).
GOTCHAS.md	Adds/updates gotchas for meta-types, bare string fallback, and format substitution.
Cargo.toml	Updates crate authors list.
CHANGELOG.md	Documents meta-types feature and `ParsedMagic` breaking change.
AGENTS.md	Updates repo guidance with new meta-type + printf substitution support and module structure.

+    rules: Vec<MagicRule>,        // Parsed magic rules (top-level, strength-sorted)
+    name_table: Arc<NameTable>,   // `name`/`use` subroutine dispatch table (Arc for cheap clone across evaluations)
+    root_rules: Arc<[MagicRule]>, // Shared immutable slice of top-level rules for `indirect` re-entry


+
+| Field (Internal) | Type               | Description                                                       |
+| ---------------- | ------------------ | ----------------------------------------------------------------- |
+| `rules`          | `Vec<MagicRule>`   | Top-level magic rules                                             |


+    let (input, _) = space1(input)?;
+    let (after_id, id) =
+        take_while(|c: char| c.is_alphanumeric() || c == '_' || c == '-').parse(input)?;
+    if id.is_empty() {


  - `numbers.rs`: Numeric type parsing (decimal/hex, signed/unsigned)
  - `value.rs`: Value literal parsing (strings, floats, hex bytes)
- `mod.rs`: Parser interface, format detection, and hierarchical rule building (✅ Complete)
+- `name_table.rs`: Load-time extraction of `name <id>` subroutine blocks into a `HashMap<String, Vec<MagicRule>>` (the `NameTable` type).


coderabbitai

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/lib.rs (1)

248-256: ⚠️ Potential issue | 🟠 Major

Hoist builtin name blocks before storing the database.

load_from_file_with_config() now builds both root_rules and name_table, but the builtin path still hardcodes NameTable::empty(). If the generated builtin corpus contains any name/use directives, builtin mode will silently degrade: MetaType::Name stays in root_rules as a no-op and every use lookup misses.

Suggested fix

-        let mut rules = crate::builtin_rules::get_builtin_rules();
+        let rules = crate::builtin_rules::get_builtin_rules();
+        let (mut rules, mut name_table) = crate::parser::name_table::extract_name_table(rules);
         crate::evaluator::strength::sort_rules_by_strength_recursive(&mut rules);
+        name_table.sort_subroutines(|rules| {
+            crate::evaluator::strength::sort_rules_by_strength_recursive(rules);
+        });
         let root_rules: std::sync::Arc<[MagicRule]> =
             std::sync::Arc::from(rules.into_boxed_slice());
         Ok(Self {
-            name_table: std::sync::Arc::new(crate::parser::name_table::NameTable::empty()),
+            name_table: std::sync::Arc::new(name_table),
             root_rules,
             config,
             source_path: None,
             mime_mapper: mime::MimeMapper::new(),
         })

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/lib.rs` around lines 248 - 256, The builtin constructor currently sets
name_table to NameTable::empty(), so any builtin `name`/`use` directives are
ignored; update with_builtin_rules_and_config to reuse the same
hoisting/name-table construction used in load_from_file_with_config: after
computing root_rules (the MagicRule array) run the hoist/name-table extraction
logic (the code path that builds the parser::name_table and hoists `name`
blocks) and set self.name_table to that resulting NameTable Arc instead of
NameTable::empty(), ensuring root_rules reflect any hoisted MetaType::Name
changes and lookups via use succeed.

♻️ Duplicate comments (1)

src/output/format.rs (1)

267-270: ⚠️ Potential issue | 🟠 Major

Apply the non-numeric padding path to %s as well.

%c now honors width/alignment via pad_non_numeric, but %s still returns the raw rendered string, so %5s / %-5s ignore field width entirely. That leaves the formatter only half-fixed for text conversions.

Suggested fix

-        Conv::Str => Some(render_string(value)),
+        Conv::Str => Some(pad_non_numeric(&render_string(value), spec)),

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/output/format.rs` around lines 267 - 270, The Conv::Str branch in render
currently returns the raw rendered string (render_string(value)) so
width/alignment is ignored; update the Conv::Str match arm to pass the rendered
string through pad_non_numeric (the same way Conv::Char/%c does) using the
provided type_kind so formats like %5s and %-5s honor padding and alignment;
locate the render function and replace the Conv::Str return with a call that
applies pad_non_numeric(render_string(value), type_kind) (ensuring correct
ownership/borrowing to produce an Option<String>).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dist-workspace.toml`:
- Line 56: The workflow pins "actions/upload-artifact" to the mutable tag
"v7.0.1", which is insecure; locate the entry with the key
"actions/upload-artifact" and replace the tag value "v7.0.1" with the
corresponding immutable commit SHA (the full 40-char commit hash resolved from
the GitHub Actions marketplace or the action's repo) so the TOML contains the
exact commit SHA instead of the tag for reproducible CI.

In `@docs/src/evaluator.md`:
- Around line 374-393: The sequence diagram is incorrect: meta directives "Use"
and "Indirect" are handled inline in evaluate_rules, not routed through
evaluate_single_rule_with_anchor; update the diagram so that the arrows for Use
and Indirect go from ER (evaluate_rules) directly to the relevant components
(e.g., NameTable lookup for Use and cloning/using root_rules for Indirect) and
back to ER, remove or change any arrows that currently route through ESR
(evaluate_single_rule_with_anchor), and keep participants names (evaluate_rules,
evaluate_single_rule_with_anchor, NameTable, root_rules) so readers can trace
the correct inlined control flow.
- Line 520: The docs import is wrong: update the use statements so
evaluate_rules is imported from the evaluator module (e.g. use
libmagic_rs::evaluator::{evaluate_rules, EvaluationConfig, EvaluationContext};)
instead of from the crate root; change every occurrence that currently uses
libmagic_rs::{evaluate_rules, ...} (including the spots mentioned for lines 520,
548, 569, 590) to reference libmagic_rs::evaluator::evaluate_rules (and bring
EvaluationConfig and EvaluationContext into scope from the same evaluator
module) so the examples compile.

In `@src/evaluator/engine/mod.rs`:
- Around line 280-281: Update the helper contract comment that currently reads
"Returns `Ok((Some(absolute_offset), matches))` ..." to accurately state that
the function returns the subroutine's terminal anchor (e.g., "Returns
`Ok((Some(terminal_anchor), matches))`") rather than the resolved use-site
offset; mention that the returned terminal anchor is what callers use for
subsequent `&N` chaining and keep the `Ok((None, vec![]))` case unchanged,
ensuring the docstring for the helper in mod.rs reflects this behavior and
follows the project's documentation guidance for complex algorithm comments.

In `@src/evaluator/engine/tests/helpers/meta.rs`:
- Around line 69-138: Several tests repeat the same MagicRule struct literal;
extract the shared fields into a single constructor helper (e.g., a private fn
like build_magic_rule(offset: OffsetSpec, typ: TypeKind, op: Operator, value:
Value, message: String, children: Vec<MagicRule>) -> MagicRule) and have
default_rule, clear_rule, byte_eq_rule, indirect_rule, and offset_rule call that
helper with only the differing arguments (for byte_eq_rule provide
signed/type/value variations, for others pass Meta types and op). Update
references to MagicRule, OffsetSpec, TypeKind, Operator, Value, message and
children in the helper so future changes to the common fields need only one
update.

In `@src/parser/grammar/mod.rs`:
- Around line 198-208: The relative-offset branch currently accepts "&+-N"
because after stripping the '+' it still calls parse_number on a '-' prefix;
modify the '+' branch in the parser so that when rest.strip_prefix('+') returns
after_plus you first check that after_plus does not start with '-' and if it
does return a parse error (rather than calling parse_number), otherwise proceed
to call parse_number(after_plus); this change touches the branch that uses
rest.strip_prefix('+'), parse_number, multispace0 and returns
OffsetSpec::Relative.
- Around line 843-863: The branch for string-family types currently calls
parse_value first so bare tokens like "123" or "0x7f" get parsed as Uint/Bytes
instead of literal strings; change the logic in the is_string_family_type branch
to prefer parse_bare_string_value for unquoted single-token inputs (i.e., try
parse_bare_string_value(input) first and return that when it succeeds), falling
back to parse_value only if the bare-string parse fails, while keeping the
Operator::AnyValue path unchanged; update tests for
parse_value/parse_bare_string_value to include numeric-looking and hex-looking
bare tokens for TypeKind::String/PString/Regex/Search to ensure the rule matches
the literal text.

---

Outside diff comments:
In `@src/lib.rs`:
- Around line 248-256: The builtin constructor currently sets name_table to
NameTable::empty(), so any builtin `name`/`use` directives are ignored; update
with_builtin_rules_and_config to reuse the same hoisting/name-table construction
used in load_from_file_with_config: after computing root_rules (the MagicRule
array) run the hoist/name-table extraction logic (the code path that builds the
parser::name_table and hoists `name` blocks) and set self.name_table to that
resulting NameTable Arc instead of NameTable::empty(), ensuring root_rules
reflect any hoisted MetaType::Name changes and lookups via use succeed.

---

Duplicate comments:
In `@src/output/format.rs`:
- Around line 267-270: The Conv::Str branch in render currently returns the raw
rendered string (render_string(value)) so width/alignment is ignored; update the
Conv::Str match arm to pass the rendered string through pad_non_numeric (the
same way Conv::Char/%c does) using the provided type_kind so formats like %5s
and %-5s honor padding and alignment; locate the render function and replace the
Conv::Str return with a call that applies pad_non_numeric(render_string(value),
type_kind) (ensuring correct ownership/borrowing to produce an Option<String>).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 24bb0714-2ff5-4601-862c-6abf0150e97e

📥 Commits

Reviewing files that changed from the base of the PR and between e9d20a0 and 4c576c7.

⛔ Files ignored due to path filters (1)

tests/property_tests.rs is excluded by none and included by none

📒 Files selected for processing (16)

dist-workspace.toml
docs/src/architecture.md
docs/src/evaluator.md
src/evaluator/engine/mod.rs
src/evaluator/engine/tests/helpers/meta.rs
src/evaluator/engine/tests/helpers/mod.rs
src/evaluator/engine/tests/meta_default_clear_indirect_tests.rs
src/evaluator/engine/tests/mod.rs
src/evaluator/offset/mod.rs
src/lib.rs
src/output/format.rs
src/parser/grammar/mod.rs
src/parser/grammar/tests/meta_types.rs
src/parser/grammar/tests/mod.rs
src/parser/name_table.rs
src/tests.rs

coderabbitai · 2026-04-24T01:31:19Z

 "actions/download-artifact"       = "3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c"
 "actions/attest-build-provenance" = "a2bbfa25375fe432b6a289bc6b6cd05ecd0c4c32"
-"actions/upload-artifact"         = "v7.0.0"
+"actions/upload-artifact"         = "v7.0.1"


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail repo="actions/upload-artifact" tag="v7.0.1" ref_json="$(curl -fsSL "https://api.github.com/repos/${repo}/git/ref/tags/${tag}")" obj_type="$(jq -r '.object.type' <<<"$ref_json")" obj_sha="$(jq -r '.object.sha' <<<"$ref_json")" if [[ "$obj_type" == "tag" ]]; then commit_sha="$(curl -fsSL "https://api.github.com/repos/${repo}/git/tags/${obj_sha}" | jq -r '.object.sha')" else commit_sha="$obj_sha" fi echo "Resolved commit SHA for ${repo}@${tag}: ${commit_sha}"

Repository: EvilBit-Labs/libmagic-rs

Length of output: 164

Pin actions/upload-artifact to an immutable commit SHA, not a tag.

Using v7.0.1 leaves CI exposed to tag retargeting. Pin to the resolved commit SHA for reproducible and tamper-resistant builds.

Suggested change

-"actions/upload-artifact" = "v7.0.1" +"actions/upload-artifact" = "043fb46d1a93c77aae656e7c1c64a875d1fc6a0a"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@dist-workspace.toml` at line 56, The workflow pins "actions/upload-artifact" to the mutable tag "v7.0.1", which is insecure; locate the entry with the key "actions/upload-artifact" and replace the tag value "v7.0.1" with the corresponding immutable commit SHA (the full 40-char commit hash resolved from the GitHub Actions marketplace or the action's repo) so the TOML contains the exact commit SHA instead of the tag for reproducible CI.

coderabbitai · 2026-04-24T01:31:19Z

+```mermaid
+sequenceDiagram
+    participant ER as evaluate_rules
+    participant ESR as evaluate_single_rule_with_anchor
+    participant NT as NameTable
+    participant RR as root_rules
+
+    ER->>ESR: rule (Use "part2")
+    ESR->>NT: lookup("part2")
+    NT-->>ESR: Vec<MagicRule> (subroutine)
+    ESR->>ER: evaluate_rules(subroutine, buffer, ctx)
+    ER-->>ESR: subroutine matches
+    ESR-->>ER: Ok(Some(offset, value)) + merged matches
+
+    ER->>ESR: rule (Indirect)
+    ESR->>RR: clone root_rules
+    ESR->>ER: evaluate_rules(root_rules, sub_buffer, ctx)
+    ER-->>ESR: inner matches
+    ESR-->>ER: Ok(Some(offset, value)) + merged matches
+```


⚠️ Potential issue | 🟡 Minor

Sequence diagram still shows the old dispatcher.

The prose above correctly says meta directives are handled inline in evaluate_rules, but this diagram routes Use and Indirect through evaluate_single_rule_with_anchor. That will send readers to the wrong place when tracing the control flow.

As per coding guidelines: "docs/**: Review documentation quality, completeness, and accuracy. Ensure comprehensive API documentation, usage examples, and migration guides."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/src/evaluator.md` around lines 374 - 393, The sequence diagram is incorrect: meta directives "Use" and "Indirect" are handled inline in evaluate_rules, not routed through evaluate_single_rule_with_anchor; update the diagram so that the arrows for Use and Indirect go from ER (evaluate_rules) directly to the relevant components (e.g., NameTable lookup for Use and cloning/using root_rules for Indirect) and back to ER, remove or change any arrows that currently route through ESR (evaluate_single_rule_with_anchor), and keep participants names (evaluate_rules, evaluate_single_rule_with_anchor, NameTable, root_rules) so readers can trace the correct inlined control flow.

coderabbitai · 2026-04-24T01:31:19Z


 ```rust
-use libmagic_rs::{evaluate_rules, EvaluationConfig};
+use libmagic_rs::{evaluate_rules, EvaluationConfig, EvaluationContext};


⚠️ Potential issue | 🟠 Major

Import evaluate_rules from libmagic_rs::evaluator.

These snippets currently use use libmagic_rs::{evaluate_rules, ...};, but the crate root does not re-export evaluate_rules. As written, the examples will not compile.

Suggested fix pattern

-use libmagic_rs::{evaluate_rules, EvaluationConfig, EvaluationContext}; +use libmagic_rs::{EvaluationConfig, EvaluationContext}; +use libmagic_rs::evaluator::evaluate_rules;

As per coding guidelines: "docs/**: Review documentation quality, completeness, and accuracy. Ensure comprehensive API documentation, usage examples, and migration guides."

Also applies to: 548-548, 569-569, 590-590

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/src/evaluator.md` at line 520, The docs import is wrong: update the use statements so evaluate_rules is imported from the evaluator module (e.g. use libmagic_rs::evaluator::{evaluate_rules, EvaluationConfig, EvaluationContext};) instead of from the crate root; change every occurrence that currently uses libmagic_rs::{evaluate_rules, ...} (including the spots mentioned for lines 520, 548, 569, 590) to reference libmagic_rs::evaluator::evaluate_rules (and bring EvaluationConfig and EvaluationContext into scope from the same evaluator module) so the examples compile.

coderabbitai · 2026-04-24T01:31:20Z

+/// Returns `Ok((Some(absolute_offset), matches))` on a successful resolution
+/// (even if the subroutine produced no matches), or `Ok((None, vec![]))`


⚠️ Potential issue | 🟡 Minor

Update the helper contract to say it returns the terminal anchor.

The implementation below no longer returns the resolved use-site offset; it returns the subroutine's terminal anchor, and the caller relies on that for subsequent &N chaining. The stale Some(absolute_offset) wording is now misleading.

As per coding guidelines: "src/**/*.rs: Document magic file syntax support, libmagic compatibility, design decisions, and trade-offs in code comments for complex algorithms."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/evaluator/engine/mod.rs` around lines 280 - 281, Update the helper contract comment that currently reads "Returns `Ok((Some(absolute_offset), matches))` ..." to accurately state that the function returns the subroutine's terminal anchor (e.g., "Returns `Ok((Some(terminal_anchor), matches))`") rather than the resolved use-site offset; mention that the returned terminal anchor is what callers use for subsequent `&N` chaining and keep the `Ok((None, vec![]))` case unchanged, ensuring the docstring for the helper in mod.rs reflects this behavior and follows the project's documentation guidance for complex algorithm comments.

coderabbitai · 2026-04-24T01:31:20Z

+pub fn default_rule(message: &str, children: Vec<MagicRule>) -> MagicRule {
+    MagicRule {
+        offset: OffsetSpec::Absolute(0),
+        typ: TypeKind::Meta(MetaType::Default),
+        op: Operator::Equal,
+        value: Value::Uint(0),
+        message: message.to_string(),
+        children,
+        level: 0,
+        strength_modifier: None,
+    }
+}
+
+/// Build a `Clear` rule. Carries no message in the magic file syntax, but the
+/// AST requires a message field.
+pub fn clear_rule() -> MagicRule {
+    MagicRule {
+        offset: OffsetSpec::Absolute(0),
+        typ: TypeKind::Meta(MetaType::Clear),
+        op: Operator::Equal,
+        value: Value::Uint(0),
+        message: String::new(),
+        children: vec![],
+        level: 0,
+        strength_modifier: None,
+    }
+}
+
+/// Build a single byte-equality rule at `offset` for `value`.
+pub fn byte_eq_rule(offset: i64, value: u64, message: &str) -> MagicRule {
+    MagicRule {
+        offset: OffsetSpec::Absolute(offset),
+        typ: TypeKind::Byte { signed: false },
+        op: Operator::Equal,
+        value: Value::Uint(value),
+        message: message.to_string(),
+        children: vec![],
+        level: 0,
+        strength_modifier: None,
+    }
+}
+
+/// Build an `Indirect` rule at `offset` with optional children.
+pub fn indirect_rule(offset: i64, message: &str, children: Vec<MagicRule>) -> MagicRule {
+    MagicRule {
+        offset: OffsetSpec::Absolute(offset),
+        typ: TypeKind::Meta(MetaType::Indirect),
+        op: Operator::Equal,
+        value: Value::Uint(0),
+        message: message.to_string(),
+        children,
+        level: 0,
+        strength_modifier: None,
+    }
+}
+
+/// Build an `Offset` rule at `offset` with an `x` (`AnyValue`) operator and
+/// the given message. Mirrors `default_rule`/`indirect_rule` helpers.
+pub fn offset_rule(offset: i64, message: &str, children: Vec<MagicRule>) -> MagicRule {
+    MagicRule {
+        offset: OffsetSpec::Absolute(offset),
+        typ: TypeKind::Meta(MetaType::Offset),
+        op: Operator::AnyValue,
+        value: Value::Uint(0),
+        message: message.to_string(),
+        children,
+        level: 0,
+        strength_modifier: None,
+    }
+}


🧹 Nitpick | 🔵 Trivial

Consider collapsing repeated MagicRule builder boilerplate.

The repeated literals are consistent today, but centralizing shared fields would reduce maintenance risk if MagicRule changes again.

Refactor sketch

+fn build_meta_rule( + offset: i64, + meta: MetaType, + op: Operator, + message: &str, + children: Vec<MagicRule>, +) -> MagicRule { + MagicRule { + offset: OffsetSpec::Absolute(offset), + typ: TypeKind::Meta(meta), + op, + value: Value::Uint(0), + message: message.to_string(), + children, + level: 0, + strength_modifier: None, + } +} + pub fn default_rule(message: &str, children: Vec<MagicRule>) -> MagicRule { - MagicRule { - offset: OffsetSpec::Absolute(0), - typ: TypeKind::Meta(MetaType::Default), - op: Operator::Equal, - value: Value::Uint(0), - message: message.to_string(), - children, - level: 0, - strength_modifier: None, - } + build_meta_rule(0, MetaType::Default, Operator::Equal, message, children) }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/evaluator/engine/tests/helpers/meta.rs` around lines 69 - 138, Several tests repeat the same MagicRule struct literal; extract the shared fields into a single constructor helper (e.g., a private fn like build_magic_rule(offset: OffsetSpec, typ: TypeKind, op: Operator, value: Value, message: String, children: Vec<MagicRule>) -> MagicRule) and have default_rule, clear_rule, byte_eq_rule, indirect_rule, and offset_rule call that helper with only the differing arguments (for byte_eq_rule provide signed/type/value variations, for others pass Meta types and op). Update references to MagicRule, OffsetSpec, TypeKind, Operator, Value, message and children in the helper so future changes to the common fields need only one update.

coderabbitai · 2026-04-24T01:31:20Z

+    } else if let Some(rest) = input.strip_prefix('&') {
+        // Relative offset: `&N`, `&+N`, or `&-N`. `parse_number` handles the
+        // bare and `-`-prefixed cases natively; `+` is consumed manually
+        // (see the indirect-offset adjustment parser for the same pattern).
+        let (rest, value) = if let Some(after_plus) = rest.strip_prefix('+') {
+            parse_number(after_plus)?
+        } else {
+            parse_number(rest)?
+        };
+        let (rest, _) = multispace0(rest)?;
+        Ok((rest, OffsetSpec::Relative(value)))


⚠️ Potential issue | 🟡 Minor

Reject &+-N in the relative-offset parser.

After the explicit + is stripped, this branch still hands the remainder to parse_number, so &+-4 currently parses as OffsetSpec::Relative(-4) instead of failing. That lets malformed magic syntax through silently.

Proposed fix

} else if let Some(rest) = input.strip_prefix('&') { // Relative offset: `&N`, `&+N`, or `&-N`. `parse_number` handles the // bare and `-`-prefixed cases natively; `+` is consumed manually // (see the indirect-offset adjustment parser for the same pattern). let (rest, value) = if let Some(after_plus) = rest.strip_prefix('+') { + if after_plus.starts_with(['+', '-']) { + return Err(nom::Err::Error(NomError::new( + after_plus, + nom::error::ErrorKind::Verify, + ))); + } parse_number(after_plus)? } else { parse_number(rest)? };

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/parser/grammar/mod.rs` around lines 198 - 208, The relative-offset branch currently accepts "&+-N" because after stripping the '+' it still calls parse_number on a '-' prefix; modify the '+' branch in the parser so that when rest.strip_prefix('+') returns after_plus you first check that after_plus does not start with '-' and if it does return a parse error (rather than calling parse_number), otherwise proceed to call parse_number(after_plus); this change touches the branch that uses rest.strip_prefix('+'), parse_number, multispace0 and returns OffsetSpec::Relative.

coderabbitai · 2026-04-24T01:31:20Z

+    // For AnyValue (`x`), no operand is needed -- treat remaining text as message.
+    // For string-family types, fall back to a bare (unquoted) single-token
+    // literal if the strict `parse_value` alternatives all fail. magic(5)
+    // syntax permits writing `string TEST` or `search/12 ABC` without
+    // surrounding quotes, and this fallback supports that form without
+    // relaxing value parsing for non-string types (where `xyz` must
+    // still be rejected -- see `test_parse_value_invalid_input`).
+    let is_string_family_type = matches!(
+        typ,
+        TypeKind::String { .. }
+            | TypeKind::PString { .. }
+            | TypeKind::Regex { .. }
+            | TypeKind::Search { .. }
+    );
    let (input, value) = if op == Operator::AnyValue {
        (input, Value::Uint(0))
+    } else if is_string_family_type {
+        match parse_value(input) {
+            Ok(ok) => ok,
+            Err(orig_err) => parse_bare_string_value(input).map_err(|_| orig_err)?,
+        }


⚠️ Potential issue | 🟠 Major

String-family bare operands still lose to numeric/hex parsing.

This only falls back to parse_bare_string_value when parse_value fails, so bare operands like string 123, string 7f, or search/16 0x10 are still parsed as Uint/Bytes instead of literal text. Those are valid unquoted string/pattern operands, so the rule will never match as intended.

Proposed fix

- } else if is_string_family_type { - match parse_value(input) { - Ok(ok) => ok, - Err(orig_err) => parse_bare_string_value(input).map_err(|_| orig_err)?, - } + } else if is_string_family_type { + let trimmed = input.trim_start(); + if trimmed.starts_with('"') || trimmed.starts_with('\\') { + parse_value(input)? + } else { + parse_bare_string_value(input)? + } } else { parse_value(input)? };

Please add regression cases for numeric-looking and hex-looking bare tokens once this branch is tightened. As per coding guidelines: “src/parser/**: Focus on magic file DSL parsing ... and robust parsing logic.”

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// For AnyValue (`x`), no operand is needed -- treat remaining text as message.

// For string-family types, fall back to a bare (unquoted) single-token

// literal if the strict `parse_value` alternatives all fail. magic(5)

// syntax permits writing `string TEST` or `search/12 ABC` without

// surrounding quotes, and this fallback supports that form without

// relaxing value parsing for non-string types (where `xyz` must

// still be rejected -- see `test_parse_value_invalid_input`).

let is_string_family_type = matches!(

typ,

TypeKind::String { .. }

| TypeKind::PString { .. }

| TypeKind::Regex { .. }

| TypeKind::Search { .. }

);

let (input, value) = if op == Operator::AnyValue {

(input, Value::Uint(0))

} else if is_string_family_type {

match parse_value(input) {

Ok(ok) => ok,

Err(orig_err) => parse_bare_string_value(input).map_err(|_| orig_err)?,

}

// For AnyValue (`x`), no operand is needed -- treat remaining text as message.

// For string-family types, fall back to a bare (unquoted) single-token

// literal if the strict `parse_value` alternatives all fail. magic(5)

// syntax permits writing `string TEST` or `search/12 ABC` without

// surrounding quotes, and this fallback supports that form without

// relaxing value parsing for non-string types (where `xyz` must

// still be rejected -- see `test_parse_value_invalid_input`).

let is_string_family_type = matches!(

typ,

TypeKind::String { .. }

| TypeKind::PString { .. }

| TypeKind::Regex { .. }

| TypeKind::Search { .. }

);

let (input, value) = if op == Operator::AnyValue {

(input, Value::Uint(0))

} else if is_string_family_type {

let trimmed = input.trim_start();

if trimmed.starts_with('"') || trimmed.starts_with('\\') {

parse_value(input)?

} else {

parse_bare_string_value(input)?

}

} else {

parse_value(input)?

};

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/parser/grammar/mod.rs` around lines 843 - 863, The branch for string-family types currently calls parse_value first so bare tokens like "123" or "0x7f" get parsed as Uint/Bytes instead of literal strings; change the logic in the is_string_family_type branch to prefer parse_bare_string_value for unquoted single-token inputs (i.e., try parse_bare_string_value(input) first and return that when it succeeds), falling back to parse_value only if the bare-string parse fails, while keeping the Operator::AnyValue path unchanged; update tests for parse_value/parse_bare_string_value to include numeric-looking and hex-looking bare tokens for TypeKind::String/PString/Regex/Search to ensure the rule matches the literal text.

## 🤖 New release * `libmagic-rs`: 0.5.0 -> 0.6.0 (⚠ API breaking changes) ### ⚠ `libmagic-rs` breaking changes ```text --- failure constructible_struct_adds_field: externally-constructible struct adds field --- Description: A pub struct constructible with a struct literal has a new pub field. Existing struct literals must be updated to include the new field. ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/constructible_struct_adds_field.ron Failed in: field MagicRule.value_transform in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:1189 field MagicRule.value_transform in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:1189 field MagicRule.value_transform in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:1189 --- failure copy_impl_added: type now implements Copy --- Description: A public type now implements Copy, causing non-move closures to capture it by reference instead of moving it. ref: rust-lang/rust#100905 impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/copy_impl_added.ron Failed in: libmagic_rs::mime::MimeMapper in /tmp/.tmpwFvgw1/libmagic-rs/src/mime.rs:98 --- failure enum_marked_non_exhaustive: enum marked #[non_exhaustive] --- Description: A public enum has been marked #[non_exhaustive]. Pattern-matching on it outside of its crate must now include a wildcard pattern like `_`, or it will fail to compile. ref: https://doc.rust-lang.org/cargo/reference/semver.html#attr-adding-non-exhaustive impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/enum_marked_non_exhaustive.ron Failed in: enum OffsetSpec in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:198 enum OffsetSpec in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:198 enum OffsetSpec in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:198 enum LibmagicError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:15 enum LibmagicError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:15 enum IoError in /tmp/.tmpwFvgw1/libmagic-rs/src/io/mod.rs:26 enum Operator in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:838 enum Operator in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:838 enum Operator in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:838 enum TypeReadError in /tmp/.tmpwFvgw1/libmagic-rs/src/evaluator/types/mod.rs:56 enum ParseError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:74 enum ParseError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:74 enum Value in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:965 enum Value in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:965 enum Value in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:965 enum TypeKind in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:398 enum TypeKind in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:398 enum TypeKind in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:398 enum EvaluationError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:148 enum EvaluationError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:148 --- failure enum_struct_variant_field_added: pub enum struct variant field added --- Description: An enum's exhaustive struct variant has a new field, which has to be included when constructing or matching on this variant. ref: https://doc.rust-lang.org/reference/attributes/type_system.html#the-non_exhaustive-attribute impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/enum_struct_variant_field_added.ron Failed in: field base_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:251 field adjustment_op of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:266 field result_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:272 field base_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:251 field adjustment_op of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:266 field result_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:272 field base_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:251 field adjustment_op of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:266 field result_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:272 --- failure function_missing: pub fn removed or renamed --- Description: A publicly-visible function cannot be imported by its prior path. A `pub use` may have been removed, or the function itself may have been renamed or removed entirely. ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/function_missing.ron Failed in: function libmagic_rs::parser::grammar::is_empty_line, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:1025 function libmagic_rs::parser::grammar::parse_strength_directive, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:846 function libmagic_rs::parser::grammar::parse_type_and_operator, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:683 function libmagic_rs::parser::grammar::parse_offset, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:179 function libmagic_rs::parser::parse_offset, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:179 function libmagic_rs::parser::grammar::parse_comment, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:1004 function libmagic_rs::parser::grammar::parse_message, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:810 function libmagic_rs::parser::grammar::parse_value, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:633 function libmagic_rs::parser::grammar::parse_number, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:133 function libmagic_rs::parser::parse_number, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:133 function libmagic_rs::parser::grammar::has_continuation, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:1060 function libmagic_rs::parser::grammar::parse_magic_rule, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:946 function libmagic_rs::parser::grammar::parse_rule_offset, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:779 function libmagic_rs::parser::grammar::is_comment_line, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:1042 function libmagic_rs::parser::grammar::is_strength_directive, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:902 function libmagic_rs::parser::grammar::parse_type, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:749 function libmagic_rs::parser::grammar::parse_operator, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:227 --- failure function_parameter_count_changed: pub fn parameter count changed --- Description: A publicly-visible function now takes a different number of parameters. ref: https://doc.rust-lang.org/cargo/reference/semver.html#fn-change-arity impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/function_parameter_count_changed.ron Failed in: libmagic_rs::evaluator::evaluate_single_rule now takes 3 parameters instead of 2, in /tmp/.tmpwFvgw1/libmagic-rs/src/evaluator/engine/mod.rs:196 --- failure inherent_method_missing: pub method removed or renamed --- Description: A publicly-visible method or associated fn is no longer available under its prior name. It may have been renamed or removed entirely. ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/inherent_method_missing.ron Failed in: FileBuffer::create_symlink, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/io/mod.rs:326 EvaluationContext::increment_recursion_depth, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/evaluator/mod.rs:114 EvaluationContext::decrement_recursion_depth, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/evaluator/mod.rs:130 EvaluationContext::increment_recursion_depth, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/evaluator/mod.rs:114 EvaluationContext::decrement_recursion_depth, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/evaluator/mod.rs:130 --- failure module_missing: pub module removed or renamed --- Description: A publicly-visible module cannot be imported by its prior path. A `pub use` may have been removed, or the module may have been renamed, removed, or made non-public. ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/module_missing.ron Failed in: mod libmagic_rs::parser::grammar, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:4 --- failure struct_marked_non_exhaustive: struct marked #[non_exhaustive] --- Description: A public struct has been marked #[non_exhaustive], which will prevent it from being constructed using a struct literal outside of its crate. It previously had no private fields, so a struct literal could be used to construct it outside its crate. ref: https://doc.rust-lang.org/cargo/reference/semver.html#attr-adding-non-exhaustive impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/struct_marked_non_exhaustive.ron Failed in: struct EvaluationConfig in /tmp/.tmpwFvgw1/libmagic-rs/src/config.rs:42 ``` <details><summary>Changelog</summary> <blockquote> ## [0.6.0] - 2026-04-25 ### Features - **parser**: Add Date and QDate types with serialization support ([#165](#165)) - **parser**: Implement pstring (Pascal string) type ([#170](#170)) - **parser**: Implement pstring multi-byte length prefix variants (/B, /H, /h, /L, /l, /J) ([#183](#183)) - **evaluator**: Add debug-level tracing for skipped rules ([#184](#184)) - **evaluator**: Implement indirect offset resolution ([#37](#37)) ([#199](#199)) - **evaluator**: Implement relative offset resolution ([#38](#38)) ([#211](#211)) - **deps**: Add new skills to actionbook/rust-skills and trailofbits/skills - **evaluator**: Regex and search types (closes #39) ([#214](#214)) - Implement libmagic meta-type directives and format substitution ([#42](#42)) ([#230](#230)) ### Bug Fixes - **regex**: PR #214 follow-up review findings ([#215](#215)) - Load and correctly evaluate /usr/share/file/magic/filesystems and adjacent magic files ([#233](#233)) ### Documentation - **gotchas**: Clarify requirements for adding TypeKind variants ### Miscellaneous Tasks - Rename .coderabbitai.yaml to .coderabbit.yaml - **Mergify**: Configuration update ([#173](#173)) - Update .gitignore to exclude local AI assistant files - **mergify**: Upgrade configuration to current format ([#205](#205)) - Resolve all pending TODO items ([#212](#212)) - **mergify**: Upgrade configuration to current format ([#231](#231))  ### Security - **io**: Close TOCTOU race in `FileBuffer::new` metadata validation (CWE-367). `validate_file_metadata` now uses `File::metadata()` on the open descriptor instead of re-canonicalizing the path, so an attacker cannot swap the path between `open_file` and validation. Error paths now report the caller-supplied path rather than the canonicalized variant. - **cli**: Remove relative-path fallbacks from `default_magic_file_path` (CWE-426). `./missing.magic`, `./third_party/magic.mgc`, and the `CI`/`GITHUB_ACTIONS` env-var branch no longer resolve against the process cwd. CI pipelines must pass `--magic <path>` explicitly. - **evaluator**: `build_regex` now bounds `size_limit` and `dfa_size_limit` to 1 MiB (`REGEX_COMPILE_SIZE_LIMIT`) to reject compile-time DoS patterns (CWE-1333) from adversarial magic files. ### Features - **parser**: Implement meta-type directives: `name`/`use` subroutines, `default`/`clear` per-level fallback, and `indirect` re-evaluation. `parse_text_magic_file` now returns `ParsedMagic { rules, name_table }` (breaking change from `Vec<MagicRule>`). Named subroutines are hoisted into `NameTable` at load time and dispatched via `RuleEnvironment` in the evaluator. Recursion is bounded by `EvaluationConfig::max_recursion_depth`. Resolves [#42](#42). - **evaluator**: Thread-local regex compile cache eliminates the double-compile paid by every successful regex match. `regex_bytes_consumed` now reuses the compiled `Regex` from `read_regex` instead of recompiling the pattern to derive the anchor advance. The cache is reset at the start of every `evaluate_rules_with_config` call, bounding memory to one evaluation. - **config**: `EvaluationConfig` is now `#[non_exhaustive]`; new builder-style setters (`with_max_recursion_depth`, `with_max_string_length`, `with_stop_at_first_match`, `with_mime_types`, `with_timeout_ms`) let external crates construct configurations without struct literals. - **parser**: `MagicRule::new()` smart constructor with `::with_children()`, `::with_strength_modifier()`, `::with_level()` builder methods and a `::validate()` method enforcing structural invariants (non-empty message, `level <= MAX_LEVEL`, children nested strictly deeper than parent). New `MagicRuleValidationError` error type. - **parser**: `RegexFlags::with_case_insensitive()` and `::with_start_offset()` builder methods. ### Refactor - **engine**: Extract `evaluate_pattern_rule()` and `evaluate_value_rule()` helpers from `evaluate_single_rule_with_anchor`'s 90-line body. Dispatch is now a two-arm type-category split; each helper has focused rustdoc on semantics and invariants. - **types**: Replace the `_ =>` catch-all in `bytes_consumed_with_pattern` with an explicit listing of the fixed-width `TypeKind` variants. Adding a new variable-width variant without updating this match is now a compile error instead of a silent relative-offset anchor corruption in release builds. - **parser**: Split the 185-line `type_keyword_to_kind` match into per-family helpers (`byte_family`, `short_family`, `long_family`, `quad_family`, `float_family`, `double_family`, `date_family`, `qdate_family`, `string_family`). Drops the `#[allow(clippy::too_many_lines)]` attribute. - **main**: `main()` returns `std::process::ExitCode` instead of calling `process::exit`, so destructors run on the happy path. Ctrl-C `AtomicBool` flag uses `Ordering::Relaxed` instead of `SeqCst`. - **grammar**: `parse_strength_directive` uses nom 8's `preceded` + `Parser::map` instead of the legacy `map(pair(char(...), parse_number), |(_, n)| ...)` pattern. - **output**: Add `#[serde(skip_serializing_if = "Option::is_none", default)]` to public `Option<T>` fields so JSON output no longer emits `"field": null` for unset optional values. ### Documentation - **lib**: Add `# Security` sections to `MagicDatabase::with_builtin_rules`, `::with_builtin_rules_and_config`, `::load_from_file`, and `::load_from_file_with_config` warning about the unbounded default timeout and recommending `EvaluationConfig::performance()` for untrusted input. - **lib**: Document `MagicDatabase: Send + Sync` for parallel scanning. - **README**: Update `TypeKind` enum example to match the current AST, add `regex` and `search/N` to the supported types table, add pre-1.0 API stability warning, correct the roadmap to mark v0.2-v0.4 as shipped. - **AGENTS.md**: Relabel "Currently Implemented (v0.1.0)" and "Current Limitations (v0.1.0)" to v0.5.0 and rewrite the Development Phases section to reflect actual shipped scope. ### Testing - Security regression tests for S-H1 (planted-magic-file in cwd), S-H2 (TOCTOU path-swap contract), S-M2 (pathological regex bounded runtime), S-L2 (codegen message escape round-trip), and GOTCHAS S13.1 (`EvaluationConfig::default()` unbounded timeout invariant). - Backspace message concatenation regression tests for first-match, consecutive, and empty-rest edge cases. - `MagicRule::validate()` tests covering empty message, child level invariant, and max-depth rejection. - `RegexCache` population/clear/reuse tests. ### Breaking Changes - **parser**: `parse_text_magic_file` return type changed from `Result<Vec<MagicRule>, ParseError>` to `Result<ParsedMagic, ParseError>`. Callers must destructure `ParsedMagic { rules, name_table }`. Low-level callers that only need the rule list can use `parsed.rules`. `load_magic_file` and `load_magic_directory` return the same new type. </blockquote> </details> --- This PR was generated with [release-plz](https://github.com/release-plz/release-plz/). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

unclesp1d3r added 6 commits April 21, 2026 23:27

chore(lib): remove duplicate author entry in Cargo.toml

0c3b398

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

chore(lib): add provenance information for actionlint tools

2e5a801

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

refactor(parser): reorganize grammar and types modules into submodules

d155a3b

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

docs(agents): update magic file compatibility and limitations for v0.5.x

6f8ae2c

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

docs(gotchas): document requirements for adding MetaType variants

33b6fc8

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

Copilot AI review requested due to automatic review settings April 22, 2026 23:30

unclesp1d3r linked an issue Apr 22, 2026 that may be closed by this pull request

Parser: implement default, clear, name, use, and indirect meta-types #42

Closed

7 tasks

dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Apr 22, 2026

Copilot started reviewing on behalf of unclesp1d3r April 22, 2026 23:31 View session

dosubot Bot added enhancement New feature or request evaluator Rule evaluation engine and logic output Result formatting and output generation parser Magic file parsing components and grammar labels Apr 22, 2026

coderabbitai Bot added the testing Test infrastructure and coverage label Apr 22, 2026

Copilot AI reviewed Apr 22, 2026

View reviewed changes

docs: Dosu updates for PR #230

f2647f0

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

coderabbitai Bot removed parser Magic file parsing components and grammar evaluator Rule evaluation engine and logic output Result formatting and output generation testing Test infrastructure and coverage labels Apr 22, 2026

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread docs/src/api-reference.md Outdated

Comment thread docs/src/api-reference.md Outdated

Comment thread docs/src/api-reference.md Outdated

Copilot AI review requested due to automatic review settings April 23, 2026 01:16

Copilot started reviewing on behalf of unclesp1d3r April 23, 2026 01:17 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Comment thread src/evaluator/mod.rs Outdated

Comment thread src/parser/grammar/mod.rs

Comment thread src/evaluator/offset/mod.rs Outdated

Comment thread .github/workflows/release.yml

coderabbitai Bot previously approved these changes Apr 23, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings April 23, 2026 03:33

unclesp1d3r dismissed coderabbitai[bot]’s stale review via 5efb0de April 23, 2026 03:33

coderabbitai Bot previously approved these changes Apr 23, 2026

View reviewed changes

Copilot started reviewing on behalf of unclesp1d3r April 23, 2026 03:34 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Comment thread src/lib.rs Outdated

Comment thread src/lib.rs Outdated

Comment thread src/output/format.rs Outdated

Comment thread tests/property_tests.rs

unclesp1d3r dismissed coderabbitai[bot]’s stale review via e9d20a0 April 23, 2026 04:35

coderabbitai Bot removed parser Magic file parsing components and grammar evaluator Rule evaluation engine and logic output Result formatting and output generation compatibility libmagic compatibility and migration testing Test infrastructure and coverage labels Apr 24, 2026

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings April 24, 2026 01:21

Copilot started reviewing on behalf of unclesp1d3r April 24, 2026 01:21 View session

coderabbitai Bot added parser Magic file parsing components and grammar evaluator Rule evaluation engine and logic output Result formatting and output generation testing Test infrastructure and coverage labels Apr 24, 2026

Copilot AI reviewed Apr 24, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

unclesp1d3r merged commit 95fe971 into main Apr 24, 2026
34 checks passed

unclesp1d3r deleted the 42-parser-implement-default-clear-name-use-and-indirect-meta-types branch April 24, 2026 02:35

github-actions Bot mentioned this pull request Apr 25, 2026

chore: release v0.6.0 #240

Merged

		/// Returns `Ok((Some(absolute_offset), matches))` on a successful resolution
		/// (even if the subroutine produced no matches), or `Ok((None, vec![]))`

Uh oh!

Conversation

unclesp1d3r commented Apr 22, 2026

Summary

What changed

Test Plan

Residual items (separate follow-up PRs)

Compatibility impact

Uh oh!

mergify Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Protections

🟢 Enforce conventional commit

🟢 Full CI must pass

🟢 Do not merge outdated PRs

Uh oh!

coderabbitai Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Review ran into problems

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dosubot Bot commented Apr 22, 2026

api-reference /libmagic-rs/blob/main/docs/src/api-reference.md — ⏳ Awaiting Merge

API_REFERENCE /libmagic-rs/blob/main/docs/API_REFERENCE.md — ⏳ Awaiting Merge

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

mergify Bot commented Apr 22, 2026 •

edited

Loading

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading

codecov Bot commented Apr 22, 2026 •

edited

Loading

api-reference `/libmagic-rs/blob/main/docs/src/api-reference.md` — ⏳ Awaiting Merge

API_REFERENCE `/libmagic-rs/blob/main/docs/API_REFERENCE.md` — ⏳ Awaiting Merge