feat: Implement libmagic meta-type directives and format substitution (#42)#230
Conversation
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…#42) Complete issue #42 by landing the remaining Phase 2 work on top of the prior Phase 1 AST + parser + `name`/`use` dispatch. Closes byte-for-byte parity with GNU `file`'s `searchbug.result` fixture. Meta-type directives - `TypeKind::Meta(MetaType::{Default, Clear, Name, Use, Indirect, Offset})` all fully evaluated in `src/evaluator/engine/mod.rs`. - `default` fires only when no sibling at the same level has matched; `clear` resets the per-level `sibling_matched` flag so later `default` rules can fire again. - `name` subroutines are hoisted into a `NameTable` at load time (`parser::name_table::extract_name_table`). `use` invokes them via `RuleEnvironment` threaded through `EvaluationContext::rule_env`. - `indirect` re-applies the root rule set at the resolved offset via `AnchorScope`, bounded by `EvaluationConfig::max_recursion_depth`. - `offset` reports the resolved file offset as `Value::Uint(pos)` for printf-style substitution. Printf-style format substitution - New pure function `src/output/format.rs::format_magic_message` supporting `%d`, `%i`, `%u`, `%x`, `%X`, `%o`, `%s`, `%c`, `%%`, plus width/padding/length modifiers. Hex specifiers mask to the natural `TypeKind::bit_width()` so a signed byte carrying `-1` renders as `ff` rather than sign-extended `ffffffffffffffff`. - Alt-form width follows C printf: zero-pad inserts between prefix and digits (`%#06x` + `0xab` -> `0x00ab`), space-pad goes before the prefix, left-align trails the digits. - Wired into `MagicDatabase::concatenate_messages` before the existing `\b` backspace-suppression check (GOTCHAS S14.1). Parser and evaluator semantics - Parser consumes an optional `x` (AnyValue) token between a Meta keyword and its message so `offset x at_offset %lld` no longer leaks the operator into output. - Subroutine `base_offset` on `EvaluationContext`: inside a `use` body, positive `OffsetSpec::Absolute(n)` resolves to `base_offset + n`, matching magic(5) semantics. Negative absolute, `FromEnd`, `Relative`, and `Indirect` are unaffected (new GOTCHAS S3.10). - Continuation-sibling anchor reset: at `recursion_depth > 0`, each sibling resolves `&N` against the parent-level entry anchor rather than the previous sibling's advance. Top-level siblings (depth 0) keep chaining per GOTCHAS S3.8 (`relative_anchor_can_decrease_...` still passes). Tests - `test_searchbug_matches_full_result_string` un-ignored and passing against the GNU `file` `searchbug.testfile` fixture. - 18 unit tests for `format_magic_message` including regression guards for alt-form zero-padding. - 7 new engine tests for `MetaType::Offset` dispatch (match emission, offset-at-zero, out-of-bounds skip, non-`AnyValue` operator reject, children, anchor semantics, `sibling_matched` propagation). - Integration test file `tests/meta_types_integration.rs` covers `default`/`clear`/`indirect` synthetic scenarios plus the searchbug round-trip. Documentation - AGENTS.md "Currently Implemented" expanded with the six meta-types and printf-style substitution; "Current Limitations" updated. - GOTCHAS.md S14.2 rewritten (specifiers ARE substituted); new S3.10 for subroutine base offset; S3.9 renumbered to S3.11 with S14.1 cross-reference fixed. - `docs/solutions/integration-issues/meta-type-subroutine-dispatch-architecture.md` documents the three-layer parse-time / `ParsedMagic` / optional `RuleEnvironment` pattern. Build tooling - `dist generate` regenerated `.github/workflows/release.yml` to resolve cargo-dist staleness vs recent `actions/upload-artifact` dependabot bumps. Test results: 1348/1348 pass, 5 pre-existing skipped. `just ci-check` green (clippy `-D warnings`, fmt, audit, deny). Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Require conventional commit format per https://www.conventionalcommits.org/en/v1.0.0/. Skipped for bots.
🟢 Full CI must passWonderful, this rule succeeded.All CI checks must pass. Release-plz PRs are exempt because they only bump versions and changelogs (code was already tested on main), and GITHUB_TOKEN-triggered force-pushes suppress CI.
🟢 Do not merge outdated PRsWonderful, this rule succeeded.Make sure PRs are within 10 commits of the base branch before merging
|
Summary by CodeRabbit
WalkthroughImplements meta-type directives ( Changes
Sequence Diagram(s)sequenceDiagram
participant Parser
participant NameTable as NameTable (Extraction)
participant ParsedMagic
Parser->>Parser: parse_text_magic_file(input)
Parser->>Parser: build rule hierarchy
Parser->>NameTable: extract_name_table(rules)
NameTable-->>Parser: rules (sans Name), name_table
Parser->>ParsedMagic: return ParsedMagic { rules, name_table }
sequenceDiagram
participant Evaluator
participant RuleEnv as RuleEnvironment (name_table)
participant Subrules as SubroutineRules
participant Formatter as OutputFormatter
Evaluator->>Evaluator: evaluate_rules(root_rules)
Evaluator->>RuleEnv: lookup name for MetaType::Use
RuleEnv-->>Evaluator: Arc<[MagicRule]> (subroutine)
Evaluator->>Subrules: evaluate_use_rule(subroutine, base_offset)
Subrules-->>Evaluator: matches + terminal anchor
Evaluator->>Formatter: format_magic_message(template, value, type_kind)
Formatter-->>Evaluator: formatted text
Evaluator-->>Evaluator: integrate matches & return
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 10✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
Warning Review ran into problems🔥 ProblemsThese MCP integrations need to be re-authenticated in the Integrations settings: Linear, Notion Comment |
There was a problem hiding this comment.
Pull request overview
Implements GNU file-compatible meta-type directive parsing + evaluation and adds printf-style message substitution so real-world magic corpora (e.g. searchbug.magic) can round-trip end-to-end within libmagic-rs.
Changes:
- Add
TypeKind::Meta(MetaType)with evaluation semantics fordefault,clear,name/use,indirect, andoffset, including parse-timenameextraction into aNameTableand runtime dispatch viaRuleEnvironment. - Introduce
output::format::format_magic_messageand wire it intoMagicDatabasemessage concatenation for%d/%u/%x/%s/...substitutions. - Update parser API to return
ParsedMagic { rules, name_table }, plus extensive tests/docs/bench updates to cover the new behavior.
Reviewed changes
Copilot reviewed 42 out of 43 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/regex_search_corpus_tests.rs | Adapts tests to destructure ParsedMagic from parse_text_magic_file. |
| tests/property_tests.rs | Extends generators and adds a meta-type “never panics” property test. |
| tests/property_tests.proptest-regressions | Checks in proptest regression seeds for meta-type cases. |
| tests/parser_integration_tests.rs | Updates loader tests for ParsedMagic and adds a name/use round-trip test. |
| tests/meta_types_integration.rs | Adds end-to-end integration tests for meta-types and searchbug parity. |
| tests/directory_loading_tests.rs | Updates directory loader tests to destructure ParsedMagic. |
| tests/compatibility_tests.rs | Adds searchbug partial-match regression coverage. |
| src/parser/types.rs | Adds meta-type keywords and maps keyword-only meta-types into TypeKind::Meta. |
| src/parser/name_table.rs | Implements NameTable and extraction/scrubbing logic for name subroutines. |
| src/parser/mod.rs | Introduces ParsedMagic and changes parse_text_magic_file to return it. |
| src/parser/loader.rs | Updates directory/file loading to merge rule lists + name tables into ParsedMagic. |
| src/parser/grammar/tests/mod.rs | Adds parsing tests for relative offsets, meta-types, and searchbug.magic fixture. |
| src/parser/grammar/mod.rs | Adds &N relative offsets, meta-type parsing (name/use), x-strip, and bare-string fallback for string-family types. |
| src/parser/codegen.rs | Adds codegen support for TypeKind::Meta(MetaType::...) and injection-escape test. |
| src/parser/ast.rs | Adds MetaType and TypeKind::Meta, plus tests and bit_width() behavior. |
| src/output/mod.rs | Exposes new output::format module. |
| src/output/format.rs | Adds printf-style substitution implementation + unit tests. |
| src/lib.rs | Threads RuleEnvironment through evaluation and applies message formatting during concatenation. |
| src/evaluator/types/mod.rs | Updates string semantics to be pattern-length-based when applicable and handles TypeKind::Meta as unsupported for reads/compare. |
| src/evaluator/strength.rs | Defines strength ordering behavior for meta-types and adds tests. |
| src/evaluator/offset/mod.rs | Adds resolve_offset_with_base to support subroutine base offsets. |
| src/evaluator/mod.rs | Adds RuleEnvironment and stores optional environment + base_offset on EvaluationContext. |
| src/evaluator/engine/mod.rs | Implements meta-type dispatch (default/clear/use/indirect/offset), continuation-sibling anchor reset, and subroutine base-offset handling. |
| src/error.rs | Adds reserved evaluation errors for unknown-name and indirect-without-environment. |
| src/build_helpers.rs | Updates codegen path to use ParsedMagic.rules. |
| mise.lock | Updates tool lock entries (python build-standalone artifacts, rust version, provenance fields). |
| docs/src/parser.md | Documents ParsedMagic and meta-types (needs updating for offset implementation status). |
| docs/src/magic-format.md | Adds a meta-types section (currently missing offset). |
| docs/src/evaluator.md | Updates evaluator architecture docs and shows meta-type dispatch behavior. |
| docs/src/ast-structures.md | Documents TypeKind::Meta (currently missing MetaType::Offset in snippet). |
| docs/src/architecture.md | Updates architecture overview to include NameTable/ParsedMagic/RuleEnvironment. |
| docs/solutions/integration-issues/meta-type-subroutine-dispatch-architecture.md | Adds an architecture decision record for meta-type dispatch + environments. |
| docs/MAGIC_FORMAT.md | Mirrors meta-type documentation in the top-level docs (currently missing offset). |
| docs/ARCHITECTURE.md | Updates high-level repository architecture and MagicDatabase fields. |
| build.rs | Updates build-time codegen to use ParsedMagic.rules. |
| benches/evaluation_bench.rs | Adds a benchmark for name/use subroutine dispatch overhead. |
| ROADMAP.md | Marks meta-types (#42) as completed. |
| GOTCHAS.md | Adds gotchas for meta-type dispatch, ParsedMagic, bare-string fallback, and format substitution. |
| Cargo.toml | Updates authors metadata. |
| CHANGELOG.md | Documents meta-type and ParsedMagic breaking change. |
| AGENTS.md | Updates project structure and implemented-feature list to include meta-types + format substitution. |
| .github/workflows/release.yml | Downgrades actions/upload-artifact version pins. |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
Related Documentation 2 document(s) may need updating based on files changed in this PR: libMagic-rs api-reference
|
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 10
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/src/ast-structures.md`:
- Around line 455-491: The docs for MetaType are missing the Offset variant:
update the MetaType enum documentation and examples to include MetaType::Offset
(and describe its semantics) so the AST docs match the code; modify the enum
block under MetaType to add an Offset variant description and add corresponding
examples creating TypeKind::Meta(MetaType::Offset) alongside the existing
Default/Clear/Name/Use/Indirect cases.
In `@docs/src/parser.md`:
- Around line 452-457: Documentation for ParsedMagic is inaccurate: the struct
field name_table is declared as pub(crate) in the implementation but the docs
show it as pub. Either update the docs code block to match the implementation by
changing the declaration to pub(crate) name_table in the ParsedMagic snippet, or
if external access was intended, change the implementation's ParsedMagic struct
to pub name_table; ensure you update whichever of ParsedMagic or name_table you
modify so the docs and the implementation remain consistent.
In `@src/build_helpers.rs`:
- Around line 32-35: The build step currently parses builtin magic with
parse_text_magic_file and then calls generate_builtin_rules(&parsed.rules) but
discards parsed.name_table (extracted by extract_name_table), so name/use
subroutines are lost; either update the code generation to accept and serialize
the name_table (pass parsed.name_table into generate_builtin_rules and include
it in the generated BUILTIN_RULES static), or add a validation step after
parse_text_magic_file that inspects parsed.name_table and returns an error
rejecting any name/use directives in builtin magic (with a clear build-time
message); modify the call site around parse_text_magic_file and
generate_builtin_rules to implement one of these two fixes and ensure the
generated output includes or rejects parsed.name_table accordingly.
In `@src/evaluator/engine/mod.rs`:
- Around line 1124-1136: The debug-only panic must be converted into a proper
error return: in evaluate_rules_with_config, replace the debug_assert!
containing contains_indirect_rule(rules) with an explicit runtime check that
returns Err(...) when an env-less indirect rule is detected; use the existing
error constructor
(crate::error::EvaluationError::indirect_without_environment()) or map it into
the public error type the function exposes (e.g., LibmagicError) so callers
receive a Result::Err instead of panicking; keep original release behavior for
non-error paths and ensure the check references contains_indirect_rule and
crate::error::EvaluationError::indirect_without_environment() so the change is
easy to locate.
- Around line 546-577: Extract the duplicated child-evaluation block in
evaluate_rules into a helper (suggest name evaluate_children_or_warn) that
encapsulates RecursionGuard::enter(context) + evaluate_rules(&rule.children,
buffer, guard.context()) and implements the exact error handling: propagate
LibmagicError::Timeout, return Err for other errors except swallow and warn on
the specific EvaluationError variants (BufferOverrun, InvalidOffset,
TypeReadError::BufferOverrun/InvalidPStringLength) and LibmagicError::IoError
while logging the same message including rule.message and the error; update the
Default/Indirect/Offset/Use branches in evaluate_rules to call this helper and
use its returned matches (or empty/handled result) so behavior is identical but
the recursion/timeout/warn logic is centralized.
- Around line 264-281: The code saves and restores EvaluationContext's
last_match_end and base_offset only on the success path, so if
RecursionGuard::enter or evaluate_rules returns Err the context is left mutated;
fix by introducing an RAII scope guard (e.g. UseScope) that captures
saved_anchor = context.last_match_end() and saved_base = context.base_offset(),
sets the new offsets via set_last_match_end(absolute_offset) and
set_base_offset(absolute_offset) in its enter() constructor, exposes a context()
method for use with RecursionGuard::enter and evaluate_rules, and implements
Drop to always restore the saved_anchor and saved_base using set_last_match_end
and set_base_offset so restoration happens on every exit path (including errors)
instead of the current manual restore around
evaluate_rules/RecursionGuard::enter.
In `@src/output/format.rs`:
- Around line 270-277: Change the octal alternate-form prefix from "0o" to the
C-style "0" in the Conv::Octal branch: when spec.alt_form is true, pass "0" (not
"0o") into render_prefixed_int so %#o renders a single leading zero (e.g., 010).
Update any tests asserting "0o..." to expect the C-style "0" prefix and ensure
behavior remains governed by coerce_to_u64_masked(value, type_kind) and the
existing render_prefixed_int call.
In `@src/parser/ast.rs`:
- Around line 168-175: The rustdoc for TypeKind::Meta is out of date: it states
meta directives are “parsed and preserved in the AST but evaluated as silent
no-ops” and omits the `offset` keyword; update the comment for the
TypeKind::Meta variant to reflect the current implemented behavior (i.e., that
these directives are parsed into the AST and will be wired into later evaluator
phases rather than being silent no-ops) and include `offset` in the list of
control-flow keywords (`default`, `clear`, `name`, `use`, `indirect`, `offset`)
so the documentation matches the code. Ensure the same correction is applied to
the duplicate doc block referenced around lines 504–510.
In `@src/parser/codegen.rs`:
- Around line 522-552: Add a new unit test mirroring
test_serialize_meta_name_escapes_injection that targets the MetaType::Use match
arm: create a MagicRule with typ:
TypeKind::Meta(MetaType::Use(malicious.to_string())) using the same malicious
payload, call serialize_magic_rule(&rule, 0), and assert the generated string
does not contain injected Rust tokens (e.g., panic!("pwned-from-meta")) and does
contain the escaped quote sequence; this ensures MetaType::Use is escaped the
same way as MetaType::Name.
In `@src/parser/mod.rs`:
- Around line 145-146: Remove the unnecessary #[allow(dead_code)] on the
name_table module since parse_text_magic_file uses it; open the module
declaration pub(crate) mod name_table and delete the #[allow(dead_code)]
attribute so the module is exported without suppressing dead-code lints, then
run cargo check to confirm no new warnings surface (inspect functions referenced
from parse_text_magic_file to ensure they remain public).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 372d4029-2871-4188-bc87-756cb445414d
⛔ Files ignored due to path filters (13)
.github/workflows/release.ymlis excluded by!.github/workflows/release.ymland included by.github/**benches/evaluation_bench.rsis excluded by none and included by nonedocs/ARCHITECTURE.mdis excluded by none and included by nonedocs/MAGIC_FORMAT.mdis excluded by none and included by nonedocs/solutions/integration-issues/meta-type-subroutine-dispatch-architecture.mdis excluded by none and included by nonemise.lockis excluded by!**/*.lockand included by nonetests/compatibility_tests.rsis excluded by none and included by nonetests/directory_loading_tests.rsis excluded by none and included by nonetests/meta_types_integration.rsis excluded by none and included by nonetests/parser_integration_tests.rsis excluded by none and included by nonetests/property_tests.proptest-regressionsis excluded by none and included by nonetests/property_tests.rsis excluded by none and included by nonetests/regex_search_corpus_tests.rsis excluded by none and included by none
📒 Files selected for processing (30)
AGENTS.mdCHANGELOG.mdCargo.tomlGOTCHAS.mdROADMAP.mdbuild.rsdocs/src/architecture.mddocs/src/ast-structures.mddocs/src/evaluator.mddocs/src/magic-format.mddocs/src/parser.mdsrc/build_helpers.rssrc/error.rssrc/evaluator/engine/mod.rssrc/evaluator/engine/tests.rssrc/evaluator/mod.rssrc/evaluator/offset/mod.rssrc/evaluator/strength.rssrc/evaluator/types/mod.rssrc/lib.rssrc/output/format.rssrc/output/mod.rssrc/parser/ast.rssrc/parser/codegen.rssrc/parser/grammar/mod.rssrc/parser/grammar/tests/mod.rssrc/parser/loader.rssrc/parser/mod.rssrc/parser/name_table.rssrc/parser/types.rs
👮 Files not reviewed due to content moderation or server errors (8)
- docs/src/magic-format.md
- src/error.rs
- src/evaluator/strength.rs
- CHANGELOG.md
- src/parser/grammar/tests/mod.rs
- docs/src/evaluator.md
- docs/src/architecture.md
- src/lib.rs
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/src/api-reference.md`:
- Around line 545-547: Update the docs example that destructures ParsedMagic to
avoid accessing the non-public pub(crate) field name_table: change the example
to either destructure while ignoring the internal field (e.g., let ParsedMagic {
rules, .. } = parse_text_magic_file(...)?;) or bind the return value (let parsed
= parse_text_magic_file(...)?;) and then use parsed.rules; reference ParsedMagic
and parse_text_magic_file in the updated text.
- Line 435: Update the parenthetical note for the `message` field to explicitly
state that unescaped '%' characters are now treated as printf-style format
specifiers (e.g., %d, %s, %x) and that literal percent signs must be escaped as
'%%'; also mention this is a breaking change so callers must escape existing '%'
in rule messages. Use the `message` field description to add the escaping
requirement and the breaking-change warning.
- Around line 233-250: Add a "#### Examples" subsection under the MetaType
documentation mirroring other type sections: include small Rust examples that
import MetaType and TypeKind and use parser::grammar::parse_magic_rule to parse
rules and assert the resulting rule.typ matches
TypeKind::Meta(MetaType::Default), MetaType::Name("..."), MetaType::Use("..."),
and MetaType::Indirect; keep examples concise and consistent with the existing
Operator examples so readers can see how to parse/construct meta-type rules.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 65dfb59d-7bdf-4d58-a1a9-fedac2dd6ed6
⛔ Files ignored due to path filters (1)
docs/API_REFERENCE.mdis excluded by none and included by none
📒 Files selected for processing (1)
docs/src/api-reference.md
…format substitution Critical: SubroutineScope RAII replaces manual save/restore in evaluate_use_rule so last_match_end and base_offset are restored on all error exit paths (RecursionLimitExceeded and Timeout previously leaked corrupted state). format_magic_message no longer mangles non-ASCII template bytes -- replaced byte-by-byte push with plain-run slice flush. Hygiene: removed stale allow(dead_code) on AnchorScope::context(). Tests: 9 new cases -- 5 for resolve_offset_with_base bias invariants (positive Absolute biased, negative Absolute / FromEnd / Relative / Indirect not biased), 2 for Use subroutine with non-zero use-site (proves bias is active and Relative is not double-biased), 1 continuation-sibling reset regression guard with bytes-consumed first sibling, 1 non-ASCII template format test. 1357/1357 pass. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
There was a problem hiding this comment.
Pull request overview
Implements GNU file-compatible meta-type directives (default, clear, name, use, indirect, offset) across the parser/evaluator pipeline and adds printf-style message substitution so magic fixtures like searchbug.magic round-trip byte-for-byte through MagicDatabase.
Changes:
- Introduces
TypeKind::Meta(MetaType)+ParsedMagic { rules, name_table }, including load-timenamehoisting into aNameTable. - Adds evaluator support for meta-type dispatch (subroutines, indirect re-entry, default/clear sibling state) and subroutine base-offset biasing.
- Adds extensive tests, docs updates, and workflow/tooling refreshes to validate parity and document breaking changes.
Reviewed changes
Copilot reviewed 45 out of 46 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/regex_search_corpus_tests.rs | Updates parser call sites to destructure ParsedMagic and continue corpus checks. |
| tests/property_tests.rs | Expands generators to include MetaType and adds a never-panics property for meta-type evaluation. |
| tests/property_tests.proptest-regressions | Checks in proptest regression seeds related to new meta-type paths. |
| tests/parser_integration_tests.rs | Updates directory/file loaders to ParsedMagic; adds end-to-end name/use round-trip test. |
| tests/meta_types_integration.rs | New end-to-end smoke + parity tests for meta-types and searchbug fixture output. |
| tests/directory_loading_tests.rs | Updates directory loader tests to ParsedMagic return type. |
| tests/compatibility_tests.rs | Adds weaker searchbug partial-match regression guard. |
| src/parser/types.rs | Adds meta keywords to parsing + maps keyword-only meta types in type_keyword_to_kind. |
| src/parser/name_table.rs | New NameTable + extract_name_table hoisting logic for name subroutines. |
| src/parser/mod.rs | Introduces ParsedMagic and updates parse_text_magic_file to hoist names. |
| src/parser/loader.rs | Returns ParsedMagic from loaders and merges NameTable across directory loads. |
| src/parser/grammar/tests/mod.rs | Adds grammar tests for relative offsets and meta-type parsing/roundtrip. |
| src/parser/grammar/mod.rs | Adds &N relative offsets, name/use operand parsing, meta-rule x stripping, and bare-word string fallback. |
| src/parser/codegen.rs | Serializes TypeKind::Meta and adds injection regression tests for Name/Use identifiers. |
| src/parser/ast.rs | Defines MetaType and adds TypeKind::Meta(MetaType) with serde/tests. |
| src/output/mod.rs | Exposes output::format module. |
| src/lib.rs | Threads RuleEnvironment into evaluations and applies printf substitution during message concatenation. |
| src/evaluator/types/mod.rs | Adjusts string semantics to pattern-length reads; explicitly rejects reading TypeKind::Meta as values/patterns. |
| src/evaluator/strength.rs | Assigns strength defaults for meta-types + tests for ordering expectations. |
| src/evaluator/offset/mod.rs | Adds resolve_offset_with_base for subroutine base_offset biasing. |
| src/evaluator/mod.rs | Adds RuleEnvironment + EvaluationContext fields for rule_env and base_offset. |
| src/error.rs | Adds reserved errors for strict-mode handling of missing env/unknown name. |
| src/build_helpers.rs | Updates codegen helper to use parsed.rules. |
| mise.lock | Toolchain lock refresh (Rust/Python artifacts and provenance fields). |
| docs/src/parser.md | Documents meta-types and the ParsedMagic breaking-change return type. |
| docs/src/magic-format.md | Documents meta-type syntax/semantics in the user-facing magic format doc. |
| docs/src/evaluator.md | Documents meta-type dispatch behavior and updates examples for ParsedMagic. |
| docs/src/ast-structures.md | Documents TypeKind::Meta / MetaType in AST reference. |
| docs/src/architecture.md | Updates architecture overview to include NameTable, ParsedMagic, and evaluator environment threading. |
| docs/src/api-reference.md | Adds MetaType docs and calls out breaking changes + printf substitution. |
| docs/solutions/logic-errors/raii-scope-guards-for-evaluator-context-save-restore.md | New design note capturing the RAII scope-guard pattern for context save/restore. |
| docs/solutions/integration-issues/meta-type-subroutine-dispatch-architecture.md | New design note describing the parse→env→dispatch architecture for meta-types. |
| docs/MAGIC_FORMAT.md | Mirrors meta-type documentation in the non-mdbook doc. |
| docs/ARCHITECTURE.md | Updates tree/struct documentation to include name table + root rule slice plumbing. |
| docs/API_REFERENCE.md | Updates API reference for ParsedMagic, meta-types, and message substitution semantics. |
| build.rs | Updates builtin rules generation to use parsed.rules. |
| benches/evaluation_bench.rs | Adds a benchmark focused on name/use subroutine dispatch overhead. |
| ROADMAP.md | Marks meta-types as complete (including offset). |
| GOTCHAS.md | Updates gotchas for meta-types, bare-word string fallback, and printf substitution. |
| Cargo.toml | Updates authors field. |
| CHANGELOG.md | Adds feature + breaking-change notes for ParsedMagic and meta-type support. |
| AGENTS.md | Updates repo guidance to include meta-types + printf substitution as implemented. |
| .github/workflows/release.yml | Updates artifact upload action version usage. |
…ect top-level re-entry Bug 1 (PRRT...PrI): evaluate_use_rule now returns the subroutine's TERMINAL anchor (where the subroutine left last_match_end) instead of the use-site offset. Callers propagate that terminal anchor via context.set_last_match_end, so sibling rules after a successful 'use' resolve &N against the subroutine's final match position -- matching GNU file's inlining semantics. Previously siblings resolved against the use-site, silently breaking continuation chains. Bug 2 (PRRT...PrL): MetaType::Indirect root re-entry is evaluated with TOP-LEVEL sibling semantics (anchor chains across siblings per GOTCHAS S3.8), not CONTINUATION semantics (anchor resets). Previously the RecursionGuard wrapping the indirect sub-evaluation forced recursion_depth > 0, which triggered the continuation-reset path and broke top-level siblings inside the re-entered database. Fixed via a new one-shot EvaluationContext::indirect_reentry flag: indirect dispatch sets it before evaluate_rules; evaluate_rules consumes it at entry so only the top-level iteration is affected -- children of matched rules inside the re-entry correctly fall back to recursion_depth > 0 for their own continuation semantics. Deferred (PRRT...PrQ refactor suggestion): split engine tests file into focused modules. Out of scope for a correctness-focused review round; tracked as a follow-on maintenance task. Test: 1364/1364 pass (searchbug byte-for-byte parity preserved through both semantic fixes); just ci-check green. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…offset docs The real guard is named SubroutineScope (saves+restores both last_match_end and base_offset). The base_offset doc comment referenced a BaseOffsetScope that never existed -- leftover from design intent in an earlier session. Updated the two doc-comment sites. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
There was a problem hiding this comment.
Pull request overview
Implements GNU file-compatible meta-type (control-flow) directives and adds printf-style message substitution so libmagic-rs can round-trip real-world magic corpora (e.g., searchbug.magic) with byte-for-byte output parity.
Changes:
- Adds
TypeKind::Meta(MetaType)(default/clear/name/use/indirect/offset) across AST, parser, evaluator dispatch, and strength ordering. - Introduces
ParsedMagic { rules, name_table }parser return type with load-timenameextraction into aNameTablefor efficientusedispatch. - Adds
output::format::format_magic_messageand wires message rendering (and backspace handling) through it.
Reviewed changes
Copilot reviewed 45 out of 46 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/regex_search_corpus_tests.rs | Updates parser call sites to destructure ParsedMagic. |
| tests/property_tests.rs | Extends generators/property coverage for meta-types and MetaType variants. |
| tests/property_tests.proptest-regressions | Checks in proptest regression seeds for meta-type scenarios. |
| tests/parser_integration_tests.rs | Adjusts loader return type usage; adds name/use round-trip integration test. |
| tests/meta_types_integration.rs | Adds end-to-end integration tests for meta-type directives and searchbug parity. |
| tests/directory_loading_tests.rs | Updates directory loader return type usage (ParsedMagic). |
| tests/compatibility_tests.rs | Adds a weaker searchbug regression assertion for consumers that alter formatting layers. |
| src/parser/types.rs | Recognizes meta keywords and maps keyword-only meta directives to TypeKind::Meta(...). |
| src/parser/name_table.rs | New: extracts top-level name blocks into a NameTable and scrubs nested name rules. |
| src/parser/mod.rs | Defines ParsedMagic and updates parse_text_magic_file to return it. |
| src/parser/loader.rs | Updates directory/file loading to return ParsedMagic and merge name tables. |
| src/parser/grammar/tests/mod.rs | Adds parsing tests for relative offsets and meta-type rules. |
| src/parser/grammar/mod.rs | Adds &N relative offset parsing, name/use identifier parsing, meta-type message x-strip, and bare-word string fallback for string-family types. |
| src/parser/codegen.rs | Serializes TypeKind::Meta(MetaType) and adds injection-escape regression tests for meta identifiers. |
| src/parser/ast.rs | Adds MetaType and TypeKind::Meta(MetaType) with serde + unit tests. |
| src/output/mod.rs | Exposes new output::format module. |
| src/output/format.rs | New: printf-style format substitution for match messages. |
| src/lib.rs | Wires name table + root rules into evaluation context; applies format substitution during message concatenation. |
| src/evaluator/types/mod.rs | Adjusts string read semantics and defines behavior for meta-types in read paths/byte consumption. |
| src/evaluator/strength.rs | Assigns strength defaults for meta-types and adds strength ordering tests. |
| src/evaluator/offset/mod.rs | Adds resolve_offset_with_base for subroutine-relative absolute offsets; tests. |
| src/evaluator/mod.rs | Introduces RuleEnvironment and extends EvaluationContext with env/base_offset/indirect-reentry state. |
| src/error.rs | Adds reserved evaluation error variants for strict-mode unknown-name / indirect-without-env. |
| src/build_helpers.rs | Updates build helper to use ParsedMagic.rules. |
| build.rs | Updates builtin rule generation to use ParsedMagic.rules. |
| benches/evaluation_bench.rs | Adds benchmark for name/use subroutine dispatch overhead. |
| docs/src/parser.md | Documents meta-types and ParsedMagic breaking change. |
| docs/src/magic-format.md | Documents meta-type directives in the magic format guide. |
| docs/src/evaluator.md | Documents evaluator meta-type dispatch and RuleEnvironment threading. |
| docs/src/ast-structures.md | Documents MetaType and TypeKind::Meta. |
| docs/src/architecture.md | Updates architecture docs for name table and ParsedMagic pipeline. |
| docs/src/api-reference.md | Documents MetaType and breaking changes (ParsedMagic + printf substitution). |
| docs/MAGIC_FORMAT.md | Mirrors meta-type documentation in top-level docs. |
| docs/ARCHITECTURE.md | Updates repo architecture doc for name_table + root_rules/name_table fields. |
| docs/API_REFERENCE.md | Updates API reference doc with ParsedMagic and MetaType info. |
| docs/solutions/logic-errors/raii-scope-guards-for-evaluator-context-save-restore.md | Adds a postmortem/solution doc on RAII scope guards for context save/restore. |
| docs/solutions/integration-issues/meta-type-subroutine-dispatch-architecture.md | Adds an architecture pattern doc for name/use extraction + RuleEnvironment threading. |
| ROADMAP.md | Marks meta-types as completed (including offset). |
| GOTCHAS.md | Updates gotchas for meta-type dispatch, ParsedMagic return type, and printf substitution. |
| CHANGELOG.md | Notes meta-type feature landing and parser breaking change. |
| AGENTS.md | Updates contributor-facing implementation status for meta-types and formatting. |
| Cargo.toml | Updates authors metadata. |
| mise.lock | Updates tool lock entries (Rust/tooling versions and provenance fields). |
| .github/workflows/release.yml | Adjusts actions/upload-artifact version pins. |
…PR review) Addresses PRRT...PrQ from the round-3 PR review: src/evaluator/engine/tests.rs grew to 3666 lines, well past the 500-600 line project guideline. Split into the three meta-type suites the reviewer called out. Structure: - src/evaluator/engine/tests/mod.rs (2777 lines): core evaluator tests (byte/short/long/string/pstring/relative-offset/recursion) plus centralized meta-type test helpers (make_context_with_env, use_rule, use_rule_at, build_name_table, default_rule, clear_rule, byte_eq_rule, indirect_rule, offset_rule) marked pub(super) so submodules can share them. - src/evaluator/engine/tests/meta_use_tests.rs (405 lines): MetaType::Use dispatch, subroutine name-table resolution, and subroutine base_offset biasing tests. - src/evaluator/engine/tests/meta_default_clear_indirect_tests.rs (275 lines): MetaType::Default fallback, MetaType::Clear state-reset, and MetaType::Indirect root re-entry tests. - src/evaluator/engine/tests/meta_offset_tests.rs (247 lines): MetaType::Offset dispatch and evaluate_children_or_warn helper coverage. No test content changed -- each test's body is byte-identical to the pre-split version. The three new submodules each have their own SPDX/copyright header and module doc. Test count unchanged (1364/1364 pass, 5 pre-existing skipped). Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 9
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/src/architecture.md`:
- Around line 175-176: The documentation list in docs/src/architecture.md omits
MetaType::Offset; update the inline-dispatch bullet to include
`MetaType::Offset` alongside `MetaType::Default`, `MetaType::Clear`,
`MetaType::Use`, and `MetaType::Indirect` (i.e., mention that `evaluate_rules`
performs inline dispatch for `MetaType::Default`, `MetaType::Clear`,
`MetaType::Use`, `MetaType::Indirect`, and `MetaType::Offset`) so it matches the
evaluator behavior described by `mod.rs` (functions `evaluate_single_rule`,
`evaluate_rules`, `evaluate_rules_with_config`) and the implementation in the
`evaluate_rules` loop.
In `@docs/src/evaluator.md`:
- Around line 529-537: The examples call evaluate_rules(&parsed.rules, &buffer)
but the real API requires passing an &mut EvaluationContext; update each example
(including the blocks using parsed from parse_text_magic_file) to construct a
mutable EvaluationContext (e.g., new or default) and call evaluate_rules(&mut
ctx, &parsed.rules, &buffer) instead; ensure you reference the same
parsed.name_table/MagicDatabase handling as before and apply this change
consistently to all listed example blocks (lines ~529, 555, 575, 598, 609).
- Line 631: Update the checklist entry to include the missing meta-type
`offset`: edit the line currently listing meta-type directives (which mentions
`default`, `clear`, `name`/`use`, and `indirect`) and add `offset` (e.g.,
include `offset` in backticks alongside the other directives) so the implemented
meta-type set in the docs matches the PR.
- Around line 372-373: The documentation wrongly states that MetaType::Offset
performs printf-style substitution during evaluator dispatch; update the
description for MetaType::Offset to say evaluate_rules records the resolved
offset as a raw Value (Value::Uint(resolved_offset)) and stores the raw message,
and that printf-style substitution (e.g., replacing %lld, %d) is performed later
during output/message assembly (via the formatting stage such as
format_magic_message) rather than inside the evaluator; reference
MetaType::Offset, evaluate_rules, and format_magic_message in the updated text.
In `@src/evaluator/engine/tests/mod.rs`:
- Around line 2634-2779: Extract the shared test helpers (make_context_with_env,
use_rule, use_rule_at, build_name_table, default_rule, clear_rule, byte_eq_rule,
indirect_rule, offset_rule) out of the large tests mod into a new dedicated test
helper module (e.g. tests/helpers/meta.rs): move the function definitions into
that file with the same signatures and keep their pub(super) visibility, then in
the original mod.rs replace the moved code with a module declaration
(#[cfg(test)] mod helpers; or pub(super) mod helpers;) and re-export or
reference the helpers as needed (e.g. use self::helpers::* or pub(super) use
helpers::*;) so existing test submodules (meta_default_clear_indirect_tests,
meta_offset_tests, meta_use_tests) continue to compile without other changes.
In `@src/output/format.rs`:
- Around line 269-313: The Str and Char conversions currently either drop
width/alignment or use pad_numeric (which applies zero-padding); implement a
text-padding helper (e.g., pad_text) that takes the rendered string, the
existing FormatSpec, and pads with spaces only while honoring width and
left-alignment and ignoring the '0' flag, then replace Conv::Str to call
render_string(value) and feed that to pad_text (instead of returning raw string)
and replace Conv::Char to convert the byte to a single-character String
(char::from(byte).to_string()) and feed that to pad_text instead of pad_numeric;
keep numeric conversions using pad_numeric unchanged and add tests for "%5s",
"%-5s", and "%03c" to assert space-padding behavior.
In `@src/parser/grammar/tests/mod.rs`:
- Around line 2505-2679: The four tests (test_parse_magic_rule_meta_types,
test_parse_magic_rule_meta_name_use_reject_malformed_identifiers,
test_parse_text_magic_file_meta_roundtrip,
test_parse_text_magic_file_searchbug_fixture) should be extracted from the huge
tests file into a new focused test module for meta-type parsing; create a new
sibling test module (e.g., meta_types tests module), move these test functions
there, ensure the new module imports crate::parser::parse_text_magic_file and
parse_magic_rule and any enums (TypeKind, MetaType) used by the assertions, and
update the parent tests module to reference the new module (pub mod ...) so the
tests run; keep test logic unchanged but remove them from the original file.
In `@src/parser/name_table.rs`:
- Line 112: The warning printed by scrub_nested_names uses the passed parent
level for messaging, but the recursive calls currently pass rule.level + 1 which
misreports the actual parent level; update the two recursive call sites that
invoke scrub_nested_names to pass rule.level (not rule.level + 1) so the warning
text reflects the true parent level (e.g., change
scrub_nested_names(rule.children, rule.level + 1) to
scrub_nested_names(rule.children, rule.level) for both occurrences).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: f3b40de6-4239-4197-9f34-1c303e9bc165
⛔ Files ignored due to path filters (3)
docs/ARCHITECTURE.mdis excluded by none and included by nonedocs/MAGIC_FORMAT.mdis excluded by none and included by nonetests/compatibility_tests.rsis excluded by none and included by none
📒 Files selected for processing (14)
ROADMAP.mddocs/src/api-reference.mddocs/src/architecture.mddocs/src/evaluator.mdsrc/evaluator/engine/mod.rssrc/evaluator/engine/tests/meta_default_clear_indirect_tests.rssrc/evaluator/engine/tests/meta_offset_tests.rssrc/evaluator/engine/tests/meta_use_tests.rssrc/evaluator/engine/tests/mod.rssrc/evaluator/mod.rssrc/output/format.rssrc/parser/ast.rssrc/parser/grammar/tests/mod.rssrc/parser/name_table.rs
Sixteen unresolved review threads across correctness, performance, documentation, and test organization: - Evaluator: AnchorScope now saves/restores both last_match_end AND base_offset so `indirect` inside a `use` subroutine re-enters root rules with base_offset=0 (RFn1). - Offset resolution: resolve_offset_with_base switches to checked arithmetic and maps overflow to InvalidOffset rather than saturating into a spurious BufferOverrun (RU0). - MagicDatabase: drop the redundant rules Vec field; root_rules is the sole storage built via Arc::from(rules.into_boxed_slice()) to avoid Arc::from-slice clones (Vdy, VeD). - Format: Conv::Char uses pad_non_numeric (space-only) so %03c follows POSIX instead of zero-padding; Conv::Char doc now says full 0x00-0xff range via Latin-1 (VeM, RFn6). - Docs: MetaType::Offset added to architecture.md inline-dispatch list; evaluator.md dispatch wording separated from output formatting; evaluate_rules usage examples updated to pass &mut EvaluationContext; checklist bullet includes offset (RFnh, RFnl, RFnx, RFnz). - Grammar: meta-type directives reject attached operators (`default&0xf` no longer silently drops the mask) with a regression test (RUs). - Release workflow: dist-workspace.toml pin for actions/upload-artifact bumped from v7.0.0 to v7.0.1; release.yml regenerated via `dist generate` (RU7). - Property test: timeout doc comment rewritten to reflect that the test constructs an env-less context, so MetaType::Indirect/Use take the silent no-op path rather than recursing (VeW). - Name table: scrub_nested_names is now called with the actual parent level rather than parent.level + 1 (RFn-). - Test layout: engine test helpers moved to src/evaluator/engine/tests/helpers/meta.rs (RFn3); grammar meta-type parser tests moved to src/parser/grammar/tests/meta_types.rs (RFn8). Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
There was a problem hiding this comment.
Pull request overview
Implements GNU file-compatible evaluation for libmagic meta-type directives (default, clear, name, use, indirect, offset) and adds printf-style message substitution, wiring parse-time name-table extraction through MagicDatabase evaluation so real-world corpora (e.g. searchbug.magic) round-trip end-to-end.
Changes:
- Introduces
TypeKind::Meta(MetaType)+ parse/load plumbing (ParsedMagic { rules, name_table }) andNameTableextraction/merging forname/use. - Wires evaluator dispatch for all meta-types (including indirect re-entry and subroutine base-offset) and applies printf-style formatting at output concatenation time.
- Adds/updates unit + integration tests (parser, evaluator engine, corpus fixtures) and refreshes docs to describe the new architecture and breaking changes.
Reviewed changes
Copilot reviewed 52 out of 53 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/regex_search_corpus_tests.rs | Updates tests to destructure ParsedMagic from parser API. |
| tests/property_tests.rs | Extends generators + adds property test coverage for TypeKind::Meta(...) evaluation. |
| tests/property_tests.proptest-regressions | Adds saved proptest regression seeds for meta-type cases. |
| tests/parser_integration_tests.rs | Updates parser integration tests for ParsedMagic + adds name/use round-trip test. |
| tests/meta_types_integration.rs | Adds end-to-end smoke/integration tests for all meta-types + searchbug parity. |
| tests/directory_loading_tests.rs | Updates directory-loading tests to destructure ParsedMagic. |
| tests/compatibility_tests.rs | Adds a weaker “partial match” regression test around the searchbug fixture. |
| src/tests.rs | Adjusts built-in rules test to use root_rules storage. |
| src/parser/types.rs | Teaches keyword parsing/conversion about meta-types, and name/use suffix handling. |
| src/parser/name_table.rs | Adds load-time name extraction into a NameTable plus merge/sort utilities. |
| src/parser/mod.rs | Changes parse_text_magic_file to return ParsedMagic { rules, name_table }. |
| src/parser/loader.rs | Updates directory/file loaders to return ParsedMagic and merge name tables. |
| src/parser/grammar/tests/mod.rs | Adds/extends grammar tests (notably relative &N offset parsing). |
| src/parser/grammar/tests/meta_types.rs | Adds dedicated grammar tests for meta-type parsing + fixture parse check. |
| src/parser/grammar/mod.rs | Adds name/use identifier parsing, optional x stripping for meta messages, bare-string fallback, and relative offsets. |
| src/parser/codegen.rs | Serializes TypeKind::Meta(...) and adds identifier escaping regression tests. |
| src/parser/ast.rs | Introduces MetaType enum and TypeKind::Meta(MetaType) with docs + tests. |
| src/output/mod.rs | Exposes new output::format module. |
| src/lib.rs | Stores root_rules + name_table in MagicDatabase, threads RuleEnvironment, and applies printf substitution during message concatenation. |
| src/evaluator/types/mod.rs | Adjusts string-read semantics for pattern-length reads; treats meta-types as non-readable/zero-consume. |
| src/evaluator/strength.rs | Assigns strength contributions for meta-types + adds ordering tests. |
| src/evaluator/offset/mod.rs | Adds resolve_offset_with_base and tests for subroutine base-offset bias + overflow behavior. |
| src/evaluator/mod.rs | Adds RuleEnvironment and extends EvaluationContext with env/base_offset/indirect-reentry plumbing. |
| src/evaluator/engine/tests/mod.rs | Refactors test helpers into submodules and wires new meta-type test modules. |
| src/evaluator/engine/tests/meta_offset_tests.rs | Adds unit tests for MetaType::Offset dispatch + child evaluation behavior. |
| src/evaluator/engine/tests/meta_default_clear_indirect_tests.rs | Adds unit tests for default/clear/indirect semantics + recursion limit behavior. |
| src/evaluator/engine/tests/helpers/mod.rs | Adds shared helper module entry point for engine tests. |
| src/evaluator/engine/tests/helpers/meta.rs | Adds builders/helpers for meta-type rule construction and contexts with env. |
| src/error.rs | Adds reserved evaluation errors for strict-mode unknown-name/indirect-without-env cases. |
| src/build_helpers.rs | Updates build helper to use ParsedMagic (rules-only for codegen). |
| docs/src/parser.md | Documents meta-types and ParsedMagic breaking change. |
| docs/src/magic-format.md | Documents meta-types in the magic(5)-format guide. |
| docs/src/evaluator.md | Documents meta-type dispatch and updates examples for EvaluationContext usage. |
| docs/src/ast-structures.md | Documents MetaType / TypeKind::Meta AST structures and semantics. |
| docs/src/architecture.md | Updates architecture docs to include NameTable and ParsedMagic shape. |
| docs/src/api-reference.md | Updates API reference to include meta-types and printf substitution behavior. |
| docs/solutions/logic-errors/raii-scope-guards-for-evaluator-context-save-restore.md | Adds a postmortem/solution writeup for RAII scope guards and formatting pitfalls. |
| docs/solutions/integration-issues/meta-type-subroutine-dispatch-architecture.md | Adds architecture writeup for name-table extraction + env-threading pattern. |
| docs/MAGIC_FORMAT.md | Mirrors meta-type documentation in the top-level magic format doc. |
| docs/ARCHITECTURE.md | Updates architecture doc for new database internals (needs correction per review). |
| docs/API_REFERENCE.md | Updates API reference (needs correction per review). |
| dist-workspace.toml | Bumps actions/upload-artifact pinned version. |
| build.rs | Updates build-time parser call sites to ParsedMagic. |
| benches/evaluation_bench.rs | Adds benchmark for name/use subroutine dispatch overhead. |
| ROADMAP.md | Marks meta-types milestone complete (includes offset). |
| GOTCHAS.md | Adds/updates gotchas for meta-types, bare string fallback, and format substitution. |
| Cargo.toml | Updates crate authors list. |
| CHANGELOG.md | Documents meta-types feature and ParsedMagic breaking change. |
| AGENTS.md | Updates repo guidance with new meta-type + printf substitution support and module structure. |
| rules: Vec<MagicRule>, // Parsed magic rules (top-level, strength-sorted) | ||
| name_table: Arc<NameTable>, // `name`/`use` subroutine dispatch table (Arc for cheap clone across evaluations) | ||
| root_rules: Arc<[MagicRule]>, // Shared immutable slice of top-level rules for `indirect` re-entry |
|
|
||
| | Field (Internal) | Type | Description | | ||
| | ---------------- | ------------------ | ----------------------------------------------------------------- | | ||
| | `rules` | `Vec<MagicRule>` | Top-level magic rules | |
| let (input, _) = space1(input)?; | ||
| let (after_id, id) = | ||
| take_while(|c: char| c.is_alphanumeric() || c == '_' || c == '-').parse(input)?; | ||
| if id.is_empty() { |
| - `numbers.rs`: Numeric type parsing (decimal/hex, signed/unsigned) | ||
| - `value.rs`: Value literal parsing (strings, floats, hex bytes) | ||
| - `mod.rs`: Parser interface, format detection, and hierarchical rule building (✅ Complete) | ||
| - `name_table.rs`: Load-time extraction of `name <id>` subroutine blocks into a `HashMap<String, Vec<MagicRule>>` (the `NameTable` type). |
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/lib.rs (1)
248-256:⚠️ Potential issue | 🟠 MajorHoist builtin
nameblocks before storing the database.
load_from_file_with_config()now builds bothroot_rulesandname_table, but the builtin path still hardcodesNameTable::empty(). If the generated builtin corpus contains anyname/usedirectives, builtin mode will silently degrade:MetaType::Namestays inroot_rulesas a no-op and everyuselookup misses.Suggested fix
- let mut rules = crate::builtin_rules::get_builtin_rules(); + let rules = crate::builtin_rules::get_builtin_rules(); + let (mut rules, mut name_table) = crate::parser::name_table::extract_name_table(rules); crate::evaluator::strength::sort_rules_by_strength_recursive(&mut rules); + name_table.sort_subroutines(|rules| { + crate::evaluator::strength::sort_rules_by_strength_recursive(rules); + }); let root_rules: std::sync::Arc<[MagicRule]> = std::sync::Arc::from(rules.into_boxed_slice()); Ok(Self { - name_table: std::sync::Arc::new(crate::parser::name_table::NameTable::empty()), + name_table: std::sync::Arc::new(name_table), root_rules, config, source_path: None, mime_mapper: mime::MimeMapper::new(), })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib.rs` around lines 248 - 256, The builtin constructor currently sets name_table to NameTable::empty(), so any builtin `name`/`use` directives are ignored; update with_builtin_rules_and_config to reuse the same hoisting/name-table construction used in load_from_file_with_config: after computing root_rules (the MagicRule array) run the hoist/name-table extraction logic (the code path that builds the parser::name_table and hoists `name` blocks) and set self.name_table to that resulting NameTable Arc instead of NameTable::empty(), ensuring root_rules reflect any hoisted MetaType::Name changes and lookups via use succeed.
♻️ Duplicate comments (1)
src/output/format.rs (1)
267-270:⚠️ Potential issue | 🟠 MajorApply the non-numeric padding path to
%sas well.
%cnow honors width/alignment viapad_non_numeric, but%sstill returns the raw rendered string, so%5s/%-5signore field width entirely. That leaves the formatter only half-fixed for text conversions.Suggested fix
- Conv::Str => Some(render_string(value)), + Conv::Str => Some(pad_non_numeric(&render_string(value), spec)),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/output/format.rs` around lines 267 - 270, The Conv::Str branch in render currently returns the raw rendered string (render_string(value)) so width/alignment is ignored; update the Conv::Str match arm to pass the rendered string through pad_non_numeric (the same way Conv::Char/%c does) using the provided type_kind so formats like %5s and %-5s honor padding and alignment; locate the render function and replace the Conv::Str return with a call that applies pad_non_numeric(render_string(value), type_kind) (ensuring correct ownership/borrowing to produce an Option<String>).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dist-workspace.toml`:
- Line 56: The workflow pins "actions/upload-artifact" to the mutable tag
"v7.0.1", which is insecure; locate the entry with the key
"actions/upload-artifact" and replace the tag value "v7.0.1" with the
corresponding immutable commit SHA (the full 40-char commit hash resolved from
the GitHub Actions marketplace or the action's repo) so the TOML contains the
exact commit SHA instead of the tag for reproducible CI.
In `@docs/src/evaluator.md`:
- Around line 374-393: The sequence diagram is incorrect: meta directives "Use"
and "Indirect" are handled inline in evaluate_rules, not routed through
evaluate_single_rule_with_anchor; update the diagram so that the arrows for Use
and Indirect go from ER (evaluate_rules) directly to the relevant components
(e.g., NameTable lookup for Use and cloning/using root_rules for Indirect) and
back to ER, remove or change any arrows that currently route through ESR
(evaluate_single_rule_with_anchor), and keep participants names (evaluate_rules,
evaluate_single_rule_with_anchor, NameTable, root_rules) so readers can trace
the correct inlined control flow.
- Line 520: The docs import is wrong: update the use statements so
evaluate_rules is imported from the evaluator module (e.g. use
libmagic_rs::evaluator::{evaluate_rules, EvaluationConfig, EvaluationContext};)
instead of from the crate root; change every occurrence that currently uses
libmagic_rs::{evaluate_rules, ...} (including the spots mentioned for lines 520,
548, 569, 590) to reference libmagic_rs::evaluator::evaluate_rules (and bring
EvaluationConfig and EvaluationContext into scope from the same evaluator
module) so the examples compile.
In `@src/evaluator/engine/mod.rs`:
- Around line 280-281: Update the helper contract comment that currently reads
"Returns `Ok((Some(absolute_offset), matches))` ..." to accurately state that
the function returns the subroutine's terminal anchor (e.g., "Returns
`Ok((Some(terminal_anchor), matches))`") rather than the resolved use-site
offset; mention that the returned terminal anchor is what callers use for
subsequent `&N` chaining and keep the `Ok((None, vec![]))` case unchanged,
ensuring the docstring for the helper in mod.rs reflects this behavior and
follows the project's documentation guidance for complex algorithm comments.
In `@src/evaluator/engine/tests/helpers/meta.rs`:
- Around line 69-138: Several tests repeat the same MagicRule struct literal;
extract the shared fields into a single constructor helper (e.g., a private fn
like build_magic_rule(offset: OffsetSpec, typ: TypeKind, op: Operator, value:
Value, message: String, children: Vec<MagicRule>) -> MagicRule) and have
default_rule, clear_rule, byte_eq_rule, indirect_rule, and offset_rule call that
helper with only the differing arguments (for byte_eq_rule provide
signed/type/value variations, for others pass Meta types and op). Update
references to MagicRule, OffsetSpec, TypeKind, Operator, Value, message and
children in the helper so future changes to the common fields need only one
update.
In `@src/parser/grammar/mod.rs`:
- Around line 198-208: The relative-offset branch currently accepts "&+-N"
because after stripping the '+' it still calls parse_number on a '-' prefix;
modify the '+' branch in the parser so that when rest.strip_prefix('+') returns
after_plus you first check that after_plus does not start with '-' and if it
does return a parse error (rather than calling parse_number), otherwise proceed
to call parse_number(after_plus); this change touches the branch that uses
rest.strip_prefix('+'), parse_number, multispace0 and returns
OffsetSpec::Relative.
- Around line 843-863: The branch for string-family types currently calls
parse_value first so bare tokens like "123" or "0x7f" get parsed as Uint/Bytes
instead of literal strings; change the logic in the is_string_family_type branch
to prefer parse_bare_string_value for unquoted single-token inputs (i.e., try
parse_bare_string_value(input) first and return that when it succeeds), falling
back to parse_value only if the bare-string parse fails, while keeping the
Operator::AnyValue path unchanged; update tests for
parse_value/parse_bare_string_value to include numeric-looking and hex-looking
bare tokens for TypeKind::String/PString/Regex/Search to ensure the rule matches
the literal text.
---
Outside diff comments:
In `@src/lib.rs`:
- Around line 248-256: The builtin constructor currently sets name_table to
NameTable::empty(), so any builtin `name`/`use` directives are ignored; update
with_builtin_rules_and_config to reuse the same hoisting/name-table construction
used in load_from_file_with_config: after computing root_rules (the MagicRule
array) run the hoist/name-table extraction logic (the code path that builds the
parser::name_table and hoists `name` blocks) and set self.name_table to that
resulting NameTable Arc instead of NameTable::empty(), ensuring root_rules
reflect any hoisted MetaType::Name changes and lookups via use succeed.
---
Duplicate comments:
In `@src/output/format.rs`:
- Around line 267-270: The Conv::Str branch in render currently returns the raw
rendered string (render_string(value)) so width/alignment is ignored; update the
Conv::Str match arm to pass the rendered string through pad_non_numeric (the
same way Conv::Char/%c does) using the provided type_kind so formats like %5s
and %-5s honor padding and alignment; locate the render function and replace the
Conv::Str return with a call that applies pad_non_numeric(render_string(value),
type_kind) (ensuring correct ownership/borrowing to produce an Option<String>).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 24bb0714-2ff5-4601-862c-6abf0150e97e
⛔ Files ignored due to path filters (1)
tests/property_tests.rsis excluded by none and included by none
📒 Files selected for processing (16)
dist-workspace.tomldocs/src/architecture.mddocs/src/evaluator.mdsrc/evaluator/engine/mod.rssrc/evaluator/engine/tests/helpers/meta.rssrc/evaluator/engine/tests/helpers/mod.rssrc/evaluator/engine/tests/meta_default_clear_indirect_tests.rssrc/evaluator/engine/tests/mod.rssrc/evaluator/offset/mod.rssrc/lib.rssrc/output/format.rssrc/parser/grammar/mod.rssrc/parser/grammar/tests/meta_types.rssrc/parser/grammar/tests/mod.rssrc/parser/name_table.rssrc/tests.rs
| "actions/download-artifact" = "3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c" | ||
| "actions/attest-build-provenance" = "a2bbfa25375fe432b6a289bc6b6cd05ecd0c4c32" | ||
| "actions/upload-artifact" = "v7.0.0" | ||
| "actions/upload-artifact" = "v7.0.1" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
repo="actions/upload-artifact"
tag="v7.0.1"
ref_json="$(curl -fsSL "https://api.github.com/repos/${repo}/git/ref/tags/${tag}")"
obj_type="$(jq -r '.object.type' <<<"$ref_json")"
obj_sha="$(jq -r '.object.sha' <<<"$ref_json")"
if [[ "$obj_type" == "tag" ]]; then
commit_sha="$(curl -fsSL "https://api.github.com/repos/${repo}/git/tags/${obj_sha}" | jq -r '.object.sha')"
else
commit_sha="$obj_sha"
fi
echo "Resolved commit SHA for ${repo}@${tag}: ${commit_sha}"Repository: EvilBit-Labs/libmagic-rs
Length of output: 164
Pin actions/upload-artifact to an immutable commit SHA, not a tag.
Using v7.0.1 leaves CI exposed to tag retargeting. Pin to the resolved commit SHA for reproducible and tamper-resistant builds.
Suggested change
-"actions/upload-artifact" = "v7.0.1"
+"actions/upload-artifact" = "043fb46d1a93c77aae656e7c1c64a875d1fc6a0a"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dist-workspace.toml` at line 56, The workflow pins "actions/upload-artifact"
to the mutable tag "v7.0.1", which is insecure; locate the entry with the key
"actions/upload-artifact" and replace the tag value "v7.0.1" with the
corresponding immutable commit SHA (the full 40-char commit hash resolved from
the GitHub Actions marketplace or the action's repo) so the TOML contains the
exact commit SHA instead of the tag for reproducible CI.
| ```mermaid | ||
| sequenceDiagram | ||
| participant ER as evaluate_rules | ||
| participant ESR as evaluate_single_rule_with_anchor | ||
| participant NT as NameTable | ||
| participant RR as root_rules | ||
|
|
||
| ER->>ESR: rule (Use "part2") | ||
| ESR->>NT: lookup("part2") | ||
| NT-->>ESR: Vec<MagicRule> (subroutine) | ||
| ESR->>ER: evaluate_rules(subroutine, buffer, ctx) | ||
| ER-->>ESR: subroutine matches | ||
| ESR-->>ER: Ok(Some(offset, value)) + merged matches | ||
|
|
||
| ER->>ESR: rule (Indirect) | ||
| ESR->>RR: clone root_rules | ||
| ESR->>ER: evaluate_rules(root_rules, sub_buffer, ctx) | ||
| ER-->>ESR: inner matches | ||
| ESR-->>ER: Ok(Some(offset, value)) + merged matches | ||
| ``` |
There was a problem hiding this comment.
Sequence diagram still shows the old dispatcher.
The prose above correctly says meta directives are handled inline in evaluate_rules, but this diagram routes Use and Indirect through evaluate_single_rule_with_anchor. That will send readers to the wrong place when tracing the control flow.
As per coding guidelines: "docs/**: Review documentation quality, completeness, and accuracy. Ensure comprehensive API documentation, usage examples, and migration guides."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/src/evaluator.md` around lines 374 - 393, The sequence diagram is
incorrect: meta directives "Use" and "Indirect" are handled inline in
evaluate_rules, not routed through evaluate_single_rule_with_anchor; update the
diagram so that the arrows for Use and Indirect go from ER (evaluate_rules)
directly to the relevant components (e.g., NameTable lookup for Use and
cloning/using root_rules for Indirect) and back to ER, remove or change any
arrows that currently route through ESR (evaluate_single_rule_with_anchor), and
keep participants names (evaluate_rules, evaluate_single_rule_with_anchor,
NameTable, root_rules) so readers can trace the correct inlined control flow.
|
|
||
| ```rust | ||
| use libmagic_rs::{evaluate_rules, EvaluationConfig}; | ||
| use libmagic_rs::{evaluate_rules, EvaluationConfig, EvaluationContext}; |
There was a problem hiding this comment.
Import evaluate_rules from libmagic_rs::evaluator.
These snippets currently use use libmagic_rs::{evaluate_rules, ...};, but the crate root does not re-export evaluate_rules. As written, the examples will not compile.
Suggested fix pattern
-use libmagic_rs::{evaluate_rules, EvaluationConfig, EvaluationContext};
+use libmagic_rs::{EvaluationConfig, EvaluationContext};
+use libmagic_rs::evaluator::evaluate_rules;As per coding guidelines: "docs/**: Review documentation quality, completeness, and accuracy. Ensure comprehensive API documentation, usage examples, and migration guides."
Also applies to: 548-548, 569-569, 590-590
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/src/evaluator.md` at line 520, The docs import is wrong: update the use
statements so evaluate_rules is imported from the evaluator module (e.g. use
libmagic_rs::evaluator::{evaluate_rules, EvaluationConfig, EvaluationContext};)
instead of from the crate root; change every occurrence that currently uses
libmagic_rs::{evaluate_rules, ...} (including the spots mentioned for lines 520,
548, 569, 590) to reference libmagic_rs::evaluator::evaluate_rules (and bring
EvaluationConfig and EvaluationContext into scope from the same evaluator
module) so the examples compile.
| /// Returns `Ok((Some(absolute_offset), matches))` on a successful resolution | ||
| /// (even if the subroutine produced no matches), or `Ok((None, vec![]))` |
There was a problem hiding this comment.
Update the helper contract to say it returns the terminal anchor.
The implementation below no longer returns the resolved use-site offset; it returns the subroutine's terminal anchor, and the caller relies on that for subsequent &N chaining. The stale Some(absolute_offset) wording is now misleading.
As per coding guidelines: "src/**/*.rs: Document magic file syntax support, libmagic compatibility, design decisions, and trade-offs in code comments for complex algorithms."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/evaluator/engine/mod.rs` around lines 280 - 281, Update the helper
contract comment that currently reads "Returns `Ok((Some(absolute_offset),
matches))` ..." to accurately state that the function returns the subroutine's
terminal anchor (e.g., "Returns `Ok((Some(terminal_anchor), matches))`") rather
than the resolved use-site offset; mention that the returned terminal anchor is
what callers use for subsequent `&N` chaining and keep the `Ok((None, vec![]))`
case unchanged, ensuring the docstring for the helper in mod.rs reflects this
behavior and follows the project's documentation guidance for complex algorithm
comments.
| pub fn default_rule(message: &str, children: Vec<MagicRule>) -> MagicRule { | ||
| MagicRule { | ||
| offset: OffsetSpec::Absolute(0), | ||
| typ: TypeKind::Meta(MetaType::Default), | ||
| op: Operator::Equal, | ||
| value: Value::Uint(0), | ||
| message: message.to_string(), | ||
| children, | ||
| level: 0, | ||
| strength_modifier: None, | ||
| } | ||
| } | ||
|
|
||
| /// Build a `Clear` rule. Carries no message in the magic file syntax, but the | ||
| /// AST requires a message field. | ||
| pub fn clear_rule() -> MagicRule { | ||
| MagicRule { | ||
| offset: OffsetSpec::Absolute(0), | ||
| typ: TypeKind::Meta(MetaType::Clear), | ||
| op: Operator::Equal, | ||
| value: Value::Uint(0), | ||
| message: String::new(), | ||
| children: vec![], | ||
| level: 0, | ||
| strength_modifier: None, | ||
| } | ||
| } | ||
|
|
||
| /// Build a single byte-equality rule at `offset` for `value`. | ||
| pub fn byte_eq_rule(offset: i64, value: u64, message: &str) -> MagicRule { | ||
| MagicRule { | ||
| offset: OffsetSpec::Absolute(offset), | ||
| typ: TypeKind::Byte { signed: false }, | ||
| op: Operator::Equal, | ||
| value: Value::Uint(value), | ||
| message: message.to_string(), | ||
| children: vec![], | ||
| level: 0, | ||
| strength_modifier: None, | ||
| } | ||
| } | ||
|
|
||
| /// Build an `Indirect` rule at `offset` with optional children. | ||
| pub fn indirect_rule(offset: i64, message: &str, children: Vec<MagicRule>) -> MagicRule { | ||
| MagicRule { | ||
| offset: OffsetSpec::Absolute(offset), | ||
| typ: TypeKind::Meta(MetaType::Indirect), | ||
| op: Operator::Equal, | ||
| value: Value::Uint(0), | ||
| message: message.to_string(), | ||
| children, | ||
| level: 0, | ||
| strength_modifier: None, | ||
| } | ||
| } | ||
|
|
||
| /// Build an `Offset` rule at `offset` with an `x` (`AnyValue`) operator and | ||
| /// the given message. Mirrors `default_rule`/`indirect_rule` helpers. | ||
| pub fn offset_rule(offset: i64, message: &str, children: Vec<MagicRule>) -> MagicRule { | ||
| MagicRule { | ||
| offset: OffsetSpec::Absolute(offset), | ||
| typ: TypeKind::Meta(MetaType::Offset), | ||
| op: Operator::AnyValue, | ||
| value: Value::Uint(0), | ||
| message: message.to_string(), | ||
| children, | ||
| level: 0, | ||
| strength_modifier: None, | ||
| } | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider collapsing repeated MagicRule builder boilerplate.
The repeated literals are consistent today, but centralizing shared fields would reduce maintenance risk if MagicRule changes again.
Refactor sketch
+fn build_meta_rule(
+ offset: i64,
+ meta: MetaType,
+ op: Operator,
+ message: &str,
+ children: Vec<MagicRule>,
+) -> MagicRule {
+ MagicRule {
+ offset: OffsetSpec::Absolute(offset),
+ typ: TypeKind::Meta(meta),
+ op,
+ value: Value::Uint(0),
+ message: message.to_string(),
+ children,
+ level: 0,
+ strength_modifier: None,
+ }
+}
+
pub fn default_rule(message: &str, children: Vec<MagicRule>) -> MagicRule {
- MagicRule {
- offset: OffsetSpec::Absolute(0),
- typ: TypeKind::Meta(MetaType::Default),
- op: Operator::Equal,
- value: Value::Uint(0),
- message: message.to_string(),
- children,
- level: 0,
- strength_modifier: None,
- }
+ build_meta_rule(0, MetaType::Default, Operator::Equal, message, children)
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/evaluator/engine/tests/helpers/meta.rs` around lines 69 - 138, Several
tests repeat the same MagicRule struct literal; extract the shared fields into a
single constructor helper (e.g., a private fn like build_magic_rule(offset:
OffsetSpec, typ: TypeKind, op: Operator, value: Value, message: String,
children: Vec<MagicRule>) -> MagicRule) and have default_rule, clear_rule,
byte_eq_rule, indirect_rule, and offset_rule call that helper with only the
differing arguments (for byte_eq_rule provide signed/type/value variations, for
others pass Meta types and op). Update references to MagicRule, OffsetSpec,
TypeKind, Operator, Value, message and children in the helper so future changes
to the common fields need only one update.
| } else if let Some(rest) = input.strip_prefix('&') { | ||
| // Relative offset: `&N`, `&+N`, or `&-N`. `parse_number` handles the | ||
| // bare and `-`-prefixed cases natively; `+` is consumed manually | ||
| // (see the indirect-offset adjustment parser for the same pattern). | ||
| let (rest, value) = if let Some(after_plus) = rest.strip_prefix('+') { | ||
| parse_number(after_plus)? | ||
| } else { | ||
| parse_number(rest)? | ||
| }; | ||
| let (rest, _) = multispace0(rest)?; | ||
| Ok((rest, OffsetSpec::Relative(value))) |
There was a problem hiding this comment.
Reject &+-N in the relative-offset parser.
After the explicit + is stripped, this branch still hands the remainder to parse_number, so &+-4 currently parses as OffsetSpec::Relative(-4) instead of failing. That lets malformed magic syntax through silently.
Proposed fix
} else if let Some(rest) = input.strip_prefix('&') {
// Relative offset: `&N`, `&+N`, or `&-N`. `parse_number` handles the
// bare and `-`-prefixed cases natively; `+` is consumed manually
// (see the indirect-offset adjustment parser for the same pattern).
let (rest, value) = if let Some(after_plus) = rest.strip_prefix('+') {
+ if after_plus.starts_with(['+', '-']) {
+ return Err(nom::Err::Error(NomError::new(
+ after_plus,
+ nom::error::ErrorKind::Verify,
+ )));
+ }
parse_number(after_plus)?
} else {
parse_number(rest)?
};🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/parser/grammar/mod.rs` around lines 198 - 208, The relative-offset branch
currently accepts "&+-N" because after stripping the '+' it still calls
parse_number on a '-' prefix; modify the '+' branch in the parser so that when
rest.strip_prefix('+') returns after_plus you first check that after_plus does
not start with '-' and if it does return a parse error (rather than calling
parse_number), otherwise proceed to call parse_number(after_plus); this change
touches the branch that uses rest.strip_prefix('+'), parse_number, multispace0
and returns OffsetSpec::Relative.
| // For AnyValue (`x`), no operand is needed -- treat remaining text as message. | ||
| // For string-family types, fall back to a bare (unquoted) single-token | ||
| // literal if the strict `parse_value` alternatives all fail. magic(5) | ||
| // syntax permits writing `string TEST` or `search/12 ABC` without | ||
| // surrounding quotes, and this fallback supports that form without | ||
| // relaxing value parsing for non-string types (where `xyz` must | ||
| // still be rejected -- see `test_parse_value_invalid_input`). | ||
| let is_string_family_type = matches!( | ||
| typ, | ||
| TypeKind::String { .. } | ||
| | TypeKind::PString { .. } | ||
| | TypeKind::Regex { .. } | ||
| | TypeKind::Search { .. } | ||
| ); | ||
| let (input, value) = if op == Operator::AnyValue { | ||
| (input, Value::Uint(0)) | ||
| } else if is_string_family_type { | ||
| match parse_value(input) { | ||
| Ok(ok) => ok, | ||
| Err(orig_err) => parse_bare_string_value(input).map_err(|_| orig_err)?, | ||
| } |
There was a problem hiding this comment.
String-family bare operands still lose to numeric/hex parsing.
This only falls back to parse_bare_string_value when parse_value fails, so bare operands like string 123, string 7f, or search/16 0x10 are still parsed as Uint/Bytes instead of literal text. Those are valid unquoted string/pattern operands, so the rule will never match as intended.
Proposed fix
- } else if is_string_family_type {
- match parse_value(input) {
- Ok(ok) => ok,
- Err(orig_err) => parse_bare_string_value(input).map_err(|_| orig_err)?,
- }
+ } else if is_string_family_type {
+ let trimmed = input.trim_start();
+ if trimmed.starts_with('"') || trimmed.starts_with('\\') {
+ parse_value(input)?
+ } else {
+ parse_bare_string_value(input)?
+ }
} else {
parse_value(input)?
};Please add regression cases for numeric-looking and hex-looking bare tokens once this branch is tightened. As per coding guidelines: “src/parser/**: Focus on magic file DSL parsing ... and robust parsing logic.”
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // For AnyValue (`x`), no operand is needed -- treat remaining text as message. | |
| // For string-family types, fall back to a bare (unquoted) single-token | |
| // literal if the strict `parse_value` alternatives all fail. magic(5) | |
| // syntax permits writing `string TEST` or `search/12 ABC` without | |
| // surrounding quotes, and this fallback supports that form without | |
| // relaxing value parsing for non-string types (where `xyz` must | |
| // still be rejected -- see `test_parse_value_invalid_input`). | |
| let is_string_family_type = matches!( | |
| typ, | |
| TypeKind::String { .. } | |
| | TypeKind::PString { .. } | |
| | TypeKind::Regex { .. } | |
| | TypeKind::Search { .. } | |
| ); | |
| let (input, value) = if op == Operator::AnyValue { | |
| (input, Value::Uint(0)) | |
| } else if is_string_family_type { | |
| match parse_value(input) { | |
| Ok(ok) => ok, | |
| Err(orig_err) => parse_bare_string_value(input).map_err(|_| orig_err)?, | |
| } | |
| // For AnyValue (`x`), no operand is needed -- treat remaining text as message. | |
| // For string-family types, fall back to a bare (unquoted) single-token | |
| // literal if the strict `parse_value` alternatives all fail. magic(5) | |
| // syntax permits writing `string TEST` or `search/12 ABC` without | |
| // surrounding quotes, and this fallback supports that form without | |
| // relaxing value parsing for non-string types (where `xyz` must | |
| // still be rejected -- see `test_parse_value_invalid_input`). | |
| let is_string_family_type = matches!( | |
| typ, | |
| TypeKind::String { .. } | |
| | TypeKind::PString { .. } | |
| | TypeKind::Regex { .. } | |
| | TypeKind::Search { .. } | |
| ); | |
| let (input, value) = if op == Operator::AnyValue { | |
| (input, Value::Uint(0)) | |
| } else if is_string_family_type { | |
| let trimmed = input.trim_start(); | |
| if trimmed.starts_with('"') || trimmed.starts_with('\\') { | |
| parse_value(input)? | |
| } else { | |
| parse_bare_string_value(input)? | |
| } | |
| } else { | |
| parse_value(input)? | |
| }; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/parser/grammar/mod.rs` around lines 843 - 863, The branch for
string-family types currently calls parse_value first so bare tokens like "123"
or "0x7f" get parsed as Uint/Bytes instead of literal strings; change the logic
in the is_string_family_type branch to prefer parse_bare_string_value for
unquoted single-token inputs (i.e., try parse_bare_string_value(input) first and
return that when it succeeds), falling back to parse_value only if the
bare-string parse fails, while keeping the Operator::AnyValue path unchanged;
update tests for parse_value/parse_bare_string_value to include numeric-looking
and hex-looking bare tokens for TypeKind::String/PString/Regex/Search to ensure
the rule matches the literal text.
## 🤖 New release
* `libmagic-rs`: 0.5.0 -> 0.6.0 (⚠ API breaking changes)
### ⚠ `libmagic-rs` breaking changes
```text
--- failure constructible_struct_adds_field: externally-constructible struct adds field ---
Description:
A pub struct constructible with a struct literal has a new pub field. Existing struct literals must be updated to include the new field.
ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/constructible_struct_adds_field.ron
Failed in:
field MagicRule.value_transform in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:1189
field MagicRule.value_transform in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:1189
field MagicRule.value_transform in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:1189
--- failure copy_impl_added: type now implements Copy ---
Description:
A public type now implements Copy, causing non-move closures to capture it by reference instead of moving it.
ref: rust-lang/rust#100905
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/copy_impl_added.ron
Failed in:
libmagic_rs::mime::MimeMapper in /tmp/.tmpwFvgw1/libmagic-rs/src/mime.rs:98
--- failure enum_marked_non_exhaustive: enum marked #[non_exhaustive] ---
Description:
A public enum has been marked #[non_exhaustive]. Pattern-matching on it outside of its crate must now include a wildcard pattern like `_`, or it will fail to compile.
ref: https://doc.rust-lang.org/cargo/reference/semver.html#attr-adding-non-exhaustive
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/enum_marked_non_exhaustive.ron
Failed in:
enum OffsetSpec in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:198
enum OffsetSpec in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:198
enum OffsetSpec in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:198
enum LibmagicError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:15
enum LibmagicError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:15
enum IoError in /tmp/.tmpwFvgw1/libmagic-rs/src/io/mod.rs:26
enum Operator in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:838
enum Operator in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:838
enum Operator in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:838
enum TypeReadError in /tmp/.tmpwFvgw1/libmagic-rs/src/evaluator/types/mod.rs:56
enum ParseError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:74
enum ParseError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:74
enum Value in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:965
enum Value in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:965
enum Value in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:965
enum TypeKind in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:398
enum TypeKind in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:398
enum TypeKind in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:398
enum EvaluationError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:148
enum EvaluationError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:148
--- failure enum_struct_variant_field_added: pub enum struct variant field added ---
Description:
An enum's exhaustive struct variant has a new field, which has to be included when constructing or matching on this variant.
ref: https://doc.rust-lang.org/reference/attributes/type_system.html#the-non_exhaustive-attribute
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/enum_struct_variant_field_added.ron
Failed in:
field base_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:251
field adjustment_op of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:266
field result_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:272
field base_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:251
field adjustment_op of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:266
field result_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:272
field base_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:251
field adjustment_op of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:266
field result_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:272
--- failure function_missing: pub fn removed or renamed ---
Description:
A publicly-visible function cannot be imported by its prior path. A `pub use` may have been removed, or the function itself may have been renamed or removed entirely.
ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/function_missing.ron
Failed in:
function libmagic_rs::parser::grammar::is_empty_line, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:1025
function libmagic_rs::parser::grammar::parse_strength_directive, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:846
function libmagic_rs::parser::grammar::parse_type_and_operator, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:683
function libmagic_rs::parser::grammar::parse_offset, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:179
function libmagic_rs::parser::parse_offset, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:179
function libmagic_rs::parser::grammar::parse_comment, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:1004
function libmagic_rs::parser::grammar::parse_message, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:810
function libmagic_rs::parser::grammar::parse_value, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:633
function libmagic_rs::parser::grammar::parse_number, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:133
function libmagic_rs::parser::parse_number, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:133
function libmagic_rs::parser::grammar::has_continuation, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:1060
function libmagic_rs::parser::grammar::parse_magic_rule, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:946
function libmagic_rs::parser::grammar::parse_rule_offset, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:779
function libmagic_rs::parser::grammar::is_comment_line, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:1042
function libmagic_rs::parser::grammar::is_strength_directive, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:902
function libmagic_rs::parser::grammar::parse_type, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:749
function libmagic_rs::parser::grammar::parse_operator, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:227
--- failure function_parameter_count_changed: pub fn parameter count changed ---
Description:
A publicly-visible function now takes a different number of parameters.
ref: https://doc.rust-lang.org/cargo/reference/semver.html#fn-change-arity
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/function_parameter_count_changed.ron
Failed in:
libmagic_rs::evaluator::evaluate_single_rule now takes 3 parameters instead of 2, in /tmp/.tmpwFvgw1/libmagic-rs/src/evaluator/engine/mod.rs:196
--- failure inherent_method_missing: pub method removed or renamed ---
Description:
A publicly-visible method or associated fn is no longer available under its prior name. It may have been renamed or removed entirely.
ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/inherent_method_missing.ron
Failed in:
FileBuffer::create_symlink, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/io/mod.rs:326
EvaluationContext::increment_recursion_depth, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/evaluator/mod.rs:114
EvaluationContext::decrement_recursion_depth, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/evaluator/mod.rs:130
EvaluationContext::increment_recursion_depth, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/evaluator/mod.rs:114
EvaluationContext::decrement_recursion_depth, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/evaluator/mod.rs:130
--- failure module_missing: pub module removed or renamed ---
Description:
A publicly-visible module cannot be imported by its prior path. A `pub use` may have been removed, or the module may have been renamed, removed, or made non-public.
ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/module_missing.ron
Failed in:
mod libmagic_rs::parser::grammar, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:4
--- failure struct_marked_non_exhaustive: struct marked #[non_exhaustive] ---
Description:
A public struct has been marked #[non_exhaustive], which will prevent it from being constructed using a struct literal outside of its crate. It previously had no private fields, so a struct literal could be used to construct it outside its crate.
ref: https://doc.rust-lang.org/cargo/reference/semver.html#attr-adding-non-exhaustive
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/struct_marked_non_exhaustive.ron
Failed in:
struct EvaluationConfig in /tmp/.tmpwFvgw1/libmagic-rs/src/config.rs:42
```
<details><summary><i><b>Changelog</b></i></summary><p>
<blockquote>
## [0.6.0] - 2026-04-25
### Features
- **parser**: Add Date and QDate types with serialization support
([#165](#165))
- **parser**: Implement pstring (Pascal string) type
([#170](#170))
- **parser**: Implement pstring multi-byte length prefix variants (/B,
/H, /h, /L, /l, /J)
([#183](#183))
- **evaluator**: Add debug-level tracing for skipped rules
([#184](#184))
- **evaluator**: Implement indirect offset resolution
([#37](#37))
([#199](#199))
- **evaluator**: Implement relative offset resolution
([#38](#38))
([#211](#211))
- **deps**: Add new skills to actionbook/rust-skills and
trailofbits/skills
- **evaluator**: Regex and search types (closes #39)
([#214](#214))
- Implement libmagic meta-type directives and format substitution
([#42](#42))
([#230](#230))
### Bug Fixes
- **regex**: PR #214 follow-up review findings
([#215](#215))
- Load and correctly evaluate /usr/share/file/magic/filesystems and
adjacent magic files
([#233](#233))
### Documentation
- **gotchas**: Clarify requirements for adding TypeKind variants
### Miscellaneous Tasks
- Rename .coderabbitai.yaml to .coderabbit.yaml
- **Mergify**: Configuration update
([#173](#173))
- Update .gitignore to exclude local AI assistant files
- **mergify**: Upgrade configuration to current format
([#205](#205))
- Resolve all pending TODO items
([#212](#212))
- **mergify**: Upgrade configuration to current format
([#231](#231))
<!-- generated by git-cliff -->
### Security
- **io**: Close TOCTOU race in `FileBuffer::new` metadata validation
(CWE-367). `validate_file_metadata` now uses `File::metadata()` on the
open descriptor instead of re-canonicalizing the path, so an attacker
cannot swap the path between `open_file` and validation. Error paths now
report the caller-supplied path rather than the canonicalized variant.
- **cli**: Remove relative-path fallbacks from `default_magic_file_path`
(CWE-426). `./missing.magic`, `./third_party/magic.mgc`, and the
`CI`/`GITHUB_ACTIONS` env-var branch no longer resolve against the
process cwd. CI pipelines must pass `--magic <path>` explicitly.
- **evaluator**: `build_regex` now bounds `size_limit` and
`dfa_size_limit` to 1 MiB (`REGEX_COMPILE_SIZE_LIMIT`) to reject
compile-time DoS patterns (CWE-1333) from adversarial magic files.
### Features
- **parser**: Implement meta-type directives: `name`/`use` subroutines,
`default`/`clear` per-level fallback, and `indirect` re-evaluation.
`parse_text_magic_file` now returns `ParsedMagic { rules, name_table }`
(breaking change from `Vec<MagicRule>`). Named subroutines are hoisted
into `NameTable` at load time and dispatched via `RuleEnvironment` in
the evaluator. Recursion is bounded by
`EvaluationConfig::max_recursion_depth`. Resolves
[#42](#42).
- **evaluator**: Thread-local regex compile cache eliminates the
double-compile paid by every successful regex match.
`regex_bytes_consumed` now reuses the compiled `Regex` from `read_regex`
instead of recompiling the pattern to derive the anchor advance. The
cache is reset at the start of every `evaluate_rules_with_config` call,
bounding memory to one evaluation.
- **config**: `EvaluationConfig` is now `#[non_exhaustive]`; new
builder-style setters (`with_max_recursion_depth`,
`with_max_string_length`, `with_stop_at_first_match`, `with_mime_types`,
`with_timeout_ms`) let external crates construct configurations without
struct literals.
- **parser**: `MagicRule::new()` smart constructor with
`::with_children()`, `::with_strength_modifier()`, `::with_level()`
builder methods and a `::validate()` method enforcing structural
invariants (non-empty message, `level <= MAX_LEVEL`, children nested
strictly deeper than parent). New `MagicRuleValidationError` error type.
- **parser**: `RegexFlags::with_case_insensitive()` and
`::with_start_offset()` builder methods.
### Refactor
- **engine**: Extract `evaluate_pattern_rule()` and
`evaluate_value_rule()` helpers from
`evaluate_single_rule_with_anchor`'s 90-line body. Dispatch is now a
two-arm type-category split; each helper has focused rustdoc on
semantics and invariants.
- **types**: Replace the `_ =>` catch-all in
`bytes_consumed_with_pattern` with an explicit listing of the
fixed-width `TypeKind` variants. Adding a new variable-width variant
without updating this match is now a compile error instead of a silent
relative-offset anchor corruption in release builds.
- **parser**: Split the 185-line `type_keyword_to_kind` match into
per-family helpers (`byte_family`, `short_family`, `long_family`,
`quad_family`, `float_family`, `double_family`, `date_family`,
`qdate_family`, `string_family`). Drops the
`#[allow(clippy::too_many_lines)]` attribute.
- **main**: `main()` returns `std::process::ExitCode` instead of calling
`process::exit`, so destructors run on the happy path. Ctrl-C
`AtomicBool` flag uses `Ordering::Relaxed` instead of `SeqCst`.
- **grammar**: `parse_strength_directive` uses nom 8's `preceded` +
`Parser::map` instead of the legacy `map(pair(char(...), parse_number),
|(_, n)| ...)` pattern.
- **output**: Add `#[serde(skip_serializing_if = "Option::is_none",
default)]` to public `Option<T>` fields so JSON output no longer emits
`"field": null` for unset optional values.
### Documentation
- **lib**: Add `# Security` sections to
`MagicDatabase::with_builtin_rules`, `::with_builtin_rules_and_config`,
`::load_from_file`, and `::load_from_file_with_config` warning about the
unbounded default timeout and recommending
`EvaluationConfig::performance()` for untrusted input.
- **lib**: Document `MagicDatabase: Send + Sync` for parallel scanning.
- **README**: Update `TypeKind` enum example to match the current AST,
add `regex` and `search/N` to the supported types table, add pre-1.0 API
stability warning, correct the roadmap to mark v0.2-v0.4 as shipped.
- **AGENTS.md**: Relabel "Currently Implemented (v0.1.0)" and "Current
Limitations (v0.1.0)" to v0.5.0 and rewrite the Development Phases
section to reflect actual shipped scope.
### Testing
- Security regression tests for S-H1 (planted-magic-file in cwd), S-H2
(TOCTOU path-swap contract), S-M2 (pathological regex bounded runtime),
S-L2 (codegen message escape round-trip), and GOTCHAS S13.1
(`EvaluationConfig::default()` unbounded timeout invariant).
- Backspace message concatenation regression tests for first-match,
consecutive, and empty-rest edge cases.
- `MagicRule::validate()` tests covering empty message, child level
invariant, and max-depth rejection.
- `RegexCache` population/clear/reuse tests.
### Breaking Changes
- **parser**: `parse_text_magic_file` return type changed from
`Result<Vec<MagicRule>, ParseError>` to `Result<ParsedMagic,
ParseError>`. Callers must destructure `ParsedMagic { rules, name_table
}`. Low-level callers that only need the rule list can use
`parsed.rules`. `load_magic_file` and `load_magic_directory` return the
same new type.
</blockquote>
</p></details>
---
This PR was generated with
[release-plz](https://github.com/release-plz/release-plz/).
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Closes #42.
Summary
MetaTypevariants (Default,Clear,Name,Use,Indirect,Offset) are fully evaluated with GNUfile-compatible semantics:defaultfires only when no sibling at the same level matched,clearresets the flag,name/useenable subroutines with endian-flip viaRuleEnvironment,indirectre-applies the root rule set at the resolved offset bounded bymax_recursion_depth, andoffsetreports the resolved file position asValue::Uint(pos).src/output/format.rs::format_magic_message) supports%d,%i,%u,%x,%X,%o,%s,%c,%%with width/padding/length modifiers. Hex specifiers mask toTypeKind::bit_width()so signed-byte-1renders asff, not sign-extendedffffffffffffffff. Alt-form follows C printf:%#06x+0xab→0x00ab.base_offsetclose the semantic gaps that blocked byte-for-byte parity with GNUfile. Top-level siblings still chain per GOTCHAS S3.8 (regression-guarded).What changed
Phase 1 (AST + parser +
name/use+default/clear/indirectdispatch) and Phase 2 (this PR's close-out:offsetevaluator, printf substitution, parserx-strip fix, sibling-anchor semantic fix, subroutine offset bias) are bundled because they're a single cohesive issue-#42 unit.src/parser/ast.rsTypeKind::Meta(MetaType)enum with six variants;Offsetdoc comment updated to reflect live semanticssrc/parser/grammar/mod.rs,src/parser/name_table.rs,src/parser/loader.rs,src/parser/mod.rs,src/parser/types.rs,src/parser/codegen.rsstrip_optional_x_operatorhelper stopsxleaking into messages;extract_name_tablehoistsMeta::Namesubtrees at load time;ParsedMagic { rules, name_table }replaces the oldVec<MagicRule>parser returnsrc/evaluator/engine/mod.rs,src/evaluator/mod.rs,src/evaluator/offset/mod.rs,src/evaluator/strength.rs,src/evaluator/types/mod.rsis_child_sibling_listgate for continuation-sibling anchor reset,base_offsetonEvaluationContext, newresolve_offset_with_basesrc/output/format.rs(new),src/output/mod.rs,src/lib.rsformat_magic_messagepure function;MagicDatabase::concatenate_messagesruns messages through it before the\bcheck (GOTCHAS S14.1)tests/meta_types_integration.rs(new),src/evaluator/engine/tests.rs,tests/*test_searchbug_matches_full_result_string; 18 format unit tests; 7 Offset dispatch tests; syntheticdefault/clear/indirectscenariosAGENTS.md,GOTCHAS.md,docs/solutions/integration-issues/meta-type-subroutine-dispatch-architecture.md,docs/src/*.md,ROADMAP.md,CHANGELOG.md.github/workflows/release.yml,build.rs,src/build_helpers.rs,src/error.rsdist generaterefresh; Phase 1 plumbing forNameTable/RuleEnvironmentNet: 41 files changed, +4661 / -205 lines.
Test Plan
mise exec -- cargo nextest run— 1348/1348 pass, 5 pre-existing skippedmise exec -- just ci-check— green (fmt, clippy-D warnings, cargo test, nextest, audit, deny, machete, mdformat)test_searchbug_matches_full_result_stringproduces byte-for-byte parity withthird_party/tests/searchbug.result:Testfmt (0) found_ABC followed_by 0x31 at_offset 11 (64) found_ABC followed_by 0x32 at_offset 75unsafe,unwrap(), orexpect()in library coderelative_anchor_can_decrease_when_later_sibling_matches_at_lower_positionstill passes)MagicRuleserde back-compat: old serialized rules deserialize unchangedResidual items (separate follow-up PRs)
Documented in
.context/compound-engineering/ce-review/20260422-101802-31ab7e83/synthesis.md(local-only). Non-blocking:AnchorScopeinIndirectarm and manual save/restore inevaluate_use_ruleleave anchor state mutated onRecursionLimitExceedederror paths. Low-probability with defaultmax_recursion_depth=20; worth RAII refactor in its own PR.evaluate_children_gracefullyhelper is worthwhile but risks subtle behavior drift.max_indirect_depthcap (separate frommax_recursion_depth) make sense for 1.0? Needs a pathological-fixture smoke test first.Compatibility impact
Breaking behavior change (library consumers): rule messages containing
%that were previously passed through verbatim will now be interpreted as format specifiers. Literal%in user-facing data must be escaped as%%. Documented in GOTCHAS S14.2 rewrite. Recommend calling out in the changelog for the next release.