Skip to content

feat(evaluator): implement relative offset resolution (#38)#211

Merged
unclesp1d3r merged 13 commits into
mainfrom
38-evaluator-implement-relative-offset-resolution
Apr 8, 2026
Merged

feat(evaluator): implement relative offset resolution (#38)#211
unclesp1d3r merged 13 commits into
mainfrom
38-evaluator-implement-relative-offset-resolution

Conversation

@unclesp1d3r
Copy link
Copy Markdown
Member

@unclesp1d3r unclesp1d3r commented Apr 8, 2026

Summary

Implements OffsetSpec::Relative evaluator support per #38 using GNU file/libmagic semantics. Threads a "previous match end" anchor through EvaluationContext, advances it after each successful match by the bytes consumed (variable-width types include c-string NUL terminators and pstring length prefixes), and resolves Relative(delta) as last_match_end + delta with bounds and overflow checks.

Impact: 21 files changed (1984 insertions, 213 deletions). 945 lib tests + 14 relative-offset integration tests + 11 other integration suites + 163 doctests, all green. Clippy clean with -D warnings. CI: 22 pass, 0 failures, mergeable.

Review rounds completed: 4 internal passes (ce:review autofix, pr-review-toolkit ×2, Copilot PR reviewer). 0 critical/high findings outstanding.

Key design decisions

  • Anchor is global-shared across child recursion, not save/restored. Siblings see the anchor wherever the deepest descendant of the previous sibling left it — matches GNU file semantics. The anchor value may increase or decrease as successive rules match at different positions; it is not a high-watermark. Documented in GOTCHAS.md §3.8.
  • Top-level relative offsets resolve from anchor 0. Non-negative deltas behave like Absolute(N); negative deltas underflow to InvalidOffset (NOT interpreted like Absolute(-N)'s from-end semantics). Called out explicitly in the rustdoc for both resolve_offset and evaluate_single_rule.
  • Pascal-string consumption is clamped against the remaining buffer. Prevents attacker-controlled length prefixes (e.g., \xFF\xFF\xFF\xFF) from poisoning the anchor to usize::MAX and silently suppressing all subsequent relative-offset rules. Found by adversarial review (SEC-001 / ADV-001), regression-tested with both BE and LE oversized-prefix cases. Documented as a reusable learning in docs/solutions/security-issues/pstring-anchor-poisoning.md.
  • Fixed-width bytes_consumed is also bounds-checked. Per Copilot reviewer feedback, added offset.checked_add(width) <= buffer.len() guard so the helper is infallible regardless of caller discipline — no relying on "the engine only calls this after a successful read" as the only line of defense.
  • Engine-internal items are pub(crate): bytes_consumed, last_match_end() getter, set_last_match_end() setter, and resolve_offset_with_context are deliberately not stable API surface. The anchor-threading contract is internal and shouldn't lock in semver guarantees.

Behavior changes

OffsetSpec::Relative previously returned EvaluationError::UnsupportedType. It now resolves:

Entry point Relative(N) for N ≥ 0 Relative(-N) for N > 0
evaluate_rules + EvaluationContext Resolves against actual last_match_end + N last_match_end - N if in bounds, else InvalidOffset
evaluate_single_rule (no context) Resolves to absolute N InvalidOffset (anchor 0 underflow)
resolve_offset (no context) Same as evaluate_single_rule Same as evaluate_single_rule

Callers with existing error-handling code that pattern-matched UnsupportedType for relative offsets must remove that arm. Documented in the rustdoc for all three entry points.

Implementation

  • EvaluationContext::last_match_end field with pub(crate) getter/setter, cleared by reset() alongside current_offset and recursion_depth
  • evaluator::types::bytes_consumed (pub(crate)) with buffer clamping for both fixed-width and variable-width (c-string, pstring) branches; string_bytes_consumed and pstring_bytes_consumed re-derive consumption directly from the buffer (not from Value::String, which can drift via from_utf8_lossy)
  • evaluator::offset::relative::resolve_relative_offset with isize::try_from + checked_add_signed bounds, mapping over/underflow to InvalidOffset and out-of-buffer to BufferOverrun
  • evaluator::offset::resolve_offset_with_context (pub(crate)) dispatcher; existing resolve_offset (pub) delegates with last_match_end = 0 so existing call sites are unchanged
  • evaluator::engine::evaluate_single_rule_with_anchor private helper; evaluate_rules updates the anchor before recursing into children, never saves/restores (GNU file global semantics)
  • evaluate_rules graceful-skip arm for child-evaluation errors: debug!warn! with a tightened comment noting the asymmetry (parent emitted, child silently dropped if the defensive arm fires)

Test coverage

Unit tests (30+ new):

  • 4 tests for EvaluationContext::last_match_end (field / clone / reset / set-get)
  • 18 tests for bytes_consumed (fixed-width all types, c-string with/without NUL, max_length truncation, pstring 1/2/4-byte BE/LE, /J flag well-formed, /J underflow with multi-byte width, oversized prefix clamping BE+LE, boundary cases, usize::MAX overflow)
  • 10 tests for resolve_relative_offset (positive/negative/zero deltas, top-level anchor=0, last-valid-index, underflow, overflow at usize::MAX, target past end, target == buffer.len())
  • 5 dispatcher tests in offset/mod.rs (relative via context, top-level default, passthrough for Absolute/FromEnd/Indirect)
  • 3 tests in engine/tests.rs pinning evaluate_single_rule contracts (positive delta → absolute N, zero delta → start, negative delta → Err(InvalidOffset))
  • 1 test injecting a near-saturation anchor (usize::MAX) via the pub(crate) setter to verify all three Relative variants skip gracefully without panic

Integration tests (14 in tests/relative_offset_evaluation.rs):

  • Numeric chain (parent + relative child)
  • Positive/negative delta children
  • Three-level chain marching forward
  • String parent + relative child (NUL terminator accounting)
  • PString parent + relative child (prefix accounting)
  • Top-level zero anchor
  • Sibling propagation at top level
  • Non-matching intermediate sibling preserves anchor
  • Anchor decreases when later sibling matches at lower position (non-monotonic semantics pinned)
  • Context reset between buffers
  • Out-of-bounds skip
  • Underflow skip

Documentation

  • AGENTS.md — "Currently Implemented" / Module Organization / Quick Reference sections updated; docs/solutions/ pointer added for knowledge-store discoverability
  • GOTCHAS.md — new §3.8 documents the no-save-restore anchor semantics; §2.1 checklist updated to include bytes_consumed exhaustive-match requirement for new TypeKind variants
  • docs/src/evaluator.md — Offset Resolution section rewritten for the two-case Relative behavior; EvaluationContext "Key Methods" split into Public vs Internal (pub(crate)) sections; Implementation Status checklist updated for Evaluator: implement indirect offset resolution #37 and Evaluator: implement relative offset resolution #38
  • docs/solutions/security-issues/pstring-anchor-poisoning.md — NEW bug-track solution doc capturing the SEC-001 finding, failed first attempts, generalizable rule ("when advancing internal state by attacker-controlled byte counts, clamp against actual buffer reality, not raw input"), and prevention strategies
  • docs/solutions/developer-experience/rust-test-visibility-boundary.md — NEW knowledge-track doc capturing the lesson that tests needing pub(crate) items must live in src/.../tests.rs, not tests/

Scope notes

A handful of files in the diff are incidental and not part of the #38 implementation:

  • .bun-version + mise.lock + mise.toml — tool version pins that came in via merge-from-main
  • tessl.json — unrelated neonwatty/logo-designer-skill dependency added in commit b1f62fa
  • docs/src/architecture.md + docs/src/compatibility.md — Dosu auto-updates from commit 3cf5d5a

Reviewers can skim these; they don't need deep review as part of the #38 work.

Post-Deploy Monitoring & Validation

No additional operational monitoring required. This is a library change to evaluator semantics:

  • Runtime impact: magic rules with OffsetSpec::Relative now evaluate correctly instead of erroring with UnsupportedType
  • API impact: the public resolve_offset and evaluate_single_rule signatures are unchanged; internal items are pub(crate) and don't lock semver
  • Dependency impact: no new Rust crates; the unrelated tessl.json change is documented above
  • Performance: bytes_consumed adds one bounds check per successful match (negligible); memchr in string_bytes_consumed is O(n) over a region the existing read_string already scans, so worst case is a bounded 2x constant factor on string matches

Behavior change is documented in the rustdoc and AGENTS.md so downstream callers see the migration note inline.

Review history

  1. Initial implementation (c9a89f1) — feature + tests per the plan
  2. ce:review autofix pass (b7e5938) — 12 safe_auto fixes including the SEC-001 pstring anchor poisoning fix, visibility narrowing to pub(crate), doc gaps on context reuse
  3. Third-pass polish (4ebd9b7) — pr-review-toolkit feedback: "monotonic" wording fix, recursion_depth in context-reuse warning, 2 new tests
  4. Copilot PR review (cd3a5e0) — bounds guard on fixed-width bytes_consumed, Relative negative-delta clarification, debug_assert! wording fix, docs/src/evaluator.md Public/Internal method split
  5. Compound docs (41c75c3) — pstring-anchor-poisoning learning + docs/solutions/ discoverability pointer in AGENTS.md
  6. Fourth-pass pr-review-toolkit (58210ab) — 6 findings: stale resolve_offset equivalence claim in docs/src/evaluator.md, log::warn! style consistency, evaluate_single_rule(Relative(-1)) test coverage, stale comments, implementation status checklist
  7. Test visibility boundary learning (03f389b) — Rust developer-experience solution doc capturing the tests/ vs src/.../tests.rs gotcha hit during implementation

Test Plan

  • All plan units complete (6 of 6)
  • All plan requirements (R1-R9) verified met
  • just ci-check passes locally
  • cargo test passes (945 lib + 14 integration + 11 other suites + 163 doctest)
  • cargo clippy --all-targets -- -D warnings clean
  • cargo fmt --check clean
  • 4 internal review rounds addressed
  • CI green on push (22 pass, 7 release skipping, 0 failures)
  • Maintainer review

Closes #38

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Implement OffsetSpec::Relative evaluator support using GNU file/libmagic
semantics. The AST variant existed but the evaluator stub returned
UnsupportedType. This change threads a previous-match anchor through
EvaluationContext, advances it after each successful match by the bytes
consumed (variable-width types include c-string NUL terminators and
pstring length prefixes), and resolves Relative(delta) as
last_match_end + delta with bounds and overflow checks.

Top-level relative offsets resolve from anchor 0, matching libmagic.
The anchor is global-monotonic across child recursion (no save/restore)
so siblings see the deepest descendant's anchor -- documented in
GOTCHAS S3.8.

Pascal-string consumption is clamped against the remaining buffer to
prevent attacker-controlled length prefixes (e.g., 0xFFFFFFFF) from
poisoning the anchor to usize::MAX and silently suppressing all
subsequent relative-offset rules.

Engine-internal items (bytes_consumed, set_last_match_end,
resolve_offset_with_context) are pub(crate) -- the anchor-threading
contract is not stable API surface.

- Add EvaluationContext::last_match_end (pub(crate) getter/setter)
- Add evaluator::types::bytes_consumed (pub(crate)) with attacker-input
  clamping for pstring
- Implement evaluator::offset::relative::resolve_relative_offset with
  isize::try_from + checked_add_signed bounds
- Add resolve_offset_with_context (pub(crate)) dispatcher; resolve_offset
  delegates with anchor=0 for backwards-compatible callers
- Add evaluate_single_rule_with_anchor (private engine helper);
  evaluate_rules updates anchor before recursing into children
- Document the Relative=anchor 0 behavior on evaluate_single_rule and
  resolve_offset, and the context-reuse warning on evaluate_rules
- 30 new unit tests + 12 integration tests in
  tests/relative_offset_evaluation.rs covering positive/negative deltas,
  string/pstring parents, sibling propagation, top-level zero anchor,
  graceful skip on overrun/underflow, /J underflow with multi-byte width,
  oversized pstring prefix clamping, non-matching intermediate sibling,
  and context reset between buffers
- Update AGENTS.md and GOTCHAS.md S3.8

Closes #38

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Copilot AI review requested due to automatic review settings April 8, 2026 01:01
@unclesp1d3r unclesp1d3r linked an issue Apr 8, 2026 that may be closed by this pull request
6 tasks
@dosubot dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Apr 8, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 8, 2026

Important

Review skipped

Auto reviews are limited based on label configuration.

🏷️ Required labels (at least one) (3)
  • rust
  • testing
  • compatibility

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: dab101e1-ba4b-46e4-851a-84827e313058

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 38-evaluator-implement-relative-offset-resolution

Comment @coderabbitai help to get the list of available commands and usage tips.

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 8, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Require conventional commit format per https://www.conventionalcommits.org/en/v1.0.0/. Skipped for bots.

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?!?:

🟢 Full CI must pass

Wonderful, this rule succeeded.

All CI checks must pass. Release-plz PRs are exempt because they only bump versions and changelogs (code was already tested on main), and GITHUB_TOKEN-triggered force-pushes suppress CI.

  • check-success = coverage
  • check-success = quality
  • check-success = test
  • check-success = test-cross-platform (macos-latest, macOS)
  • check-success = test-cross-platform (ubuntu-22.04, Linux)
  • check-success = test-cross-platform (ubuntu-latest, Linux)
  • check-success = test-cross-platform (windows-latest, Windows)

🟢 Do not merge outdated PRs

Wonderful, this rule succeeded.

Make sure PRs are within 10 commits of the base branch before merging

  • #commits-behind <= 10

@dosubot dosubot Bot added the evaluator Rule evaluation engine and logic label Apr 8, 2026
@dosubot
Copy link
Copy Markdown
Contributor

dosubot Bot commented Apr 8, 2026

Related Documentation

3 document(s) may need updating based on files changed in this PR:

libMagic-rs

architecture /libmagic-rs/blob/main/docs/src/architecture.md — ⏳ Awaiting Merge
compatibility /libmagic-rs/blob/main/docs/src/compatibility.md — ⏳ Awaiting Merge
evaluator /libmagic-rs/blob/main/docs/src/evaluator.md — ⏳ Awaiting Merge

Note: You must be authenticated to accept/decline updates.

How did I do? Any feedback?

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements evaluator support for OffsetSpec::Relative (issue #38) by threading a “previous match end” anchor through EvaluationContext and using it during offset resolution, aligning behavior with GNU file/libmagic relative-offset semantics.

Changes:

  • Add relative-offset resolution (last_match_end + delta) with overflow/underflow + bounds checks and a context-aware offset dispatcher.
  • Track and advance EvaluationContext::last_match_end after each successful match using new types::bytes_consumed (including C-string NUL and pstring prefix semantics).
  • Add extensive unit + integration tests and update project documentation/tooling pins.

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/evaluator/offset/relative.rs Implements relative offset resolution with error mapping and unit tests.
src/evaluator/offset/mod.rs Adds resolve_offset_with_context and makes resolve_offset delegate with anchor=0; updates tests/docs.
src/evaluator/mod.rs Extends EvaluationContext with last_match_end getter/setter and reset behavior.
src/evaluator/engine/mod.rs Threads anchor into rule evaluation and advances it via bytes_consumed; documents behavior.
src/evaluator/types/mod.rs Adds bytes_consumed + helpers for string/pstring consumption (anchor advancement).
src/evaluator/types/tests.rs Adds unit tests for bytes_consumed.
src/evaluator/tests.rs Adds tests for EvaluationContext::last_match_end and reset/clone behavior.
tests/relative_offset_evaluation.rs Integration tests covering relative chains, nesting, bounds failures, and context reset.
GOTCHAS.md Documents relative-offset anchor semantics (no save/restore across recursion).
AGENTS.md Updates evaluator/offset module docs and “currently implemented” feature list.
mise.toml Pins several dev tool versions (was latest for some).
mise.lock Updates lockfile to match tool pin changes.
.bun-version Bumps Bun version to 1.3.11.

Comment thread src/evaluator/offset/mod.rs Outdated
Comment thread src/evaluator/engine/mod.rs
Comment thread src/evaluator/types/mod.rs Outdated
Comment thread src/evaluator/types/tests.rs Outdated
Comment thread src/evaluator/mod.rs Outdated
Comment thread src/evaluator/engine/mod.rs Outdated
Comment thread GOTCHAS.md Outdated
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 8, 2026

Codecov Report

❌ Patch coverage is 90.00000% with 23 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/evaluator/offset/relative.rs 89.13% 10 Missing ⚠️
src/evaluator/types/mod.rs 86.11% 10 Missing ⚠️
src/evaluator/engine/mod.rs 85.71% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

Apply 8 fixes from the pr-review-toolkit second-pass review (PR #211):

- Replace "monotonic" anchor language in EvaluationContext field doc and
  the engine inline comment with accurate "may increase or decrease"
  wording. The anchor reflects the end of the most recent match, not a
  high-watermark, and can move backwards when a sibling matches at a
  lower position.
- Replace bytes_consumed _ => 0 catch-all with debug_assert! that fires
  on unhandled variable-width TypeKind variants. Update GOTCHAS S2.1 to
  list bytes_consumed in the new-TypeKind-variant exhaustive-match
  checklist; previously a future variable-width variant could silently
  corrupt the relative-offset anchor with no signal.
- Fix AGENTS.md Module Organization comment that still labeled
  src/evaluator/offset/indirect.rs as "stub (issue #37)" -- #37 shipped
  in commit 6426d18.
- Add 2 unit tests in src/evaluator/engine/tests.rs covering
  evaluate_single_rule with OffsetSpec::Relative (delta != 0 and
  delta == 0). The public API's anchor=0 contract had no test coverage
  at that layer; integration tests all went through evaluate_rules.
- Tighten bytes_consumed rustdoc to scope the "returns 0 for unexpected
  inputs" claim to variable-width types only -- the fixed-width branch
  ignores offset validity (correct under the engine invariant, but the
  doc was imprecise).
- Extend evaluate_rules rustdoc context-reuse warning to cover the
  Err return path (Timeout / RecursionLimitExceeded leave the anchor
  partially advanced; callers retrying must call reset()).
- Split test_bytes_consumed_offset_at_buffer_end_returns_zero into two
  tests with accurate names: test_bytes_consumed_string_at_past_end_
  returns_zero and test_bytes_consumed_fixed_width_ignores_offset_
  validity. The original test name claimed "returns_zero" but the
  fixed-width assertion returned 1 (the bit_width fallback).
- Document EvaluationError::InternalError in resolve_relative_offset's
  # Errors section, noting it propagates instead of being caught by the
  engine's graceful-skip arm.

All 943 lib tests + 12 relative-offset integration tests + 13 other
suites + 163 doctests passing. Clippy and fmt clean.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Copilot AI review requested due to automatic review settings April 8, 2026 01:33
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 18 changed files in this pull request and generated 2 comments.

Comment thread docs/src/evaluator.md
Comment thread src/evaluator/types/mod.rs Outdated
Apply 9 follow-up items from comment-analyzer, silent-failure-hunter,
and code-reviewer that fell below the first synthesis pass:

- Upgrade defensive child-error arm in evaluate_rules from debug! to
  warn!. If this arm fires, the parent match is still emitted but the
  entire child subtree is silently dropped -- previously logged at debug
  with no signal that a partial classification was returned.
- Tighten bytes_consumed rustdoc NUL semantics: distinguish nul_index+1
  (byte after the NUL) from window-size return when no NUL is found,
  and explicitly state that the NUL inclusion is intentional GNU `file`
  semantics for chained record parsing.
- Document Relative(0)-after-string semantics on string_bytes_consumed
  (anchor lands one byte past the NUL terminator).
- Add recursion_depth to the evaluate_rules context-reuse warning -- it
  also leaks across calls if the caller forgets reset().
- Clarify set_last_match_end doc that reset() clears the anchor,
  current offset, and recursion depth together (not just the anchor).
- Reword "prior to v0.5" to "before the relative-offset feature landed
  in v0.5" in resolve_offset and evaluate_single_rule rustdocs (the
  v0.5 file referencing "prior to v0.5" reads oddly).
- Narrow GOTCHAS S3.8 push-ordering claim to "incidental for anchor
  correctness" so a future caller iterating matches alongside the
  context anchor is not misled.
- Add integration test that pins the documented "anchor may decrease"
  behavior: rule A matches at offset 8, rule B matches at offset 2,
  rule C uses Relative(0) and reads at offset 3 (not 12).
- Add unit test that injects a near-saturation anchor (usize::MAX) via
  the pub(crate) setter and verifies subsequent Relative(0)/(+1)/(-1)
  rules skip gracefully without panic.

944 lib tests + 13 relative-offset integration tests + 13 other suites
+ 163 doctests passing. Clippy and fmt clean.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
coderabbitai[bot]
coderabbitai Bot previously approved these changes Apr 8, 2026
Apply 4 review threads from PR #211:

- evaluate_single_rule rustdoc no longer claims Relative is "equivalent
  to absolute of the same delta". Negative deltas underflow the anchor
  and return InvalidOffset, while Absolute(-N) means from-end -- the two
  are NOT equivalent. Same fix applied to resolve_offset rustdoc which
  had the same misleading wording.
- bytes_consumed: bounds-check the fixed-width branch instead of just
  documenting the gap. The function now returns 0 if offset + width
  exceeds buffer.len() (mirroring the variable-width path). The engine
  guarantees in-bounds invocation in practice, but the guard makes the
  contract self-consistent for any future caller. Test updated from
  pinning the old "ignores offset" behavior to pinning the new
  bounds-checked behavior at the boundary.
- bytes_consumed catch-all comment no longer says debug_assert "logs"
  the case -- debug_assert! panics in test/dev builds and compiles out
  in release. Reworded to reflect the actual behavior.
- docs/src/evaluator.md: split EvaluationContext methods into "Public
  Methods" and "Internal (pub(crate))" sections so external users see
  that last_match_end / set_last_match_end are not callable from
  outside the crate. Direct external users to evaluate_rules +
  EvaluationContext for relative-offset behavior.

944 lib tests + 13 relative-offset integration tests + 13 other suites
+ 163 doctests passing. Clippy and fmt clean.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Copilot AI review requested due to automatic review settings April 8, 2026 01:51
coderabbitai[bot]
coderabbitai Bot previously approved these changes Apr 8, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 18 changed files in this pull request and generated 4 comments.

Comment thread src/evaluator/types/mod.rs Outdated
Comment thread src/evaluator/types/mod.rs Outdated
Comment thread mise.toml
Comment thread .bun-version
- Add docs/solutions/security-issues/pstring-anchor-poisoning.md capturing
  the SEC-001 finding from PR #211 review: attacker-controlled 4-byte
  pstring length prefix poisoning the GNU `file` previous-match anchor.
  Includes the failed first attempt (documenting the gap), the working
  fix (clamping payload against remaining buffer), why-this-works
  rationale, and prevention rules generalizing to any future variable-
  width parser advancing internal state from attacker input.
- Cross-references the two existing indirect-offset learnings as the
  sibling work that established the offset-resolver patterns this PR
  followed.
- Add docs/solutions/ pointer to AGENTS.md Quick Reference so future
  contributors and agents (without the compound-engineering plugin) can
  discover the searchable knowledge store of past learnings.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Apply 6 findings from the fourth internal review pass (code-reviewer,
pr-test-analyzer, comment-analyzer) on PR #211:

- docs/src/evaluator.md: the Offset Resolution section still contained
  the old "Relative(N) resolves as if it were Absolute(N)" claim that
  directly contradicts the rustdoc fix already applied in cd3a5e0.
  Rewrote the paragraph with the two-case split (non-negative delta vs
  negative delta underflow -> InvalidOffset).
- docs/src/evaluator.md: Implementation Status checklist showed
  "[ ] Indirect offset support" but #37 shipped in commit 6426d18.
  Ticked it off and added the new "[x] Relative offset support".
- src/evaluator/engine/mod.rs: style consistency fix. The new defensive
  warn! call used `log::warn!(...)` path-qualified while the file
  imports `use log::debug;` and uses `debug!` bare. Changed import to
  `use log::{debug, warn};` and call site to bare `warn!(...)` per the
  project-wide convention.
- src/evaluator/engine/tests.rs: add
  test_evaluate_single_rule_relative_negative_with_zero_anchor_errors
  pinning the documented contract that evaluate_single_rule with
  OffsetSpec::Relative(-1) returns Err(InvalidOffset { offset: -1 }),
  not Ok(None). Previously the contract had zero test coverage at the
  public API boundary.
- src/evaluator/types/tests.rs: stale comment on
  test_bytes_consumed_fixed_width_types said "Buffer is irrelevant for
  fixed-width types; bytes_consumed reads bit_width". The bounds guard
  added in cd3a5e0 makes the buffer relevant. Updated the comment to
  reflect that the 16-byte buffer is sized to fit all tested widths and
  points to the companion test that exercises the guard's 0-return path.
- tests/relative_offset_evaluation.rs: remove the abandoned drafting
  comment "offset 0: 0x42 (matched by rule_b at offset 2 via
  Absolute(2)... wait, we need rule_a to match HIGHER...)" from
  relative_anchor_can_decrease_when_later_sibling_matches_at_lower_position.
  The final layout description that follows is accurate; the mid-
  sentence course-correction was a draft artifact.

945 lib tests + 13 relative-offset integration tests + 13 other suites
+ 163 doctests passing. Clippy and fmt clean.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Copilot AI review requested due to automatic review settings April 8, 2026 03:05
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 19 changed files in this pull request and generated 4 comments.

Comment thread docs/src/compatibility.md
Comment on lines 72 to 78
| Feature | GNU file | rmagic | Status | Notes |
| ------------------ | -------- | ------ | ----------- | ---------------------------- |
| Basic patterns | ✅ | ✅ | Complete | String, numeric matching |
| Hierarchical rules | ✅ | 🔄 | In Progress | Parent-child relationships |
| Indirect offsets | ✅ | 📋 | Planned | Pointer dereferencing |
| Relative offsets | ✅ | 📋 | Planned | Position-relative addressing |
| Relative offsets | ✅ | | Complete | Position-relative addressing (PR #211) |
| Search patterns | ✅ | 📋 | Planned | Pattern searching in ranges |
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Magic File Format Compatibility matrix, both offset-related rows look inconsistent with current implementation details:

  • Indirect offsets is marked as “📋 Planned”, but resolve_indirect_offset is implemented and the parser grammar already supports (base.type) syntax.
  • Relative offsets is marked “✅ Complete”, but magic-file &+N/&-N parsing is still TODO (relative offsets are only reachable programmatically via the AST).
    Consider adjusting the table to reflect this split (e.g., indirect ✅, relative 🔄/partial with a note that evaluation is implemented but parser wiring is pending).

Copilot uses AI. Check for mistakes.
Comment thread docs/src/architecture.md Outdated
Comment thread docs/src/evaluator.md
Comment on lines 562 to 576
## Implementation Status

- [x] Basic evaluation engine structure
- [x] Offset resolution (absolute, relative, from-end)
- [x] Relative offset support with previous-match anchor tracking (PR #211, issue #38)
- [x] Type reading with endianness support (Byte, Short, Long, Quad, Float, Double, Date, QDate, String, PString with 1/2/4-byte prefixes)
- [x] Operator application (Equal, NotEqual, LessThan, GreaterThan, LessEqual, GreaterEqual, BitwiseAnd, BitwiseAndMask)
- [x] Hierarchical rule processing with child evaluation
- [x] Error handling with graceful degradation
- [x] Timeout protection
- [x] Recursion depth limiting
- [x] Comprehensive test coverage (150+ tests)
- [ ] Indirect offset support (pointer dereferencing)
- [x] Indirect offset support (pointer dereferencing, issue #37)
- [x] Relative offset support (GNU `file` anchor semantics, issue #38)
- [ ] Regex type support
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Implementation Status” checklist is redundant/inconsistent:

  • It already has “Offset resolution (absolute, relative, from-end)”, which includes relative offsets.
  • It then lists “Relative offset support …” twice (lines 566 and 575).
    Consider collapsing these into a single relative-offset bullet (or keeping only the more specific anchor-tracking line) to avoid confusion about what’s actually newly implemented.

Copilot uses AI. Check for mistakes.
Comment thread GOTCHAS.md Outdated
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
@unclesp1d3r unclesp1d3r self-assigned this Apr 8, 2026
Add docs/solutions/developer-experience/rust-test-visibility-boundary.md
capturing the lesson from PR #211: tests in tests/ compile as external
crates and cannot use pub(crate) items; tests that need internal-state
injection must live in src/.../tests.rs instead.

The test_evaluate_rules_anchor_near_saturation_skips_relative_child_
gracefully test that regression-guards the pstring anchor poisoning
fix is the concrete case that prompted this — the initial attempt to
put it in tests/relative_offset_evaluation.rs failed because
EvaluationContext::set_last_match_end is pub(crate).

Documents the three-way split between:
  - src/.../tests.rs      -> same crate, full pub(crate) access
  - tests/foo.rs          -> external crate, public API only
  - doctests in rustdoc   -> external crate, same as tests/

Also update AGENTS.md Quick Reference to list the new
developer-experience/ category alongside the existing ones.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Copilot AI review requested due to automatic review settings April 8, 2026 03:33
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 21 changed files in this pull request and generated 4 comments.

Comment thread GOTCHAS.md Outdated
Comment thread src/evaluator/types/mod.rs Outdated
Comment thread tessl.json
Comment thread docs/src/evaluator.md
Comment on lines 49 to 55
```rust
pub struct EvaluationContext {
/// Current offset position for relative calculations
current_offset: usize,
/// End offset of the most recent successful match (GNU file anchor)
last_match_end: usize,
/// Current recursion depth for safety limits
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this EvaluationContext snippet, current_offset is described as being used for "relative calculations", but relative offsets now resolve via the GNU file anchor (last_match_end). If current_offset is no longer used for relative offset resolution (and isn’t used by evaluate_rules), consider rewording this comment to avoid suggesting it drives OffsetSpec::Relative semantics (or documenting what current_offset is used for today).

Copilot uses AI. Check for mistakes.
coderabbitai[bot]
coderabbitai Bot previously approved these changes Apr 8, 2026
- bytes_consumed rustdoc header: drop the misleading "mirrors the
  consumption logic of read_typed_value" framing. The helper
  intentionally includes framing bytes (NUL terminator, pstring length
  prefix) that the read functions do not return, so describing it as
  "anchor-advance distance" is more accurate. Callers should not equate
  this with the byte count read_typed_value returns.
- string_bytes_consumed rustdoc: replace "Mirrors read_string" with an
  explicit statement that the function uses the same scan logic but
  intentionally counts the NUL terminator as consumed. Adds a do-not-fix
  note so a future maintainer doesn't try to align the two behaviors.
- GOTCHAS.md S3.8 title: "Anchor is Global-Monotonic" -> "Global Anchor,
  No Save/Restore". The anchor numeric value is not non-decreasing --
  a Relative(-N) match (or any later rule at a lower absolute position)
  can move it earlier. The original title contradicted the documented
  and tested behavior.
- docs/src/architecture.md module list: offset/indirect.rs was still
  labeled as "stub (issue #37)" even though #37 shipped in commit
  6426d18. Same fix previously applied to docs/src/evaluator.md in a
  prior review round.

945 lib tests + 14 relative-offset integration + 13 other suites + 163
doctests still passing. Clippy and fmt clean.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
@unclesp1d3r unclesp1d3r merged commit 85c0860 into main Apr 8, 2026
29 checks passed
@unclesp1d3r unclesp1d3r deleted the 38-evaluator-implement-relative-offset-resolution branch April 8, 2026 05:21
@github-actions github-actions Bot mentioned this pull request Apr 25, 2026
unclesp1d3r pushed a commit that referenced this pull request Apr 25, 2026
## 🤖 New release

* `libmagic-rs`: 0.5.0 -> 0.6.0 (⚠ API breaking changes)

### ⚠ `libmagic-rs` breaking changes

```text
--- failure constructible_struct_adds_field: externally-constructible struct adds field ---

Description:
A pub struct constructible with a struct literal has a new pub field. Existing struct literals must be updated to include the new field.
        ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/constructible_struct_adds_field.ron

Failed in:
  field MagicRule.value_transform in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:1189
  field MagicRule.value_transform in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:1189
  field MagicRule.value_transform in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:1189

--- failure copy_impl_added: type now implements Copy ---

Description:
A public type now implements Copy, causing non-move closures to capture it by reference instead of moving it.
        ref: rust-lang/rust#100905
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/copy_impl_added.ron

Failed in:
  libmagic_rs::mime::MimeMapper in /tmp/.tmpwFvgw1/libmagic-rs/src/mime.rs:98

--- failure enum_marked_non_exhaustive: enum marked #[non_exhaustive] ---

Description:
A public enum has been marked #[non_exhaustive]. Pattern-matching on it outside of its crate must now include a wildcard pattern like `_`, or it will fail to compile.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#attr-adding-non-exhaustive
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/enum_marked_non_exhaustive.ron

Failed in:
  enum OffsetSpec in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:198
  enum OffsetSpec in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:198
  enum OffsetSpec in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:198
  enum LibmagicError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:15
  enum LibmagicError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:15
  enum IoError in /tmp/.tmpwFvgw1/libmagic-rs/src/io/mod.rs:26
  enum Operator in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:838
  enum Operator in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:838
  enum Operator in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:838
  enum TypeReadError in /tmp/.tmpwFvgw1/libmagic-rs/src/evaluator/types/mod.rs:56
  enum ParseError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:74
  enum ParseError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:74
  enum Value in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:965
  enum Value in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:965
  enum Value in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:965
  enum TypeKind in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:398
  enum TypeKind in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:398
  enum TypeKind in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:398
  enum EvaluationError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:148
  enum EvaluationError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:148

--- failure enum_struct_variant_field_added: pub enum struct variant field added ---

Description:
An enum's exhaustive struct variant has a new field, which has to be included when constructing or matching on this variant.
        ref: https://doc.rust-lang.org/reference/attributes/type_system.html#the-non_exhaustive-attribute
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/enum_struct_variant_field_added.ron

Failed in:
  field base_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:251
  field adjustment_op of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:266
  field result_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:272
  field base_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:251
  field adjustment_op of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:266
  field result_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:272
  field base_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:251
  field adjustment_op of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:266
  field result_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:272

--- failure function_missing: pub fn removed or renamed ---

Description:
A publicly-visible function cannot be imported by its prior path. A `pub use` may have been removed, or the function itself may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/function_missing.ron

Failed in:
  function libmagic_rs::parser::grammar::is_empty_line, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:1025
  function libmagic_rs::parser::grammar::parse_strength_directive, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:846
  function libmagic_rs::parser::grammar::parse_type_and_operator, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:683
  function libmagic_rs::parser::grammar::parse_offset, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:179
  function libmagic_rs::parser::parse_offset, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:179
  function libmagic_rs::parser::grammar::parse_comment, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:1004
  function libmagic_rs::parser::grammar::parse_message, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:810
  function libmagic_rs::parser::grammar::parse_value, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:633
  function libmagic_rs::parser::grammar::parse_number, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:133
  function libmagic_rs::parser::parse_number, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:133
  function libmagic_rs::parser::grammar::has_continuation, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:1060
  function libmagic_rs::parser::grammar::parse_magic_rule, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:946
  function libmagic_rs::parser::grammar::parse_rule_offset, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:779
  function libmagic_rs::parser::grammar::is_comment_line, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:1042
  function libmagic_rs::parser::grammar::is_strength_directive, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:902
  function libmagic_rs::parser::grammar::parse_type, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:749
  function libmagic_rs::parser::grammar::parse_operator, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:227

--- failure function_parameter_count_changed: pub fn parameter count changed ---

Description:
A publicly-visible function now takes a different number of parameters.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#fn-change-arity
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/function_parameter_count_changed.ron

Failed in:
  libmagic_rs::evaluator::evaluate_single_rule now takes 3 parameters instead of 2, in /tmp/.tmpwFvgw1/libmagic-rs/src/evaluator/engine/mod.rs:196

--- failure inherent_method_missing: pub method removed or renamed ---

Description:
A publicly-visible method or associated fn is no longer available under its prior name. It may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/inherent_method_missing.ron

Failed in:
  FileBuffer::create_symlink, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/io/mod.rs:326
  EvaluationContext::increment_recursion_depth, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/evaluator/mod.rs:114
  EvaluationContext::decrement_recursion_depth, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/evaluator/mod.rs:130
  EvaluationContext::increment_recursion_depth, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/evaluator/mod.rs:114
  EvaluationContext::decrement_recursion_depth, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/evaluator/mod.rs:130

--- failure module_missing: pub module removed or renamed ---

Description:
A publicly-visible module cannot be imported by its prior path. A `pub use` may have been removed, or the module may have been renamed, removed, or made non-public.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/module_missing.ron

Failed in:
  mod libmagic_rs::parser::grammar, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:4

--- failure struct_marked_non_exhaustive: struct marked #[non_exhaustive] ---

Description:
A public struct has been marked #[non_exhaustive], which will prevent it from being constructed using a struct literal outside of its crate. It previously had no private fields, so a struct literal could be used to construct it outside its crate.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#attr-adding-non-exhaustive
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/struct_marked_non_exhaustive.ron

Failed in:
  struct EvaluationConfig in /tmp/.tmpwFvgw1/libmagic-rs/src/config.rs:42
```

<details><summary><i><b>Changelog</b></i></summary><p>

<blockquote>

## [0.6.0] - 2026-04-25

### Features

- **parser**: Add Date and QDate types with serialization support
([#165](#165))
- **parser**: Implement pstring (Pascal string) type
([#170](#170))
- **parser**: Implement pstring multi-byte length prefix variants (/B,
/H, /h, /L, /l, /J)
([#183](#183))
- **evaluator**: Add debug-level tracing for skipped rules
([#184](#184))
- **evaluator**: Implement indirect offset resolution
([#37](#37))
([#199](#199))
- **evaluator**: Implement relative offset resolution
([#38](#38))
([#211](#211))
- **deps**: Add new skills to actionbook/rust-skills and
trailofbits/skills
- **evaluator**: Regex and search types (closes #39)
([#214](#214))
- Implement libmagic meta-type directives and format substitution
([#42](#42))
([#230](#230))

### Bug Fixes

- **regex**: PR #214 follow-up review findings
([#215](#215))
- Load and correctly evaluate /usr/share/file/magic/filesystems and
adjacent magic files
([#233](#233))

### Documentation

- **gotchas**: Clarify requirements for adding TypeKind variants

### Miscellaneous Tasks

- Rename .coderabbitai.yaml to .coderabbit.yaml
- **Mergify**: Configuration update
([#173](#173))
- Update .gitignore to exclude local AI assistant files
- **mergify**: Upgrade configuration to current format
([#205](#205))
- Resolve all pending TODO items
([#212](#212))
- **mergify**: Upgrade configuration to current format
([#231](#231))
<!-- generated by git-cliff -->

### Security

- **io**: Close TOCTOU race in `FileBuffer::new` metadata validation
(CWE-367). `validate_file_metadata` now uses `File::metadata()` on the
open descriptor instead of re-canonicalizing the path, so an attacker
cannot swap the path between `open_file` and validation. Error paths now
report the caller-supplied path rather than the canonicalized variant.
- **cli**: Remove relative-path fallbacks from `default_magic_file_path`
(CWE-426). `./missing.magic`, `./third_party/magic.mgc`, and the
`CI`/`GITHUB_ACTIONS` env-var branch no longer resolve against the
process cwd. CI pipelines must pass `--magic <path>` explicitly.
- **evaluator**: `build_regex` now bounds `size_limit` and
`dfa_size_limit` to 1 MiB (`REGEX_COMPILE_SIZE_LIMIT`) to reject
compile-time DoS patterns (CWE-1333) from adversarial magic files.

### Features

- **parser**: Implement meta-type directives: `name`/`use` subroutines,
`default`/`clear` per-level fallback, and `indirect` re-evaluation.
`parse_text_magic_file` now returns `ParsedMagic { rules, name_table }`
(breaking change from `Vec<MagicRule>`). Named subroutines are hoisted
into `NameTable` at load time and dispatched via `RuleEnvironment` in
the evaluator. Recursion is bounded by
`EvaluationConfig::max_recursion_depth`. Resolves
[#42](#42).
- **evaluator**: Thread-local regex compile cache eliminates the
double-compile paid by every successful regex match.
`regex_bytes_consumed` now reuses the compiled `Regex` from `read_regex`
instead of recompiling the pattern to derive the anchor advance. The
cache is reset at the start of every `evaluate_rules_with_config` call,
bounding memory to one evaluation.
- **config**: `EvaluationConfig` is now `#[non_exhaustive]`; new
builder-style setters (`with_max_recursion_depth`,
`with_max_string_length`, `with_stop_at_first_match`, `with_mime_types`,
`with_timeout_ms`) let external crates construct configurations without
struct literals.
- **parser**: `MagicRule::new()` smart constructor with
`::with_children()`, `::with_strength_modifier()`, `::with_level()`
builder methods and a `::validate()` method enforcing structural
invariants (non-empty message, `level <= MAX_LEVEL`, children nested
strictly deeper than parent). New `MagicRuleValidationError` error type.
- **parser**: `RegexFlags::with_case_insensitive()` and
`::with_start_offset()` builder methods.

### Refactor

- **engine**: Extract `evaluate_pattern_rule()` and
`evaluate_value_rule()` helpers from
`evaluate_single_rule_with_anchor`'s 90-line body. Dispatch is now a
two-arm type-category split; each helper has focused rustdoc on
semantics and invariants.
- **types**: Replace the `_ =>` catch-all in
`bytes_consumed_with_pattern` with an explicit listing of the
fixed-width `TypeKind` variants. Adding a new variable-width variant
without updating this match is now a compile error instead of a silent
relative-offset anchor corruption in release builds.
- **parser**: Split the 185-line `type_keyword_to_kind` match into
per-family helpers (`byte_family`, `short_family`, `long_family`,
`quad_family`, `float_family`, `double_family`, `date_family`,
`qdate_family`, `string_family`). Drops the
`#[allow(clippy::too_many_lines)]` attribute.
- **main**: `main()` returns `std::process::ExitCode` instead of calling
`process::exit`, so destructors run on the happy path. Ctrl-C
`AtomicBool` flag uses `Ordering::Relaxed` instead of `SeqCst`.
- **grammar**: `parse_strength_directive` uses nom 8's `preceded` +
`Parser::map` instead of the legacy `map(pair(char(...), parse_number),
|(_, n)| ...)` pattern.
- **output**: Add `#[serde(skip_serializing_if = "Option::is_none",
default)]` to public `Option<T>` fields so JSON output no longer emits
`"field": null` for unset optional values.

### Documentation

- **lib**: Add `# Security` sections to
`MagicDatabase::with_builtin_rules`, `::with_builtin_rules_and_config`,
`::load_from_file`, and `::load_from_file_with_config` warning about the
unbounded default timeout and recommending
`EvaluationConfig::performance()` for untrusted input.
- **lib**: Document `MagicDatabase: Send + Sync` for parallel scanning.
- **README**: Update `TypeKind` enum example to match the current AST,
add `regex` and `search/N` to the supported types table, add pre-1.0 API
stability warning, correct the roadmap to mark v0.2-v0.4 as shipped.
- **AGENTS.md**: Relabel "Currently Implemented (v0.1.0)" and "Current
Limitations (v0.1.0)" to v0.5.0 and rewrite the Development Phases
section to reflect actual shipped scope.

### Testing

- Security regression tests for S-H1 (planted-magic-file in cwd), S-H2
(TOCTOU path-swap contract), S-M2 (pathological regex bounded runtime),
S-L2 (codegen message escape round-trip), and GOTCHAS S13.1
(`EvaluationConfig::default()` unbounded timeout invariant).
- Backspace message concatenation regression tests for first-match,
consecutive, and empty-rest edge cases.
- `MagicRule::validate()` tests covering empty message, child level
invariant, and max-depth rejection.
- `RegexCache` population/clear/reuse tests.

### Breaking Changes

- **parser**: `parse_text_magic_file` return type changed from
`Result<Vec<MagicRule>, ParseError>` to `Result<ParsedMagic,
ParseError>`. Callers must destructure `ParsedMagic { rules, name_table
}`. Low-level callers that only need the rule list can use
`parsed.rules`. `load_magic_file` and `load_magic_directory` return the
same new type.
</blockquote>


</p></details>

---
This PR was generated with
[release-plz](https://github.com/release-plz/release-plz/).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

evaluator Rule evaluation engine and logic size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Evaluator: implement relative offset resolution

2 participants