Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
0c3b398
chore(lib): remove duplicate author entry in Cargo.toml
unclesp1d3r Apr 22, 2026
2e5a801
chore(lib): add provenance information for actionlint tools
unclesp1d3r Apr 22, 2026
d155a3b
refactor(parser): reorganize grammar and types modules into submodules
unclesp1d3r Apr 22, 2026
6f8ae2c
docs(agents): update magic file compatibility and limitations for v0.5.x
unclesp1d3r Apr 22, 2026
33b6fc8
docs(gotchas): document requirements for adding MetaType variants
unclesp1d3r Apr 22, 2026
6e4ba1a
feat: Implement libmagic meta-type directives and format substitution…
unclesp1d3r Apr 22, 2026
f2647f0
docs: Dosu updates for PR #230
dosubot[bot] Apr 22, 2026
d510188
fix(meta-types): address PR-review findings for subroutine scope and …
unclesp1d3r Apr 23, 2026
7a5add9
docs(solutions): add RAII scope guard learning and refresh meta-type …
unclesp1d3r Apr 23, 2026
840ac7d
fix: Address PR #230 review feedback (34 threads)
unclesp1d3r Apr 23, 2026
d2b58e1
fix(meta-types): Address second-round PR review findings
unclesp1d3r Apr 23, 2026
dbcbdb7
fix: Round-3 PR #230 review -- holistic docs/comment sweep + 3 bugs
unclesp1d3r Apr 23, 2026
b0c1a8a
fix(evaluator): Round-3 semantic fixes -- use terminal anchor + indir…
unclesp1d3r Apr 23, 2026
5efb0de
docs: Fix ghost BaseOffsetScope reference in EvaluationContext::base_…
unclesp1d3r Apr 23, 2026
e9d20a0
refactor(tests): Split engine/tests.rs into focused submodules (#230 …
unclesp1d3r Apr 23, 2026
4c576c7
fix: address PR #230 review feedback (round 4)
unclesp1d3r Apr 24, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 33 additions & 10 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,9 +82,12 @@ pub enum Operator {
parser/
├── mod.rs // Public parser interface
├── ast.rs // AST node definitions
├── grammar/ // Magic file DSL parsing (nom)
│ ├── mod.rs // Grammar parsing logic
│ └── tests.rs // Grammar parser tests
├── grammar/ // Magic file DSL parsing (nom) -- split into focused submodules
│ ├── mod.rs // Top-level parse_magic_rule_line, dispatch
│ ├── numbers.rs // parse_number, parse_unsigned_number
│ ├── value.rs // parse_value (quoted strings, numeric literals)
│ ├── type_suffix.rs // pstring /B/H/L, regex /c/s, search /N suffixes
│ └── tests/ // Grammar test modules
├── types.rs // Type keyword parsing and TypeKind conversion
└── codegen.rs // Serialization for code generation (shared with build.rs)

Expand All @@ -95,7 +98,13 @@ evaluator/
├── engine/ // Core evaluation engine submodule
│ ├── mod.rs // evaluate_single_rule, evaluate_rules, evaluate_rules_with_config
│ └── tests.rs // Engine unit tests
├── types.rs // Type interpretation with endianness
├── types/ // Type interpretation with endianness (directory module, issue #63)
│ ├── mod.rs // read_typed_value, read_pattern_match, bytes_consumed_with_pattern
│ ├── numeric.rs // byte/short/long/quad readers
│ ├── string.rs // string/pstring readers
│ ├── float.rs // float/double readers
│ ├── date.rs // date/qdate readers and timestamp formatting
│ └── regex.rs // regex/search readers, REGEX_MAX_BYTES cap, thread-local cache
├── strength.rs // Strength modifier application
├── offset/ // Offset resolution submodule
│ ├── mod.rs // Dispatcher (resolve_offset) and re-exports
Expand Down Expand Up @@ -202,7 +211,7 @@ cargo test --doc # Test documentation examples

## Magic File Compatibility

### Currently Implemented (v0.5.0)
### Currently Implemented (v0.5.x, unreleased)

- **Offsets**: Absolute, from-end, indirect, and relative specifications (relative offsets `&+N`/`&-N` are evaluated using GNU `file` semantics -- the previous-match anchor)
- **Types**: `byte`, `short`, `long`, `quad`, `float`, `double`, `string`, `pstring` with endianness support; unsigned variants `ubyte`, `ushort`/`ubeshort`/`uleshort`, `ulong`/`ubelong`/`ulelong`, `uquad`/`ubequad`/`ulequad`; float/double endian variants `befloat`/`lefloat`, `bedouble`/`ledouble`; 32-bit date/timestamp types `date`/`ldate`/`bedate`/`beldate`/`ledate`/`leldate`; 64-bit date/timestamp types `qdate`/`qldate`/`beqdate`/`beqldate`/`leqdate`/`leqldate`; `pstring` is a Pascal string (length-prefixed) with support for 1/2/4-byte length prefixes via `/B`, `/H` (2-byte BE), `/h` (2-byte LE), `/L` (4-byte BE), `/l` (4-byte LE) suffixes, and the `/J` flag (stored length includes prefix width, JPEG convention) which is combinable with width suffixes (e.g., `pstring/HJ`); date values formatted as "Www Mmm DD HH:MM:SS YYYY" matching GNU `file` output; types are signed by default (libmagic-compatible)
Expand All @@ -211,20 +220,23 @@ cargo test --doc # Test documentation examples
- **String Matching**: Exact string matching with null-termination and Pascal string (length-prefixed) support
- **Regex type**: Binary-safe regex matching via `regex::bytes::Regex`. Full flag support: `/c` (case-insensitive), `/s` (anchor advances to match-start instead of match-end), `/l` (scan window is measured in lines instead of bytes). Flags combine in any order (`regex/cs`, `regex/csl`, `regex/lc`). Numeric counts are honored: `regex/100` scans at most 100 bytes; `regex/1l` scans at most 1 line. Multi-line regex matching is always on (matching libmagic's unconditional `REG_NEWLINE`), so `^` and `$` match at line boundaries regardless of `/l`. Every scan window is capped at 8192 bytes (`FILE_REGEX_MAX`) regardless of the user's count.
- **Search type**: Bounded literal pattern scan via `memchr::memmem::find`; `search/N` caps the scan window to `N` bytes from the offset. The range is **mandatory** and stored as `NonZeroUsize`, so bare `search` and `search/0` are parse errors (matching GNU `file` magic(5)). Anchor advance follows GNU `file` semantics (match-end, not window-end) so relative-offset children resolve to the byte immediately after the matched pattern.
- **Meta-type directives**: `default`, `clear`, `name <id>`, `use <id>`, `indirect`, and `offset` are fully implemented. `name` blocks are hoisted into a `NameTable` at load time (`parser::name_table::extract_name_table`). `use` invokes subroutines at the resolved offset via `RuleEnvironment` threaded through `EvaluationContext::rule_env`; subroutine-local absolute offsets resolve relative to the use-site base (tracked via `EvaluationContext::base_offset`). `default` fires only when no sibling at the same level has matched; `clear` resets the per-level sibling-matched flag so a later `default` can fire. `indirect` re-applies the root rule set at the resolved offset, bounded by `EvaluationConfig::max_recursion_depth`. `offset` reports the resolved file offset as `Value::Uint(pos)` for format-string rendering. Continuation siblings (`recursion_depth > 0`) see the parent-level anchor on each iteration rather than chaining -- matching libmagic's `ms->c.li[cont_level]` model. Top-level siblings still chain (documented in GOTCHAS S3.8).
- **Printf-style format substitution**: Rule messages support `%d`, `%i`, `%u`, `%x`, `%X`, `%o`, `%s`, `%c`, and `%%`, along with width/padding modifiers (`%05d`, `%-5d`) and length modifiers (`l`, `ll`, `h`, etc. -- parsed and ignored). Hex specifiers respect the rule's `TypeKind::bit_width()` to mask sign-extended signed reads (so a signed byte carrying `-1` renders as `ff`, not `ffffffffffffffff`). Implemented in `src/output/format.rs::format_magic_message` and wired into `MagicDatabase::build_result`. Unrecognized specifiers pass through literally with a `debug!` log.

See **Development Phases** below for the planned roadmap of features not yet implemented (Aho-Corasick multi-pattern optimization, compiled-regex caching, `!:mime`/`!:ext`/`!:apple` directive evaluation, and `use`/`name` named test directives).
See **Development Phases** below for the planned roadmap of features not yet implemented (Aho-Corasick multi-pattern optimization and `!:mime`/`!:ext`/`!:apple` directive evaluation).

## Current Limitations (v0.5.0)
## Current Limitations (v0.5.x, unreleased)

### Type System

- 64-bit integer types: `quad`/`uquad`, `bequad`/`ubequad`, `lequad`/`ulequad` are implemented; `qquad` (128-bit) is not yet supported
- String evaluation reads until first NUL or end-of-buffer by default; `pstring` reads a length-prefixed Pascal string; `max_length: Some(_)` is supported internally but no dedicated fixed-length string parser syntax exists yet
- `string` evaluation reads until first NUL or end-of-buffer; `max_length: Some(_)` is supported programmatically (via the AST) but libmagic itself has no corresponding surface syntax, so this is not a parity gap
- `string` type modifier flags are not supported: `/B` (compact whitespace), `/b` (compact blanks), `/c`/`/C` (case-insensitive), `/t`/`/T` (force text/binary), `/w`/`/W` (whitespace optional). Only `pstring` has suffix parsing today.
- `pstring` supports 1-byte (`/B`), 2-byte big-endian (`/H`), 2-byte little-endian (`/h`), 4-byte big-endian (`/L`), and 4-byte little-endian (`/l`) length prefixes, plus the `/J` flag (stored length includes prefix width). All flags are combinable (e.g., `pstring/HJ`) and fully implemented.

### Operators

- BitwiseAnd supports mask values but not all libmagic mask syntax
- Parser handles `&`, `&<decimal>`, and `&0x<hex>` masks across the full `u64` range; compound forms like arithmetic expressions in mask position (`&(N+M)`) or post-mask modifiers are not parsed

### Offset Specifications

Expand All @@ -235,7 +247,7 @@ See **Development Phases** below for the planned roadmap of features not yet imp

- Limited support for special directives (only `!:strength` is parsed)
- No support for `!:mime`, `!:ext`, `!:apple` directives in evaluation
- No support for named tests or use/name directives
- Meta-type directives (`default`, `clear`, `name`, `use`, `indirect`, `offset`) are all fully implemented with evaluator dispatch, including printf-style format substitution in message rendering (see "Currently Implemented" above for details).

See issue #52 for the planned enhancement roadmap.

Expand Down Expand Up @@ -310,6 +322,17 @@ sample.bin: ELF 64-bit LSB executable, x86-64, version 1 (SYSV)
6. Add tests for the new type
7. Update documentation

### Adding a new meta-type

Meta-types sit inside `TypeKind::Meta(MetaType)` and do not read bytes. Adding a new variant requires:

1. Add the variant to `MetaType` in `src/parser/ast.rs`. Update the three test fixtures that iterate `MetaType` variants: `test_meta_type_variants_debug_clone_eq`, `test_meta_type_serde_roundtrip`, `test_type_kind_meta_bit_width_is_none` (see GOTCHAS S2.11).
2. Add the keyword tag in `parse_type_keyword` and the arm in `type_keyword_to_kind` in `src/parser/types.rs`, plus the `test_roundtrip_all_keywords` array.
3. Update `serialize_type_kind` (the inner `TypeKind::Meta(meta)` arm) in `src/parser/codegen.rs`.
4. Update `arb_type_kind` in `tests/property_tests.rs` (`prop_oneof` branch for `MetaType`).
5. Decide semantics: does the new variant need inline loop-level dispatch in `evaluate_rules` (like `Use`, `Default`, `Clear`, `Indirect` — each of which mutates the match vector or `sibling_matched` flag) or is it a silent no-op via the `Meta(_)` wildcard arm in `evaluate_single_rule_with_anchor`? Add the arm accordingly in `src/evaluator/engine/mod.rs`.
6. Add unit tests covering parse round-trip, the evaluator arm, and any new `RuleEnvironment` lookups.

### Adding New Operators

> **Note:** Currently implemented operators are `Equal`, `NotEqual`, `LessThan`, `GreaterThan`, `LessEqual`, `GreaterEqual`, `BitwiseAnd` (with `BitwiseAndMask`), `BitwiseXor`, `BitwiseNot`, and `AnyValue`.
Expand Down
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ All notable changes to this project will be documented in this file.

### Features

- **parser**: Implement meta-type directives: `name`/`use` subroutines, `default`/`clear` per-level fallback, and `indirect` re-evaluation. `parse_text_magic_file` now returns `ParsedMagic { rules, name_table }` (breaking change from `Vec<MagicRule>`). Named subroutines are hoisted into `NameTable` at load time and dispatched via `RuleEnvironment` in the evaluator. Recursion is bounded by `EvaluationConfig::max_recursion_depth`. Resolves [#42](https://github.com/EvilBit-Labs/libmagic-rs/issues/42).
- **evaluator**: Thread-local regex compile cache eliminates the double-compile paid by every successful regex match. `regex_bytes_consumed` now reuses the compiled `Regex` from `read_regex` instead of recompiling the pattern to derive the anchor advance. The cache is reset at the start of every `evaluate_rules_with_config` call, bounding memory to one evaluation.
- **config**: `EvaluationConfig` is now `#[non_exhaustive]`; new builder-style setters (`with_max_recursion_depth`, `with_max_string_length`, `with_stop_at_first_match`, `with_mime_types`, `with_timeout_ms`) let external crates construct configurations without struct literals.
- **parser**: `MagicRule::new()` smart constructor with `::with_children()`, `::with_strength_modifier()`, `::with_level()` builder methods and a `::validate()` method enforcing structural invariants (non-empty message, `level <= MAX_LEVEL`, children nested strictly deeper than parent). New `MagicRuleValidationError` error type.
Expand Down Expand Up @@ -41,6 +42,10 @@ All notable changes to this project will be documented in this file.
- `MagicRule::validate()` tests covering empty message, child level invariant, and max-depth rejection.
- `RegexCache` population/clear/reuse tests.

### Breaking Changes

- **parser**: `parse_text_magic_file` return type changed from `Result<Vec<MagicRule>, ParseError>` to `Result<ParsedMagic, ParseError>`. Callers must destructure `ParsedMagic { rules, name_table }`. Low-level callers that only need the rule list can use `parsed.rules`. `load_magic_file` and `load_magic_directory` return the same new type.

## [0.5.0] - 2026-03-07

### Features
Expand Down
5 changes: 1 addition & 4 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,7 @@
name = "libmagic-rs"
version = "0.5.0"
edition = "2024"
authors = [
"UncleSp1d3r <unclespid3r@evilbitlabs.io>",
"KryptoKat <kryptokat@evilbitlabs.io>",
]
authors = ["UncleSp1d3r <unclespid3r@evilbitlabs.io>"]
description = "A pure-Rust implementation of libmagic for file type identification"
license = "Apache-2.0"
repository = "https://github.com/EvilBit-Labs/libmagic-rs"
Expand Down
Loading
Loading