feat(parser): implement pstring (Pascal string) type#170
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…ests Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…upport Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (10)
Disabled knowledge base sources:
Summary by CodeRabbitRelease Notes
WalkthroughAdds Pascal-style length-prefixed string support ("pstring"): new AST TypeKind variant, parser recognition, read_pstring evaluator, dispatch and strength handling, extensive tests, and documentation updates describing the new type. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 CI must passWonderful, this rule succeeded.All CI checks must pass. Release-plz PRs are exempt because they only bump versions and changelogs (code was already tested on main), and GITHUB_TOKEN-triggered force-pushes suppress CI.
🟢 Do not merge outdated PRsWonderful, this rule succeeded.Make sure PRs are within 10 commits of the base branch before merging
|
🧪 CI InsightsHere's what we observed from your CI run for d021555. 🟢 All jobs passed!But CI Insights is watching 👀 |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
Related Documentation 8 document(s) may need updating based on files changed in this PR: libMagic-rs api-reference
|
There was a problem hiding this comment.
Pull request overview
Adds Pascal-style length-prefixed string support (pstring) across the parser and evaluator so magic rules can match length-prefixed strings in binary formats.
Changes:
- Extend
TypeKindwithPStringand wire it through parsing (pstringkeyword) and codegen serialization. - Implement evaluator reading for
PStringand include it in typed-value dispatch + strength calculation. - Add unit/integration/property tests plus documentation updates for the new type.
Reviewed changes
Copilot reviewed 14 out of 15 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/parser/ast.rs |
Adds TypeKind::PString and updates bit_width() handling. |
src/parser/types.rs |
Parses pstring keyword and maps it to TypeKind::PString. |
src/parser/codegen.rs |
Adds PString support to serialize_type_kind(). |
src/parser/grammar/mod.rs |
Updates type docs to mention pstring. |
src/parser/grammar/tests.rs |
Adds parser tests for pstring parsing and rule parsing. |
src/evaluator/types/string.rs |
Implements read_pstring() and adds unit tests. |
src/evaluator/types/mod.rs |
Dispatches TypeKind::PString via read_typed_value() and re-exports read_pstring. |
src/evaluator/strength.rs |
Includes PString in default strength calculation alongside String. |
src/evaluator/engine/tests.rs |
Adds engine tests verifying PString rule matching behavior. |
tests/evaluator_tests.rs |
Adds integration-style tests for PString evaluation. |
tests/property_tests.rs |
Includes PString in TypeKind generation strategy. |
src/build_helpers.rs |
Adds serialization tests for TypeKind::PString. |
AGENTS.md |
Documents pstring as implemented type support. |
.github/copilot-instructions.md |
Updates project status/docs to reflect current capabilities and pstring support. |
mise.lock |
Regenerated lockfile content (tooling metadata). |
- Add usize::MAX offset overflow test for read_pstring (checked_add regression guard) - Add test for max_length capping when buffer has less data than length byte claims - Assert specific TypeReadError variants instead of just is_err() - Document pstring/B, /H, /L suffix limitation in AGENTS.md and parser code - Add TODO for debug-level trace on skipped rules in evaluate_rules Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Clarify that read_pstring validates bounds against the capped length (after max_length), not the raw length byte. This matches GNU file behavior where max_length handles truncated data scenarios. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
| let (remaining, (typ, op)) = parse_type_and_operator("pstring& ").unwrap(); | ||
| assert_eq!(remaining, ""); | ||
| assert_eq!(typ, TypeKind::PString { max_length: None }); | ||
| assert_eq!(op, Some(Operator::BitwiseAnd)); |
There was a problem hiding this comment.
parse_type_and_operator("pstring& ") currently parses an attached & as Operator::BitwiseAnd for pstring. Since the evaluator’s bitwise operators only apply to integer Values (and return false for strings), this allows nonsensical rules to parse but never match, which is hard to diagnose. Consider rejecting attached & / &mask operators for string/pstring types in parse_type_and_operator/parse_magic_rule, and update this test accordingly (expect an error instead of Some(BitwiseAnd)).
| let (remaining, (typ, op)) = parse_type_and_operator("pstring& ").unwrap(); | |
| assert_eq!(remaining, ""); | |
| assert_eq!(typ, TypeKind::PString { max_length: None }); | |
| assert_eq!(op, Some(Operator::BitwiseAnd)); | |
| let result = parse_type_and_operator("pstring& "); | |
| assert!( | |
| result.is_err(), | |
| "attached '&' for pstring should be rejected as an invalid operator" | |
| ); |
| - **Types**: `byte`, `short`, `long`, `quad`, `float`, `double`, `string`, `pstring` with endianness support; unsigned variants `ubyte`, `ushort`/`ubeshort`/`uleshort`, `ulong`/`ubelong`/`ulelong`, `uquad`/`ubequad`/`ulequad`; float/double endian variants `befloat`/`lefloat`, `bedouble`/`ledouble`; 32-bit date/timestamp types `date`/`ldate`/`bedate`/`beldate`/`ledate`/`leldate`; 64-bit date/timestamp types `qdate`/`qldate`/`beqdate`/`beqldate`/`leqdate`/`leqldate` | ||
| - **Operators**: `=` (equal), `!=` (not equal), `<` (less than), `>` (greater than), `<=` (less equal), `>=` (greater equal), `&` (bitwise AND with optional mask), `^` (bitwise XOR), `~` (bitwise NOT), `x` (any value) | ||
| - **Nesting**: Hierarchical rules with proper indentation handling | ||
| - **String Matching**: Exact string matching with null-termination |
There was a problem hiding this comment.
The “Supported Syntax” section lists pstring as a supported type, but the “String Matching” bullet still only mentions null-terminated strings. Updating that bullet to also mention Pascal (length-prefixed) strings would keep the documentation consistent with the implemented feature set.
| - **String Matching**: Exact string matching with null-termination | |
| - **String Matching**: Exact string matching for null-terminated (`string`) and Pascal/length-prefixed (`pstring`) strings |
| }; | ||
|
|
||
| // Check if we have enough bytes for the (possibly capped) string data | ||
| let string_start = offset.checked_add(1).ok_or(TypeReadError::BufferOverrun { | ||
| offset, | ||
| buffer_len: buffer.len(), | ||
| })?; |
There was a problem hiding this comment.
read_pstring reports TypeReadError::BufferOverrun { offset: usize::MAX, ... } when string_start.checked_add(actual_length) overflows. Using usize::MAX makes the error location misleading and breaks consistency with other buffer-overrun errors. Consider returning the original offset/string_start, or introduce a dedicated overflow error variant so callers can distinguish arithmetic overflow from an actual buffer overrun.
| /// When `max_length` is set, bounds are validated against the capped length, not the | ||
| /// raw length byte. This matches GNU `file` behavior: `max_length` is intended to | ||
| /// handle cases where the length byte may reference more data than actually exists. |
There was a problem hiding this comment.
The security docs for read_pstring say “The length byte is validated against the remaining buffer size before reading”, but the implementation validates the capped length (min(length_byte, max_length)) against the buffer, and can succeed even if the original length byte exceeds remaining data (when max_length is smaller). Either adjust the docs to match the behavior, or change the code to treat an out-of-bounds length byte as an error regardless of max_length.
| /// When `max_length` is set, bounds are validated against the capped length, not the | |
| /// raw length byte. This matches GNU `file` behavior: `max_length` is intended to | |
| /// handle cases where the length byte may reference more data than actually exists. | |
| /// The raw length byte is validated against the remaining buffer size before any | |
| /// string data is read. When `max_length` is set, the number of bytes actually read | |
| /// is capped to `min(length_byte, max_length)` after this bounds check. |
|
@Mergifyio queue |
Merge Queue Status
Required conditions to enter a queue
|
Merge Queue StatusRule:
This pull request spent 7 seconds in the queue, with no time running CI. Required conditions to merge
|
## 🤖 New release
* `libmagic-rs`: 0.5.0 -> 0.6.0 (⚠ API breaking changes)
### ⚠ `libmagic-rs` breaking changes
```text
--- failure constructible_struct_adds_field: externally-constructible struct adds field ---
Description:
A pub struct constructible with a struct literal has a new pub field. Existing struct literals must be updated to include the new field.
ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/constructible_struct_adds_field.ron
Failed in:
field MagicRule.value_transform in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:1189
field MagicRule.value_transform in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:1189
field MagicRule.value_transform in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:1189
--- failure copy_impl_added: type now implements Copy ---
Description:
A public type now implements Copy, causing non-move closures to capture it by reference instead of moving it.
ref: rust-lang/rust#100905
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/copy_impl_added.ron
Failed in:
libmagic_rs::mime::MimeMapper in /tmp/.tmpwFvgw1/libmagic-rs/src/mime.rs:98
--- failure enum_marked_non_exhaustive: enum marked #[non_exhaustive] ---
Description:
A public enum has been marked #[non_exhaustive]. Pattern-matching on it outside of its crate must now include a wildcard pattern like `_`, or it will fail to compile.
ref: https://doc.rust-lang.org/cargo/reference/semver.html#attr-adding-non-exhaustive
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/enum_marked_non_exhaustive.ron
Failed in:
enum OffsetSpec in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:198
enum OffsetSpec in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:198
enum OffsetSpec in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:198
enum LibmagicError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:15
enum LibmagicError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:15
enum IoError in /tmp/.tmpwFvgw1/libmagic-rs/src/io/mod.rs:26
enum Operator in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:838
enum Operator in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:838
enum Operator in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:838
enum TypeReadError in /tmp/.tmpwFvgw1/libmagic-rs/src/evaluator/types/mod.rs:56
enum ParseError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:74
enum ParseError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:74
enum Value in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:965
enum Value in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:965
enum Value in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:965
enum TypeKind in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:398
enum TypeKind in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:398
enum TypeKind in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:398
enum EvaluationError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:148
enum EvaluationError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:148
--- failure enum_struct_variant_field_added: pub enum struct variant field added ---
Description:
An enum's exhaustive struct variant has a new field, which has to be included when constructing or matching on this variant.
ref: https://doc.rust-lang.org/reference/attributes/type_system.html#the-non_exhaustive-attribute
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/enum_struct_variant_field_added.ron
Failed in:
field base_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:251
field adjustment_op of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:266
field result_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:272
field base_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:251
field adjustment_op of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:266
field result_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:272
field base_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:251
field adjustment_op of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:266
field result_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:272
--- failure function_missing: pub fn removed or renamed ---
Description:
A publicly-visible function cannot be imported by its prior path. A `pub use` may have been removed, or the function itself may have been renamed or removed entirely.
ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/function_missing.ron
Failed in:
function libmagic_rs::parser::grammar::is_empty_line, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:1025
function libmagic_rs::parser::grammar::parse_strength_directive, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:846
function libmagic_rs::parser::grammar::parse_type_and_operator, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:683
function libmagic_rs::parser::grammar::parse_offset, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:179
function libmagic_rs::parser::parse_offset, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:179
function libmagic_rs::parser::grammar::parse_comment, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:1004
function libmagic_rs::parser::grammar::parse_message, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:810
function libmagic_rs::parser::grammar::parse_value, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:633
function libmagic_rs::parser::grammar::parse_number, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:133
function libmagic_rs::parser::parse_number, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:133
function libmagic_rs::parser::grammar::has_continuation, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:1060
function libmagic_rs::parser::grammar::parse_magic_rule, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:946
function libmagic_rs::parser::grammar::parse_rule_offset, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:779
function libmagic_rs::parser::grammar::is_comment_line, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:1042
function libmagic_rs::parser::grammar::is_strength_directive, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:902
function libmagic_rs::parser::grammar::parse_type, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:749
function libmagic_rs::parser::grammar::parse_operator, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:227
--- failure function_parameter_count_changed: pub fn parameter count changed ---
Description:
A publicly-visible function now takes a different number of parameters.
ref: https://doc.rust-lang.org/cargo/reference/semver.html#fn-change-arity
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/function_parameter_count_changed.ron
Failed in:
libmagic_rs::evaluator::evaluate_single_rule now takes 3 parameters instead of 2, in /tmp/.tmpwFvgw1/libmagic-rs/src/evaluator/engine/mod.rs:196
--- failure inherent_method_missing: pub method removed or renamed ---
Description:
A publicly-visible method or associated fn is no longer available under its prior name. It may have been renamed or removed entirely.
ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/inherent_method_missing.ron
Failed in:
FileBuffer::create_symlink, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/io/mod.rs:326
EvaluationContext::increment_recursion_depth, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/evaluator/mod.rs:114
EvaluationContext::decrement_recursion_depth, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/evaluator/mod.rs:130
EvaluationContext::increment_recursion_depth, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/evaluator/mod.rs:114
EvaluationContext::decrement_recursion_depth, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/evaluator/mod.rs:130
--- failure module_missing: pub module removed or renamed ---
Description:
A publicly-visible module cannot be imported by its prior path. A `pub use` may have been removed, or the module may have been renamed, removed, or made non-public.
ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/module_missing.ron
Failed in:
mod libmagic_rs::parser::grammar, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:4
--- failure struct_marked_non_exhaustive: struct marked #[non_exhaustive] ---
Description:
A public struct has been marked #[non_exhaustive], which will prevent it from being constructed using a struct literal outside of its crate. It previously had no private fields, so a struct literal could be used to construct it outside its crate.
ref: https://doc.rust-lang.org/cargo/reference/semver.html#attr-adding-non-exhaustive
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/struct_marked_non_exhaustive.ron
Failed in:
struct EvaluationConfig in /tmp/.tmpwFvgw1/libmagic-rs/src/config.rs:42
```
<details><summary><i><b>Changelog</b></i></summary><p>
<blockquote>
## [0.6.0] - 2026-04-25
### Features
- **parser**: Add Date and QDate types with serialization support
([#165](#165))
- **parser**: Implement pstring (Pascal string) type
([#170](#170))
- **parser**: Implement pstring multi-byte length prefix variants (/B,
/H, /h, /L, /l, /J)
([#183](#183))
- **evaluator**: Add debug-level tracing for skipped rules
([#184](#184))
- **evaluator**: Implement indirect offset resolution
([#37](#37))
([#199](#199))
- **evaluator**: Implement relative offset resolution
([#38](#38))
([#211](#211))
- **deps**: Add new skills to actionbook/rust-skills and
trailofbits/skills
- **evaluator**: Regex and search types (closes #39)
([#214](#214))
- Implement libmagic meta-type directives and format substitution
([#42](#42))
([#230](#230))
### Bug Fixes
- **regex**: PR #214 follow-up review findings
([#215](#215))
- Load and correctly evaluate /usr/share/file/magic/filesystems and
adjacent magic files
([#233](#233))
### Documentation
- **gotchas**: Clarify requirements for adding TypeKind variants
### Miscellaneous Tasks
- Rename .coderabbitai.yaml to .coderabbit.yaml
- **Mergify**: Configuration update
([#173](#173))
- Update .gitignore to exclude local AI assistant files
- **mergify**: Upgrade configuration to current format
([#205](#205))
- Resolve all pending TODO items
([#212](#212))
- **mergify**: Upgrade configuration to current format
([#231](#231))
<!-- generated by git-cliff -->
### Security
- **io**: Close TOCTOU race in `FileBuffer::new` metadata validation
(CWE-367). `validate_file_metadata` now uses `File::metadata()` on the
open descriptor instead of re-canonicalizing the path, so an attacker
cannot swap the path between `open_file` and validation. Error paths now
report the caller-supplied path rather than the canonicalized variant.
- **cli**: Remove relative-path fallbacks from `default_magic_file_path`
(CWE-426). `./missing.magic`, `./third_party/magic.mgc`, and the
`CI`/`GITHUB_ACTIONS` env-var branch no longer resolve against the
process cwd. CI pipelines must pass `--magic <path>` explicitly.
- **evaluator**: `build_regex` now bounds `size_limit` and
`dfa_size_limit` to 1 MiB (`REGEX_COMPILE_SIZE_LIMIT`) to reject
compile-time DoS patterns (CWE-1333) from adversarial magic files.
### Features
- **parser**: Implement meta-type directives: `name`/`use` subroutines,
`default`/`clear` per-level fallback, and `indirect` re-evaluation.
`parse_text_magic_file` now returns `ParsedMagic { rules, name_table }`
(breaking change from `Vec<MagicRule>`). Named subroutines are hoisted
into `NameTable` at load time and dispatched via `RuleEnvironment` in
the evaluator. Recursion is bounded by
`EvaluationConfig::max_recursion_depth`. Resolves
[#42](#42).
- **evaluator**: Thread-local regex compile cache eliminates the
double-compile paid by every successful regex match.
`regex_bytes_consumed` now reuses the compiled `Regex` from `read_regex`
instead of recompiling the pattern to derive the anchor advance. The
cache is reset at the start of every `evaluate_rules_with_config` call,
bounding memory to one evaluation.
- **config**: `EvaluationConfig` is now `#[non_exhaustive]`; new
builder-style setters (`with_max_recursion_depth`,
`with_max_string_length`, `with_stop_at_first_match`, `with_mime_types`,
`with_timeout_ms`) let external crates construct configurations without
struct literals.
- **parser**: `MagicRule::new()` smart constructor with
`::with_children()`, `::with_strength_modifier()`, `::with_level()`
builder methods and a `::validate()` method enforcing structural
invariants (non-empty message, `level <= MAX_LEVEL`, children nested
strictly deeper than parent). New `MagicRuleValidationError` error type.
- **parser**: `RegexFlags::with_case_insensitive()` and
`::with_start_offset()` builder methods.
### Refactor
- **engine**: Extract `evaluate_pattern_rule()` and
`evaluate_value_rule()` helpers from
`evaluate_single_rule_with_anchor`'s 90-line body. Dispatch is now a
two-arm type-category split; each helper has focused rustdoc on
semantics and invariants.
- **types**: Replace the `_ =>` catch-all in
`bytes_consumed_with_pattern` with an explicit listing of the
fixed-width `TypeKind` variants. Adding a new variable-width variant
without updating this match is now a compile error instead of a silent
relative-offset anchor corruption in release builds.
- **parser**: Split the 185-line `type_keyword_to_kind` match into
per-family helpers (`byte_family`, `short_family`, `long_family`,
`quad_family`, `float_family`, `double_family`, `date_family`,
`qdate_family`, `string_family`). Drops the
`#[allow(clippy::too_many_lines)]` attribute.
- **main**: `main()` returns `std::process::ExitCode` instead of calling
`process::exit`, so destructors run on the happy path. Ctrl-C
`AtomicBool` flag uses `Ordering::Relaxed` instead of `SeqCst`.
- **grammar**: `parse_strength_directive` uses nom 8's `preceded` +
`Parser::map` instead of the legacy `map(pair(char(...), parse_number),
|(_, n)| ...)` pattern.
- **output**: Add `#[serde(skip_serializing_if = "Option::is_none",
default)]` to public `Option<T>` fields so JSON output no longer emits
`"field": null` for unset optional values.
### Documentation
- **lib**: Add `# Security` sections to
`MagicDatabase::with_builtin_rules`, `::with_builtin_rules_and_config`,
`::load_from_file`, and `::load_from_file_with_config` warning about the
unbounded default timeout and recommending
`EvaluationConfig::performance()` for untrusted input.
- **lib**: Document `MagicDatabase: Send + Sync` for parallel scanning.
- **README**: Update `TypeKind` enum example to match the current AST,
add `regex` and `search/N` to the supported types table, add pre-1.0 API
stability warning, correct the roadmap to mark v0.2-v0.4 as shipped.
- **AGENTS.md**: Relabel "Currently Implemented (v0.1.0)" and "Current
Limitations (v0.1.0)" to v0.5.0 and rewrite the Development Phases
section to reflect actual shipped scope.
### Testing
- Security regression tests for S-H1 (planted-magic-file in cwd), S-H2
(TOCTOU path-swap contract), S-M2 (pathological regex bounded runtime),
S-L2 (codegen message escape round-trip), and GOTCHAS S13.1
(`EvaluationConfig::default()` unbounded timeout invariant).
- Backspace message concatenation regression tests for first-match,
consecutive, and empty-rest edge cases.
- `MagicRule::validate()` tests covering empty message, child level
invariant, and max-depth rejection.
- `RegexCache` population/clear/reuse tests.
### Breaking Changes
- **parser**: `parse_text_magic_file` return type changed from
`Result<Vec<MagicRule>, ParseError>` to `Result<ParsedMagic,
ParseError>`. Callers must destructure `ParsedMagic { rules, name_table
}`. Low-level callers that only need the rule list can use
`parsed.rules`. `load_magic_file` and `load_magic_directory` return the
same new type.
</blockquote>
</p></details>
---
This PR was generated with
[release-plz](https://github.com/release-plz/release-plz/).
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Summary
PStringvariant toTypeKindfor Pascal-style length-prefixed stringspstringkeyword and maps toTypeKind::PStringValue::String)Changes
src/parser/ast.rs-- newPStringvariant inTypeKindwith rustdocsrc/parser/types.rs--pstringkeyword parsing inparse_type_keywordandtype_keyword_to_kindsrc/parser/codegen.rs-- serialization support forPStringsrc/evaluator/types/string.rs--read_pstring()implementation with bounds checkingsrc/evaluator/types/mod.rs-- dispatchPStringthroughread_typed_valuesrc/evaluator/strength.rs-- strength calculation forPStringsrc/evaluator/engine/tests.rs-- engine-level pstring rule matching teststests/evaluator_tests.rs-- integration tests for pstring evaluationtests/property_tests.rs-- proptest coverage forPStringsrc/parser/grammar/tests.rs-- parser round-trip testsAGENTS.md,docs/-- documentation updatesTest plan
read_pstring(empty, normal, truncated, buffer overrun, offset overflow)pstringkeyword recognition and serializationPStringvariantjust ci-checkpasses locallyCloses #43
🤖 Generated with Claude Code