Skip to content

feat(parser): implement pstring multi-byte length prefix variants (/B, /H, /h, /L, /l, /J)#183

Merged
unclesp1d3r merged 28 commits into
mainfrom
171-parser-implement-pstring-multi-byte-length-prefix-variants-b-h-l
Mar 25, 2026
Merged

feat(parser): implement pstring multi-byte length prefix variants (/B, /H, /h, /L, /l, /J)#183
unclesp1d3r merged 28 commits into
mainfrom
171-parser-implement-pstring-multi-byte-length-prefix-variants-b-h-l

Conversation

@unclesp1d3r
Copy link
Copy Markdown
Member

Summary

  • Implement full GNU libmagic pstring length-prefix support with correct endianness: /H (2-byte BE), /h (2-byte LE), /L (4-byte BE), /l (4-byte LE), /B (1-byte, default)
  • Add /J flag support (JPEG-style self-inclusive length) with combinable syntax (/HJ, /lJ, etc.)
  • Replace unwrap() in library code with proper error propagation

Changes

  • PStringLengthWidth enum expanded to 5 variants: OneByte, TwoByteBE, TwoByteLE, FourByteBE, FourByteLE
  • Added length_includes_itself: bool field to TypeKind::PString for /J flag
  • read_pstring uses from_be_bytes/from_le_bytes based on variant; subtracts prefix width when /J is set
  • Parser handles /J standalone and in combinations via parse_pstring_suffix helper
  • Updated codegen, strength, and all test files (13 files, +445/-100 lines)

Test Plan

  • All 1,238 tests pass (877 unit + 44 integration + 34 property + 157 doc + others)
  • cargo clippy -- -D warnings clean
  • just ci-check passes
  • New table-driven tests for BE vs LE byte order
  • New tests for /J flag with all width variants
  • Edge cases: /J where length equals prefix width (empty string), length < prefix width (error)

Closes #171

🤖 Generated with Claude Code

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…ation

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…version notes

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…l quirks

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
@unclesp1d3r unclesp1d3r linked an issue Mar 24, 2026 that may be closed by this pull request
7 tasks
@dosubot dosubot Bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Mar 24, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 24, 2026

Important

Review skipped

Auto reviews are limited based on label configuration.

🏷️ Required labels (at least one) (3)
  • rust
  • testing
  • compatibility

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 14d79d44-cb0d-48c7-afbf-30af8627ee8d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • ✅ Review completed - (🔄 Check again to review again)
📝 Walkthrough

Walkthrough

Add multi-width Pascal-string (pstring) support: new PStringLengthWidth enum, extended TypeKind::PString with length_width and length_includes_itself, parser suffix parsing, evaluator read_pstring to handle 1/2/4‑byte prefixes with endianness and /J semantics, plus tests and docs updates.

Changes

Cohort / File(s) Summary
Parser AST
src/parser/ast.rs
Add PStringLengthWidth enum and extend TypeKind::PString with length_width and length_includes_itself.
Parser Grammar & Types
src/parser/grammar/mod.rs, src/parser/types.rs, src/parser/grammar/tests.rs, src/parser/types.rs
Add parse_pstring_suffix to parse /B//H//h//L//l and optional /J; update parsing to set new fields; update/add unit tests for suffixes.
Parser Codegen
src/parser/codegen.rs
Emit new pstring fields in generated code; add helper to serialize PStringLengthWidth.
Evaluator: string reading
src/evaluator/types/string.rs, src/evaluator/types/mod.rs
Change read_pstring signature to accept length_width and length_includes_itself; implement variable-width prefix read (BE/LE), /J adjustment, bounds and underflow checks; forward fields from dispatcher.
Evaluator: other
src/evaluator/strength.rs, src/evaluator/engine/tests.rs
Adjust strength match to ignore added PString fields; update tests to instantiate new PString shape.
Tests / Build helpers
src/build_helpers.rs, src/evaluator/engine/tests.rs
Update unit tests and serialized expectations to include new pstring fields.
Documentation & Guides
docs/src/*, AGENTS.md, CONTRIBUTING.md, GOTCHAS.md
Document new pstring width/endian flags and /J; add GOTCHAS and cross-references; update examples and architecture/docs.
Repo config / manifests
.coderabbit.yaml, .mcp.json, tessl.json
Update CodeRabbit review config paths and auto-review settings; add MCP server config and a vendored manifest.

Sequence Diagram

sequenceDiagram
    participant Client as CLI/Parser
    participant Grammar as Grammar<br/>(parse_pstring_suffix)
    participant AST as AST<br/>(TypeKind::PString)
    participant Evaluator as Evaluator
    participant Reader as StringReader<br/>(read_pstring)
    participant Buffer as Buffer

    Client->>Grammar: parse "pstring/H/J" token
    Grammar->>Grammar: detect "pstring" and call parse_pstring_suffix
    Grammar->>AST: construct TypeKind::PString { length_width, length_includes_itself }
    AST-->>Client: return TypeKind

    Client->>Evaluator: evaluate rule with TypeKind::PString
    Evaluator->>Reader: read_pstring(buffer, offset, max_length, length_width, length_includes_itself)
    Reader->>Buffer: read N prefix bytes (N = length_width.byte_count())
    Buffer-->>Reader: prefix bytes
    Reader->>Reader: parse length (BE/LE), apply /J adjust
    Reader->>Buffer: slice string bytes according to computed length
    Buffer-->>Reader: string bytes
    Reader-->>Evaluator: return Value::String or TypeReadError
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

Possibly related PRs

Suggested labels

parser

Poem

🧵 Pascal strings learn to stretch and bend,
One, two, four bytes signal where contents end.
/H and /L now speak in ordered bytes,
/J counts the prefix in its little rites—
Safe slices, clearer rules, the parser’s friend. ✨

🚥 Pre-merge checks | ✅ 10
✅ Passed checks (10 passed)
Check name Status Explanation
Title check ✅ Passed The PR title follows Conventional Commits spec with type(scope): description format, correctly using 'feat' type and 'parser' scope, and clearly summarizes the main feature implementation of pstring multi-byte variants.
Description check ✅ Passed The PR description is comprehensive and directly related to the changeset, detailing implementation of pstring multi-byte prefixes, test coverage, and linking to issue #171.
Linked Issues check ✅ Passed All acceptance criteria from issue #171 are met: PStringLengthWidth enum added, TypeKind::PString extended with length fields, parser recognizes /B/H/h/L/l/J suffixes, evaluator handles multi-byte prefixes with correct endianness, bounds checking implemented, unit tests added, and documentation updated.
Out of Scope Changes check ✅ Passed All changes are tightly scoped to pstring multi-byte prefix support. Configuration/documentation updates to .coderabbit.yaml, AGENTS.md, CONTRIBUTING.md, GOTCHAS.md, and tessl.json/.mcp.json are supportive infrastructure changes directly related to the feature, not out-of-scope.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 85.00%.
Memory-Safety-Check ✅ Passed No unsafe blocks, raw pointers, transmute calls, or get_unchecked patterns detected across all modified files.
Libmagic-Compatibility-Check ✅ Passed The implementation correctly maps endianness conventions (uppercase=big-endian, lowercase=little-endian), uses proper byte order functions (from_be_bytes/from_le_bytes), implements all width variants with comprehensive test coverage, and maintains GNU libmagic compatibility throughout.
Performance-Regression-Check ✅ Passed The pstring multi-byte length prefix implementation introduces no performance regressions, using zero-allocation string operations and efficient stack-based conversions while maintaining memory-mapped I/O compatibility.
Test-Coverage-Check ✅ Passed Pull request demonstrates robust test coverage for pstring multi-byte length prefix feature with 68 dedicated tests across five files, comprehensive table-driven tests covering all width variants and suffix combinations, explicit edge-case testing, error handling verification with checked arithmetic, and boundary condition testing.
Error-Handling-Check ✅ Passed Library code implements proper error handling with Result-based APIs throughout pstring implementation, no panics or unwrap calls in src/ directory.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 171-parser-implement-pstring-multi-byte-length-prefix-variants-b-h-l

Comment @coderabbitai help to get the list of available commands and usage tips.

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Mar 24, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 CI must pass

Wonderful, this rule succeeded.

All CI checks must pass. Release-plz PRs are exempt because they only bump versions and changelogs (code was already tested on main), and GITHUB_TOKEN-triggered force-pushes suppress CI.

  • check-success = coverage
  • check-success = quality
  • check-success = test
  • check-success = test-cross-platform (macos-latest, macOS)
  • check-success = test-cross-platform (ubuntu-22.04, Linux)
  • check-success = test-cross-platform (ubuntu-latest, Linux)
  • check-success = test-cross-platform (windows-latest, Windows)

🟢 Do not merge outdated PRs

Wonderful, this rule succeeded.

Make sure PRs are within 10 commits of the base branch before merging

  • #commits-behind <= 3

@unclesp1d3r unclesp1d3r self-assigned this Mar 24, 2026
@dosubot dosubot Bot added enhancement New feature or request parser Magic file parsing components and grammar labels Mar 24, 2026
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Mar 25, 2026
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
@dosubot
Copy link
Copy Markdown
Contributor

dosubot Bot commented Mar 25, 2026

Related Documentation

6 document(s) may need updating based on files changed in this PR:

libMagic-rs

architecture /libmagic-rs/blob/main/docs/src/architecture.md
View Suggested Changes
@@ -65,6 +65,7 @@
 - ✅ **Offset parsing**: Absolute offsets with comprehensive validation
 - ✅ **Operator parsing**: Equality (`=`, `==`), inequality (`!=`, `<>`), comparison (`<`, `>`, `<=`, `>=`), bitwise (`&`, `^`, `~`), and any-value (`x`) operators
 - ✅ **Value parsing**: Strings, numbers, and hex byte sequences with escape sequences
+- ✅ **PString suffixes**: `/B`, `/H`, `/h`, `/L`, `/l` (length prefix width), `/J` (self-inclusive length)
 - ✅ **Error handling**: Comprehensive nom error handling with meaningful messages
 - ✅ **Rule parsing**: Complete rule parsing via `parse_magic_rule()`
 - ✅ **File parsing**: Complete magic file parsing with `parse_text_magic_file()`
@@ -95,7 +96,11 @@
     Long { endian: Endianness, signed: bool },
     Quad { endian: Endianness, signed: bool },
     String { max_length: Option<usize> },
-    PString { max_length: Option<usize> }, // Pascal string (length-prefixed)
+    PString {
+        max_length: Option<usize>,
+        length_width: PStringLengthWidth,
+        length_includes_itself: bool,
+    }, // Pascal string (length-prefixed)
 }
 
 pub enum Operator {
@@ -111,6 +116,14 @@
     BitwiseNot,                   // ~
     AnyValue,                     // x (always matches)
 }
+
+pub enum PStringLengthWidth {
+    OneByte,                      // 1-byte prefix (default, /B)
+    TwoByteBE,                    // 2-byte big-endian prefix (/H)
+    TwoByteLE,                    // 2-byte little-endian prefix (/h)
+    FourByteBE,                   // 4-byte big-endian prefix (/L)
+    FourByteLE,                   // 4-byte little-endian prefix (/l)
+}
 ```
 
 **Design Principles:**
@@ -120,6 +133,18 @@
 - **Self-contained**: No external dependencies in AST nodes
 - **Type-safe**: Rust's type system prevents invalid rule combinations
 - **Explicit signedness**: `TypeKind::Byte` and integer types (Short, Long, Quad) distinguish signed from unsigned interpretations
+
+**PString Length Prefix Support:**
+
+The `PString` type supports multiple length prefix formats through the `length_width` field:
+
+- **OneByte** (`/B`): Default 1-byte length prefix (0-255 range)
+- **TwoByteBE** (`/H`): 2-byte big-endian prefix
+- **TwoByteLE** (`/h`): 2-byte little-endian prefix
+- **FourByteBE** (`/L`): 4-byte big-endian prefix
+- **FourByteLE** (`/l`): 4-byte little-endian prefix
+
+The `length_includes_itself` field (controlled by the `/J` suffix) indicates JPEG-style self-inclusive length, where the stored length value includes the length field itself. This can be combined with any width variant (e.g., `/HJ` for 2-byte big-endian with self-inclusive length).
 
 ### 3. Evaluator Module (`src/evaluator/`)
 
@@ -134,7 +159,7 @@
 - `types/`: Type interpretation submodule
   - `mod.rs`: Public API surface with `read_typed_value`, `coerce_value_to_type`, and type re-exports
   - `numeric.rs`: Numeric type handling (`read_byte`, `read_short`, `read_long`, `read_quad`) with endianness and signedness support
-  - `string.rs`: String type handling (`read_string`) with null-termination and UTF-8 conversion
+  - `string.rs`: String type handling (`read_string`, `read_pstring`) with null-termination, UTF-8 conversion, and multi-byte length prefix support
   - `tests.rs`: Module tests
 - `offset/`: Offset resolution submodule
   - `mod.rs`: Dispatcher (`resolve_offset`) and re-exports

✅ Accepted

ast-structures /libmagic-rs/blob/main/docs/src/ast-structures.md
View Suggested Changes
@@ -186,7 +186,11 @@
     String { max_length: Option<usize> },
 
     /// Pascal string (length-prefixed)
-    PString { max_length: Option<usize> },
+    PString {
+        max_length: Option<usize>,
+        length_width: PStringLengthWidth,
+        length_includes_itself: bool,
+    },
 }
 ```
 
@@ -231,34 +235,91 @@
 
 ### PString (Pascal String)
 
-Pascal-style length-prefixed strings where the first byte contains the string length.
+Pascal-style length-prefixed strings where the length prefix can be 1, 2, or 4 bytes depending on the `length_width` field.
 
 **Structure:**
-- Length byte: 1 byte indicating string length (0-255)
-- String data: The number of bytes specified by the length byte
+- Length prefix: 1, 2, or 4 bytes indicating string length, with configurable endianness
+- String data: The number of bytes specified by the length prefix
 
 **Example:**
 ```
 0    pstring    JPEG
-```
-Reads one byte as length, then reads that many bytes as a string.
+0    pstring/H  JPEG
+```
+The first line reads a 1-byte length prefix (default), then reads that many bytes as a string. The second line reads a 2-byte big-endian length prefix.
 
 **Behavior:**
 - Returns `Value::String` containing the string data (without the length prefix)
-- Performs bounds checking on both the length byte and the string data
+- Performs bounds checking on both the length prefix and the string data
 - Supports all string comparison operators
-
-**Usage:**
-
-```rust
-// Pascal string with no length limit
-let pstring_type = TypeKind::PString {
-    max_length: None
-};
-
-// Pascal string with maximum 64-byte limit
+- Length prefix width controlled by `PStringLengthWidth` enum
+- Optional `/J` flag indicates JPEG-style self-inclusive length (stored length includes the prefix itself)
+
+### PStringLengthWidth Enum
+
+The `PStringLengthWidth` enum specifies the width and endianness of the length prefix:
+
+```rust
+pub enum PStringLengthWidth {
+    /// 1-byte length prefix (default, `/B` suffix)
+    OneByte,
+    /// 2-byte big-endian length prefix (`/H` suffix)
+    TwoByteBE,
+    /// 2-byte little-endian length prefix (`/h` suffix)
+    TwoByteLE,
+    /// 4-byte big-endian length prefix (`/L` suffix)
+    FourByteBE,
+    /// 4-byte little-endian length prefix (`/l` suffix)
+    FourByteLE,
+}
+```
+
+**Suffix conventions:**
+- `/B` - 1-byte length prefix (default if no suffix specified)
+- `/H` - 2-byte big-endian length prefix
+- `/h` - 2-byte little-endian length prefix
+- `/L` - 4-byte big-endian length prefix
+- `/l` - 4-byte little-endian length prefix
+- `/J` - Length includes the prefix width itself (combinable: `/HJ`, `/lJ`, etc.)
+
+**Examples:**
+
+```rust
+use libmagic_rs::parser::ast::{TypeKind, PStringLengthWidth};
+
+// 1-byte length prefix (default)
+let pstring_default = TypeKind::PString {
+    max_length: None,
+    length_width: PStringLengthWidth::OneByte,
+    length_includes_itself: false,
+};
+
+// 2-byte big-endian length prefix
+let pstring_be = TypeKind::PString {
+    max_length: None,
+    length_width: PStringLengthWidth::TwoByteBE,
+    length_includes_itself: false,
+};
+
+// 4-byte little-endian length prefix
+let pstring_le = TypeKind::PString {
+    max_length: None,
+    length_width: PStringLengthWidth::FourByteLE,
+    length_includes_itself: false,
+};
+
+// 2-byte big-endian with /J flag (JPEG-style self-inclusive length)
+let pstring_jpeg = TypeKind::PString {
+    max_length: None,
+    length_width: PStringLengthWidth::TwoByteBE,
+    length_includes_itself: true,
+};
+
+// Maximum 64-byte limit with 1-byte prefix
 let limited_pstring = TypeKind::PString {
-    max_length: Some(64)
+    max_length: Some(64),
+    length_width: PStringLengthWidth::OneByte,
+    length_includes_itself: false,
 };
 ```
 

✅ Accepted

evaluator /libmagic-rs/blob/main/docs/src/evaluator.md
View Suggested Changes
@@ -34,7 +34,7 @@
   - **`types/numeric.rs`** - Numeric type handling: `read_byte`, `read_short`, `read_long`, `read_quad` with endianness and signedness support
   - **`types/float.rs`** - Floating-point type handling: `read_float` (32-bit IEEE 754), `read_double` (64-bit IEEE 754) with endianness support
   - **`types/date.rs`** - Date and timestamp type handling: `read_date` (32-bit Unix timestamps), `read_qdate` (64-bit Unix timestamps) with endianness and UTC/local time support
-  - **`types/string.rs`** - String type handling: `read_string` with null-termination and UTF-8 conversion
+  - **`types/string.rs`** - String type handling: `read_string` with null-termination and UTF-8 conversion, `read_pstring` with configurable length-prefix widths (1, 2, or 4 bytes)
   - **`types/tests.rs`** - Module tests
 - **`evaluator/strength.rs`** - Rule strength calculation
 
@@ -118,6 +118,7 @@
 - **Date**: 32-bit Unix timestamps (signed seconds since epoch) with configurable endianness and UTC/local time formatting
 - **QDate**: 64-bit Unix timestamps (signed seconds since epoch) with configurable endianness and UTC/local time formatting
 - **String**: Byte sequences with length limits
+- **PString**: Pascal-style length-prefixed strings with 1-byte (`/B`), 2-byte (`/H` or `/h`), or 4-byte (`/L` or `/l`) length prefixes, supporting big-endian and little-endian byte order
 - **Bounds checking**: Prevents buffer overruns
 
 ```rust
@@ -174,6 +175,34 @@
 - Both support UTC or local time formatting
 - The evaluator reads raw integer timestamps from the buffer and converts them to formatted date strings for comparison
 - Example: A 32-bit value `1234567890` at offset 0 with type `ldate` would be evaluated as `"Fri Feb 13 23:31:30 2009"`
+
+**Pascal String Type Reading (`evaluator/types/string.rs`):**
+
+```rust
+pub fn read_pstring(
+    buffer: &[u8],
+    offset: usize,
+    max_length: Option<usize>,
+    length_width: PStringLengthWidth,
+    length_includes_itself: bool,
+) -> Result<Value, TypeReadError>
+```
+
+- `read_pstring()` reads a length-prefixed Pascal string with configurable prefix width
+- **Length prefix width** (`length_width`):
+  - `PStringLengthWidth::OneByte` - 1-byte length prefix (`/B` suffix, default)
+  - `PStringLengthWidth::TwoByteBE` - 2-byte big-endian length prefix (`/H` suffix)
+  - `PStringLengthWidth::TwoByteLE` - 2-byte little-endian length prefix (`/h` suffix)
+  - `PStringLengthWidth::FourByteBE` - 4-byte big-endian length prefix (`/L` suffix)
+  - `PStringLengthWidth::FourByteLE` - 4-byte little-endian length prefix (`/l` suffix)
+- **Length interpretation**: 
+  - Reads 1, 2, or 4 bytes from buffer using `from_be_bytes` or `from_le_bytes` depending on variant
+  - The length value specifies how many bytes of string data follow the prefix
+- **`/J` flag** (`length_includes_itself`):
+  - When `true`, the stored length value includes the prefix width itself (JPEG-style)
+  - The evaluator subtracts the prefix width (1, 2, or 4 bytes) from the length to get effective content length
+  - Example: A 2-byte big-endian prefix with value `7` and `/J` flag yields `7 - 2 = 5` bytes of string content
+- Returns `Value::String` with UTF-8 conversion (using lossy conversion for invalid UTF-8)
 
 ### Operator Application (`evaluator/operators.rs`)
 
@@ -486,11 +515,42 @@
 assert_eq!(matches[0].message, "Pi constant detected");
 ```
 
+**Example with pstring types:**
+
+```rust
+use libmagic_rs::{evaluate_rules, EvaluationConfig};
+use libmagic_rs::parser::parse_text_magic_file;
+
+// Parse magic rules with pstring variants
+let magic_content = r#"
+0 pstring/B MAGIC Pascal string (1-byte prefix)
+0 pstring/H =\x00\x05MAGIC Pascal string (2-byte BE prefix)
+0 pstring/h =\x05\x00MAGIC Pascal string (2-byte LE prefix)
+0 pstring/L =\x00\x00\x00\x05MAGIC Pascal string (4-byte BE prefix)
+0 pstring/l =\x05\x00\x00\x00MAGIC Pascal string (4-byte LE prefix)
+"#;
+let rules = parse_text_magic_file(magic_content)?;
+
+// 1-byte prefix: length=5, then "MAGIC"
+let buffer = b"\x05MAGIC";
+let matches = evaluate_rules(&rules, &buffer)?;
+assert_eq!(matches[0].message, "Pascal string (1-byte prefix)");
+
+// 2-byte big-endian prefix with /J flag: stored length 7 (includes 2-byte prefix), effective content 5 bytes
+let magic_content_j = r#"
+0 pstring/HJ =MAGIC JPEG-style pstring with self-inclusive length
+"#;
+let rules_j = parse_text_magic_file(magic_content_j)?;
+let buffer_j = b"\x00\x07MAGIC"; // 2-byte BE prefix: value 7, minus 2 = 5 bytes of content
+let matches_j = evaluate_rules(&rules_j, &buffer_j)?;
+assert_eq!(matches_j[0].message, "JPEG-style pstring with self-inclusive length");
+```
+
 ## Implementation Status
 
 - [x] Basic evaluation engine structure
 - [x] Offset resolution (absolute, relative, from-end)
-- [x] Type reading with endianness support (Byte, Short, Long, Quad, Float, Double, Date, QDate, String)
+- [x] Type reading with endianness support (Byte, Short, Long, Quad, Float, Double, Date, QDate, String, PString with 1/2/4-byte prefixes)
 - [x] Operator application (Equal, NotEqual, LessThan, GreaterThan, LessEqual, GreaterEqual, BitwiseAnd, BitwiseAndMask)
 - [x] Hierarchical rule processing with child evaluation
 - [x] Error handling with graceful degradation

✅ Accepted

magic-format /libmagic-rs/blob/main/docs/src/magic-format.md
View Suggested Changes
@@ -227,13 +227,57 @@
 
 ### Pascal String Type
 
-Pascal string (pstring) is a length-prefixed string type. The first byte contains the string length (0-255), followed by that many bytes of string data. Unlike C strings, Pascal strings are not null-terminated.
+Pascal string (pstring) is a length-prefixed string type. The length prefix can be 1, 2, or 4 bytes depending on the suffix flag. Unlike C strings, Pascal strings are not null-terminated.
+
+#### Length Prefix Width
+
+The default pstring type uses a 1-byte length prefix (0-255 range). Use suffix flags to specify different prefix widths:
+
+| Suffix | Width   | Endianness    | Range           |
+| ------ | ------- | ------------- | --------------- |
+| `/B`   | 1 byte  | N/A           | 0-255 (default) |
+| `/H`   | 2 bytes | big-endian    | 0-65535         |
+| `/h`   | 2 bytes | little-endian | 0-65535         |
+| `/L`   | 4 bytes | big-endian    | 0-4294967295    |
+| `/l`   | 4 bytes | little-endian | 0-4294967295    |
+
+#### Self-Inclusive Length (`/J` Flag)
+
+The `/J` flag indicates that the stored length value includes the size of the length prefix itself (JPEG-style). This flag can be combined with any width variant.
+
+#### Examples
+
+Basic pstring with default 1-byte prefix:
 
 ```text
 0       pstring   =JPEG     JPEG image (Pascal string)
 ```
 
-The evaluator reads the length byte, then reads that many bytes as string data. The optional max_length parameter caps the length byte value:
+2-byte big-endian length prefix:
+
+```text
+0       pstring/H =JPEG     JPEG image (2-byte BE prefix)
+```
+
+4-byte little-endian length prefix:
+
+```text
+0       pstring/l x         \b, name: %s
+```
+
+Self-inclusive length with 2-byte big-endian prefix:
+
+```text
+0       pstring/HJ x        \b, JPEG-style length
+```
+
+Self-inclusive length with default 1-byte prefix:
+
+```text
+0       pstring/J  x        \b, self-inclusive length
+```
+
+The optional max_length parameter caps the length value:
 
 ```text
 0       pstring   x         \b, name: %s

✅ Accepted

MAGIC_FORMAT /libmagic-rs/blob/main/docs/MAGIC_FORMAT.md
View Suggested Changes
@@ -188,15 +188,37 @@
 
 **Pascal String (pstring)**
 
-Length-prefixed string type where the first byte contains the string length (0-255), followed by that many bytes of string data. Unlike C strings, Pascal strings are not null-terminated.
-
-```text
-0       pstring   =JPEG     JPEG image (Pascal string)
-```
-
-The length byte value determines how many bytes to read for the string data. If `max_length` is specified in the magic file (not shown in the basic syntax), it caps the length byte value to prevent reading excessive data.
+Length-prefixed string type where a length prefix (1, 2, or 4 bytes) specifies the number of bytes of string data that follow. Unlike C strings, Pascal strings are not null-terminated.
+
+The length prefix width is controlled by suffix flags:
+
+| Suffix | Length Prefix Width | Byte Order    |
+| ------ | ------------------- | ------------- |
+| `/B`   | 1 byte (default)    | N/A           |
+| `/H`   | 2 bytes             | big-endian    |
+| `/h`   | 2 bytes             | little-endian |
+| `/L`   | 4 bytes             | big-endian    |
+| `/l`   | 4 bytes             | little-endian |
+
+The `/J` flag indicates JPEG-style self-inclusive length where the stored length value includes the size of the length prefix itself. This flag can be combined with any width suffix (`/HJ`, `/lJ`, etc.) or used alone (`/J` defaults to 1-byte width).
+
+Examples:
+
+```text
+0       pstring   =JPEG         JPEG image (1-byte prefix, default)
+0       pstring/B =JPEG         JPEG image (1-byte prefix, explicit)
+0       pstring/H =JPEG         JPEG image (2-byte big-endian prefix)
+0       pstring/h =JPEG         JPEG image (2-byte little-endian prefix)
+0       pstring/L =JPEG         JPEG image (4-byte big-endian prefix)
+0       pstring/l =JPEG         JPEG image (4-byte little-endian prefix)
+0       pstring/HJ =JPEG        JPEG image (2-byte BE, self-inclusive length)
+```
+
+If `max_length` is specified in the magic file (not shown in the basic syntax), it caps the length value to prevent reading excessive data.
 
 ### String Flags
+
+Flags for `string` type:
 
 | Flag | Description            |
 | ---- | ---------------------- |
@@ -209,6 +231,8 @@
 ```text
 0       string/c  <!doctype  HTML document
 ```
+
+Flags for `pstring` type are documented in the Pascal String section above.
 
 ### Date/Timestamp Types
 

✅ Accepted

parser /libmagic-rs/blob/main/docs/src/parser.md
View Suggested Changes
@@ -186,40 +186,77 @@
 
 ### Pascal String (pstring) Type
 
-The parser supports Pascal-style length-prefixed strings through the `pstring` keyword:
+The parser supports Pascal-style length-prefixed strings through the `pstring` keyword with multiple length prefix width variants:
 
 **Type Keyword:**
 
-- `pstring` - Length-prefixed string (1-byte length + string data) → `TypeKind::PString { max_length: None }`
+- `pstring` - Length-prefixed string → `TypeKind::PString { max_length: None, length_width: PStringLengthWidth::OneByte, length_includes_itself: false }`
+
+**Length Prefix Width Variants:**
+
+Pascal strings support multiple length prefix widths via suffix modifiers:
+
+- `/B` - 1-byte length prefix (default) → `PStringLengthWidth::OneByte`
+- `/H` - 2-byte big-endian length prefix → `PStringLengthWidth::TwoByteBE`
+- `/h` - 2-byte little-endian length prefix → `PStringLengthWidth::TwoByteLE`
+- `/L` - 4-byte big-endian length prefix → `PStringLengthWidth::FourByteBE`
+- `/l` - 4-byte little-endian length prefix → `PStringLengthWidth::FourByteLE`
+
+**Self-Inclusive Length Flag (`/J`):**
+
+The `/J` flag indicates JPEG-style self-inclusive length, where the stored length value includes the length prefix bytes themselves. The evaluator subtracts the prefix width from the stored length to determine the actual string data length.
+
+The `/J` flag can be combined with any width variant:
+
+- `/J` - 1-byte self-inclusive (default width)
+- `/BJ` - 1-byte self-inclusive (explicit)
+- `/HJ` - 2-byte big-endian self-inclusive
+- `/hJ` - 2-byte little-endian self-inclusive
+- `/LJ` - 4-byte big-endian self-inclusive
+- `/lJ` - 4-byte little-endian self-inclusive
 
 **Format:**
 
-Pascal strings store the length as the first byte (0-255), followed by that many bytes of string data. Unlike C strings, they are not null-terminated.
+Pascal strings store the length as a prefix (1, 2, or 4 bytes depending on the variant), followed by that many bytes of string data. Unlike C strings, they are not null-terminated. When the `/J` flag is used, the length value includes the prefix size itself.
 
 **Parser Implementation:**
 
 - Recognized by `parse_type_keyword()` in `src/parser/types.rs`
-- Maps to `TypeKind::PString` in the AST
-- Evaluator reads length prefix byte then that many bytes as string data
+- Suffix parsing handled by `parse_pstring_suffix()` in `src/parser/grammar/mod.rs`
+- Maps to `TypeKind::PString` in the AST with `length_width` and `length_includes_itself` fields
+- Evaluator reads length prefix using appropriate byte order (`from_be_bytes` or `from_le_bytes`)
 - Stored as `Value::String` for comparison with string operators
-- Supports optional `max_length` field to cap the length byte value
+- Supports optional `max_length` field to cap the length value
 
 **Usage in Magic Rules:**
 
 ```rust
-// Basic pstring matching
+// Basic pstring matching (1-byte length prefix)
 0 pstring =Hello     // Match if pstring equals "Hello"
 0 pstring x          // Match any pstring value
 
-// With max_length constraint (parsed separately)
-0 pstring/64 x       // Limit string read to 64 bytes
-```
-
-**Features:**
-
-- ✅ Single type keyword `pstring`
-- ✅ Length-prefixed format (1 byte length, 0-255 bytes data)
-- ✅ Bounds checking for both length byte and string data
+// Multi-byte length prefix variants
+0 pstring/H =Test    // 2-byte big-endian length prefix
+0 pstring/h =Test    // 2-byte little-endian length prefix
+0 pstring/L =Test    // 4-byte big-endian length prefix
+0 pstring/l =Test    // 4-byte little-endian length prefix
+
+// JPEG-style self-inclusive length
+0 pstring/J x        // 1-byte self-inclusive length
+0 pstring/HJ =Data   // 2-byte big-endian self-inclusive length
+0 pstring/lJ =Data   // 4-byte little-endian self-inclusive length
+
+// With max_length constraint
+0 pstring/H/64 x     // 2-byte prefix, limit read to 64 bytes
+```
+
+**Features:**
+
+- ✅ Five length prefix width variants (1-byte, 2-byte BE/LE, 4-byte BE/LE)
+- ✅ Self-inclusive length flag (`/J`) for JPEG-style length encoding
+- ✅ Combinable suffix syntax (`/HJ`, `/lJ`, etc.)
+- ✅ Bounds checking for both length prefix and string data
+- ✅ Proper endianness handling via `from_be_bytes` / `from_le_bytes`
 - ✅ UTF-8 validation with replacement character for invalid sequences
 - ✅ Optional `max_length` parameter to limit string reads
 - ✅ String comparison operators work with pstring values

✅ Accepted

Note: You must be authenticated to accept/decline updates.

How did I do? Any feedback?

@coderabbitai coderabbitai Bot removed the parser Magic file parsing components and grammar label Mar 25, 2026
@coderabbitai coderabbitai Bot added evaluator Rule evaluation engine and logic testing Test infrastructure and coverage labels Mar 25, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 25, 2026

Codecov Report

❌ Patch coverage is 95.27778% with 17 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/evaluator/types/string.rs 94.34% 13 Missing ⚠️
src/parser/codegen.rs 82.35% 3 Missing ⚠️
src/parser/types.rs 96.55% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

…r and evaluator

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
docs/src/ast-structures.md (1)

172-190: ⚠️ Potential issue | 🟠 Major

AST docs still describe the old TypeKind::PString shape.

TypeKind::PString is documented with only max_length, but the PR adds width and /J-related semantics. Please update the enum snippet and PString section to include new fields and add a short migration note for downstream code that constructs or pattern-matches this variant.

As per coding guidelines, "Review documentation quality, completeness, and accuracy. Ensure comprehensive API documentation, usage examples, and migration guides. Focus on user experience and developer onboarding."

Also applies to: 232-267

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/src/ast-structures.md` around lines 172 - 190, Update the TypeKind enum
snippet and documentation to reflect the new PString shape: modify
TypeKind::PString to include the added fields (the new width field and the
/J-related flag/semantics introduced in the PR) and update the descriptive
paragraph for "Pascal string (length-prefixed)" to explain how width and the /J
semantics affect encoding/decoding; add a short migration note explaining how to
construct the new PString variant and how to update pattern matches (matching on
fields or using wildcard for added fields) so downstream code compiles, and
apply the same changes to the other occurrence mentioned (the block around the
later lines referenced).
docs/src/magic-format.md (1)

228-240: ⚠️ Potential issue | 🟠 Major

Magic-format reference is missing new pstring suffix grammar.

The section should document supported pstring suffixes (/B, /H, /h, /L, /l, /J, and combined forms like /HJ) with concrete examples and behavior notes. Right now readers are told only 1-byte length-prefix behavior, which is no longer accurate.

As per coding guidelines, "Review documentation quality, completeness, and accuracy. Ensure comprehensive API documentation, usage examples, and migration guides. Focus on user experience and developer onboarding."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/src/magic-format.md` around lines 228 - 240, Update the "Pascal String
Type" pstring docs to reflect the new suffix grammar: list supported suffixes
(/B, /H, /h, /L, /l, /J) and allowed combined forms (e.g., /HJ), explain how
each suffix changes the length-prefix size and endianness (e.g., /B=1-byte,
/H=/h=2-byte big/little as implemented, /L=/l=4-byte, /J=8-byte), document how
optional max_length still caps the decoded length, and add concrete examples
showing the magic lines and resulting behavior (e.g., "0 pstring/B =JPEG" vs "0
pstring/HJ x =...") so readers can see exact parsing semantics and migration
implications for code expecting 1-byte-only pstrings.
docs/src/parser.md (1)

187-226: ⚠️ Potential issue | 🟠 Major

Update pstring parser docs to match new suffix support.

This section is now stale: it documents only 1-byte pstring behavior, but parser/evaluator now support /B, /H, /h, /L, /l, and /J combinations. Please add syntax, endianness semantics, /J self-inclusive length behavior, and at least one example per width family so readers can author rules correctly.

As per coding guidelines, "Review documentation quality, completeness, and accuracy. Ensure comprehensive API documentation, usage examples, and migration guides. Focus on user experience and developer onboarding."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/src/parser.md` around lines 187 - 226, Update the "Pascal String
(pstring) Type" docs to document the new suffixes and their semantics: list the
allowed suffixes (/B, /H, /h, /L, /l, /J), explain width mapping (B=1, H=2, L=4)
and that uppercase means big-endian while lowercase means little-endian for
multi-byte length fields, and describe /J as the self-inclusive length variant
(length includes the size of the length field itself). Show the exact parsing
behavior used by parse_type_keyword() and the evaluator: how the length field is
read (with chosen width and endianness), how max_length still caps the decoded
length, how /J adjusts the resulting byte count, UTF‑8 validation/replacement,
and that values are produced as Value::String for comparisons mapped to
TypeKind::PString. Add one short usage example per width family (B, H/h, L/l)
including a /J example and one with a max_length (e.g., pstring/64), and ensure
the examples mirror the syntax the parser recognizes so readers can author rules
correctly.
src/evaluator/types/string.rs (1)

86-139: 🛠️ Refactor suggestion | 🟠 Major

Document the two new read_pstring knobs in rustdoc.

The public signature now exposes length_width and length_includes_itself, but the # Arguments section still documents only buffer, offset, and max_length, and the examples never show /J. Please add those parameters plus the underflow error case so the new behavior is discoverable in rustdoc. As per coding guidelines, "All public APIs require rustdoc with examples; include error conditions and recovery strategies; provide usage examples for common patterns."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/evaluator/types/string.rs` around lines 86 - 139, Update the rustdoc for
the public function read_pstring to document the two new parameters: add entries
in the # Arguments section for length_width (type PStringLengthWidth) and
length_includes_itself (bool) explaining their meanings and how they affect the
parsed length, and mention the underflow/error case where a length prefix that
implies fewer bytes than the header or causes negative effective payload should
return/raise TypeReadError::BufferOverrun; update the # Examples to include
calls that demonstrate different PStringLengthWidth variants (OneByte,
TwoByteBE, FourByteLE) and both values of length_includes_itself (true/false)
including an example that triggers the underflow/BufferOverrun case so the new
behavior and recovery are visible in rustdoc.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.mcp.json:
- Around line 1-12: This PR should not include the MCP server configuration;
either remove the .mcp.json entry entirely from this PR or extract it into its
own PR that adds the file plus clear documentation; specifically delete or
relocate the "mcpServers" object containing the "tessl" entry (type "stdio",
command "tessl", args ["mcp","start"]) and if moved, add a README or PR
description explaining the tessl dependency, its purpose, how it integrates with
the project, and why the MCP server is needed.

In `@AGENTS.md`:
- Around line 208-209: The pstring documentation omits the implemented /J
(self-inclusive length) suffix and its combinability with size suffixes (e.g.,
/HJ, /lJ); update the pstring sentence in AGENTS.md to state that pstring
supports 1/2/4-byte length prefixes via /B, /H (or /h), and /L (or /l) and also
supports the /J modifier which makes the length count include the length
byte(s), and note that /J can be combined with any size suffix (examples: /HJ,
/lJ) so the docs match actual behavior of pstring parsing.

In `@GOTCHAS.md`:
- Around line 94-97: Update the GOTCHAS.md section to correctly describe pstring
endianness: state that /H denotes a 2-byte big-endian length prefix and /L
denotes a 4-byte big-endian length prefix (while /h and /l remain the
little-endian counterparts), and adjust the example bytes and wording
accordingly so examples show big-endian ordering for /H and /L and little-endian
for /h and /l.

In `@src/evaluator/types/string.rs`:
- Around line 829-833: The test
test_read_pstring_j_flag_length_less_than_prefix_width is using the wrong byte
order for a TwoByteLE prefix: change the buffer used in the test (referencing
read_pstring and PStringLengthWidth::TwoByteLE) so the two-byte prefix encodes 1
in little-endian (swap the bytes to b"\x01\x00xx") to trigger the checked_sub
underflow branch; alternatively, if you prefer not to change bytes, switch the
length width to TwoByteBE so the current bytes represent 1.
- Around line 149-156: The PStringLengthWidth::OneByte branch in read_pstring
currently uses direct indexing (len_bytes[0]); change it to use safe accessor
semantics like len_bytes.get(0) and propagate the same
TypeReadError::BufferOverrun (or appropriate error) if missing, then convert the
retrieved byte to usize (e.g., usize::from(*byte)) so this branch conforms to
the repo invariant of always using .get() for buffer access.

In `@src/parser/ast.rs`:
- Around line 239-240: The PString doc comment incorrectly states the length
prefix is little-endian; update the comment for the PString type (and any
related functions/constructors named PString, parse_pstring, or similar in
ast.rs) to say the prefix endianness is configurable and can be either
little-endian or big-endian (supported by format specifiers /H, /h, /L, /l), and
describe that the length is stored in the specified endianness followed by that
many bytes of string data (not null-terminated); keep the note about 1, 2, or 4
byte lengths and mention the specifiers that control byte order so future
readers can find the supported options.
- Around line 949-958: The round-trip serialization test only covers PString
with PStringLengthWidth::OneByte and length_includes_itself: false; add
additional TypeKind::PString cases to that test exercising 2-byte and 4-byte
width variants for both endianness and a case where length_includes_itself is
true (the “/J” case). Concretely, in the existing round-trip test that builds
TypeKind::PString values, add at least: a 2-byte big-endian variant, a 2-byte
little-endian variant, a 4-byte big-endian variant, a 4-byte little-endian
variant (using the appropriate PStringLengthWidth enum variants for 2/4 bytes
and BE/LE), and one PString with length_includes_itself: true; ensure each new
instance is included in the same serialize->deserialize assertions to catch
serde shape regressions.

In `@src/parser/grammar/mod.rs`:
- Around line 727-737: The code currently consumes the '/' before validating the
pstring suffix which lets invalid suffixes like '/x' be silently accepted;
change the logic in the pstring handling block so you first attempt to
parse/validate the suffix (e.g., call or adapt parse_pstring_suffix to return a
Result or an Option without mutating input) and only update input,
pstring_length_width, and pstring_length_includes_itself when the suffix parse
succeeds; on failure, emit/collect a syntax error (with location) and leave the
input unchanged so parsing can continue. Ensure references to
parse_pstring_suffix, input, pstring_length_width, and
pstring_length_includes_itself are used to locate and implement the change.

In `@tessl.json`:
- Around line 5-7: The dependency entry for "actionbook/rust-skills" is pinned
to a raw commit hash in the "version" field which hinders maintainability;
replace the commit-hash value with a stable tag or semantic release (e.g.,
"v1.2.3") when available, and if a tag is not available document an update
policy for this vendored dependency and add an automated periodic check (CI job
or script) to verify the commit remains accessible and to alert when newer
tags/commits appear; update the "version" field accordingly and add a short
comment or README note describing the chosen update cadence and the existence of
the periodic check.
- Around line 1-40: tessl.json appears to be a development-only AI/tooling
config (contains "name": "stringy" and a pinned dependency
"actionbook/rust-skills") and should not be included in the published crate;
either move this file into a non-distributed location (e.g., .github/ or a
tools/ directory) or add tessl.json to .gitignore, correct the "name" field to
match the repository ("libmagic-rs") if you keep it, and add a short README
entry documenting why the actionbook/rust-skills dependency is needed and
whether pinning to the specific commit is intentional or should be
relaxed/removed.

---

Outside diff comments:
In `@docs/src/ast-structures.md`:
- Around line 172-190: Update the TypeKind enum snippet and documentation to
reflect the new PString shape: modify TypeKind::PString to include the added
fields (the new width field and the /J-related flag/semantics introduced in the
PR) and update the descriptive paragraph for "Pascal string (length-prefixed)"
to explain how width and the /J semantics affect encoding/decoding; add a short
migration note explaining how to construct the new PString variant and how to
update pattern matches (matching on fields or using wildcard for added fields)
so downstream code compiles, and apply the same changes to the other occurrence
mentioned (the block around the later lines referenced).

In `@docs/src/magic-format.md`:
- Around line 228-240: Update the "Pascal String Type" pstring docs to reflect
the new suffix grammar: list supported suffixes (/B, /H, /h, /L, /l, /J) and
allowed combined forms (e.g., /HJ), explain how each suffix changes the
length-prefix size and endianness (e.g., /B=1-byte, /H=/h=2-byte big/little as
implemented, /L=/l=4-byte, /J=8-byte), document how optional max_length still
caps the decoded length, and add concrete examples showing the magic lines and
resulting behavior (e.g., "0 pstring/B =JPEG" vs "0 pstring/HJ x =...") so
readers can see exact parsing semantics and migration implications for code
expecting 1-byte-only pstrings.

In `@docs/src/parser.md`:
- Around line 187-226: Update the "Pascal String (pstring) Type" docs to
document the new suffixes and their semantics: list the allowed suffixes (/B,
/H, /h, /L, /l, /J), explain width mapping (B=1, H=2, L=4) and that uppercase
means big-endian while lowercase means little-endian for multi-byte length
fields, and describe /J as the self-inclusive length variant (length includes
the size of the length field itself). Show the exact parsing behavior used by
parse_type_keyword() and the evaluator: how the length field is read (with
chosen width and endianness), how max_length still caps the decoded length, how
/J adjusts the resulting byte count, UTF‑8 validation/replacement, and that
values are produced as Value::String for comparisons mapped to
TypeKind::PString. Add one short usage example per width family (B, H/h, L/l)
including a /J example and one with a max_length (e.g., pstring/64), and ensure
the examples mirror the syntax the parser recognizes so readers can author rules
correctly.

In `@src/evaluator/types/string.rs`:
- Around line 86-139: Update the rustdoc for the public function read_pstring to
document the two new parameters: add entries in the # Arguments section for
length_width (type PStringLengthWidth) and length_includes_itself (bool)
explaining their meanings and how they affect the parsed length, and mention the
underflow/error case where a length prefix that implies fewer bytes than the
header or causes negative effective payload should return/raise
TypeReadError::BufferOverrun; update the # Examples to include calls that
demonstrate different PStringLengthWidth variants (OneByte, TwoByteBE,
FourByteLE) and both values of length_includes_itself (true/false) including an
example that triggers the underflow/BufferOverrun case so the new behavior and
recovery are visible in rustdoc.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d45f7fed-b735-4e94-9b40-7b94b86db8cb

📥 Commits

Reviewing files that changed from the base of the PR and between 2c35b16 and 1821cc9.

⛔ Files ignored due to path filters (7)
  • .codex/config.toml is excluded by none and included by none
  • .gemini/settings.json is excluded by none and included by none
  • .gitignore is excluded by none and included by none
  • .tessl/.gitignore is excluded by none and included by none
  • docs/API_REFERENCE.md is excluded by none and included by none
  • tests/evaluator_tests.rs is excluded by none and included by none
  • tests/property_tests.rs is excluded by none and included by none
📒 Files selected for processing (19)
  • .coderabbit.yaml
  • .mcp.json
  • AGENTS.md
  • CONTRIBUTING.md
  • GOTCHAS.md
  • docs/src/ast-structures.md
  • docs/src/magic-format.md
  • docs/src/parser.md
  • src/build_helpers.rs
  • src/evaluator/engine/tests.rs
  • src/evaluator/strength.rs
  • src/evaluator/types/mod.rs
  • src/evaluator/types/string.rs
  • src/parser/ast.rs
  • src/parser/codegen.rs
  • src/parser/grammar/mod.rs
  • src/parser/grammar/tests.rs
  • src/parser/types.rs
  • tessl.json
💤 Files with no reviewable changes (1)
  • .coderabbit.yaml

Comment thread .mcp.json Outdated
Comment thread AGENTS.md Outdated
Comment thread GOTCHAS.md
Comment thread src/evaluator/types/string.rs Outdated
Comment thread src/evaluator/types/string.rs Outdated
Comment thread src/parser/ast.rs Outdated
Comment thread src/parser/ast.rs
Comment thread src/parser/grammar/mod.rs
Comment thread tessl.json Outdated
Comment thread tessl.json Outdated
@coderabbitai coderabbitai Bot added parser Magic file parsing components and grammar and removed evaluator Rule evaluation engine and logic testing Test infrastructure and coverage labels Mar 25, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/src/parser.md (1)

187-263: ⚠️ Potential issue | 🔴 Critical

Remove unsupported max_length example from documentation.

The pstring parser does not support the /64 suffix syntax shown in the line 250 example. The test cases in src/parser/grammar/tests.rs:2138-2180 confirm the parser only handles width variants (/B, /H, /h, /L, /l) and the self-inclusive flag (/J), with max_length always initialized to None. The max_length field exists in the TypeKind::PString AST struct for programmatic use, but cannot be specified through magic rule syntax.

Remove the example 0 pstring/H/64 x from line 250 or clarify that max_length is only available when constructing rules programmatically via the Rust API.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/src/parser.md` around lines 187 - 263, Update the pstring docs to remove
the unsupported suffix example (the `0 pstring/H/64 x` line) or explicitly state
that max_length cannot be specified in magic syntax; instead note that
max_length exists only in the AST/ Rust API. Reference parse_type_keyword() and
parse_pstring_suffix() as the parser entrypoints and TypeKind::PString as the
AST variant so readers understand that only /B,/H,/h,/L,/l and /J suffixes are
parsed and that max_length is set programmatically, not via rule suffixes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@docs/src/parser.md`:
- Around line 187-263: Update the pstring docs to remove the unsupported suffix
example (the `0 pstring/H/64 x` line) or explicitly state that max_length cannot
be specified in magic syntax; instead note that max_length exists only in the
AST/ Rust API. Reference parse_type_keyword() and parse_pstring_suffix() as the
parser entrypoints and TypeKind::PString as the AST variant so readers
understand that only /B,/H,/h,/L,/l and /J suffixes are parsed and that
max_length is set programmatically, not via rule suffixes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 46e46d0e-2016-4321-97fd-f5a090d065ee

📥 Commits

Reviewing files that changed from the base of the PR and between 1821cc9 and 0284b50.

⛔ Files ignored due to path filters (1)
  • docs/MAGIC_FORMAT.md is excluded by none and included by none
📒 Files selected for processing (6)
  • .coderabbit.yaml
  • docs/src/architecture.md
  • docs/src/ast-structures.md
  • docs/src/evaluator.md
  • docs/src/magic-format.md
  • docs/src/parser.md

unclesp1d3r and others added 2 commits March 24, 2026 20:34
- Reject unrecognized pstring suffix characters (e.g., /Z) with parse error
  instead of silently defaulting to OneByte
- Add # Examples rustdoc to all PStringLengthWidth enum variants
- Fix doc comment claiming "little-endian" when both endiannesses are supported
- Re-export PStringLengthWidth from lib.rs alongside other parser::ast types
- Add InvalidPStringLength error variant for /J underflow (was misleadingly
  reported as BufferOverrun)
- Remove unused is_big_endian() helper method
- Add integration test for 2-byte BE prefix with /J flag
- Add test for /J + max_length interaction
- Add test for /J zero-length edge case across all widths
- Add test for invalid suffix rejection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Fix GOTCHAS.md: document correct endianness mapping (uppercase=BE, lowercase=LE)
  and /J flag semantics
- Document /J flag support in AGENTS.md feature list and limitations section
- Use safe .first() accessor instead of len_bytes[0] indexing for OneByte variant
- Add round-trip serialization test coverage for TwoByteBE and FourByteLE variants
  with length_includes_itself variations
- Remove unrelated config files (.mcp.json, tessl.json, .codex/, .gemini/) from PR

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
@unclesp1d3r unclesp1d3r merged commit 88a6a58 into main Mar 25, 2026
28 checks passed
@unclesp1d3r unclesp1d3r deleted the 171-parser-implement-pstring-multi-byte-length-prefix-variants-b-h-l branch March 25, 2026 00:46
@coderabbitai coderabbitai Bot added evaluator Rule evaluation engine and logic testing Test infrastructure and coverage and removed parser Magic file parsing components and grammar labels Mar 25, 2026
@github-actions github-actions Bot mentioned this pull request Apr 25, 2026
unclesp1d3r pushed a commit that referenced this pull request Apr 25, 2026
## 🤖 New release

* `libmagic-rs`: 0.5.0 -> 0.6.0 (⚠ API breaking changes)

### ⚠ `libmagic-rs` breaking changes

```text
--- failure constructible_struct_adds_field: externally-constructible struct adds field ---

Description:
A pub struct constructible with a struct literal has a new pub field. Existing struct literals must be updated to include the new field.
        ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/constructible_struct_adds_field.ron

Failed in:
  field MagicRule.value_transform in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:1189
  field MagicRule.value_transform in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:1189
  field MagicRule.value_transform in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:1189

--- failure copy_impl_added: type now implements Copy ---

Description:
A public type now implements Copy, causing non-move closures to capture it by reference instead of moving it.
        ref: rust-lang/rust#100905
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/copy_impl_added.ron

Failed in:
  libmagic_rs::mime::MimeMapper in /tmp/.tmpwFvgw1/libmagic-rs/src/mime.rs:98

--- failure enum_marked_non_exhaustive: enum marked #[non_exhaustive] ---

Description:
A public enum has been marked #[non_exhaustive]. Pattern-matching on it outside of its crate must now include a wildcard pattern like `_`, or it will fail to compile.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#attr-adding-non-exhaustive
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/enum_marked_non_exhaustive.ron

Failed in:
  enum OffsetSpec in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:198
  enum OffsetSpec in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:198
  enum OffsetSpec in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:198
  enum LibmagicError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:15
  enum LibmagicError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:15
  enum IoError in /tmp/.tmpwFvgw1/libmagic-rs/src/io/mod.rs:26
  enum Operator in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:838
  enum Operator in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:838
  enum Operator in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:838
  enum TypeReadError in /tmp/.tmpwFvgw1/libmagic-rs/src/evaluator/types/mod.rs:56
  enum ParseError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:74
  enum ParseError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:74
  enum Value in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:965
  enum Value in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:965
  enum Value in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:965
  enum TypeKind in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:398
  enum TypeKind in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:398
  enum TypeKind in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:398
  enum EvaluationError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:148
  enum EvaluationError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:148

--- failure enum_struct_variant_field_added: pub enum struct variant field added ---

Description:
An enum's exhaustive struct variant has a new field, which has to be included when constructing or matching on this variant.
        ref: https://doc.rust-lang.org/reference/attributes/type_system.html#the-non_exhaustive-attribute
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/enum_struct_variant_field_added.ron

Failed in:
  field base_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:251
  field adjustment_op of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:266
  field result_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:272
  field base_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:251
  field adjustment_op of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:266
  field result_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:272
  field base_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:251
  field adjustment_op of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:266
  field result_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:272

--- failure function_missing: pub fn removed or renamed ---

Description:
A publicly-visible function cannot be imported by its prior path. A `pub use` may have been removed, or the function itself may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/function_missing.ron

Failed in:
  function libmagic_rs::parser::grammar::is_empty_line, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:1025
  function libmagic_rs::parser::grammar::parse_strength_directive, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:846
  function libmagic_rs::parser::grammar::parse_type_and_operator, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:683
  function libmagic_rs::parser::grammar::parse_offset, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:179
  function libmagic_rs::parser::parse_offset, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:179
  function libmagic_rs::parser::grammar::parse_comment, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:1004
  function libmagic_rs::parser::grammar::parse_message, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:810
  function libmagic_rs::parser::grammar::parse_value, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:633
  function libmagic_rs::parser::grammar::parse_number, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:133
  function libmagic_rs::parser::parse_number, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:133
  function libmagic_rs::parser::grammar::has_continuation, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:1060
  function libmagic_rs::parser::grammar::parse_magic_rule, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:946
  function libmagic_rs::parser::grammar::parse_rule_offset, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:779
  function libmagic_rs::parser::grammar::is_comment_line, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:1042
  function libmagic_rs::parser::grammar::is_strength_directive, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:902
  function libmagic_rs::parser::grammar::parse_type, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:749
  function libmagic_rs::parser::grammar::parse_operator, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:227

--- failure function_parameter_count_changed: pub fn parameter count changed ---

Description:
A publicly-visible function now takes a different number of parameters.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#fn-change-arity
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/function_parameter_count_changed.ron

Failed in:
  libmagic_rs::evaluator::evaluate_single_rule now takes 3 parameters instead of 2, in /tmp/.tmpwFvgw1/libmagic-rs/src/evaluator/engine/mod.rs:196

--- failure inherent_method_missing: pub method removed or renamed ---

Description:
A publicly-visible method or associated fn is no longer available under its prior name. It may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/inherent_method_missing.ron

Failed in:
  FileBuffer::create_symlink, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/io/mod.rs:326
  EvaluationContext::increment_recursion_depth, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/evaluator/mod.rs:114
  EvaluationContext::decrement_recursion_depth, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/evaluator/mod.rs:130
  EvaluationContext::increment_recursion_depth, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/evaluator/mod.rs:114
  EvaluationContext::decrement_recursion_depth, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/evaluator/mod.rs:130

--- failure module_missing: pub module removed or renamed ---

Description:
A publicly-visible module cannot be imported by its prior path. A `pub use` may have been removed, or the module may have been renamed, removed, or made non-public.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/module_missing.ron

Failed in:
  mod libmagic_rs::parser::grammar, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:4

--- failure struct_marked_non_exhaustive: struct marked #[non_exhaustive] ---

Description:
A public struct has been marked #[non_exhaustive], which will prevent it from being constructed using a struct literal outside of its crate. It previously had no private fields, so a struct literal could be used to construct it outside its crate.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#attr-adding-non-exhaustive
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/struct_marked_non_exhaustive.ron

Failed in:
  struct EvaluationConfig in /tmp/.tmpwFvgw1/libmagic-rs/src/config.rs:42
```

<details><summary><i><b>Changelog</b></i></summary><p>

<blockquote>

## [0.6.0] - 2026-04-25

### Features

- **parser**: Add Date and QDate types with serialization support
([#165](#165))
- **parser**: Implement pstring (Pascal string) type
([#170](#170))
- **parser**: Implement pstring multi-byte length prefix variants (/B,
/H, /h, /L, /l, /J)
([#183](#183))
- **evaluator**: Add debug-level tracing for skipped rules
([#184](#184))
- **evaluator**: Implement indirect offset resolution
([#37](#37))
([#199](#199))
- **evaluator**: Implement relative offset resolution
([#38](#38))
([#211](#211))
- **deps**: Add new skills to actionbook/rust-skills and
trailofbits/skills
- **evaluator**: Regex and search types (closes #39)
([#214](#214))
- Implement libmagic meta-type directives and format substitution
([#42](#42))
([#230](#230))

### Bug Fixes

- **regex**: PR #214 follow-up review findings
([#215](#215))
- Load and correctly evaluate /usr/share/file/magic/filesystems and
adjacent magic files
([#233](#233))

### Documentation

- **gotchas**: Clarify requirements for adding TypeKind variants

### Miscellaneous Tasks

- Rename .coderabbitai.yaml to .coderabbit.yaml
- **Mergify**: Configuration update
([#173](#173))
- Update .gitignore to exclude local AI assistant files
- **mergify**: Upgrade configuration to current format
([#205](#205))
- Resolve all pending TODO items
([#212](#212))
- **mergify**: Upgrade configuration to current format
([#231](#231))
<!-- generated by git-cliff -->

### Security

- **io**: Close TOCTOU race in `FileBuffer::new` metadata validation
(CWE-367). `validate_file_metadata` now uses `File::metadata()` on the
open descriptor instead of re-canonicalizing the path, so an attacker
cannot swap the path between `open_file` and validation. Error paths now
report the caller-supplied path rather than the canonicalized variant.
- **cli**: Remove relative-path fallbacks from `default_magic_file_path`
(CWE-426). `./missing.magic`, `./third_party/magic.mgc`, and the
`CI`/`GITHUB_ACTIONS` env-var branch no longer resolve against the
process cwd. CI pipelines must pass `--magic <path>` explicitly.
- **evaluator**: `build_regex` now bounds `size_limit` and
`dfa_size_limit` to 1 MiB (`REGEX_COMPILE_SIZE_LIMIT`) to reject
compile-time DoS patterns (CWE-1333) from adversarial magic files.

### Features

- **parser**: Implement meta-type directives: `name`/`use` subroutines,
`default`/`clear` per-level fallback, and `indirect` re-evaluation.
`parse_text_magic_file` now returns `ParsedMagic { rules, name_table }`
(breaking change from `Vec<MagicRule>`). Named subroutines are hoisted
into `NameTable` at load time and dispatched via `RuleEnvironment` in
the evaluator. Recursion is bounded by
`EvaluationConfig::max_recursion_depth`. Resolves
[#42](#42).
- **evaluator**: Thread-local regex compile cache eliminates the
double-compile paid by every successful regex match.
`regex_bytes_consumed` now reuses the compiled `Regex` from `read_regex`
instead of recompiling the pattern to derive the anchor advance. The
cache is reset at the start of every `evaluate_rules_with_config` call,
bounding memory to one evaluation.
- **config**: `EvaluationConfig` is now `#[non_exhaustive]`; new
builder-style setters (`with_max_recursion_depth`,
`with_max_string_length`, `with_stop_at_first_match`, `with_mime_types`,
`with_timeout_ms`) let external crates construct configurations without
struct literals.
- **parser**: `MagicRule::new()` smart constructor with
`::with_children()`, `::with_strength_modifier()`, `::with_level()`
builder methods and a `::validate()` method enforcing structural
invariants (non-empty message, `level <= MAX_LEVEL`, children nested
strictly deeper than parent). New `MagicRuleValidationError` error type.
- **parser**: `RegexFlags::with_case_insensitive()` and
`::with_start_offset()` builder methods.

### Refactor

- **engine**: Extract `evaluate_pattern_rule()` and
`evaluate_value_rule()` helpers from
`evaluate_single_rule_with_anchor`'s 90-line body. Dispatch is now a
two-arm type-category split; each helper has focused rustdoc on
semantics and invariants.
- **types**: Replace the `_ =>` catch-all in
`bytes_consumed_with_pattern` with an explicit listing of the
fixed-width `TypeKind` variants. Adding a new variable-width variant
without updating this match is now a compile error instead of a silent
relative-offset anchor corruption in release builds.
- **parser**: Split the 185-line `type_keyword_to_kind` match into
per-family helpers (`byte_family`, `short_family`, `long_family`,
`quad_family`, `float_family`, `double_family`, `date_family`,
`qdate_family`, `string_family`). Drops the
`#[allow(clippy::too_many_lines)]` attribute.
- **main**: `main()` returns `std::process::ExitCode` instead of calling
`process::exit`, so destructors run on the happy path. Ctrl-C
`AtomicBool` flag uses `Ordering::Relaxed` instead of `SeqCst`.
- **grammar**: `parse_strength_directive` uses nom 8's `preceded` +
`Parser::map` instead of the legacy `map(pair(char(...), parse_number),
|(_, n)| ...)` pattern.
- **output**: Add `#[serde(skip_serializing_if = "Option::is_none",
default)]` to public `Option<T>` fields so JSON output no longer emits
`"field": null` for unset optional values.

### Documentation

- **lib**: Add `# Security` sections to
`MagicDatabase::with_builtin_rules`, `::with_builtin_rules_and_config`,
`::load_from_file`, and `::load_from_file_with_config` warning about the
unbounded default timeout and recommending
`EvaluationConfig::performance()` for untrusted input.
- **lib**: Document `MagicDatabase: Send + Sync` for parallel scanning.
- **README**: Update `TypeKind` enum example to match the current AST,
add `regex` and `search/N` to the supported types table, add pre-1.0 API
stability warning, correct the roadmap to mark v0.2-v0.4 as shipped.
- **AGENTS.md**: Relabel "Currently Implemented (v0.1.0)" and "Current
Limitations (v0.1.0)" to v0.5.0 and rewrite the Development Phases
section to reflect actual shipped scope.

### Testing

- Security regression tests for S-H1 (planted-magic-file in cwd), S-H2
(TOCTOU path-swap contract), S-M2 (pathological regex bounded runtime),
S-L2 (codegen message escape round-trip), and GOTCHAS S13.1
(`EvaluationConfig::default()` unbounded timeout invariant).
- Backspace message concatenation regression tests for first-match,
consecutive, and empty-rest edge cases.
- `MagicRule::validate()` tests covering empty message, child level
invariant, and max-depth rejection.
- `RegexCache` population/clear/reuse tests.

### Breaking Changes

- **parser**: `parse_text_magic_file` return type changed from
`Result<Vec<MagicRule>, ParseError>` to `Result<ParsedMagic,
ParseError>`. Callers must destructure `ParsedMagic { rules, name_table
}`. Low-level callers that only need the rule list can use
`parsed.rules`. `load_magic_file` and `load_magic_directory` return the
same new type.
</blockquote>


</p></details>

---
This PR was generated with
[release-plz](https://github.com/release-plz/release-plz/).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request evaluator Rule evaluation engine and logic size:XXL This PR changes 1000+ lines, ignoring generated files. testing Test infrastructure and coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parser: implement pstring multi-byte length prefix variants (/B, /H, /L)

2 participants