Skip to content

feat(parser): implement pstring (Pascal string) type#170

Merged
mergify[bot] merged 18 commits into
mainfrom
43-parser-implement-pstring-pascal-string-type
Mar 9, 2026
Merged

feat(parser): implement pstring (Pascal string) type#170
mergify[bot] merged 18 commits into
mainfrom
43-parser-implement-pstring-pascal-string-type

Conversation

@unclesp1d3r
Copy link
Copy Markdown
Member

Summary

  • Add PString variant to TypeKind for Pascal-style length-prefixed strings
  • Parser recognizes pstring keyword and maps to TypeKind::PString
  • Evaluator reads length prefix byte then that many bytes as string data
  • Bounds checking for both the length byte and the string data it references
  • String comparison operators work with pstring values (stored as Value::String)

Changes

  • src/parser/ast.rs -- new PString variant in TypeKind with rustdoc
  • src/parser/types.rs -- pstring keyword parsing in parse_type_keyword and type_keyword_to_kind
  • src/parser/codegen.rs -- serialization support for PString
  • src/evaluator/types/string.rs -- read_pstring() implementation with bounds checking
  • src/evaluator/types/mod.rs -- dispatch PString through read_typed_value
  • src/evaluator/strength.rs -- strength calculation for PString
  • src/evaluator/engine/tests.rs -- engine-level pstring rule matching tests
  • tests/evaluator_tests.rs -- integration tests for pstring evaluation
  • tests/property_tests.rs -- proptest coverage for PString
  • src/parser/grammar/tests.rs -- parser round-trip tests
  • AGENTS.md, docs/ -- documentation updates

Test plan

  • Unit tests for read_pstring (empty, normal, truncated, buffer overrun, offset overflow)
  • Parser tests for pstring keyword recognition and serialization
  • Engine tests for pstring rule matching with Equal operator
  • Integration tests for pstring evaluation through public API
  • Property tests include PString variant
  • just ci-check passes locally

Closes #43

🤖 Generated with Claude Code

unclesp1d3r and others added 8 commits March 8, 2026 21:30
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…ests

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…upport

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Copilot AI review requested due to automatic review settings March 9, 2026 06:40
@unclesp1d3r unclesp1d3r linked an issue Mar 9, 2026 that may be closed by this pull request
6 tasks
@dosubot dosubot Bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Mar 9, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 9, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 070db191-747f-454f-bcd0-61ab976a9846

📥 Commits

Reviewing files that changed from the base of the PR and between b1a22db and d021555.

📒 Files selected for processing (10)
  • AGENTS.md
  • docs/MAGIC_FORMAT.md
  • docs/src/api-reference.md
  • docs/src/architecture.md
  • docs/src/ast-structures.md
  • docs/src/magic-format.md
  • docs/src/parser.md
  • src/evaluator/engine/mod.rs
  • src/evaluator/types/string.rs
  • src/parser/types.rs

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.


Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for Pascal strings (length-prefixed strings), a new data type complementing existing string types with automatic length-prefix handling.
  • Documentation

    • Expanded documentation covering the new Pascal string type, including usage examples and feature descriptions.
    • Updated development status and architectural breakdown with current implementation progress.

Walkthrough

Adds Pascal-style length-prefixed string support ("pstring"): new AST TypeKind variant, parser recognition, read_pstring evaluator, dispatch and strength handling, extensive tests, and documentation updates describing the new type.

Changes

Cohort / File(s) Summary
Parser: AST & Keywords
src/parser/ast.rs, src/parser/types.rs, src/parser/grammar/mod.rs, src/parser/grammar/tests.rs, src/parser/types.rs
Introduce TypeKind::PString { max_length }, treat PString like String in bit_width(), parse pstring keyword, and add parsing tests.
Parser: Serialization / Codegen
src/parser/codegen.rs
Serialize TypeKind::PString (Some/None) mirroring existing String serialization.
Evaluator: Type readers & dispatch
src/evaluator/types/mod.rs, src/evaluator/types/string.rs
Add pub fn read_pstring(...) with bounds/UTF‑8 handling; re-export read_pstring; route TypeKind::PString to read_pstring in typed reader.
Evaluator: Strength & Engine
src/evaluator/strength.rs, src/evaluator/engine/mod.rs
Treat PString like String in default strength logic; minor comment/TODO additions in engine error path.
Tests: unit/integration/property
src/build_helpers.rs, src/evaluator/engine/tests.rs, tests/evaluator_tests.rs, tests/property_tests.rs
Add unit and integration tests for pstring reading/matching, serialization tests, property-test inclusion. Attention: src/evaluator/engine/tests.rs contains a duplicated PString test block.
Docs & Guides
docs/*, docs/src/*, AGENTS.md, .github/copilot-instructions.md, docs/MAGIC_FORMAT.md, docs/src/api-reference.md, docs/src/architecture.md, docs/src/ast-structures.md, docs/src/magic-format.md, docs/src/parser.md
Extensive documentation updates adding pstring description, examples, and references across API, format, architecture and guide docs (some duplicated blocks present).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 I count the bytes, a one-byte hop,

length then letters, sweet pstring crop.
I nibble tests and document the trail,
A prefix hop, the data's tale.
Hooray — I bound, I parse, I cheer, carrot in paw near! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Out of Scope Changes check ❓ Inconclusive The changeset contains a notable duplicate: identical PString test blocks in src/evaluator/engine/tests.rs appear twice, suggesting accidental duplication rather than intentional scope expansion. Remove the duplicate PString test block from src/evaluator/engine/tests.rs to ensure only intentional changes are included.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(parser): implement pstring (Pascal string) type' accurately describes the main change—adding parser support for a new pstring type.
Description check ✅ Passed The description provides relevant details about the pstring implementation across multiple components (parser, evaluator, tests, docs) and directly relates to the changeset.
Linked Issues check ✅ Passed All coding requirements from issue #43 are met: TypeKind extended with PString, parser recognizes pstring, evaluator reads length prefix and bytes, string comparison works, bounds checking implemented, and comprehensive tests added.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 43-parser-implement-pstring-pascal-string-type

Comment @coderabbitai help to get the list of available commands and usage tips.

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Mar 9, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 CI must pass

Wonderful, this rule succeeded.

All CI checks must pass. Release-plz PRs are exempt because they only bump versions and changelogs (code was already tested on main), and GITHUB_TOKEN-triggered force-pushes suppress CI.

  • check-success = coverage
  • check-success = quality
  • check-success = test
  • check-success = test-cross-platform (macos-latest, macOS)
  • check-success = test-cross-platform (ubuntu-22.04, Linux)
  • check-success = test-cross-platform (ubuntu-latest, Linux)
  • check-success = test-cross-platform (windows-latest, Windows)

🟢 Do not merge outdated PRs

Wonderful, this rule succeeded.

Make sure PRs are within 10 commits of the base branch before merging

  • #commits-behind <= 3

@dosubot dosubot Bot added evaluator Rule evaluation engine and logic parser Magic file parsing components and grammar labels Mar 9, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Mar 9, 2026

🧪 CI Insights

Here's what we observed from your CI run for d021555.

🟢 All jobs passed!

But CI Insights is watching 👀

@coderabbitai coderabbitai Bot added documentation Improvements or additions to documentation enhancement New feature or request testing Test infrastructure and coverage size:XXL This PR changes 1000+ lines, ignoring generated files. labels Mar 9, 2026
coderabbitai[bot]
coderabbitai Bot previously approved these changes Mar 9, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 9, 2026

Codecov Report

❌ Patch coverage is 98.86364% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/evaluator/types/string.rs 99.30% 1 Missing ⚠️
src/parser/ast.rs 80.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@dosubot
Copy link
Copy Markdown
Contributor

dosubot Bot commented Mar 9, 2026

Related Documentation

8 document(s) may need updating based on files changed in this PR:

libMagic-rs

api-reference /libmagic-rs/blob/main/docs/src/api-reference.md
View Suggested Changes
@@ -229,6 +229,7 @@
 | `Float { endian }`         | 32-bit IEEE 754 floating-point (added in v0.5.0)                                            |
 | `Double { endian }`        | 64-bit IEEE 754 double-precision floating-point (added in v0.5.0)                           |
 | `String { max_length }`    | String data (discriminant changed from 4 to 6 in v0.5.0)                                    |
+| `PString { max_length }`   | Pascal string - length-prefixed byte followed by string data (returns `Value::String`)      |
 
 ### Operator
 

✅ Accepted

architecture /libmagic-rs/blob/main/docs/src/architecture.md
View Suggested Changes
@@ -95,6 +95,7 @@
     Long { endian: Endianness, signed: bool },
     Quad { endian: Endianness, signed: bool },
     String { max_length: Option<usize> },
+    PString { max_length: Option<usize> }, // Pascal string (length-prefixed)
 }
 
 pub enum Operator {

✅ Accepted

ast-structures /libmagic-rs/blob/main/docs/src/ast-structures.md
View Suggested Changes
@@ -184,6 +184,9 @@
 
     /// String data
     String { max_length: Option<usize> },
+
+    /// Pascal string (length-prefixed)
+    PString { max_length: Option<usize> },
 }
 ```
 
@@ -223,6 +226,39 @@
 // Null-terminated string, max 256 bytes
 let string_type = TypeKind::String {
     max_length: Some(256)
+};
+```
+
+### PString (Pascal String)
+
+Pascal-style length-prefixed strings where the first byte contains the string length.
+
+**Structure:**
+- Length byte: 1 byte indicating string length (0-255)
+- String data: The number of bytes specified by the length byte
+
+**Example:**
+```
+0    pstring    JPEG
+```
+Reads one byte as length, then reads that many bytes as a string.
+
+**Behavior:**
+- Returns `Value::String` containing the string data (without the length prefix)
+- Performs bounds checking on both the length byte and the string data
+- Supports all string comparison operators
+
+**Usage:**
+
+```rust
+// Pascal string with no length limit
+let pstring_type = TypeKind::PString {
+    max_length: None
+};
+
+// Pascal string with maximum 64-byte limit
+let limited_pstring = TypeKind::PString {
+    max_length: Some(64)
 };
 ```
 
@@ -419,7 +455,8 @@
 1. **Use `Byte { signed }`** for single-byte values and flags, specifying signedness
 2. **Use `Short/Long/Quad`** with explicit endianness and signedness for multi-byte integers
 3. **Use `String`** with length limits for text patterns
-4. **Use `Bytes`** for exact binary sequences
+4. **Use `PString`** for Pascal-style length-prefixed strings
+5. **Use `Bytes`** for exact binary sequences
 
 ### Performance Considerations
 

✅ Accepted

Magic File Compatibility Status
View Suggested Changes
@@ -13,7 +13,7 @@
 The v0.4.x releases provide foundational file identification capabilities across data types, operators, offsets, nested rules, and string matching. While limited in scope compared to GNU file, these features establish the architectural patterns for future expansion.
 
 #### Data Types
-The evaluator currently supports seven basic data types with endianness variants:
+The evaluator currently supports eight basic data types with endianness variants:
 
 - **[Byte](https://github.com/EvilBit-Labs/libmagic-rs/blob/e925ef6b3f2208fc8805a728ba3de55956f4447a/src/evaluator/types.rs#L77-L85)**: Single byte values (8-bit signed or unsigned)
 - **[Short](https://github.com/EvilBit-Labs/libmagic-rs/blob/e925ef6b3f2208fc8805a728ba3de55956f4447a/src/evaluator/types.rs#L122-L147)**: 16-bit integers with native, little-endian, big-endian support (signed and unsigned)
@@ -22,6 +22,7 @@
 - **Date**: 32-bit Unix timestamps with endianness variants and UTC/local time formatting. Implementation in `src/evaluator/types/date.rs` uses `chrono` crate for timestamp formatting with format string `"%a %b %e %H:%M:%S %Y"` matching GNU file output.
 - **QDate**: 64-bit Unix timestamps with endianness variants and UTC/local time formatting. Shares formatting implementation with Date type for consistent output.
 - **[String](https://github.com/EvilBit-Labs/libmagic-rs/blob/e925ef6b3f2208fc8805a728ba3de55956f4447a/src/evaluator/types.rs#L275-L308)**: Null-terminated or length-limited strings with UTF-8 conversion using [SIMD-accelerated null scanning](https://github.com/EvilBit-Labs/libmagic-rs/blob/e925ef6b3f2208fc8805a728ba3de55956f4447a/src/evaluator/types.rs#L291-L298)
+- **PString**: Pascal-style length-prefixed strings where the first byte stores the string length (0-255), followed by that many bytes of string data. Implementation in `src/evaluator/types/string.rs` provides bounds checking for both the length byte and string data.
 
 #### Operators
 Eighteen operators are [fully implemented in evaluation](https://github.com/EvilBit-Labs/libmagic-rs/blob/e925ef6b3f2208fc8805a728ba3de55956f4447a/src/evaluator/operators.rs):
@@ -107,13 +108,13 @@
 - [Bitwise XOR (^), NOT (~), and any-value (x) operators implemented in PR #145](https://github.com/EvilBit-Labs/libmagic-rs/pull/145) - Resolves Issue #35. Required for advanced binary pattern matching and unconditional match patterns. Released in v0.4.0.
 
 ### Epic #54: Type System Expansion (v0.2.0 + v0.3.0)
-[Status: 7 of 33+ types implemented](https://github.com/EvilBit-Labs/libmagic-rs/issues/54). The type system expansion is split across two releases to manage code complexity.
+[Status: 8 of 33+ types implemented](https://github.com/EvilBit-Labs/libmagic-rs/issues/54). The type system expansion is split across two releases to manage code complexity.
 
 **Version 0.2.0 targets:**
 - [Quad (64-bit integer) with endian variants](https://github.com/EvilBit-Labs/libmagic-rs/issues/36) - **Implemented**. Supports signed and unsigned 64-bit integers with full endianness support for modern binary formats
 - [Date and timestamp types tracked in Issue #41](https://github.com/EvilBit-Labs/libmagic-rs/issues/54) - **Implemented**. Supports 32-bit (Date) and 64-bit (QDate) Unix timestamps with endianness variants and UTC/local time formatting
 - [Float and double with endian variants tracked in Issue #40](https://github.com/EvilBit-Labs/libmagic-rs/issues/54) - Required for scientific data and image metadata
-- [Pstring (Pascal string) tracked in Issue #43](https://github.com/EvilBit-Labs/libmagic-rs/issues/54) - Rare but needed for full compatibility
+- [Pstring (Pascal string) tracked in Issue #43](https://github.com/EvilBit-Labs/libmagic-rs/issues/54) - **Implemented**. Length-prefixed strings where the first byte stores the string length (0-255)
 
 **Version 0.3.0 targets:**
 - [Regex and search types tracked in Issue #39](https://github.com/EvilBit-Labs/libmagic-rs/issues/54) - **HIGH priority** for text file detection (JSON, scripts, XML)
@@ -191,7 +192,6 @@
 - [Relative offset resolution via Issue #38](https://github.com/EvilBit-Labs/libmagic-rs/issues/38) - Enables sequential field parsing in complex formats
 - [Regex and search type matching via Issue #39](https://github.com/EvilBit-Labs/libmagic-rs/issues/39) - Critical for detecting text files (JSON, XML, scripts)
 - Float and double numeric types with endianness variants
-- Pascal string type (pstring)
 - ZIP content inspection enhancements
 
 **Mandatory prerequisite refactoring:**

✅ Accepted

magic-format /libmagic-rs/blob/main/docs/src/magic-format.md
View Suggested Changes
@@ -209,7 +209,7 @@
 16      leqldate  x         \b, timestamp %s
 ```
 
-### String Type
+### String Types
 
 Match literal string data:
 
@@ -224,6 +224,20 @@
 - `\n` - newline
 - `\t` - tab
 - `\\` - backslash
+
+### Pascal String Type
+
+Pascal string (pstring) is a length-prefixed string type. The first byte contains the string length (0-255), followed by that many bytes of string data. Unlike C strings, Pascal strings are not null-terminated.
+
+```text
+0       pstring   =JPEG     JPEG image (Pascal string)
+```
+
+The evaluator reads the length byte, then reads that many bytes as string data. The optional max_length parameter caps the length byte value:
+
+```text
+0       pstring   x         \b, name: %s
+```
 
 ### String Flags (Not Yet Implemented)
 
@@ -535,7 +549,7 @@
 - Byte, short, long, quad types (8-bit, 16-bit, 32-bit, 64-bit integers)
 - Float and double types (32-bit and 64-bit IEEE 754 floating-point)
 - Date and qdate types (32-bit and 64-bit Unix timestamps)
-- String type
+- String and pstring types (null-terminated and length-prefixed strings)
 - Comparison operators (equal, not-equal, less-than, greater-than, less-equal, greater-equal)
 - Bitwise AND operator
 - Nested rules

✅ Accepted

MAGIC_FORMAT /libmagic-rs/blob/main/docs/MAGIC_FORMAT.md
View Suggested Changes
@@ -170,7 +170,7 @@
 8       uquad     >0x8000000000000000 (unsigned 64-bit check)
 ```
 
-### String Type
+### String Types
 
 Match literal string data:
 
@@ -185,6 +185,16 @@
 - `\n` - newline
 - `\t` - tab
 - `\\` - backslash
+
+**Pascal String (pstring)**
+
+Length-prefixed string type where the first byte contains the string length (0-255), followed by that many bytes of string data. Unlike C strings, Pascal strings are not null-terminated.
+
+```text
+0       pstring   =JPEG     JPEG image (Pascal string)
+```
+
+The length byte value determines how many bytes to read for the string data. If `max_length` is specified in the magic file (not shown in the basic syntax), it caps the length byte value to prevent reading excessive data.
 
 ### String Flags
 
@@ -525,7 +535,7 @@
 - Relative offsets
 - Indirect offsets (basic)
 - Byte, short, long, quad types (8-bit, 16-bit, 32-bit, 64-bit integers)
-- String type
+- String types (`string`, `pstring`)
 - Date and timestamp types (32-bit and 64-bit Unix timestamps)
 - Comparison operators (`=`, `!`, `<`, `>`, `<=`, `>=`)
 - Bitwise AND operator
@@ -542,6 +552,7 @@
 
 ### Recently Added
 
+- **Pascal string type**: `pstring` for length-prefixed strings
 - **Date/timestamp types**: `date` (32-bit) and `qdate` (64-bit) Unix timestamp types
 - **Comparison operators**: Full support for `<`, `>`, `<=`, `>=` operators
 - **Strength modifiers**: The `!:strength` directive for adjusting rule priority

✅ Accepted

parser /libmagic-rs/blob/main/docs/src/parser.md
View Suggested Changes
@@ -183,6 +183,46 @@
 - ✅ Comprehensive test coverage for all endianness variants and literal formats
 
 **Note:** Float and double types do **not** have signed/unsigned variants. IEEE 754 handles sign internally via the sign bit, so all float types use a single `TypeKind` variant with only an `endian` field (no `signed: bool` field).
+
+### Pascal String (pstring) Type
+
+The parser supports Pascal-style length-prefixed strings through the `pstring` keyword:
+
+**Type Keyword:**
+
+- `pstring` - Length-prefixed string (1-byte length + string data) → `TypeKind::PString { max_length: None }`
+
+**Format:**
+
+Pascal strings store the length as the first byte (0-255), followed by that many bytes of string data. Unlike C strings, they are not null-terminated.
+
+**Parser Implementation:**
+
+- Recognized by `parse_type_keyword()` in `src/parser/types.rs`
+- Maps to `TypeKind::PString` in the AST
+- Evaluator reads length prefix byte then that many bytes as string data
+- Stored as `Value::String` for comparison with string operators
+- Supports optional `max_length` field to cap the length byte value
+
+**Usage in Magic Rules:**
+
+```rust
+// Basic pstring matching
+0 pstring =Hello     // Match if pstring equals "Hello"
+0 pstring x          // Match any pstring value
+
+// With max_length constraint (parsed separately)
+0 pstring/64 x       // Limit string read to 64 bytes
+```
+
+**Features:**
+
+- ✅ Single type keyword `pstring`
+- ✅ Length-prefixed format (1 byte length, 0-255 bytes data)
+- ✅ Bounds checking for both length byte and string data
+- ✅ UTF-8 validation with replacement character for invalid sequences
+- ✅ Optional `max_length` parameter to limit string reads
+- ✅ String comparison operators work with pstring values
 
 ### Date and Timestamp Types
 

✅ Accepted

Type System And Operator Coverage
View Suggested Changes
@@ -85,6 +85,17 @@
 - **UTF-8 Handling**: Invalid UTF-8 sequences replaced with replacement character (U+FFFD)
 - **Return Type**: `Value::String(String)`
 - **Implementation**: [read_string function](https://github.com/EvilBit-Labs/libmagic-rs/blob/e925ef6b3f2208fc8805a728ba3de55956f4447a/src/evaluator/types.rs#L275-L308)
+
+#### PString Type
+- **Description**: Pascal-style length-prefixed strings where the first byte indicates the string length, followed by that many bytes of string data
+- **Size**: Variable (1 byte for length prefix + actual string bytes)
+- **Parameters**: `max_length: Option<usize>` caps the length byte value
+- **Behavior**: Reads length byte (0-255), then reads that many bytes as string data; not null-terminated
+- **Bounds Checking**: Validates both the length byte is readable and that the full string data is within bounds
+- **UTF-8 Handling**: Invalid UTF-8 sequences replaced with replacement character (U+FFFD)
+- **Return Type**: `Value::String(String)`
+- **Implementation**: [read_pstring function](https://github.com/EvilBit-Labs/libmagic-rs/blob/main/src/evaluator/types/string.rs)
+- **Comparison Operators**: Supports all string comparison operators (=, !=, <, >, <=, >=)
 
 ### Endianness Support
 
@@ -244,7 +255,7 @@
 
 **Status**: Released and [published on crates.io](https://github.com/EvilBit-Labs/libmagic-rs/blob/e925ef6b3f2208fc8805a728ba3de55956f4447a/README.md#L24)
 
-**Type Coverage**: 19 of ~33 types (58%)
+**Type Coverage**: 20 of ~33 types (61%)
 - ✅ Byte
 - ✅ Short (signed/unsigned, all endianness variants)
 - ✅ Long (signed/unsigned, all endianness variants)
@@ -254,6 +265,7 @@
 - ✅ Date (32-bit Unix timestamp, all endianness variants, UTC/local time)
 - ✅ QDate (64-bit Unix timestamp, all endianness variants, UTC/local time)
 - ✅ String (with optional max_length)
+- ✅ PString (Pascal string with optional max_length)
 
 **Operator Coverage**: 4 of ~13 operators (31%)
 - ✅ Equal (=, ==)
@@ -324,11 +336,10 @@
 
 **Planned New Types**:
 - **Regex and Search**: Pattern matching for text detection ([Issue #39](https://github.com/EvilBit-Labs/libmagic-rs/issues/39))
-- **Pascal String**: Length-prefixed string type ([Issue #43](https://github.com/EvilBit-Labs/libmagic-rs/issues/43))
 - **Meta-Types**: `default`, `clear`, `name`, `use`, `indirect` ([Issue #42](https://github.com/EvilBit-Labs/libmagic-rs/issues/42))
 
 **Coverage**:
-- Types: 19 of ~33 (58%)
+- Types: 20 of ~33 (61%)
 - Operators: 11 of ~13 (85%)
 - Offsets: 5 of 5 (100%)
 
@@ -419,7 +430,7 @@
 | | `ledouble` | 8 bytes | Little | ✅ Implemented | v0.1.0 |
 | | `bedouble` | 8 bytes | Big | ✅ Implemented | v0.1.0 |
 | **String** | `string` | Variable | N/A | ✅ Implemented | v0.1.x |
-| | `pstring` | Variable | N/A | 📋 Planned | v0.4.0 |
+| | `pstring` | Variable | N/A | ✅ Implemented | v0.5.0+ |
 | **Date (32-bit)** | `date` | 4 bytes | Native UTC | ✅ Implemented | v0.5.0+ |
 | | `ldate` | 4 bytes | Native Local | ✅ Implemented | v0.5.0+ |
 | | `bedate` | 4 bytes | Big UTC | ✅ Implemented | v0.5.0+ |
@@ -442,10 +453,10 @@
 | | `use` | N/A | N/A | 📋 Planned | v0.4.0 |
 | | `indirect` | N/A | N/A | 📋 Planned | v0.4.0 |
 
-**Current Coverage**: 19 of 33 types (58%)
+**Current Coverage**: 20 of 33 types (61%)
 **v0.3.0 Status**: 7 of 33 types (21%)
 **v0.4.0 Status**: 7 of 33 types (21%)
-**v0.5.0+ Status**: 19 of 33 types (58%)
+**v0.5.0+ Status**: 20 of 33 types (61%)
 **v1.0.0 Target**: ~31 of 33 types (94%)
 
 ### Complete Operator Inventory
@@ -498,7 +509,8 @@
 | File Path | Description | Key Components |
 |-----------|-------------|----------------|
 | [src/parser/ast.rs](https://github.com/EvilBit-Labs/libmagic-rs/blob/e925ef6b3f2208fc8805a728ba3de55956f4447a/src/parser/ast.rs) | AST data structures | TypeKind enum, Operator enum, OffsetSpec enum, Endianness enum |
-| [src/evaluator/types.rs](https://github.com/EvilBit-Labs/libmagic-rs/blob/e925ef6b3f2208fc8805a728ba3de55956f4447a/src/evaluator/types.rs) | Type reading implementation | read_byte, read_short, read_long, read_quad, read_float, read_double, read_date, read_qdate, read_string, read_typed_value |
+| [src/evaluator/types.rs](https://github.com/EvilBit-Labs/libmagic-rs/blob/e925ef6b3f2208fc8805a728ba3de55956f4447a/src/evaluator/types.rs) | Type reading implementation | read_byte, read_short, read_long, read_quad, read_float, read_double, read_date, read_qdate, read_string, read_pstring, read_typed_value |
+| [src/evaluator/types/string.rs](https://github.com/EvilBit-Labs/libmagic-rs/blob/main/src/evaluator/types/string.rs) | String type implementation | read_string, read_pstring |
 | [src/evaluator/types/date.rs](https://github.com/EvilBit-Labs/libmagic-rs/blob/main/src/evaluator/types/date.rs) | Date/timestamp implementation | read_date, read_qdate, format_unix_timestamp_64, local_utc_offset_secs |
 | [src/evaluator/operators.rs](https://github.com/EvilBit-Labs/libmagic-rs/blob/e925ef6b3f2208fc8805a728ba3de55956f4447a/src/evaluator/operators.rs) | Operator evaluation | apply_equal, apply_not_equal, apply_bitwise_and, apply_operator |
 | [src/evaluator/offset.rs](https://github.com/EvilBit-Labs/libmagic-rs/blob/e925ef6b3f2208fc8805a728ba3de55956f4447a/src/evaluator/offset.rs) | Offset resolution | resolve_offset, resolve_absolute_offset, OffsetError |

✅ Accepted

Note: You must be authenticated to accept/decline updates.

How did I do? Any feedback?  Join Discord

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Pascal-style length-prefixed string support (pstring) across the parser and evaluator so magic rules can match length-prefixed strings in binary formats.

Changes:

  • Extend TypeKind with PString and wire it through parsing (pstring keyword) and codegen serialization.
  • Implement evaluator reading for PString and include it in typed-value dispatch + strength calculation.
  • Add unit/integration/property tests plus documentation updates for the new type.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/parser/ast.rs Adds TypeKind::PString and updates bit_width() handling.
src/parser/types.rs Parses pstring keyword and maps it to TypeKind::PString.
src/parser/codegen.rs Adds PString support to serialize_type_kind().
src/parser/grammar/mod.rs Updates type docs to mention pstring.
src/parser/grammar/tests.rs Adds parser tests for pstring parsing and rule parsing.
src/evaluator/types/string.rs Implements read_pstring() and adds unit tests.
src/evaluator/types/mod.rs Dispatches TypeKind::PString via read_typed_value() and re-exports read_pstring.
src/evaluator/strength.rs Includes PString in default strength calculation alongside String.
src/evaluator/engine/tests.rs Adds engine tests verifying PString rule matching behavior.
tests/evaluator_tests.rs Adds integration-style tests for PString evaluation.
tests/property_tests.rs Includes PString in TypeKind generation strategy.
src/build_helpers.rs Adds serialization tests for TypeKind::PString.
AGENTS.md Documents pstring as implemented type support.
.github/copilot-instructions.md Updates project status/docs to reflect current capabilities and pstring support.
mise.lock Regenerated lockfile content (tooling metadata).

Comment thread src/evaluator/types/string.rs Outdated
Comment thread src/parser/types.rs Outdated
- Add usize::MAX offset overflow test for read_pstring (checked_add
  regression guard)
- Add test for max_length capping when buffer has less data than
  length byte claims
- Assert specific TypeReadError variants instead of just is_err()
- Document pstring/B, /H, /L suffix limitation in AGENTS.md and
  parser code
- Add TODO for debug-level trace on skipped rules in evaluate_rules

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
@dosubot dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. size:XXL This PR changes 1000+ lines, ignoring generated files. labels Mar 9, 2026
unclesp1d3r and others added 2 commits March 9, 2026 02:49
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Copilot AI review requested due to automatic review settings March 9, 2026 06:51
unclesp1d3r and others added 7 commits March 9, 2026 02:52
Clarify that read_pstring validates bounds against the capped length
(after max_length), not the raw length byte. This matches GNU file
behavior where max_length handles truncated data scenarios.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 22 changed files in this pull request and generated 4 comments.

Comment on lines +2037 to +2040
let (remaining, (typ, op)) = parse_type_and_operator("pstring& ").unwrap();
assert_eq!(remaining, "");
assert_eq!(typ, TypeKind::PString { max_length: None });
assert_eq!(op, Some(Operator::BitwiseAnd));
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_type_and_operator("pstring& ") currently parses an attached & as Operator::BitwiseAnd for pstring. Since the evaluator’s bitwise operators only apply to integer Values (and return false for strings), this allows nonsensical rules to parse but never match, which is hard to diagnose. Consider rejecting attached & / &mask operators for string/pstring types in parse_type_and_operator/parse_magic_rule, and update this test accordingly (expect an error instead of Some(BitwiseAnd)).

Suggested change
let (remaining, (typ, op)) = parse_type_and_operator("pstring& ").unwrap();
assert_eq!(remaining, "");
assert_eq!(typ, TypeKind::PString { max_length: None });
assert_eq!(op, Some(Operator::BitwiseAnd));
let result = parse_type_and_operator("pstring& ");
assert!(
result.is_err(),
"attached '&' for pstring should be rejected as an invalid operator"
);

Copilot uses AI. Check for mistakes.
- **Types**: `byte`, `short`, `long`, `quad`, `float`, `double`, `string`, `pstring` with endianness support; unsigned variants `ubyte`, `ushort`/`ubeshort`/`uleshort`, `ulong`/`ubelong`/`ulelong`, `uquad`/`ubequad`/`ulequad`; float/double endian variants `befloat`/`lefloat`, `bedouble`/`ledouble`; 32-bit date/timestamp types `date`/`ldate`/`bedate`/`beldate`/`ledate`/`leldate`; 64-bit date/timestamp types `qdate`/`qldate`/`beqdate`/`beqldate`/`leqdate`/`leqldate`
- **Operators**: `=` (equal), `!=` (not equal), `<` (less than), `>` (greater than), `<=` (less equal), `>=` (greater equal), `&` (bitwise AND with optional mask), `^` (bitwise XOR), `~` (bitwise NOT), `x` (any value)
- **Nesting**: Hierarchical rules with proper indentation handling
- **String Matching**: Exact string matching with null-termination
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Supported Syntax” section lists pstring as a supported type, but the “String Matching” bullet still only mentions null-terminated strings. Updating that bullet to also mention Pascal (length-prefixed) strings would keep the documentation consistent with the implemented feature set.

Suggested change
- **String Matching**: Exact string matching with null-termination
- **String Matching**: Exact string matching for null-terminated (`string`) and Pascal/length-prefixed (`pstring`) strings

Copilot uses AI. Check for mistakes.
Comment on lines +152 to +158
};

// Check if we have enough bytes for the (possibly capped) string data
let string_start = offset.checked_add(1).ok_or(TypeReadError::BufferOverrun {
offset,
buffer_len: buffer.len(),
})?;
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read_pstring reports TypeReadError::BufferOverrun { offset: usize::MAX, ... } when string_start.checked_add(actual_length) overflows. Using usize::MAX makes the error location misleading and breaks consistency with other buffer-overrun errors. Consider returning the original offset/string_start, or introduce a dedicated overflow error variant so callers can distinguish arithmetic overflow from an actual buffer overrun.

Copilot uses AI. Check for mistakes.
Comment on lines +105 to +107
/// When `max_length` is set, bounds are validated against the capped length, not the
/// raw length byte. This matches GNU `file` behavior: `max_length` is intended to
/// handle cases where the length byte may reference more data than actually exists.
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The security docs for read_pstring say “The length byte is validated against the remaining buffer size before reading”, but the implementation validates the capped length (min(length_byte, max_length)) against the buffer, and can succeed even if the original length byte exceeds remaining data (when max_length is smaller). Either adjust the docs to match the behavior, or change the code to treat an out-of-bounds length byte as an error regardless of max_length.

Suggested change
/// When `max_length` is set, bounds are validated against the capped length, not the
/// raw length byte. This matches GNU `file` behavior: `max_length` is intended to
/// handle cases where the length byte may reference more data than actually exists.
/// The raw length byte is validated against the remaining buffer size before any
/// string data is read. When `max_length` is set, the number of bytes actually read
/// is capped to `min(length_byte, max_length)` after this bounds check.

Copilot uses AI. Check for mistakes.
@unclesp1d3r
Copy link
Copy Markdown
Member Author

@Mergifyio queue

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Mar 9, 2026

Merge Queue Status


  • 🟠 Waiting for queue conditions
  • ⬜ Enter queue
  • ⬜ Run checks
  • ⬜ Merge
Required conditions to enter a queue
  • -closed [📌 queue requirement]
  • -conflict [📌 queue requirement]
  • -draft [📌 queue requirement]
  • any of [📌 queue -> configuration change requirements]:
    • -mergify-configuration-changed
    • check-success = Configuration changed
  • any of [📌 queue requirement]:
    • check-success = Mergify Merge Protections
    • check-neutral = Mergify Merge Protections
    • check-skipped = Mergify Merge Protections
  • any of [🔀 queue conditions]:
    • all of [📌 queue conditions of queue default]:
      • all of [🛡 Merge Protections rule CI must pass]:
        • check-success = coverage
        • check-success = quality
        • check-success = test
        • check-success = test-cross-platform (macos-latest, macOS)
        • check-success = test-cross-platform (ubuntu-22.04, Linux)
        • check-success = test-cross-platform (ubuntu-latest, Linux)
        • check-success = test-cross-platform (windows-latest, Windows)
      • all of [🛡 Merge Protections rule Do not merge outdated PRs]:
        • #commits-behind <= 3
      • any of [🛡 GitHub repository ruleset rule main]:
        • check-success = DCO
        • check-neutral = DCO
        • check-skipped = DCO
      • any of [🛡 GitHub repository ruleset rule main]:
        • check-success = Mergify Merge Protections
        • check-neutral = Mergify Merge Protections
        • check-skipped = Mergify Merge Protections

@mergify mergify Bot added the queued label Mar 9, 2026
@mergify mergify Bot merged commit 7564bed into main Mar 9, 2026
28 of 29 checks passed
@mergify mergify Bot deleted the 43-parser-implement-pstring-pascal-string-type branch March 9, 2026 07:02
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Mar 9, 2026

Merge Queue Status

Rule: default


  • Entered queue2026-03-09 07:02 UTC
  • Checks passed · in-place
  • Merged2026-03-09 07:02 UTC · at d0215558009854608dc6907a36d6df77c22e263c

This pull request spent 7 seconds in the queue, with no time running CI.

Required conditions to merge
  • all of [🛡 Merge Protections rule CI must pass]:
    • check-success = coverage
    • check-success = quality
    • check-success = test
    • check-success = test-cross-platform (macos-latest, macOS)
    • check-success = test-cross-platform (ubuntu-22.04, Linux)
    • check-success = test-cross-platform (ubuntu-latest, Linux)
    • check-success = test-cross-platform (windows-latest, Windows)
  • all of [🛡 Merge Protections rule Do not merge outdated PRs]:
  • any of [🛡 GitHub repository ruleset rule main]:
    • check-success = DCO
    • check-neutral = DCO
    • check-skipped = DCO
  • any of [🛡 GitHub repository ruleset rule main]:
    • check-success = Mergify Merge Protections
    • check-neutral = Mergify Merge Protections
    • check-skipped = Mergify Merge Protections

@mergify mergify Bot removed the queued label Mar 9, 2026
@github-actions github-actions Bot mentioned this pull request Apr 25, 2026
unclesp1d3r pushed a commit that referenced this pull request Apr 25, 2026
## 🤖 New release

* `libmagic-rs`: 0.5.0 -> 0.6.0 (⚠ API breaking changes)

### ⚠ `libmagic-rs` breaking changes

```text
--- failure constructible_struct_adds_field: externally-constructible struct adds field ---

Description:
A pub struct constructible with a struct literal has a new pub field. Existing struct literals must be updated to include the new field.
        ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/constructible_struct_adds_field.ron

Failed in:
  field MagicRule.value_transform in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:1189
  field MagicRule.value_transform in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:1189
  field MagicRule.value_transform in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:1189

--- failure copy_impl_added: type now implements Copy ---

Description:
A public type now implements Copy, causing non-move closures to capture it by reference instead of moving it.
        ref: rust-lang/rust#100905
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/copy_impl_added.ron

Failed in:
  libmagic_rs::mime::MimeMapper in /tmp/.tmpwFvgw1/libmagic-rs/src/mime.rs:98

--- failure enum_marked_non_exhaustive: enum marked #[non_exhaustive] ---

Description:
A public enum has been marked #[non_exhaustive]. Pattern-matching on it outside of its crate must now include a wildcard pattern like `_`, or it will fail to compile.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#attr-adding-non-exhaustive
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/enum_marked_non_exhaustive.ron

Failed in:
  enum OffsetSpec in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:198
  enum OffsetSpec in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:198
  enum OffsetSpec in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:198
  enum LibmagicError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:15
  enum LibmagicError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:15
  enum IoError in /tmp/.tmpwFvgw1/libmagic-rs/src/io/mod.rs:26
  enum Operator in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:838
  enum Operator in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:838
  enum Operator in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:838
  enum TypeReadError in /tmp/.tmpwFvgw1/libmagic-rs/src/evaluator/types/mod.rs:56
  enum ParseError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:74
  enum ParseError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:74
  enum Value in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:965
  enum Value in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:965
  enum Value in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:965
  enum TypeKind in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:398
  enum TypeKind in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:398
  enum TypeKind in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:398
  enum EvaluationError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:148
  enum EvaluationError in /tmp/.tmpwFvgw1/libmagic-rs/src/error.rs:148

--- failure enum_struct_variant_field_added: pub enum struct variant field added ---

Description:
An enum's exhaustive struct variant has a new field, which has to be included when constructing or matching on this variant.
        ref: https://doc.rust-lang.org/reference/attributes/type_system.html#the-non_exhaustive-attribute
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/enum_struct_variant_field_added.ron

Failed in:
  field base_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:251
  field adjustment_op of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:266
  field result_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:272
  field base_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:251
  field adjustment_op of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:266
  field result_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:272
  field base_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:251
  field adjustment_op of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:266
  field result_relative of variant OffsetSpec::Indirect in /tmp/.tmpwFvgw1/libmagic-rs/src/parser/ast.rs:272

--- failure function_missing: pub fn removed or renamed ---

Description:
A publicly-visible function cannot be imported by its prior path. A `pub use` may have been removed, or the function itself may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/function_missing.ron

Failed in:
  function libmagic_rs::parser::grammar::is_empty_line, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:1025
  function libmagic_rs::parser::grammar::parse_strength_directive, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:846
  function libmagic_rs::parser::grammar::parse_type_and_operator, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:683
  function libmagic_rs::parser::grammar::parse_offset, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:179
  function libmagic_rs::parser::parse_offset, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:179
  function libmagic_rs::parser::grammar::parse_comment, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:1004
  function libmagic_rs::parser::grammar::parse_message, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:810
  function libmagic_rs::parser::grammar::parse_value, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:633
  function libmagic_rs::parser::grammar::parse_number, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:133
  function libmagic_rs::parser::parse_number, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:133
  function libmagic_rs::parser::grammar::has_continuation, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:1060
  function libmagic_rs::parser::grammar::parse_magic_rule, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:946
  function libmagic_rs::parser::grammar::parse_rule_offset, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:779
  function libmagic_rs::parser::grammar::is_comment_line, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:1042
  function libmagic_rs::parser::grammar::is_strength_directive, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:902
  function libmagic_rs::parser::grammar::parse_type, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:749
  function libmagic_rs::parser::grammar::parse_operator, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:227

--- failure function_parameter_count_changed: pub fn parameter count changed ---

Description:
A publicly-visible function now takes a different number of parameters.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#fn-change-arity
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/function_parameter_count_changed.ron

Failed in:
  libmagic_rs::evaluator::evaluate_single_rule now takes 3 parameters instead of 2, in /tmp/.tmpwFvgw1/libmagic-rs/src/evaluator/engine/mod.rs:196

--- failure inherent_method_missing: pub method removed or renamed ---

Description:
A publicly-visible method or associated fn is no longer available under its prior name. It may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/inherent_method_missing.ron

Failed in:
  FileBuffer::create_symlink, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/io/mod.rs:326
  EvaluationContext::increment_recursion_depth, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/evaluator/mod.rs:114
  EvaluationContext::decrement_recursion_depth, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/evaluator/mod.rs:130
  EvaluationContext::increment_recursion_depth, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/evaluator/mod.rs:114
  EvaluationContext::decrement_recursion_depth, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/evaluator/mod.rs:130

--- failure module_missing: pub module removed or renamed ---

Description:
A publicly-visible module cannot be imported by its prior path. A `pub use` may have been removed, or the module may have been renamed, removed, or made non-public.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/module_missing.ron

Failed in:
  mod libmagic_rs::parser::grammar, previously in file /tmp/.tmphvgzOh/libmagic-rs/src/parser/grammar/mod.rs:4

--- failure struct_marked_non_exhaustive: struct marked #[non_exhaustive] ---

Description:
A public struct has been marked #[non_exhaustive], which will prevent it from being constructed using a struct literal outside of its crate. It previously had no private fields, so a struct literal could be used to construct it outside its crate.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#attr-adding-non-exhaustive
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/struct_marked_non_exhaustive.ron

Failed in:
  struct EvaluationConfig in /tmp/.tmpwFvgw1/libmagic-rs/src/config.rs:42
```

<details><summary><i><b>Changelog</b></i></summary><p>

<blockquote>

## [0.6.0] - 2026-04-25

### Features

- **parser**: Add Date and QDate types with serialization support
([#165](#165))
- **parser**: Implement pstring (Pascal string) type
([#170](#170))
- **parser**: Implement pstring multi-byte length prefix variants (/B,
/H, /h, /L, /l, /J)
([#183](#183))
- **evaluator**: Add debug-level tracing for skipped rules
([#184](#184))
- **evaluator**: Implement indirect offset resolution
([#37](#37))
([#199](#199))
- **evaluator**: Implement relative offset resolution
([#38](#38))
([#211](#211))
- **deps**: Add new skills to actionbook/rust-skills and
trailofbits/skills
- **evaluator**: Regex and search types (closes #39)
([#214](#214))
- Implement libmagic meta-type directives and format substitution
([#42](#42))
([#230](#230))

### Bug Fixes

- **regex**: PR #214 follow-up review findings
([#215](#215))
- Load and correctly evaluate /usr/share/file/magic/filesystems and
adjacent magic files
([#233](#233))

### Documentation

- **gotchas**: Clarify requirements for adding TypeKind variants

### Miscellaneous Tasks

- Rename .coderabbitai.yaml to .coderabbit.yaml
- **Mergify**: Configuration update
([#173](#173))
- Update .gitignore to exclude local AI assistant files
- **mergify**: Upgrade configuration to current format
([#205](#205))
- Resolve all pending TODO items
([#212](#212))
- **mergify**: Upgrade configuration to current format
([#231](#231))
<!-- generated by git-cliff -->

### Security

- **io**: Close TOCTOU race in `FileBuffer::new` metadata validation
(CWE-367). `validate_file_metadata` now uses `File::metadata()` on the
open descriptor instead of re-canonicalizing the path, so an attacker
cannot swap the path between `open_file` and validation. Error paths now
report the caller-supplied path rather than the canonicalized variant.
- **cli**: Remove relative-path fallbacks from `default_magic_file_path`
(CWE-426). `./missing.magic`, `./third_party/magic.mgc`, and the
`CI`/`GITHUB_ACTIONS` env-var branch no longer resolve against the
process cwd. CI pipelines must pass `--magic <path>` explicitly.
- **evaluator**: `build_regex` now bounds `size_limit` and
`dfa_size_limit` to 1 MiB (`REGEX_COMPILE_SIZE_LIMIT`) to reject
compile-time DoS patterns (CWE-1333) from adversarial magic files.

### Features

- **parser**: Implement meta-type directives: `name`/`use` subroutines,
`default`/`clear` per-level fallback, and `indirect` re-evaluation.
`parse_text_magic_file` now returns `ParsedMagic { rules, name_table }`
(breaking change from `Vec<MagicRule>`). Named subroutines are hoisted
into `NameTable` at load time and dispatched via `RuleEnvironment` in
the evaluator. Recursion is bounded by
`EvaluationConfig::max_recursion_depth`. Resolves
[#42](#42).
- **evaluator**: Thread-local regex compile cache eliminates the
double-compile paid by every successful regex match.
`regex_bytes_consumed` now reuses the compiled `Regex` from `read_regex`
instead of recompiling the pattern to derive the anchor advance. The
cache is reset at the start of every `evaluate_rules_with_config` call,
bounding memory to one evaluation.
- **config**: `EvaluationConfig` is now `#[non_exhaustive]`; new
builder-style setters (`with_max_recursion_depth`,
`with_max_string_length`, `with_stop_at_first_match`, `with_mime_types`,
`with_timeout_ms`) let external crates construct configurations without
struct literals.
- **parser**: `MagicRule::new()` smart constructor with
`::with_children()`, `::with_strength_modifier()`, `::with_level()`
builder methods and a `::validate()` method enforcing structural
invariants (non-empty message, `level <= MAX_LEVEL`, children nested
strictly deeper than parent). New `MagicRuleValidationError` error type.
- **parser**: `RegexFlags::with_case_insensitive()` and
`::with_start_offset()` builder methods.

### Refactor

- **engine**: Extract `evaluate_pattern_rule()` and
`evaluate_value_rule()` helpers from
`evaluate_single_rule_with_anchor`'s 90-line body. Dispatch is now a
two-arm type-category split; each helper has focused rustdoc on
semantics and invariants.
- **types**: Replace the `_ =>` catch-all in
`bytes_consumed_with_pattern` with an explicit listing of the
fixed-width `TypeKind` variants. Adding a new variable-width variant
without updating this match is now a compile error instead of a silent
relative-offset anchor corruption in release builds.
- **parser**: Split the 185-line `type_keyword_to_kind` match into
per-family helpers (`byte_family`, `short_family`, `long_family`,
`quad_family`, `float_family`, `double_family`, `date_family`,
`qdate_family`, `string_family`). Drops the
`#[allow(clippy::too_many_lines)]` attribute.
- **main**: `main()` returns `std::process::ExitCode` instead of calling
`process::exit`, so destructors run on the happy path. Ctrl-C
`AtomicBool` flag uses `Ordering::Relaxed` instead of `SeqCst`.
- **grammar**: `parse_strength_directive` uses nom 8's `preceded` +
`Parser::map` instead of the legacy `map(pair(char(...), parse_number),
|(_, n)| ...)` pattern.
- **output**: Add `#[serde(skip_serializing_if = "Option::is_none",
default)]` to public `Option<T>` fields so JSON output no longer emits
`"field": null` for unset optional values.

### Documentation

- **lib**: Add `# Security` sections to
`MagicDatabase::with_builtin_rules`, `::with_builtin_rules_and_config`,
`::load_from_file`, and `::load_from_file_with_config` warning about the
unbounded default timeout and recommending
`EvaluationConfig::performance()` for untrusted input.
- **lib**: Document `MagicDatabase: Send + Sync` for parallel scanning.
- **README**: Update `TypeKind` enum example to match the current AST,
add `regex` and `search/N` to the supported types table, add pre-1.0 API
stability warning, correct the roadmap to mark v0.2-v0.4 as shipped.
- **AGENTS.md**: Relabel "Currently Implemented (v0.1.0)" and "Current
Limitations (v0.1.0)" to v0.5.0 and rewrite the Development Phases
section to reflect actual shipped scope.

### Testing

- Security regression tests for S-H1 (planted-magic-file in cwd), S-H2
(TOCTOU path-swap contract), S-M2 (pathological regex bounded runtime),
S-L2 (codegen message escape round-trip), and GOTCHAS S13.1
(`EvaluationConfig::default()` unbounded timeout invariant).
- Backspace message concatenation regression tests for first-match,
consecutive, and empty-rest edge cases.
- `MagicRule::validate()` tests covering empty message, child level
invariant, and max-depth rejection.
- `RegexCache` population/clear/reuse tests.

### Breaking Changes

- **parser**: `parse_text_magic_file` return type changed from
`Result<Vec<MagicRule>, ParseError>` to `Result<ParsedMagic,
ParseError>`. Callers must destructure `ParsedMagic { rules, name_table
}`. Low-level callers that only need the rule list can use
`parsed.rules`. `load_magic_file` and `load_magic_directory` return the
same new type.
</blockquote>


</p></details>

---
This PR was generated with
[release-plz](https://github.com/release-plz/release-plz/).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request evaluator Rule evaluation engine and logic parser Magic file parsing components and grammar size:XL This PR changes 500-999 lines, ignoring generated files. testing Test infrastructure and coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parser: implement pstring (Pascal string) type

2 participants