feat(parser): implement comparison operators#104
Conversation
|
Caution Review failedFailed to post review comments Summary by CodeRabbit
WalkthroughTypeKind variants now include explicit signedness/endian fields (e.g., Changes
Sequence DiagramsequenceDiagram
participant Client as Rule Input
participant Parser as Parser (grammar.rs)
participant AST as AST (ast.rs)
participant Evaluator as Evaluator (operators.rs)
participant TypeSys as Type Reader (types.rs)
participant Result as Match Result
Client->>Parser: parse rule "0 ubyte 0x10 > 5"
Parser->>AST: emit TypeKind::Byte { signed: false } and Operator::GreaterThan
AST->>Evaluator: evaluate rule (operator + type)
Evaluator->>TypeSys: read_typed_value(buffer, TypeKind::Byte { signed: false })
TypeSys->>TypeSys: read_byte(..., signed=false) -> Value::Uint
TypeSys->>Evaluator: return Value
Evaluator->>Evaluator: compare_values(left, right) -> Some(Ordering)
Evaluator->>Result: return match / no match
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 📃 Configuration Change RequirementsWonderful, this rule succeeded.Mergify configuration change
🟢 CI must passWonderful, this rule succeeded.All CI checks must pass. This protection prevents manual merges that bypass the merge queue.
🟢 Do not merge outdated PRsWonderful, this rule succeeded.Make sure PRs are within 10 commits of the base branch before merging
|
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…ypes Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
b441249 to
9c9ab52
Compare
|
Documentation Updates 6 document(s) were updated by changes in this PR: architectureView Changes@@ -63,7 +63,7 @@
- ✅ **Number parsing**: Decimal and hexadecimal with overflow protection
- ✅ **Offset parsing**: Absolute offsets with comprehensive validation
-- ✅ **Operator parsing**: Equality, inequality, and bitwise AND operators
+- ✅ **Operator parsing**: Equality (`=`, `==`), inequality (`!=`, `<>`), comparison (`<`, `>`, `<=`, `>=`), and bitwise AND (`&`) operators
- ✅ **Value parsing**: Strings, numbers, and hex byte sequences with escape sequences
- ✅ **Error handling**: Comprehensive nom error handling with meaningful messages
- ✅ **Rule parsing**: Complete rule parsing via `parse_magic_rule()`
@@ -88,6 +88,24 @@
pub children: Vec<MagicRule>, // Nested rules
pub level: u32, // Indentation level
}
+
+pub enum TypeKind {
+ Byte { signed: bool }, // Single byte with explicit signedness
+ Short { endian: Endianness, signed: bool },
+ Long { endian: Endianness, signed: bool },
+ String { max_length: Option<usize> },
+}
+
+pub enum Operator {
+ Equal, // = or ==
+ NotEqual, // != or <>
+ LessThan, // <
+ GreaterThan, // >
+ LessEqual, // <=
+ GreaterEqual, // >=
+ BitwiseAnd, // &
+ BitwiseAndMask(u64), // & with mask
+}
```
**Design Principles:**
@@ -96,6 +114,7 @@
- **Serializable**: Full serde support for caching
- **Self-contained**: No external dependencies in AST nodes
- **Type-safe**: Rust's type system prevents invalid rule combinations
+- **Explicit signedness**: `TypeKind::Byte` and integer types distinguish signed from unsigned interpretations
### 3. Evaluator Module (`src/evaluator/`)
@@ -105,7 +124,7 @@
- `mod.rs`: Main evaluation engine with `EvaluationContext` and `MatchResult`
- `offset.rs`: Offset resolution (absolute, relative, from-end)
-- `types.rs`: Type interpretation with endianness handling
+- `types.rs`: Type interpretation with endianness handling and signedness coercion
- `operators.rs`: Comparison and bitwise operations
**Implemented Features:**
@@ -117,6 +136,8 @@
- ✅ **Graceful Degradation**: Skip problematic rules, continue evaluation
- ✅ **Timeout Protection**: Configurable time limits
- ✅ **Recursion Limiting**: Prevent stack overflow from deep nesting
+- ✅ **Signedness Coercion**: Automatic value coercion for signed type comparisons (e.g., `0xff` → `-1` for signed byte)
+- ✅ **Comparison Operators**: Full support for `<`, `>`, `<=`, `>=` with numeric and lexicographic ordering
- 📋 **Indirect Offsets**: Pointer dereferencing (planned)
### 4. I/O Module (`src/io/`)
@@ -217,8 +238,8 @@
```mermaid
flowchart TD
R[Root Rule<br/>e.g., "0 string PK"]
- R -->|match| C1[Child Rule 1<br/>e.g., ">4 byte 0x14"]
- R -->|match| C2[Child Rule 2<br/>e.g., ">4 byte 0x06"]
+ R -->|match| C1[Child Rule 1<br/>e.g., ">4 ubyte 0x14"]
+ R -->|match| C2[Child Rule 2<br/>e.g., ">4 ubyte 0x06"]
C1 -->|match| G1[Grandchild<br/>ZIP archive v2.0]
C2 -->|match| G2[Grandchild<br/>ZIP archive v1.0]
@@ -228,6 +249,20 @@
style G1 fill:#c8e6c9
style G2 fill:#c8e6c9
```
+
+**Operator Support:**
+
+The evaluator supports all comparison and bitwise operators:
+
+- **Equality**: `=` or `==` (exact match)
+- **Inequality**: `!=` or `<>` (not equal)
+- **Less-than**: `<` (numeric or lexicographic)
+- **Greater-than**: `>` (numeric or lexicographic)
+- **Less-equal**: `<=` (numeric or lexicographic)
+- **Greater-equal**: `>=` (numeric or lexicographic)
+- **Bitwise AND**: `&` (bit pattern matching)
+
+Comparison operators support both numeric comparisons (with automatic type coercion between signed and unsigned integers via `i128`) and lexicographic comparisons for strings and byte sequences.
### Memory-Safe Buffer Access
Comparison Operators And Compare_Values Helper PatternView Changes@@ -2,19 +2,19 @@
## Overview
-The Comparison Operators And Compare_Values Helper Pattern is a proposed reusable architectural pattern for implementing comparison operators in [libmagic-rs](https://github.com/EvilBit-Labs/libmagic-rs). The pattern uses a shared `compare_values` helper function that handles all `Value` type pairs with cross-type coercion, consolidating duplicated type-matching logic. This architectural approach avoids code duplication across the four comparison operator implementations (`<`, `>`, `<=`, `>=`) by delegating to a single source of truth for type-pair matching and ordering logic.
-
-The pattern addresses [Issue #34](https://github.com/EvilBit-Labs/libmagic-rs/issues/34), which requests comparison operators to improve magic file compatibility. According to the issue, comparison operators would unlock approximately 40% more magic file compatibility and are critical for version checks, size validation, and range matching in magic rules. Magic files frequently use comparison operators to detect file format versions, validate size constraints, and perform range-based matching.
-
-The existing codebase provides the foundation through [cross-type integer coercion using `i128` in `apply_equal`](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/evaluator/operators.rs#L62-L64), which safely compares `u64` and `i64` values. The proposed `compare_values` helper extends this pattern to return `Option<Ordering>` instead of `bool`, enabling all four comparison operators to share identical type-handling logic while implementing operator-specific ordering semantics.
+The Comparison Operators And Compare_Values Helper Pattern is a reusable architectural pattern for implementing comparison operators in [libmagic-rs](https://github.com/EvilBit-Labs/libmagic-rs). The pattern uses a shared `compare_values` helper function that handles all `Value` type pairs with cross-type coercion, consolidating duplicated type-matching logic. This architectural approach avoids code duplication across the four comparison operator implementations (`<`, `>`, `<=`, `>=`) by delegating to a single source of truth for type-pair matching and ordering logic.
+
+The pattern addresses [Issue #34](https://github.com/EvilBit-Labs/libmagic-rs/issues/34), which requests comparison operators to improve magic file compatibility. According to the issue, comparison operators unlock approximately 40% more magic file compatibility and are critical for version checks, size validation, and range matching in magic rules. Magic files frequently use comparison operators to detect file format versions, validate size constraints, and perform range-based matching.
+
+The implementation builds on the foundation of cross-type integer coercion using `i128`, which safely compares `u64` and `i64` values. The `compare_values` helper extends this pattern to return `Option<Ordering>` instead of `bool`, enabling all four comparison operators to share identical type-handling logic while implementing operator-specific ordering semantics.
## Implementation Status
-**This pattern is not yet implemented.** The comparison operators and `compare_values` helper function are proposed in [PR #104](https://github.com/EvilBit-Labs/libmagic-rs/pull/104), which remains open as of March 1, 2026. The current codebase only implements equality operators (`Equal`, `NotEqual`) and bitwise operators (`BitwiseAnd`, `BitwiseAndMask`). This article documents the planned implementation to serve as a reference for the architectural pattern and design decisions.
+The comparison operators and `compare_values` helper function were implemented in [PR #104](https://github.com/EvilBit-Labs/libmagic-rs/pull/104), which was merged. The implementation includes all four comparison operators (`LessThan`, `GreaterThan`, `LessEqual`, `GreaterEqual`) added to the `Operator` enum, the shared `compare_values` helper function returning `Option<Ordering>`, parser support for comparison operator syntax with proper token ordering, and comprehensive test coverage for all type pairs and edge cases. This article documents the implemented pattern to serve as a reference for the architectural decisions and design approach.
## The compare_values Helper Function
-The proposed `compare_values` function returns `Option<Ordering>` and handles all `Value` type pairs:
+The `compare_values` function returns `Option<Ordering>` and handles all `Value` type pairs:
```rust
pub fn compare_values(left: &Value, right: &Value) -> Option<Ordering> {
@@ -40,31 +40,29 @@
## Comparison Operator Functions
-The proposed implementation includes four operator functions that delegate to `compare_values`:
+The implementation includes four operator functions that delegate to `compare_values`:
```rust
pub fn apply_less_than(left: &Value, right: &Value) -> bool {
- compare_values(left, right)
- .map(|ord| ord == Ordering::Less)
- .unwrap_or(false)
+ compare_values(left, right) == Some(Ordering::Less)
}
pub fn apply_greater_than(left: &Value, right: &Value) -> bool {
- compare_values(left, right)
- .map(|ord| ord == Ordering::Greater)
- .unwrap_or(false)
+ compare_values(left, right) == Some(Ordering::Greater)
}
pub fn apply_less_equal(left: &Value, right: &Value) -> bool {
- compare_values(left, right)
- .map(|ord| !matches!(ord, Ordering::Greater))
- .unwrap_or(false)
+ matches!(
+ compare_values(left, right),
+ Some(Ordering::Less | Ordering::Equal)
+ )
}
pub fn apply_greater_equal(left: &Value, right: &Value) -> bool {
- compare_values(left, right)
- .map(|ord| !matches!(ord, Ordering::Less))
- .unwrap_or(false)
+ matches!(
+ compare_values(left, right),
+ Some(Ordering::Greater | Ordering::Equal)
+ )
}
```
@@ -99,36 +97,36 @@
## AST and Parser Integration
-The `Operator` enum in [`src/parser/ast.rs`](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/parser/ast.rs#L106-L117) would be extended with four new variants:
+The `Operator` enum in `src/parser/ast.rs` includes four comparison operator variants:
```rust
pub enum Operator {
Equal,
NotEqual,
- LessThan, // NEW
- GreaterThan, // NEW
- LessEqual, // NEW
- GreaterEqual, // NEW
+ LessThan,
+ GreaterThan,
+ LessEqual,
+ GreaterEqual,
BitwiseAnd,
BitwiseAndMask(u64),
}
```
-The parser in `src/parser/grammar.rs` would need to recognize comparison operator syntax with proper token ordering. Longer operators (`<=`, `>=`) must be parsed before shorter prefixes (`<`, `>`) to prevent premature matching. This ordering requirement is critical: if `<` is parsed before `<=`, the input `<=` would incorrectly match as `<` followed by `=`.
+The parser in `src/parser/grammar.rs` recognizes comparison operator syntax with proper token ordering. Longer operators (`<=`, `>=`) are parsed before shorter prefixes (`<`, `>`) to prevent premature matching. This ordering requirement is critical: if `<` is parsed before `<=`, the input `<=` would incorrectly match as `<` followed by `=`.
## Operator Dispatch
-[The `apply_operator` function in `src/evaluator/operators.rs`](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/evaluator/operators.rs#L219-L239) currently dispatches to equality and bitwise operators. The pattern extends this function to route comparison operators:
+The `apply_operator` function in `src/evaluator/operators.rs` dispatches to all operator implementations including comparison operators:
```rust
pub fn apply_operator(operator: &Operator, left: &Value, right: &Value) -> bool {
match operator {
Operator::Equal => apply_equal(left, right),
Operator::NotEqual => apply_not_equal(left, right),
- Operator::LessThan => apply_less_than(left, right), // NEW
- Operator::GreaterThan => apply_greater_than(left, right), // NEW
- Operator::LessEqual => apply_less_equal(left, right), // NEW
- Operator::GreaterEqual => apply_greater_equal(left, right), // NEW
+ Operator::LessThan => apply_less_than(left, right),
+ Operator::GreaterThan => apply_greater_than(left, right),
+ Operator::LessEqual => apply_less_equal(left, right),
+ Operator::GreaterEqual => apply_greater_equal(left, right),
Operator::BitwiseAnd => apply_bitwise_and(left, right),
Operator::BitwiseAndMask(mask) => { /* inline logic */ },
}
@@ -235,20 +233,22 @@
### Implementation Steps
-1. **Extend `Operator` enum** in `src/parser/ast.rs` with `LessThan`, `GreaterThan`, `LessEqual`, `GreaterEqual` variants
-2. **Add parsing logic** in `src/parser/grammar.rs` with correct token ordering (`<=` before `<`, `>=` before `>`)
-3. **Implement `compare_values` helper** in `src/evaluator/operators.rs` with all type pair handling
-4. **Implement four `apply_*` functions** that delegate to `compare_values`
-5. **Extend `apply_operator` dispatch** to route new operator variants
-6. **Add operator serialization** in both `build.rs` and `src/build_helpers.rs`
-7. **Add comprehensive tests** covering all type pairs, edge cases, and integration scenarios
-8. **Update documentation** including API docs, mdbook, and examples
-
-[The Enum Extension And Exhaustive Match Synchronization pattern](https://app.dosu.dev/documents/e6db8e51-ec58-458f-893a-e0c04ea79829) requires synchronized updates across all files that match on the `Operator` enum. The Rust compiler's exhaustive match checking enforces this synchronization at compile time.
+The implementation followed these steps:
+
+1. **Extended `Operator` enum** in `src/parser/ast.rs` with `LessThan`, `GreaterThan`, `LessEqual`, `GreaterEqual` variants
+2. **Added parsing logic** in `src/parser/grammar.rs` with correct token ordering (`<=` before `<`, `>=` before `>`)
+3. **Implemented `compare_values` helper** in `src/evaluator/operators.rs` with all type pair handling
+4. **Implemented four `apply_*` functions** that delegate to `compare_values`
+5. **Extended `apply_operator` dispatch** to route new operator variants
+6. **Added operator serialization** in both `build.rs` and `src/build_helpers.rs`
+7. **Added comprehensive tests** covering all type pairs, edge cases, and integration scenarios
+8. **Updated documentation** including API docs, mdbook, and examples
+
+[The Enum Extension And Exhaustive Match Synchronization pattern](https://app.dosu.dev/documents/e6db8e51-ec58-458f-893a-e0c04ea79829) required synchronized updates across all files that match on the `Operator` enum. The Rust compiler's exhaustive match checking enforces this synchronization at compile time.
### Testing Requirements
-Comprehensive test coverage is required:
+The implementation includes comprehensive test coverage:
- **Basic numeric tests**: Same values, different values, zero, extreme values (`u64::MAX`, `i64::MIN`, `i64::MAX`)
- **Cross-type integer tests**: `Uint` vs `Int` comparisons with edge cases, especially `u64::MAX` vs `Int(-1)`
@@ -260,7 +260,7 @@
- **Dispatch tests**: Verify `apply_operator` correctly routes to comparison operator functions
- **Integration tests**: Real magic rule patterns with version detection, size validation, range checking
-[The current codebase demonstrates the expected testing standard](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/evaluator/operators.rs) with over 1,400 lines of tests for equality and bitwise operators. The comparison operators should maintain this level of coverage.
+The tests maintain the high coverage standard established in the codebase with over 1,400 lines of tests for equality and bitwise operators.
## Related Patterns and Topics
@@ -270,14 +270,14 @@
- **Parser Token Ordering**: The critical requirement that longer operators must be parsed before shorter prefixes to prevent premature matching. For comparison operators, `<=` and `>=` must be parsed before `<` and `>`.
-- **Magic File Compatibility**: The broader goal of achieving compatibility with system magic databases. Comparison operators would unlock approximately 40% more magic file compatibility.
+- **Magic File Compatibility**: The broader goal of achieving compatibility with system magic databases. Comparison operators unlock approximately 40% more magic file compatibility.
- **Operator Evaluation Pipeline**: The three-stage evaluation process in libmagic-rs consisting of offset resolution, type interpretation, and operator application.
## References
- [Issue #34: Parser: implement comparison operators (<, >, <=, >=)](https://github.com/EvilBit-Labs/libmagic-rs/issues/34) - Feature request created February 14, 2026
-- [PR #104: feat(parser): implement comparison operators](https://github.com/EvilBit-Labs/libmagic-rs/pull/104) - Implementation pull request (open as of March 1, 2026)
+- [PR #104: feat(parser): implement comparison operators](https://github.com/EvilBit-Labs/libmagic-rs/pull/104) - Implementation pull request (merged)
- [PR #105: docs: updates for PR #104](https://github.com/EvilBit-Labs/libmagic-rs/pull/105) - Documentation updates (merged March 1, 2026)
- [Issue #62: refactor: pre-create evaluator submodules for v0.2.0 features](https://github.com/EvilBit-Labs/libmagic-rs/issues/62) - References `comparison.rs` module for this feature
- [Issue #53: Epic #24 for Operator Completeness](https://github.com/EvilBit-Labs/libmagic-rs/issues/53) - Tracks implementation of missing magic file operatorstestingView Changes@@ -62,16 +62,16 @@
**TypeKind Tests:**
-- `test_type_kind_byte` - Single byte type handling
+- `test_type_kind_byte` - Single byte type handling with signedness
- `test_type_kind_short` - 16-bit integer types with endianness
- `test_type_kind_long` - 32-bit integer types with endianness
- `test_type_kind_string` - String types with length limits
-- `test_type_kind_serialization` - All type serialization
+- `test_type_kind_serialization` - All type serialization including signed/unsigned variants
**Operator Tests:**
-- `test_operator_variants` - All operator types
-- `test_operator_serialization` - Operator serialization
+- `test_operator_variants` - All operator types (Equal, NotEqual, LessThan, GreaterThan, LessEqual, GreaterEqual, BitwiseAnd, BitwiseAndMask)
+- `test_operator_serialization` - Operator serialization including comparison operators
**MagicRule Tests:**
@@ -104,6 +104,7 @@
- `test_parse_operator_equality` - Equality operators (= and ==)
- `test_parse_operator_inequality` - Inequality operators (!= and \<>)
+- `test_parse_operator_comparison` - Comparison operators (\<, >, \<=, >=)
- `test_parse_operator_bitwise_and` - Bitwise AND operator (&)
- `test_parse_operator_with_remaining_input` - Partial parsing
- `test_parse_operator_precedence` - Operator precedence handling
@@ -300,6 +301,40 @@
}
```
+**Testing Signed vs Unsigned Byte Behavior:**
+
+```rust
+#[test]
+fn test_signed_unsigned_byte_handling() {
+ // Test signed byte interpretation
+ let signed_rule = MagicRule {
+ offset: OffsetSpec::Absolute(0),
+ typ: TypeKind::Byte { signed: true },
+ op: Operator::GreaterThan,
+ value: Value::Int(0),
+ message: "Positive signed byte".to_string(),
+ children: vec![],
+ level: 0,
+ };
+
+ // 0x7f = 127 as signed (positive)
+ // 0x80 = -128 as signed (negative)
+
+ // Test unsigned byte interpretation
+ let unsigned_rule = MagicRule {
+ offset: OffsetSpec::Absolute(0),
+ typ: TypeKind::Byte { signed: false },
+ op: Operator::GreaterThan,
+ value: Value::Uint(127),
+ message: "Large unsigned byte".to_string(),
+ children: vec![],
+ level: 0,
+ };
+
+ // Both 0x7f and 0x80 are > 127 when interpreted as unsigned
+}
+```
+
### Test Data Management
**Test Fixtures:**
@@ -313,7 +348,7 @@
fn create_test_rule() -> MagicRule {
MagicRule {
offset: OffsetSpec::Absolute(0),
- typ: TypeKind::Byte,
+ typ: TypeKind::Byte { signed: true },
op: Operator::Equal,
value: Value::Uint(0x7f),
message: "Test rule".to_string(),Testing Strategy And Coverage RequirementsView Changes@@ -23,7 +23,7 @@
**Evaluator module tests:**
- [src/evaluator/mod.rs](https://github.com/EvilBit-Labs/libmagic-rs/blob/e925ef6b3f2208fc8805a728ba3de55956f4447a/src/evaluator/mod.rs) - 77 unit tests
- [src/evaluator/types.rs](https://github.com/EvilBit-Labs/libmagic-rs/blob/e925ef6b3f2208fc8805a728ba3de55956f4447a/src/evaluator/types.rs) - 77 unit tests
-- [src/evaluator/operators.rs](https://github.com/EvilBit-Labs/libmagic-rs/blob/e925ef6b3f2208fc8805a728ba3de55956f4447a/src/evaluator/operators.rs) - 86 unit tests
+- [src/evaluator/operators.rs](https://github.com/EvilBit-Labs/libmagic-rs/blob/e925ef6b3f2208fc8805a728ba3de55956f4447a/src/evaluator/operators.rs) - 86 unit tests covering equality, inequality, comparison operators (`<`, `>`, `<=`, `>=`), and bitwise operations
- [src/evaluator/strength.rs](https://github.com/EvilBit-Labs/libmagic-rs/blob/e925ef6b3f2208fc8805a728ba3de55956f4447a/src/evaluator/strength.rs) - 35 unit tests
**Parser module tests:**
@@ -155,8 +155,8 @@
The [tests/property_tests.rs file](https://github.com/EvilBit-Labs/libmagic-rs/blob/e925ef6b3f2208fc8805a728ba3de55956f4447a/tests/property_tests.rs) defines [6 custom strategies](https://github.com/EvilBit-Labs/libmagic-rs/blob/e925ef6b3f2208fc8805a728ba3de55956f4447a/tests/property_tests.rs#L19-L101):
1. **[`arb_offset_spec()`](https://github.com/EvilBit-Labs/libmagic-rs/blob/e925ef6b3f2208fc8805a728ba3de55956f4447a/tests/property_tests.rs#L19-L25)** - Generates valid `OffsetSpec` values (Absolute, Relative, FromEnd)
-2. **[`arb_type_kind()`](https://github.com/EvilBit-Labs/libmagic-rs/blob/e925ef6b3f2208fc8805a728ba3de55956f4447a/tests/property_tests.rs#L28-L55)** - Generates valid `TypeKind` values with different endianness
-3. **[`arb_operator()`](https://github.com/EvilBit-Labs/libmagic-rs/blob/e925ef6b3f2208fc8805a728ba3de55956f4447a/tests/property_tests.rs#L58-L65)** - Generates valid `Operator` values
+2. **[`arb_type_kind()`](https://github.com/EvilBit-Labs/libmagic-rs/blob/e925ef6b3f2208fc8805a728ba3de55956f4447a/tests/property_tests.rs#L28-L55)** - Generates valid `TypeKind` values with different endianness and signedness (including `TypeKind::Byte { signed }`)
+3. **[`arb_operator()`](https://github.com/EvilBit-Labs/libmagic-rs/blob/e925ef6b3f2208fc8805a728ba3de55956f4447a/tests/property_tests.rs#L58-L65)** - Generates valid `Operator` values (including comparison operators `LessThan`, `GreaterThan`, `LessEqual`, `GreaterEqual`)
4. **[`arb_value()`](https://github.com/EvilBit-Labs/libmagic-rs/blob/e925ef6b3f2208fc8805a728ba3de55956f4447a/tests/property_tests.rs#L68-L75)** - Generates valid `Value` variants (Uint, Int, Bytes, String)
5. **[`arb_magic_rule()`](https://github.com/EvilBit-Labs/libmagic-rs/blob/e925ef6b3f2208fc8805a728ba3de55956f4447a/tests/property_tests.rs#L78-L96)** - Combines all strategies to generate complete `MagicRule` instances
6. **[`arb_buffer()`](https://github.com/EvilBit-Labs/libmagic-rs/blob/e925ef6b3f2208fc8805a728ba3de55956f4447a/tests/property_tests.rs#L99-L101)** - Generates arbitrary binary buffers (0-1024 bytes)
@@ -206,7 +206,7 @@
The [build_helpers.rs module contains 35+ unit tests](https://github.com/EvilBit-Labs/libmagic-rs/blob/e925ef6b3f2208fc8805a728ba3de55956f4447a/src/build_helpers.rs) (lines 346-709) covering:
- **Error formatting tests** - Parse error message formatting for various error types
-- **Serialization tests** - Tests for `serialize_offset_spec()`, `serialize_type_kind()`, `serialize_operator()`, `serialize_value()`
+- **Serialization tests** - Tests for `serialize_offset_spec()`, `serialize_type_kind()` (including `TypeKind::Byte { signed }`), `serialize_operator()` (including comparison operators), `serialize_value()`
- **Code generation tests** - Empty rules, single rules, nested children serialization
- **End-to-end parsing tests** - Valid magic file parsing, invalid syntax handling, malformed values
@@ -340,7 +340,16 @@
#[test]
fn test_evaluate_byte_equal() {
- let rule = MagicRule { /* ... */ };
+ let rule = MagicRule {
+ offset: OffsetSpec::Absolute(0),
+ typ: TypeKind::Byte { signed: false },
+ op: Operator::Equal,
+ value: Value::Uint(0x7f),
+ message: "test".to_string(),
+ children: vec![],
+ level: 0,
+ strength_modifier: None,
+ };
let buffer = vec![0x7f, 0x45, 0x4c, 0x46];
let config = EvaluationConfig::default();
let mut context = EvaluationContext::new(config);testing-guidelinesView Changes@@ -116,7 +116,7 @@
// Arrange
let rule = MagicRule {
offset: OffsetSpec::Absolute(0),
- typ: TypeKind::Byte,
+ typ: TypeKind::Byte { signed: false },
op: Operator::Equal,
value: Value::Uint(0x7f),
message: "ELF magic".to_string(),Type Signedness Defaults And Unsigned Type VariantsView Changes@@ -2,7 +2,7 @@
The libmagic-rs type system implements explicit signedness control through [TypeKind enum variants with `signed: bool` fields](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/parser/ast.rs#L80-L104). This design provides libmagic-compatible type interpretation for multi-byte integers (`Short` and `Long` types), enabling correct detection of file formats whose magic numbers have high bits set. The parser recognizes endian-prefixed type names (`beshort`, `leshort`, `belong`, `lelong`) and [maps them to TypeKind variants with appropriate endianness and signedness settings](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/parser/grammar.rs#L1460-L1488).
-Currently, [the `TypeKind::Byte` variant is a unit variant without signedness control](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/parser/ast.rs#L84), and [bytes are always read as unsigned `Value::Uint`](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/evaluator/types.rs#L77-L85). In contrast, `Short` and `Long` variants include explicit `signed: bool` fields that control whether values are interpreted via signed casts (`i16`/`i32`) or unsigned conversions (`u16`/`u32`). This distinction matters for formats like [JPEG (0xffd8)](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/builtin_rules.magic#L47) and [PNG (0x89504e47)](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/builtin_rules.magic#L50), where unsigned interpretation prevents misreading high-bit magic numbers as negative values.
+All `TypeKind` variants (`Byte`, `Short`, `Long`) include explicit `signed: bool` fields that control whether values are interpreted via signed casts (`i8`/`i16`/`i32`) or unsigned conversions (`u8`/`u16`/`u32`). [The `read_byte` function](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/evaluator/types.rs#L77-L85) returns `Value::Uint(u64)` for unsigned bytes and `Value::Int(i64)` for signed bytes. This distinction matters for formats like [JPEG (0xffd8)](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/builtin_rules.magic#L47) and [PNG (0x89504e47)](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/builtin_rules.magic#L50), where unsigned interpretation prevents misreading high-bit magic numbers as negative values.
The signedness field cascades through multiple subsystems: [type reading functions in `src/evaluator/types.rs`](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/evaluator/types.rs#L122-L209), [strength scoring in `src/evaluator/strength.rs`](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/evaluator/strength.rs#L72-L86), [serialization in `build.rs` and `src/build_helpers.rs`](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/build.rs#L270-L290), and [property test strategies in `tests/property_tests.rs`](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/tests/property_tests.rs#L28-L54). Rust's exhaustive pattern matching ensures all code paths handle signedness consistently.
@@ -12,7 +12,9 @@
```rust
pub enum TypeKind {
- Byte,
+ Byte {
+ signed: bool,
+ },
Short {
endian: Endianness,
signed: bool,
@@ -29,7 +31,7 @@
### Variant Details
-**`Byte`**: [Single 8-bit byte, always interpreted as unsigned](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/parser/ast.rs#L84). The [`read_byte` function returns `Value::Uint(u64)`](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/evaluator/types.rs#L77-L85) without signedness control.
+**`Byte`**: [Single 8-bit byte with `signed: bool` field](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/parser/ast.rs#L84). When `signed: true`, [values are cast to `i8` and returned as `Value::Int(i64)`](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/evaluator/types.rs#L77-L85). When `signed: false`, values remain `u8` and return as `Value::Uint(u64)`.
**`Short`**: [16-bit integer with `endian: Endianness` and `signed: bool` fields](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/parser/ast.rs#L86-L91). When `signed: true`, [values are cast to `i16` and returned as `Value::Int(i64)`](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/evaluator/types.rs#L140-L146). When `signed: false`, values remain `u16` and return as `Value::Uint(u64)`.
@@ -63,16 +65,23 @@
| Type Name | TypeKind Variant | Endianness | Signedness |
|-----------|------------------|------------|------------|
-| `byte` | `TypeKind::Byte` | N/A | Always unsigned |
-| `short` | `TypeKind::Short` | Native | `signed: false` |
-| `leshort` | `TypeKind::Short` | Little | `signed: false` |
-| `beshort` | `TypeKind::Short` | Big | `signed: false` |
-| `long` | `TypeKind::Long` | Native | `signed: false` |
-| `lelong` | `TypeKind::Long` | Little | `signed: false` |
-| `belong` | `TypeKind::Long` | Big | `signed: false` |
+| `byte` | `TypeKind::Byte` | N/A | `signed: true` |
+| `ubyte` | `TypeKind::Byte` | N/A | `signed: false` |
+| `short` | `TypeKind::Short` | Native | `signed: true` |
+| `ushort` | `TypeKind::Short` | Native | `signed: false` |
+| `leshort` | `TypeKind::Short` | Little | `signed: true` |
+| `uleshort` | `TypeKind::Short` | Little | `signed: false` |
+| `beshort` | `TypeKind::Short` | Big | `signed: true` |
+| `ubeshort` | `TypeKind::Short` | Big | `signed: false` |
+| `long` | `TypeKind::Long` | Native | `signed: true` |
+| `ulong` | `TypeKind::Long` | Native | `signed: false` |
+| `lelong` | `TypeKind::Long` | Little | `signed: true` |
+| `ulelong` | `TypeKind::Long` | Little | `signed: false` |
+| `belong` | `TypeKind::Long` | Big | `signed: true` |
+| `ubelong` | `TypeKind::Long` | Big | `signed: false` |
| `string` | `TypeKind::String` | N/A | N/A |
-**Important limitation**: All currently parsed integer types (except `Byte`, which has no signedness field) map to `signed: false`. The parser does **not** support u-prefixed type names like `ubyte`, `ushort`, or `ulong` found in traditional libmagic. There is no logic to distinguish signed vs unsigned type name variants.
+Following libmagic conventions, unprefixed type names (`byte`, `short`, `long`, `beshort`, `belong`, `leshort`, `lelong`) default to signed interpretation. The u-prefixed variants (`ubyte`, `ushort`, `ulong`, `ubeshort`, `ubelong`, `uleshort`, `ulelong`) explicitly request unsigned interpretation.
## Type Reading and Signed/Unsigned Conversion
@@ -80,7 +89,7 @@
```rust
match type_kind {
- TypeKind::Byte => read_byte(buffer, offset),
+ TypeKind::Byte { signed } => read_byte(buffer, offset, *signed),
TypeKind::Short { endian, signed } => read_short(buffer, offset, *endian, *signed),
TypeKind::Long { endian, signed } => read_long(buffer, offset, *endian, *signed),
TypeKind::String { max_length } => read_string(buffer, offset, *max_length),
@@ -89,13 +98,14 @@
### Signedness Conversion Pattern
-[Both `read_short` and `read_long`](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/evaluator/types.rs#L122-L209) follow an identical pattern:
-
-1. Read raw bytes as unsigned (`u16` or `u32`) using the appropriate endianness
+All three integer reading functions (`read_byte`, `read_short`, `read_long`) follow an identical pattern:
+
+1. Read raw bytes as unsigned (`u8`, `u16`, or `u32`) using the appropriate endianness
2. If `signed == false`: convert to `u64` and return `Value::Uint(u64)`
-3. If `signed == true`: cast to signed type (`value as i16` / `value as i32`), convert to `i64`, and return `Value::Int(i64)`
+3. If `signed == true`: cast to signed type (`value as i8` / `value as i16` / `value as i32`), convert to `i64`, and return `Value::Int(i64)`
The casting approach reinterprets bit patterns as signed integers using two's complement. For example:
+- `0xFF` as `u8` = 255 → [cast to `i8` = -1](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/evaluator/types.rs#L607-L617)
- `0xFFFF` as `u16` = 65,535 → [cast to `i16` = -1](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/evaluator/types.rs#L607-L617)
- `0xFFFFFFFF` as `u32` = 4,294,967,295 → [cast to `i32` = -1](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/evaluator/types.rs#L747-L757)
@@ -103,7 +113,7 @@
Functions return [variants of the `Value` enum](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/parser/ast.rs#L121-L130):
-- [`read_byte`](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/evaluator/types.rs#L77-L85): Always returns `Value::Uint(u64)` (even for `0x80`, which is unsigned 128, not signed -128)
+- `read_byte`: Returns `Value::Int(i64)` if `signed: true`, otherwise `Value::Uint(u64)`
- `read_short`: Returns `Value::Int(i64)` if `signed: true`, otherwise `Value::Uint(u64)`
- `read_long`: Returns `Value::Int(i64)` if `signed: true`, otherwise `Value::Uint(u64)`
- `read_string`: Returns `Value::String(String)`
@@ -116,19 +126,19 @@
[Line 47](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/builtin_rules.magic#L47):
```
-0 beshort 0xffd8 JPEG image data
-```
-
-Uses `beshort` (big-endian short, unsigned) to match the JPEG start-of-image marker `0xFFD8`. Both bytes have high bits set (`0xFF` = 255 decimal), requiring unsigned interpretation.
+0 ubeshort 0xffd8 JPEG image data
+```
+
+Uses `ubeshort` (big-endian unsigned short) to match the JPEG start-of-image marker `0xFFD8`. Both bytes have high bits set (`0xFF` = 255 decimal), requiring unsigned interpretation.
### PNG Detection
[Line 50](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/builtin_rules.magic#L50):
```
-0 belong 0x89504e47 PNG image data
-```
-
-Uses `belong` (big-endian long, unsigned) to match the PNG signature `0x89504E47` (byte sequence `0x89 'P' 'N' 'G'`). The first byte `0x89` (137 decimal) exceeds the signed byte maximum of 127, making unsigned interpretation essential.
+0 ubelong 0x89504e47 PNG image data
+```
+
+Uses `ubelong` (big-endian unsigned long) to match the PNG signature `0x89504E47` (byte sequence `0x89 'P' 'N' 'G'`). The first byte `0x89` (137 decimal) exceeds the signed byte maximum of 127, making unsigned interpretation essential.
### GZIP Detection
@@ -143,15 +153,15 @@
## Cascading Implementation Impacts
-The `signed: bool` field in `TypeKind::Short` and `TypeKind::Long` requires exhaustive pattern match updates across multiple modules. Rust's compiler enforces this through exhaustiveness checking, preventing runtime failures.
+The `signed: bool` field in all `TypeKind` variants (`Byte`, `Short`, `Long`) requires exhaustive pattern match updates across multiple modules. Rust's compiler enforces this through exhaustiveness checking, preventing runtime failures.
### Core Subsystems
-**AST Definition**: [src/parser/ast.rs lines 80-104](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/parser/ast.rs#L80-L104) defines the `TypeKind` enum with `signed` fields in `Short` and `Long` variants.
+**AST Definition**: [src/parser/ast.rs lines 80-104](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/parser/ast.rs#L80-L104) defines the `TypeKind` enum with `signed` fields in all integer variants (`Byte`, `Short`, `Long`).
**Parser Grammar**: [src/parser/grammar.rs lines 1460-1488](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/parser/grammar.rs#L1460-L1488) maps type name strings to `TypeKind` variants with explicit `signed` values.
-**Type Reading**: [src/evaluator/types.rs lines 122-209](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/evaluator/types.rs#L122-L209) implements `read_short` and `read_long` with `signed` parameter handling.
+**Type Reading**: [src/evaluator/types.rs lines 122-209](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/evaluator/types.rs#L122-L209) implements `read_byte`, `read_short`, and `read_long` with `signed` parameter handling.
**Strength Scoring**: [src/evaluator/strength.rs lines 72-86](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/evaluator/strength.rs#L72-L86) matches on `TypeKind` variants (signedness doesn't affect score, but pattern must match structure).
@@ -166,13 +176,14 @@
Example serialization output:
```rust
+TypeKind::Byte { signed: false }
TypeKind::Short { endian: Endianness::Big, signed: false }
TypeKind::Long { endian: Endianness::Little, signed: true }
```
### Testing Infrastructure
-**Property Tests**: [tests/property_tests.rs lines 28-54](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/tests/property_tests.rs#L28-L54) contains the [`arb_type_kind()` strategy](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/tests/property_tests.rs#L31-L49) that generates `Short` and `Long` variants with `any::<bool>()` for both endianness and signedness, producing all four combinations.
+**Property Tests**: [tests/property_tests.rs lines 28-54](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/tests/property_tests.rs#L28-L54) contains the [`arb_type_kind()` strategy](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/tests/property_tests.rs#L31-L49) that generates all three integer type variants (`Byte`, `Short`, `Long`) with `any::<bool>()` for signedness, producing all combinations of endianness and signedness.
**Unit Tests**: [src/evaluator/types.rs](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/evaluator/types.rs#L607-L757) contains dedicated tests comparing signed vs unsigned interpretation (e.g., `0xFFFF` as 65,535 unsigned vs -1 signed).
@@ -192,12 +203,7 @@
### Current Implementation Status
-**Critical limitation**: While `Short` and `Long` have explicit signedness control, the current parser implementation:
-- Maps all type names to `signed: false` (unsigned)
-- Does not recognize u-prefixed type variants (`ubyte`, `ushort`, `ulong`, etc.)
-- Has no logic to distinguish signed vs unsigned type name variants
-
-This means signed integer interpretation is currently **not accessible through the parser**, despite the underlying type system supporting it.
+The parser supports both signed and unsigned type variants through u-prefixed type names (`ubyte`, `ushort`, `ubeshort`, `ulong`, `ubelong`, `ulelong`, `uleshort`). Unprefixed type names default to signed interpretation following libmagic conventions. The built-in rules use unsigned types (`ubeshort`, `ubelong`) for file formats with high-bit magic numbers like JPEG and PNG.
### Memory Safety and Type Safety
@@ -211,7 +217,7 @@
| File | Purpose | Key Content |
|------|---------|-------------|
-| [src/parser/ast.rs](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/parser/ast.rs#L80-L104) | AST definitions | TypeKind enum with `signed: bool` fields in Short and Long variants |
+| [src/parser/ast.rs](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/parser/ast.rs#L80-L104) | AST definitions | TypeKind enum with `signed: bool` fields in all integer variants (Byte, Short, Long) |
| [src/parser/grammar.rs](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/parser/grammar.rs#L1434-L1488) | Parser implementation | Type name parsing and mapping to TypeKind variants |
| [src/evaluator/types.rs](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/evaluator/types.rs#L77-L361) | Type reading functions | `read_byte`, `read_short`, `read_long` with signedness handling |
| [src/evaluator/operators.rs](https://github.com/EvilBit-Labs/libmagic-rs/blob/f4da22375a440bb5f28bf2bd7e33fbe4e7804f40/src/evaluator/operators.rs#L48-L69) | Operator evaluation | Cross-type integer coercion via `i128` | |
🧪 CI InsightsHere's what we observed from your CI run for aa3a5cd. 🟢 All jobs passed!But CI Insights is watching 👀 |
There was a problem hiding this comment.
Pull request overview
This PR extends libmagic-rs’s parsing and evaluation capabilities by adding signed/unsigned handling for byte reads and implementing comparison operators (<, >, <=, >=) end-to-end (parser → AST → evaluator), aligning built-in rules and docs with the new semantics.
Changes:
- Extend
TypeKind::Byteto carry signedness and add parsing support for unsigned integer type spellings (e.g.,ubyte,ubeshort,ubelong). - Add comparison operators to
Operator, implement their parsing and evaluation logic, and incorporate them into strength scoring. - Update built-in rules and test suite (unit/property/integration) to cover new operators and signedness behavior.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
tests/property_tests.rs |
Generates TypeKind::Byte { signed } and includes new operators in proptest strategies. |
tests/integration_tests.rs |
Adds end-to-end tests for <, >, <=, >= and updates PNG rule to ubelong. |
src/parser/preprocessing.rs |
Updates parser tests for the new TypeKind::Byte { .. } shape. |
src/parser/grammar.rs |
Parses comparison operators and unsigned type spellings; updates tests/doctests accordingly. |
src/parser/ast.rs |
Introduces signed Byte type form and new Operator variants (with rustdoc examples). |
src/evaluator/types.rs |
Implements signed vs unsigned byte reads and propagates through read_typed_value. |
src/evaluator/strength.rs |
Assigns strength contribution for the new comparison operators. |
src/evaluator/operators.rs |
Implements ordering + <, >, <=, >= evaluation and dispatch via apply_operator. |
src/evaluator/offset.rs |
Updates indirect pointer type usage to new TypeKind::Byte { .. } form in tests. |
src/evaluator/mod.rs |
Updates doctests and adds evaluator tests for comparison operators + signed byte behavior. |
src/builtin_rules.magic |
Switches JPEG/PNG signatures to unsigned integer types for correct matching. |
src/build_helpers.rs |
Updates codegen serialization for TypeKind::Byte { signed } and new operators. |
build.rs |
Mirrors serialization updates for build-time generation. |
AGENTS.md |
Updates developer notes/docs to reflect operator/type additions and maintenance steps. |
| Operator::LessThan => "Operator::LessThan".to_string(), | ||
| Operator::GreaterThan => "Operator::GreaterThan".to_string(), | ||
| Operator::LessEqual => "Operator::LessEqual".to_string(), | ||
| Operator::GreaterEqual => "Operator::GreaterEqual".to_string(), | ||
| Operator::BitwiseAnd => "Operator::BitwiseAnd".to_string(), |
There was a problem hiding this comment.
serialize_operator() now supports the new comparison operators, but the unit test test_serialize_operator still only asserts the old variants. Add assertions for LessThan, GreaterThan, LessEqual, and GreaterEqual so this match stays covered when refactoring.
| // Comparison operators are moderately specific | ||
| Operator::LessThan | ||
| | Operator::GreaterThan | ||
| | Operator::LessEqual | ||
| | Operator::GreaterEqual => 6, |
There was a problem hiding this comment.
calculate_default_strength() adds a new strength mapping for the comparison operators, but the test suite in this file doesn't include any cases asserting the expected strength contribution for those new operator variants. Add unit tests covering at least one of the comparison operators (or table-driven tests for all four) to prevent regressions.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…etails Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
| - name: Enforce conventional commit | ||
| description: Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/ | ||
| if: | ||
| - base = main |
There was a problem hiding this comment.
The new "Enforce conventional commit" merge protection applies to all PRs to main, but it validates the PR title. This will likely block auto-queued PRs from dependabot/release-plz (their titles typically don’t follow feat(scope): ...), meaning the bot rules above can approve+queue but merges will still be prevented. Consider excluding trusted bots/release-plz branches from this protection (or enforcing conventional commits via a CI check that validates commits instead of PR titles).
| - base = main | |
| - base = main | |
| - author != dependabot[bot] | |
| - "-head ~= ^release-plz/" |
| # Tier 1: Trusted bot PRs -- auto-approve and queue immediately | ||
| - name: Auto-approve and queue dependabot PRs | ||
| conditions: | ||
| - base = main | ||
| - "author=@maintainers" | ||
| - label = lgtm | ||
| - author = dependabot[bot] | ||
| - -draft | ||
| - label != do-not-merge | ||
| # release.yml is autogenerated by cargo-dist -- dependabot updates to | ||
| # pinned actions in it will break the release pipeline. Dependabot has no | ||
| # way to ignore specific workflow files, so we block it here instead. | ||
| - -files~=\.github/workflows/release\.yml | ||
| actions: | ||
| review: | ||
| type: APPROVE | ||
| message: Automatically approved by Mergify | ||
| queue: | ||
| name: default | ||
|
|
There was a problem hiding this comment.
This PR description focuses on parser type/operator support, but it also changes merge automation (Mergify rules/merge protections) and Dependabot configuration. These CI/ops changes are significant and should either be called out explicitly in the PR description or split into a separate PR to keep review/rollback risk isolated.
| /// | ||
| /// assert!(apply_less_than(&Value::Uint(5), &Value::Uint(10))); | ||
| /// assert!(!apply_less_than(&Value::Uint(10), &Value::Uint(10))); | ||
| /// assert!(apply_less_than(&Value::Int(-1), &Value::Uint(0))); | ||
| /// ``` |
There was a problem hiding this comment.
compare_values() introduces a second source of truth for comparison/coercion logic, but apply_equal() still has its own overlapping match/coercion implementation. To avoid future divergence, consider implementing apply_equal/apply_not_equal in terms of compare_values() (with None => not equal).
…raphic types Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Merge Queue StatusRule:
This pull request spent 4 minutes 25 seconds in the queue, including 4 minutes 11 seconds running CI. Required conditions to merge
|
Update documentation for #104 _Generated by [Dosu](https://dosu.dev)_ Co-authored-by: dosubot[bot] <131922026+dosubot[bot]@users.noreply.github.com>
## 🤖 New release
* `libmagic-rs`: 0.1.1 -> 0.2.0 (⚠ API breaking changes)
### ⚠ `libmagic-rs` breaking changes
```text
--- failure enum_unit_variant_changed_kind: An enum unit variant changed kind ---
Description:
A public enum's exhaustive unit variant has changed to a different kind of enum variant, breaking possible instantiations and patterns.
ref: https://doc.rust-lang.org/reference/items/enumerations.html
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/enum_unit_variant_changed_kind.ron
Failed in:
variant TypeKind::Byte in /tmp/.tmpchbzmC/libmagic-rs/src/parser/ast.rs:84
variant TypeKind::Byte in /tmp/.tmpchbzmC/libmagic-rs/src/parser/ast.rs:84
variant TypeKind::Byte in /tmp/.tmpchbzmC/libmagic-rs/src/parser/ast.rs:84
--- failure enum_variant_added: enum variant added on exhaustive enum ---
Description:
A publicly-visible enum without #[non_exhaustive] has a new variant.
ref: https://doc.rust-lang.org/cargo/reference/semver.html#enum-variant-new
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/enum_variant_added.ron
Failed in:
variant Operator:LessThan in /tmp/.tmpchbzmC/libmagic-rs/src/parser/ast.rs:144
variant Operator:GreaterThan in /tmp/.tmpchbzmC/libmagic-rs/src/parser/ast.rs:155
variant Operator:LessEqual in /tmp/.tmpchbzmC/libmagic-rs/src/parser/ast.rs:166
variant Operator:GreaterEqual in /tmp/.tmpchbzmC/libmagic-rs/src/parser/ast.rs:177
variant Operator:LessThan in /tmp/.tmpchbzmC/libmagic-rs/src/parser/ast.rs:144
variant Operator:GreaterThan in /tmp/.tmpchbzmC/libmagic-rs/src/parser/ast.rs:155
variant Operator:LessEqual in /tmp/.tmpchbzmC/libmagic-rs/src/parser/ast.rs:166
variant Operator:GreaterEqual in /tmp/.tmpchbzmC/libmagic-rs/src/parser/ast.rs:177
variant Operator:LessThan in /tmp/.tmpchbzmC/libmagic-rs/src/parser/ast.rs:144
variant Operator:GreaterThan in /tmp/.tmpchbzmC/libmagic-rs/src/parser/ast.rs:155
variant Operator:LessEqual in /tmp/.tmpchbzmC/libmagic-rs/src/parser/ast.rs:166
variant Operator:GreaterEqual in /tmp/.tmpchbzmC/libmagic-rs/src/parser/ast.rs:177
--- failure function_parameter_count_changed: pub fn parameter count changed ---
Description:
A publicly-visible function now takes a different number of parameters.
ref: https://doc.rust-lang.org/cargo/reference/semver.html#fn-change-arity
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/function_parameter_count_changed.ron
Failed in:
libmagic_rs::evaluator::types::read_byte now takes 3 parameters instead of 2, in /tmp/.tmpchbzmC/libmagic-rs/src/evaluator/types.rs:79
```
<details><summary><i><b>Changelog</b></i></summary><p>
<blockquote>
## [0.2.0] - 2026-03-01
### Features
- **parser**: Implement comparison operators
([#104](#104))
### Miscellaneous Tasks
- **Mergify**: Add outdated PR protection
([#75](#75))
- Add Mergify merge queue and simplify CI
([#78](#78))
- Mergify merge queue, dependabot integration, and CI simplification
([#79](#79))
- **release**: Add regex for version bumping based on commit types
<!-- generated by git-cliff -->
</blockquote>
</p></details>
---
This PR was generated with
[release-plz](https://github.com/release-plz/release-plz/).
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This pull request introduces support for signed and unsigned byte types in the magic rule system and adds comparison operators (
<,>,<=,>=) to the rule evaluation logic. It updates the documentation, test cases, and serialization logic to reflect these changes, and ensures consistency between the code and the built-in rules.Type System Enhancements:
The
TypeKind::Bytevariant is changed toTypeKind::Byte { signed: bool }, allowing explicit signedness for byte types throughout the codebase, including in serialization and test cases.The built-in magic rules now use unsigned types where appropriate (e.g.,
ubeshort,ubelong) for better compatibility with file formats.Added
coerce_value_to_type()to handle signedness coercion at evaluation time -- e.g.,0xffparsed asUint(255)is coerced toInt(-1)when matched against a signed byte rule, ensuring libmagic-compatible behavior.Operator Support Improvements:
Adds support for comparison operators (
<,>,<=,>=) to theOperatorenum and updates serialization logic in bothsrc/build_helpers.rsandbuild.rsto handle these new variants.Unified
apply_equalto delegate tocompare_values(), eliminating duplicated i128 coercion logic.Comparison operators support both numeric and lexicographic ordering (for String/Bytes values).
Updates documentation and developer notes to reflect the addition of the new operators, including instructions for updating serialization and test helpers.
Error Handling Improvements:
Replaced silent
Err(_e) => continuecatch-alls inevaluate_ruleswith explicit error variant matching -- expected errors (buffer overrun, invalid offset, type read errors) continue gracefully, while unexpected errors propagate.Changed
parse_escape_sequenceunreachable fallback from silent passthrough to explicitunreachable!().CI/CD and Merge Automation:
Aligned
.mergify.ymlwith the ruley repo configuration:-draftconditions to all PR rules!breaking change marker support)Added
commit-message: prefix: "chore(deps)"to.github/dependabot.ymlfor all package ecosystems, ensuring dependabot PRs follow conventional commit format.Documentation and Maintenance:
Updated
docs/src/evaluator.mdto fix 9+ inaccuracies: stale function signatures, missing fields, wrong error types, "Numeric" -> "numeric or lexicographic" for comparison operators.Updated
docs/src/magic-format.mdto fix operator documentation (!->!=), mark unimplemented features (XOR, string flags).Updates code comments, documentation, and example usages to use the new
TypeKind::Byte { signed: ... }format and to document the presence of comparison operators.Adds notes about the need to update duplicate
serialize_*functions in bothsrc/build_helpers.rsandbuild.rswhen adding new enum variants.These changes make the magic rule system more expressive and compatible with libmagic, and improve maintainability by clarifying update procedures for serialization and operator logic.