Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,7 @@ cargo test --doc # Test documentation examples
### Currently Implemented (v0.1.0)

- **Offsets**: Absolute and from-end specifications (indirect and relative are parsed but not yet evaluated)
- **Types**: `byte`, `short`, `long`, `quad`, `string` with endianness support; unsigned variants `ubyte`, `ushort`/`ubeshort`/`uleshort`, `ulong`/`ubelong`/`ulelong`, `uquad`/`ubequad`/`ulequad`; types are signed by default (libmagic-compatible)
- **Types**: `byte`, `short`, `long`, `quad`, `float`, `double`, `string` with endianness support; unsigned variants `ubyte`, `ushort`/`ubeshort`/`uleshort`, `ulong`/`ubelong`/`ulelong`, `uquad`/`ubequad`/`ulequad`; float/double endian variants `befloat`/`lefloat`, `bedouble`/`ledouble`; types are signed by default (libmagic-compatible)
- **Operators**: `=` (equal), `!=` (not equal), `<` (less than), `>` (greater than), `<=` (less equal), `>=` (greater equal), `&` (bitwise AND with optional mask), `^` (bitwise XOR), `~` (bitwise NOT), `x` (any value)
- **Nested Rules**: Hierarchical rule evaluation with proper indentation
- **String Matching**: Exact string matching with null-termination
Expand Down Expand Up @@ -240,7 +240,6 @@ impl BinaryRegex for regex::bytes::Regex {

- No regex/search pattern matching
- 64-bit integer types: `quad`/`uquad`, `bequad`/`ubequad`, `lequad`/`ulequad` are implemented; `qquad` (128-bit) is not yet supported
- No floating-point types (float, double, befloat, lefloat)
- No date/time types (date, qdate, ldate, qldate)
- String evaluation reads until first NUL or end-of-buffer by default; `max_length: Some(_)` is supported internally but no dedicated fixed-length string parser syntax exists yet

Expand Down Expand Up @@ -477,7 +476,7 @@ CI must pass before merge. Mergify merge protections enforce these checks. Bot P

1. **MVP (v0.1.0)** - CURRENT: Basic parsing and evaluation with byte/short/long/quad/string types, equality and bitwise AND operators, built-in rules for 10 common formats
2. **Enhanced Features (v0.2)**: Comparison operators (`>`, `<`), indirect offset improvements, strength-based rule ordering
3. **Advanced Types (v0.3)**: Regex type, floating-point types, search patterns
3. **Advanced Types (v0.3)**: Regex type, search patterns
4. **Full Compatibility (v0.4)**: Complete libmagic syntax support, all special directives, named tests
5. **Production Ready (v1.0)**: Stable API, complete documentation, 95%+ compatibility with GNU file

Expand Down
43 changes: 37 additions & 6 deletions docs/API_REFERENCE.md
Original file line number Diff line number Diff line change
Expand Up @@ -302,6 +302,8 @@ use libmagic_rs::TypeKind;
| `Short { endian, signed }` | 16-bit integer |
| `Long { endian, signed }` | 32-bit integer |
| `Quad { endian, signed }` | 64-bit integer |
| `Float { endian }` | 32-bit IEEE 754 floating-point |
| `Double { endian }` | 64-bit IEEE 754 floating-point |
| `String { max_length }` | String data |

##### 64-bit Integer Types
Expand All @@ -319,6 +321,26 @@ The `Quad` variant supports six endian-signedness combinations:

**Version Note:** In v0.2.0, the `Byte` variant changed from a unit variant to a struct variant with a `signed` field.

##### 32-bit Floating-Point Types

The `Float` variant supports three endian variants:

| Type Specifier | Endianness | Description |
| -------------- | ---------- | ----------------------------------- |
| `float` | Native | Native-endian 32-bit IEEE 754 float |
| `lefloat` | Little | Little-endian 32-bit IEEE 754 float |
| `befloat` | Big | Big-endian 32-bit IEEE 754 float |

##### 64-bit Floating-Point Types

The `Double` variant supports three endian variants:

| Type Specifier | Endianness | Description |
| -------------- | ---------- | ------------------------------------ |
| `double` | Native | Native-endian 64-bit IEEE 754 double |
| `ledouble` | Little | Little-endian 64-bit IEEE 754 double |
| `bedouble` | Big | Big-endian 64-bit IEEE 754 double |

#### Operator

Comparison operators.
Expand All @@ -343,6 +365,12 @@ use libmagic_rs::Operator;

**Version Note:** The comparison operators `LessThan`, `GreaterThan`, `LessEqual`, and `GreaterEqual` were added in v0.2.0.

##### Floating-Point Comparison Semantics

Equality operators (`Equal`, `NotEqual`) use epsilon-aware comparison for `Value::Float` operands: two floats are considered equal when `|a - b| <= f64::EPSILON`. NaN is never equal to anything (including itself), and infinities are equal only to the same-signed infinity.

Ordering operators (`LessThan`, `GreaterThan`, `LessEqual`, `GreaterEqual`) use IEEE 754 `partial_cmp` semantics. All NaN comparisons return `false` (NaN is not comparable to any value).

#### Value

Value types for matching.
Expand All @@ -351,12 +379,15 @@ Value types for matching.
use libmagic_rs::Value;
```

| Variant | Description |
| ---------------- | ---------------- |
| `Uint(u64)` | Unsigned integer |
| `Int(i64)` | Signed integer |
| `Bytes(Vec<u8>)` | Byte sequence |
| `String(String)` | String value |
| Variant | Description |
| ---------------- | --------------------------- |
| `Uint(u64)` | Unsigned integer |
| `Int(i64)` | Signed integer |
| `Float(f64)` | 64-bit floating-point value |
| `Bytes(Vec<u8>)` | Byte sequence |
| `String(String)` | String value |

**Note:** `Value` implements `PartialEq` but not `Eq` due to IEEE 754 NaN semantics (NaN is not equal to itself).

#### Endianness

Expand Down
104 changes: 100 additions & 4 deletions docs/src/evaluator.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ The evaluator module separates public interface from implementation:
- **`evaluator/types/`** - Type reading and coercion (organized as submodules as of v0.4.2)
- **`types/mod.rs`** - Public API surface: `read_typed_value`, `coerce_value_to_type`, re-exports type functions
- **`types/numeric.rs`** - Numeric type handling: `read_byte`, `read_short`, `read_long`, `read_quad` with endianness and signedness support
- **`types/float.rs`** - Floating-point type handling: `read_float` (32-bit IEEE 754), `read_double` (64-bit IEEE 754) with endianness support
- **`types/string.rs`** - String type handling: `read_string` with null-termination and UTF-8 conversion
- **`types/tests.rs`** - Module tests
- **`evaluator/strength.rs`** - Rule strength calculation
Expand Down Expand Up @@ -85,7 +86,7 @@ pub struct RuleMatch {
}
```

The `Value` type is from `parser::ast::Value` and represents the actual matched content according to the rule's type specification.
The `Value` type is from `parser::ast::Value` and represents the actual matched content according to the rule's type specification. Note that `Value` implements only `PartialEq` (not `Eq`) due to floating-point NaN semantics.

### Offset Resolution (`evaluator/offset.rs`)

Expand All @@ -105,12 +106,14 @@ pub fn resolve_offset(

### Type Reading (`evaluator/types/`)

Interprets bytes according to type specifications. The types module is organized into submodules for numeric and string type handling (refactored from a single file in v0.4.2):
Interprets bytes according to type specifications. The types module is organized into submodules for numeric, floating-point, and string type handling (refactored from a single file in v0.4.2):

- **Byte**: Single byte values (signed or unsigned)
- **Short**: 16-bit integers with endianness
- **Long**: 32-bit integers with endianness
- **Quad**: 64-bit integers with endianness
- **Float**: 32-bit IEEE 754 floating-point with endianness (native, big-endian `befloat`, little-endian `lefloat`)
- **Double**: 64-bit IEEE 754 floating-point with endianness (native, big-endian `bedouble`, little-endian `ledouble`)
- **String**: Byte sequences with length limits
- **Bounds checking**: Prevents buffer overruns

Expand All @@ -124,6 +127,26 @@ pub fn read_typed_value(

The `read_byte` function signature changed in v0.2.0 to accept three parameters (`buffer`, `offset`, and `signed`) instead of two, allowing explicit control over signed vs unsigned byte interpretation.

**Floating-Point Type Reading (`evaluator/types/float.rs`):**

```rust
pub fn read_float(
buffer: &[u8],
offset: usize,
endian: Endianness,
) -> Result<Value, TypeReadError>

pub fn read_double(
buffer: &[u8],
offset: usize,
endian: Endianness,
) -> Result<Value, TypeReadError>
```

- `read_float()` reads 4 bytes and interprets as `f32`, converting to `f64` and returning `Value::Float(f64)`
- `read_double()` reads 8 bytes and interprets as `f64`, returning `Value::Float(f64)`
- Both respect endianness specified in `TypeKind::Float` or `TypeKind::Double`

### Operator Application (`evaluator/operators.rs`)

Applies comparison operations:
Expand All @@ -139,6 +162,24 @@ Applies comparison operations:

Comparison operators support numeric comparisons across different integer types using `i128` coercion for cross-type compatibility.

**Floating-Point Operator Semantics:**

Float values (`Value::Float`) work with comparison and equality operators but have special handling:

- **Equality operators** (`==`, `!=`): Use epsilon-aware comparison with `f64::EPSILON` tolerance
- Two floats are considered equal when `|a - b| <= f64::EPSILON`
- Implementation is in `floats_equal()` helper function (`evaluator/operators/equality.rs`)
- **Ordering operators** (`<`, `>`, `<=`, `>=`): Use IEEE 754 `partial_cmp` semantics
- Standard floating-point ordering: `-∞ < finite values < +∞`
- Implementation is in `compare_values()` function (`evaluator/operators/comparison.rs`)
- **NaN handling**:
- `NaN != NaN` returns `true` (NaN is never equal to anything, including itself)
- All comparison operations with NaN return `false` (NaN is not comparable)
- **Infinity handling**:
- Positive and negative infinity are only equal to the same sign of infinity
- Infinities are ordered correctly: `NEG_INFINITY < finite < INFINITY`
- **Type mismatch**: Float values cannot be compared with `Int` or `Uint` (returns `false` or `None`)

```rust
pub fn apply_operator(
operator: &Operator,
Expand Down Expand Up @@ -175,6 +216,41 @@ assert!(apply_operator(
));
```

**Example with floating-point operators:**

```rust
use libmagic_rs::parser::ast::{Operator, Value};
use libmagic_rs::evaluator::operators::apply_operator;

// Epsilon-aware equality
assert!(apply_operator(
&Operator::Equal,
&Value::Float(1.0),
&Value::Float(1.0 + f64::EPSILON)
));

// Float ordering
assert!(apply_operator(
&Operator::LessThan,
&Value::Float(1.5),
&Value::Float(2.0)
));

// NaN inequality
assert!(apply_operator(
&Operator::NotEqual,
&Value::Float(f64::NAN),
&Value::Float(f64::NAN)
));

// Infinity comparison
assert!(apply_operator(
&Operator::LessThan,
&Value::Float(f64::NEG_INFINITY),
&Value::Float(0.0)
));
```

## Evaluation Algorithm

The evaluator uses a depth-first hierarchical algorithm:
Expand Down Expand Up @@ -362,17 +438,37 @@ let matches = evaluate_rules(&rules, &buffer)?;
assert_eq!(matches[0].message, "Small value detected");
```

**Example with floating-point types:**

```rust
use libmagic_rs::{evaluate_rules, EvaluationConfig};
use libmagic_rs::parser::parse_text_magic_file;

// Parse magic rule with float type
let magic_content = r#"
0 lefloat 3.14159 Pi constant detected
0 bedouble >100.0 Large double value
"#;
let rules = parse_text_magic_file(magic_content)?;

// IEEE 754 little-endian representation of 3.14159f32
let buffer = vec![0xd0, 0x0f, 0x49, 0x40];
let matches = evaluate_rules(&rules, &buffer)?;

assert_eq!(matches[0].message, "Pi constant detected");
```

## Implementation Status

- [x] Basic evaluation engine structure
- [x] Offset resolution (absolute, relative, from-end)
- [x] Type reading with endianness support (Byte, Short, Long, Quad, String)
- [x] Type reading with endianness support (Byte, Short, Long, Quad, Float, Double, String)
- [x] Operator application (Equal, NotEqual, LessThan, GreaterThan, LessEqual, GreaterEqual, BitwiseAnd, BitwiseAndMask)
- [x] Hierarchical rule processing with child evaluation
- [x] Error handling with graceful degradation
- [x] Timeout protection
- [x] Recursion depth limiting
- [x] Comprehensive test coverage (100+ tests)
- [x] Comprehensive test coverage (150+ tests)
- [ ] Indirect offset support (pointer dereferencing)
- [ ] Regex type support
- [ ] Performance optimizations (rule ordering, caching)
Expand Down
43 changes: 42 additions & 1 deletion docs/src/magic-format.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,33 @@ Examples:
8 uquad >0x8000000000000000 (unsigned 64-bit check)
```

### Floating-Point Types

| Type | Size | Endianness | IEEE 754 |
| ---------- | ------- | ------------- | -------- |
| `float` | 4 bytes | native | 32-bit |
| `befloat` | 4 bytes | big-endian | 32-bit |
| `lefloat` | 4 bytes | little-endian | 32-bit |
| `double` | 8 bytes | native | 64-bit |
| `bedouble` | 8 bytes | big-endian | 64-bit |
| `ledouble` | 8 bytes | little-endian | 64-bit |

Floating-point types follow IEEE 754 standard. Unlike integer types, float types do not have signed or unsigned variants (the IEEE 754 format handles sign internally).

Examples:

```text
0 lefloat =3.14159 File with float value pi
0 bedouble >1.0 Double value greater than 1.0
```

Float comparison behavior:

- **Equality**: Uses epsilon-aware comparison (`f64::EPSILON` tolerance)
- **Ordering**: Uses IEEE 754 semantics via `partial_cmp`
- **NaN**: `NaN != NaN`, comparisons with NaN always return false
- **Infinity**: Positive and negative infinity are properly ordered

### String Type

Match literal string data:
Expand Down Expand Up @@ -384,6 +411,19 @@ Output: `GIF image data, version 89a`
>24 byte 6 \b, RGBA
```

### Floating-Point Values

```text
# Check for specific float value
0 lefloat =3.14159 File with float value pi

# Float comparison
0 float >1.0 Float value greater than 1.0

# Double precision
0 bedouble =0.45455 PNG image with gamma 0.45455
```

## Best Practices

### 1. Order Rules by Specificity
Expand Down Expand Up @@ -461,6 +501,7 @@ Consider:
- Relative offsets
- Indirect offsets (basic)
- Byte, short, long, quad types (8-bit, 16-bit, 32-bit, 64-bit integers)
- Float and double types (32-bit and 64-bit IEEE 754 floating-point)
- String type
- Comparison operators (equal, not-equal, less-than, greater-than, less-equal, greater-equal)
- Bitwise AND operator
Expand All @@ -471,7 +512,6 @@ Consider:

- Regex patterns
- Date/time types
- Float types
- 128-bit integer types
- Use/name directives
- Default rules
Expand All @@ -480,6 +520,7 @@ Consider:

- **Strength modifiers**: The `!:strength` directive for adjusting rule priority
- **64-bit integers**: `quad` type family (`quad`, `uquad`, `lequad`, `ulequad`, `bequad`, `ubequad`)
- **Floating-point types**: `float` and `double` type families (`float`, `befloat`, `lefloat`, `double`, `bedouble`, `ledouble`) with IEEE 754 semantics and epsilon-aware equality

## Troubleshooting

Expand Down
Loading
Loading