Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 18 additions & 16 deletions docs/API_REFERENCE.md
Original file line number Diff line number Diff line change
Expand Up @@ -296,15 +296,17 @@ Data type specifications.
use libmagic_rs::TypeKind;
```

| Variant | Description |
| -------------------------- | -------------------------------------------------------- |
| `Byte { signed }` | Single byte with explicit signedness (changed in v0.2.0) |
| `Short { endian, signed }` | 16-bit integer |
| `Long { endian, signed }` | 32-bit integer |
| `Quad { endian, signed }` | 64-bit integer |
| `Float { endian }` | 32-bit IEEE 754 floating-point |
| `Double { endian }` | 64-bit IEEE 754 floating-point |
| `String { max_length }` | String data |
| Variant | Description |
| -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `Byte { signed }` | Single byte with explicit signedness (changed in v0.2.0) |
| `Short { endian, signed }` | 16-bit integer |
| `Long { endian, signed }` | 32-bit integer |
| `Quad { endian, signed }` | 64-bit integer |
| `Float { endian }` | 32-bit IEEE 754 floating-point |
| `Double { endian }` | 64-bit IEEE 754 floating-point |
| `Date { endian, utc }` | 32-bit Unix timestamp (signed seconds since epoch). The `endian` parameter specifies byte order (LittleEndian or BigEndian), and `utc` is a boolean indicating whether to format as UTC or local time. Date values are formatted as "Www Mmm DD HH:MM:SS YYYY" strings to match GNU file output. |
| `QDate { endian, utc }` | 64-bit Unix timestamp (signed seconds since epoch). The `endian` parameter specifies byte order (LittleEndian or BigEndian), and `utc` is a boolean indicating whether to format as UTC or local time. QDate values are formatted as "Www Mmm DD HH:MM:SS YYYY" strings to match GNU file output. |
| `String { max_length }` | String data |

##### 64-bit Integer Types

Expand Down Expand Up @@ -379,13 +381,13 @@ Value types for matching.
use libmagic_rs::Value;
```

| Variant | Description |
| ---------------- | --------------------------- |
| `Uint(u64)` | Unsigned integer |
| `Int(i64)` | Signed integer |
| `Float(f64)` | 64-bit floating-point value |
| `Bytes(Vec<u8>)` | Byte sequence |
| `String(String)` | String value |
| Variant | Description |
| ---------------- | --------------------------------------------------------------------------------- |
| `Uint(u64)` | Unsigned integer |
| `Int(i64)` | Signed integer |
| `Float(f64)` | 64-bit floating-point value |
| `Bytes(Vec<u8>)` | Byte sequence |
| `String(String)` | String value (also used for date/timestamp values formatted as human-readable strings) |

**Note:** `Value` implements `PartialEq` but not `Eq` due to IEEE 754 NaN semantics (NaN is not equal to itself).

Expand Down
37 changes: 36 additions & 1 deletion docs/MAGIC_FORMAT.md
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,40 @@ Example:
0 string/c <!doctype HTML document
```

### Date/Timestamp Types

Date and timestamp types read Unix timestamps (signed seconds since epoch) and format them as human-readable strings.

**32-bit timestamps (4 bytes):**

| Type | Size | Endianness | Timezone |
| --------- | ------- | ------------- | ---------- |
| `date` | 4 bytes | native | UTC |
| `ldate` | 4 bytes | native | local time |
| `bedate` | 4 bytes | big-endian | UTC |
| `beldate` | 4 bytes | big-endian | local time |
| `ledate` | 4 bytes | little-endian | UTC |
| `leldate` | 4 bytes | little-endian | local time |

**64-bit timestamps (8 bytes):**

| Type | Size | Endianness | Timezone |
| ---------- | ------- | ------------- | ---------- |
| `qdate` | 8 bytes | native | UTC |
| `qldate` | 8 bytes | native | local time |
| `beqdate` | 8 bytes | big-endian | UTC |
| `beqldate` | 8 bytes | big-endian | local time |
| `leqdate` | 8 bytes | little-endian | UTC |
| `leqldate` | 8 bytes | little-endian | local time |

All timestamp values are formatted as strings in the format `"Www Mmm DD HH:MM:SS YYYY"` to match GNU file output.

Example:

```text
0 ldate x Unix timestamp: %s
```

---

## Operators
Expand Down Expand Up @@ -492,6 +526,7 @@ Consider:
- Indirect offsets (basic)
- Byte, short, long, quad types (8-bit, 16-bit, 32-bit, 64-bit integers)
- String type
- Date and timestamp types (32-bit and 64-bit Unix timestamps)
- Comparison operators (`=`, `!`, `<`, `>`, `<=`, `>=`)
- Bitwise AND operator
- Nested rules
Expand All @@ -500,14 +535,14 @@ Consider:
### Not Yet Supported

- Regex patterns
- Date/time types
- Float types
- 128-bit integer types
- Use/name directives
- Default rules

### Recently Added

- **Date/timestamp types**: `date` (32-bit) and `qdate` (64-bit) Unix timestamp types
- **Comparison operators**: Full support for `<`, `>`, `<=`, `>=` operators
- **Strength modifiers**: The `!:strength` directive for adjusting rule priority
- **64-bit integers**: `quad` type family (`quad`, `uquad`, `lequad`, `ulequad`, `bequad`, `ubequad`)
Expand Down
34 changes: 31 additions & 3 deletions docs/src/evaluator.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,12 @@ The evaluator module separates public interface from implementation:
- **`types/mod.rs`** - Public API surface: `read_typed_value`, `coerce_value_to_type`, re-exports type functions
- **`types/numeric.rs`** - Numeric type handling: `read_byte`, `read_short`, `read_long`, `read_quad` with endianness and signedness support
- **`types/float.rs`** - Floating-point type handling: `read_float` (32-bit IEEE 754), `read_double` (64-bit IEEE 754) with endianness support
- **`types/date.rs`** - Date and timestamp type handling: `read_date` (32-bit Unix timestamps), `read_qdate` (64-bit Unix timestamps) with endianness and UTC/local time support
- **`types/string.rs`** - String type handling: `read_string` with null-termination and UTF-8 conversion
- **`types/tests.rs`** - Module tests
- **`evaluator/strength.rs`** - Rule strength calculation

The refactoring improves organization by separating concerns: `mod.rs` handles the public API surface and data types, while `engine/` contains the core evaluation logic. The types module was refactored in v0.4.2 from a single 1,836-line file into focused submodules for numeric and string handling, improving maintainability without changing the public API. From a public API perspective, all types and functions are imported from the `evaluator` module as before -- the internal organization is transparent to library users.
The refactoring improves organization by separating concerns: `mod.rs` handles the public API surface and data types, while `engine/` contains the core evaluation logic. The types module was refactored in v0.4.2 from a single 1,836-line file into focused submodules for numeric, floating-point, date/timestamp, and string handling, improving maintainability without changing the public API. From a public API perspective, all types and functions are imported from the `evaluator` module as before -- the internal organization is transparent to library users.

## Core Components

Expand Down Expand Up @@ -106,14 +107,16 @@ pub fn resolve_offset(

### Type Reading (`evaluator/types/`)

Interprets bytes according to type specifications. The types module is organized into submodules for numeric, floating-point, and string type handling (refactored from a single file in v0.4.2):
Interprets bytes according to type specifications. The types module is organized into submodules for numeric, floating-point, date/timestamp, and string type handling (refactored from a single file in v0.4.2):

- **Byte**: Single byte values (signed or unsigned)
- **Short**: 16-bit integers with endianness
- **Long**: 32-bit integers with endianness
- **Quad**: 64-bit integers with endianness
- **Float**: 32-bit IEEE 754 floating-point with endianness (native, big-endian `befloat`, little-endian `lefloat`)
- **Double**: 64-bit IEEE 754 floating-point with endianness (native, big-endian `bedouble`, little-endian `ledouble`)
- **Date**: 32-bit Unix timestamps (signed seconds since epoch) with configurable endianness and UTC/local time formatting
- **QDate**: 64-bit Unix timestamps (signed seconds since epoch) with configurable endianness and UTC/local time formatting
- **String**: Byte sequences with length limits
- **Bounds checking**: Prevents buffer overruns

Expand Down Expand Up @@ -147,6 +150,31 @@ pub fn read_double(
- `read_double()` reads 8 bytes and interprets as `f64`, returning `Value::Float(f64)`
- Both respect endianness specified in `TypeKind::Float` or `TypeKind::Double`

**Date and QDate Type Reading (`evaluator/types/date.rs`):**

```rust
pub fn read_date(
buffer: &[u8],
offset: usize,
endian: Endianness,
utc: bool,
) -> Result<Value, TypeReadError>

pub fn read_qdate(
buffer: &[u8],
offset: usize,
endian: Endianness,
utc: bool,
) -> Result<Value, TypeReadError>
```

- `read_date()` reads 4 bytes as a 32-bit Unix timestamp (seconds since epoch) and returns `Value::String` formatted as `"Www Mmm DD HH:MM:SS YYYY"` to match GNU file output
- `read_qdate()` reads 8 bytes as a 64-bit Unix timestamp (seconds since epoch) and returns `Value::String` formatted as `"Www Mmm DD HH:MM:SS YYYY"` to match GNU file output
- Both support endianness (little-endian, big-endian, native)
- Both support UTC or local time formatting
- The evaluator reads raw integer timestamps from the buffer and converts them to formatted date strings for comparison
- Example: A 32-bit value `1234567890` at offset 0 with type `ldate` would be evaluated as `"Fri Feb 13 23:31:30 2009"`

### Operator Application (`evaluator/operators.rs`)

Applies comparison operations:
Expand Down Expand Up @@ -462,7 +490,7 @@ assert_eq!(matches[0].message, "Pi constant detected");

- [x] Basic evaluation engine structure
- [x] Offset resolution (absolute, relative, from-end)
- [x] Type reading with endianness support (Byte, Short, Long, Quad, Float, Double, String)
- [x] Type reading with endianness support (Byte, Short, Long, Quad, Float, Double, Date, QDate, String)
- [x] Operator application (Equal, NotEqual, LessThan, GreaterThan, LessEqual, GreaterEqual, BitwiseAnd, BitwiseAndMask)
- [x] Hierarchical rule processing with child evaluation
- [x] Error handling with graceful degradation
Expand Down
34 changes: 33 additions & 1 deletion docs/src/magic-format.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,38 @@ Float comparison behavior:
- **NaN**: `NaN != NaN`, comparisons with NaN always return false
- **Infinity**: Positive and negative infinity are properly ordered

### Date/Timestamp Types

| Type | Size | Endianness | UTC/Local | Description |
| ----------- | ------- | ------------- | --------- | ----------------------------------------------------------------------- |
| `date` | 4 bytes | native | UTC | 32-bit Unix timestamp (signed seconds since epoch), formatted as UTC |
| `ldate` | 4 bytes | native | Local | 32-bit Unix timestamp, formatted as local time |
| `bedate` | 4 bytes | big-endian | UTC | 32-bit Unix timestamp, big-endian byte order, UTC |
| `beldate` | 4 bytes | big-endian | Local | 32-bit Unix timestamp, big-endian byte order, local time |
| `ledate` | 4 bytes | little-endian | UTC | 32-bit Unix timestamp, little-endian byte order, UTC |
| `leldate` | 4 bytes | little-endian | Local | 32-bit Unix timestamp, little-endian byte order, local time |
| `qdate` | 8 bytes | native | UTC | 64-bit Unix timestamp (signed seconds since epoch), formatted as UTC |
| `qldate` | 8 bytes | native | Local | 64-bit Unix timestamp, formatted as local time |
| `beqdate` | 8 bytes | big-endian | UTC | 64-bit Unix timestamp, big-endian byte order, UTC |
| `beqldate` | 8 bytes | big-endian | Local | 64-bit Unix timestamp, big-endian byte order, local time |
| `leqdate` | 8 bytes | little-endian | UTC | 64-bit Unix timestamp, little-endian byte order, UTC |
| `leqldate` | 8 bytes | little-endian | Local | 64-bit Unix timestamp, little-endian byte order, local time |

Timestamp values are formatted as strings matching GNU file output format: "Www Mmm DD HH:MM:SS YYYY"

Examples:

```text
# Match file modified at Unix epoch
0 date =0 File created at epoch

# Check timestamp in file header (big-endian)
8 bedate >946684800 File created after 2000-01-01

# 64-bit timestamp (little-endian, local time)
16 leqldate x \b, timestamp %s
```

### String Type

Match literal string data:
Expand Down Expand Up @@ -502,6 +534,7 @@ Consider:
- Indirect offsets (basic)
- Byte, short, long, quad types (8-bit, 16-bit, 32-bit, 64-bit integers)
- Float and double types (32-bit and 64-bit IEEE 754 floating-point)
- Date and qdate types (32-bit and 64-bit Unix timestamps)
- String type
- Comparison operators (equal, not-equal, less-than, greater-than, less-equal, greater-equal)
- Bitwise AND operator
Expand All @@ -511,7 +544,6 @@ Consider:
### Not Yet Supported

- Regex patterns
- Date/time types
- 128-bit integer types
- Use/name directives
- Default rules
Expand Down
22 changes: 22 additions & 0 deletions docs/src/parser.md
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,28 @@ Parsed literals are stored as `Value::Float(f64)` in the AST, regardless of whet

**Note:** Float and double types do **not** have signed/unsigned variants. IEEE 754 handles sign internally via the sign bit, so all float types use a single `TypeKind` variant with only an `endian` field (no `signed: bool` field).

### Date and Timestamp Types

The parser supports date and timestamp types for parsing Unix timestamps (signed seconds since epoch). There are 12 type keywords:

**32-bit timestamps (Date):**
- `date` - Native endian, UTC
- `ldate` - Native endian, local time
- `bedate` - Big-endian, UTC
- `beldate` - Big-endian, local time
- `ledate` - Little-endian, UTC
- `leldate` - Little-endian, local time

**64-bit timestamps (QDate):**
- `qdate` - Native endian, UTC
- `qldate` - Native endian, local time
- `beqdate` - Big-endian, UTC
- `beqldate` - Big-endian, local time
- `leqdate` - Little-endian, UTC
- `leqldate` - Little-endian, local time

The parser creates `TypeKind::Date` or `TypeKind::QDate` variants with appropriate endianness and UTC flags. During evaluation, timestamps are formatted as strings in the format "Www Mmm DD HH:MM:SS YYYY" to match GNU file output.

## Parser Design Principles

### Error Handling
Expand Down
Loading