diff --git a/docs/API_REFERENCE.md b/docs/API_REFERENCE.md index a61d4453..427c549c 100644 --- a/docs/API_REFERENCE.md +++ b/docs/API_REFERENCE.md @@ -296,15 +296,17 @@ Data type specifications. use libmagic_rs::TypeKind; ``` -| Variant | Description | -| -------------------------- | -------------------------------------------------------- | -| `Byte { signed }` | Single byte with explicit signedness (changed in v0.2.0) | -| `Short { endian, signed }` | 16-bit integer | -| `Long { endian, signed }` | 32-bit integer | -| `Quad { endian, signed }` | 64-bit integer | -| `Float { endian }` | 32-bit IEEE 754 floating-point | -| `Double { endian }` | 64-bit IEEE 754 floating-point | -| `String { max_length }` | String data | +| Variant | Description | +| -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `Byte { signed }` | Single byte with explicit signedness (changed in v0.2.0) | +| `Short { endian, signed }` | 16-bit integer | +| `Long { endian, signed }` | 32-bit integer | +| `Quad { endian, signed }` | 64-bit integer | +| `Float { endian }` | 32-bit IEEE 754 floating-point | +| `Double { endian }` | 64-bit IEEE 754 floating-point | +| `Date { endian, utc }` | 32-bit Unix timestamp (signed seconds since epoch). The `endian` parameter specifies byte order (LittleEndian or BigEndian), and `utc` is a boolean indicating whether to format as UTC or local time. Date values are formatted as "Www Mmm DD HH:MM:SS YYYY" strings to match GNU file output. | +| `QDate { endian, utc }` | 64-bit Unix timestamp (signed seconds since epoch). The `endian` parameter specifies byte order (LittleEndian or BigEndian), and `utc` is a boolean indicating whether to format as UTC or local time. QDate values are formatted as "Www Mmm DD HH:MM:SS YYYY" strings to match GNU file output. | +| `String { max_length }` | String data | ##### 64-bit Integer Types @@ -379,13 +381,13 @@ Value types for matching. use libmagic_rs::Value; ``` -| Variant | Description | -| ---------------- | --------------------------- | -| `Uint(u64)` | Unsigned integer | -| `Int(i64)` | Signed integer | -| `Float(f64)` | 64-bit floating-point value | -| `Bytes(Vec)` | Byte sequence | -| `String(String)` | String value | +| Variant | Description | +| ---------------- | --------------------------------------------------------------------------------- | +| `Uint(u64)` | Unsigned integer | +| `Int(i64)` | Signed integer | +| `Float(f64)` | 64-bit floating-point value | +| `Bytes(Vec)` | Byte sequence | +| `String(String)` | String value (also used for date/timestamp values formatted as human-readable strings) | **Note:** `Value` implements `PartialEq` but not `Eq` due to IEEE 754 NaN semantics (NaN is not equal to itself). diff --git a/docs/MAGIC_FORMAT.md b/docs/MAGIC_FORMAT.md index 95b742d4..f7beddac 100644 --- a/docs/MAGIC_FORMAT.md +++ b/docs/MAGIC_FORMAT.md @@ -200,6 +200,40 @@ Example: 0 string/c `, `<=`, `>=`) - Bitwise AND operator - Nested rules @@ -500,7 +535,6 @@ Consider: ### Not Yet Supported - Regex patterns -- Date/time types - Float types - 128-bit integer types - Use/name directives @@ -508,6 +542,7 @@ Consider: ### Recently Added +- **Date/timestamp types**: `date` (32-bit) and `qdate` (64-bit) Unix timestamp types - **Comparison operators**: Full support for `<`, `>`, `<=`, `>=` operators - **Strength modifiers**: The `!:strength` directive for adjusting rule priority - **64-bit integers**: `quad` type family (`quad`, `uquad`, `lequad`, `ulequad`, `bequad`, `ubequad`) diff --git a/docs/src/evaluator.md b/docs/src/evaluator.md index b9aaace7..2437c6de 100644 --- a/docs/src/evaluator.md +++ b/docs/src/evaluator.md @@ -33,11 +33,12 @@ The evaluator module separates public interface from implementation: - **`types/mod.rs`** - Public API surface: `read_typed_value`, `coerce_value_to_type`, re-exports type functions - **`types/numeric.rs`** - Numeric type handling: `read_byte`, `read_short`, `read_long`, `read_quad` with endianness and signedness support - **`types/float.rs`** - Floating-point type handling: `read_float` (32-bit IEEE 754), `read_double` (64-bit IEEE 754) with endianness support + - **`types/date.rs`** - Date and timestamp type handling: `read_date` (32-bit Unix timestamps), `read_qdate` (64-bit Unix timestamps) with endianness and UTC/local time support - **`types/string.rs`** - String type handling: `read_string` with null-termination and UTF-8 conversion - **`types/tests.rs`** - Module tests - **`evaluator/strength.rs`** - Rule strength calculation -The refactoring improves organization by separating concerns: `mod.rs` handles the public API surface and data types, while `engine/` contains the core evaluation logic. The types module was refactored in v0.4.2 from a single 1,836-line file into focused submodules for numeric and string handling, improving maintainability without changing the public API. From a public API perspective, all types and functions are imported from the `evaluator` module as before -- the internal organization is transparent to library users. +The refactoring improves organization by separating concerns: `mod.rs` handles the public API surface and data types, while `engine/` contains the core evaluation logic. The types module was refactored in v0.4.2 from a single 1,836-line file into focused submodules for numeric, floating-point, date/timestamp, and string handling, improving maintainability without changing the public API. From a public API perspective, all types and functions are imported from the `evaluator` module as before -- the internal organization is transparent to library users. ## Core Components @@ -106,7 +107,7 @@ pub fn resolve_offset( ### Type Reading (`evaluator/types/`) -Interprets bytes according to type specifications. The types module is organized into submodules for numeric, floating-point, and string type handling (refactored from a single file in v0.4.2): +Interprets bytes according to type specifications. The types module is organized into submodules for numeric, floating-point, date/timestamp, and string type handling (refactored from a single file in v0.4.2): - **Byte**: Single byte values (signed or unsigned) - **Short**: 16-bit integers with endianness @@ -114,6 +115,8 @@ Interprets bytes according to type specifications. The types module is organized - **Quad**: 64-bit integers with endianness - **Float**: 32-bit IEEE 754 floating-point with endianness (native, big-endian `befloat`, little-endian `lefloat`) - **Double**: 64-bit IEEE 754 floating-point with endianness (native, big-endian `bedouble`, little-endian `ledouble`) +- **Date**: 32-bit Unix timestamps (signed seconds since epoch) with configurable endianness and UTC/local time formatting +- **QDate**: 64-bit Unix timestamps (signed seconds since epoch) with configurable endianness and UTC/local time formatting - **String**: Byte sequences with length limits - **Bounds checking**: Prevents buffer overruns @@ -147,6 +150,31 @@ pub fn read_double( - `read_double()` reads 8 bytes and interprets as `f64`, returning `Value::Float(f64)` - Both respect endianness specified in `TypeKind::Float` or `TypeKind::Double` +**Date and QDate Type Reading (`evaluator/types/date.rs`):** + +```rust +pub fn read_date( + buffer: &[u8], + offset: usize, + endian: Endianness, + utc: bool, +) -> Result + +pub fn read_qdate( + buffer: &[u8], + offset: usize, + endian: Endianness, + utc: bool, +) -> Result +``` + +- `read_date()` reads 4 bytes as a 32-bit Unix timestamp (seconds since epoch) and returns `Value::String` formatted as `"Www Mmm DD HH:MM:SS YYYY"` to match GNU file output +- `read_qdate()` reads 8 bytes as a 64-bit Unix timestamp (seconds since epoch) and returns `Value::String` formatted as `"Www Mmm DD HH:MM:SS YYYY"` to match GNU file output +- Both support endianness (little-endian, big-endian, native) +- Both support UTC or local time formatting +- The evaluator reads raw integer timestamps from the buffer and converts them to formatted date strings for comparison +- Example: A 32-bit value `1234567890` at offset 0 with type `ldate` would be evaluated as `"Fri Feb 13 23:31:30 2009"` + ### Operator Application (`evaluator/operators.rs`) Applies comparison operations: @@ -462,7 +490,7 @@ assert_eq!(matches[0].message, "Pi constant detected"); - [x] Basic evaluation engine structure - [x] Offset resolution (absolute, relative, from-end) -- [x] Type reading with endianness support (Byte, Short, Long, Quad, Float, Double, String) +- [x] Type reading with endianness support (Byte, Short, Long, Quad, Float, Double, Date, QDate, String) - [x] Operator application (Equal, NotEqual, LessThan, GreaterThan, LessEqual, GreaterEqual, BitwiseAnd, BitwiseAndMask) - [x] Hierarchical rule processing with child evaluation - [x] Error handling with graceful degradation diff --git a/docs/src/magic-format.md b/docs/src/magic-format.md index 6c907754..41e1f782 100644 --- a/docs/src/magic-format.md +++ b/docs/src/magic-format.md @@ -177,6 +177,38 @@ Float comparison behavior: - **NaN**: `NaN != NaN`, comparisons with NaN always return false - **Infinity**: Positive and negative infinity are properly ordered +### Date/Timestamp Types + +| Type | Size | Endianness | UTC/Local | Description | +| ----------- | ------- | ------------- | --------- | ----------------------------------------------------------------------- | +| `date` | 4 bytes | native | UTC | 32-bit Unix timestamp (signed seconds since epoch), formatted as UTC | +| `ldate` | 4 bytes | native | Local | 32-bit Unix timestamp, formatted as local time | +| `bedate` | 4 bytes | big-endian | UTC | 32-bit Unix timestamp, big-endian byte order, UTC | +| `beldate` | 4 bytes | big-endian | Local | 32-bit Unix timestamp, big-endian byte order, local time | +| `ledate` | 4 bytes | little-endian | UTC | 32-bit Unix timestamp, little-endian byte order, UTC | +| `leldate` | 4 bytes | little-endian | Local | 32-bit Unix timestamp, little-endian byte order, local time | +| `qdate` | 8 bytes | native | UTC | 64-bit Unix timestamp (signed seconds since epoch), formatted as UTC | +| `qldate` | 8 bytes | native | Local | 64-bit Unix timestamp, formatted as local time | +| `beqdate` | 8 bytes | big-endian | UTC | 64-bit Unix timestamp, big-endian byte order, UTC | +| `beqldate` | 8 bytes | big-endian | Local | 64-bit Unix timestamp, big-endian byte order, local time | +| `leqdate` | 8 bytes | little-endian | UTC | 64-bit Unix timestamp, little-endian byte order, UTC | +| `leqldate` | 8 bytes | little-endian | Local | 64-bit Unix timestamp, little-endian byte order, local time | + +Timestamp values are formatted as strings matching GNU file output format: "Www Mmm DD HH:MM:SS YYYY" + +Examples: + +```text +# Match file modified at Unix epoch +0 date =0 File created at epoch + +# Check timestamp in file header (big-endian) +8 bedate >946684800 File created after 2000-01-01 + +# 64-bit timestamp (little-endian, local time) +16 leqldate x \b, timestamp %s +``` + ### String Type Match literal string data: @@ -502,6 +534,7 @@ Consider: - Indirect offsets (basic) - Byte, short, long, quad types (8-bit, 16-bit, 32-bit, 64-bit integers) - Float and double types (32-bit and 64-bit IEEE 754 floating-point) +- Date and qdate types (32-bit and 64-bit Unix timestamps) - String type - Comparison operators (equal, not-equal, less-than, greater-than, less-equal, greater-equal) - Bitwise AND operator @@ -511,7 +544,6 @@ Consider: ### Not Yet Supported - Regex patterns -- Date/time types - 128-bit integer types - Use/name directives - Default rules diff --git a/docs/src/parser.md b/docs/src/parser.md index 2d599141..92a8906f 100644 --- a/docs/src/parser.md +++ b/docs/src/parser.md @@ -184,6 +184,28 @@ Parsed literals are stored as `Value::Float(f64)` in the AST, regardless of whet **Note:** Float and double types do **not** have signed/unsigned variants. IEEE 754 handles sign internally via the sign bit, so all float types use a single `TypeKind` variant with only an `endian` field (no `signed: bool` field). +### Date and Timestamp Types + +The parser supports date and timestamp types for parsing Unix timestamps (signed seconds since epoch). There are 12 type keywords: + +**32-bit timestamps (Date):** +- `date` - Native endian, UTC +- `ldate` - Native endian, local time +- `bedate` - Big-endian, UTC +- `beldate` - Big-endian, local time +- `ledate` - Little-endian, UTC +- `leldate` - Little-endian, local time + +**64-bit timestamps (QDate):** +- `qdate` - Native endian, UTC +- `qldate` - Native endian, local time +- `beqdate` - Big-endian, UTC +- `beqldate` - Big-endian, local time +- `leqdate` - Little-endian, UTC +- `leqldate` - Little-endian, local time + +The parser creates `TypeKind::Date` or `TypeKind::QDate` variants with appropriate endianness and UTC flags. During evaluation, timestamps are formatted as strings in the format "Www Mmm DD HH:MM:SS YYYY" to match GNU file output. + ## Parser Design Principles ### Error Handling