feat(parser): implement float and double types with endian variants#162
Conversation
…pe system - Added support for `float` and `double` types, including their endian variants `befloat`/`lefloat` and `bedouble`/`ledouble`. - Removed the note about the absence of floating-point types in the type system section. - Updated the development phases to reflect the removal of floating-point types from the upcoming advanced types feature. This enhances the documentation to accurately represent the current capabilities of the magic file compatibility and type system. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…on and equality handling - Implemented reading and handling of `float` and `double` types, including endian support. - Enhanced comparison functions to support `Value::Float`, including partial comparisons and epsilon-aware equality checks. - Added comprehensive tests for float and double handling, ensuring correct behavior for edge cases like NaN and infinity. - Updated the evaluator's type reading API to include float and double types. This update improves the evaluator's capability to handle floating-point numbers, enhancing its overall functionality. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…on and parsing - Introduced `TypeKind::Float` and `TypeKind::Double` enums to represent 32-bit and 64-bit floating-point types, respectively, with endian support. - Implemented serialization for `Value::Float` and `Value::Double`, ensuring correct formatting in output. - Enhanced parsing capabilities to recognize floating-point literals, including support for scientific notation. - Added comprehensive unit tests for serialization, parsing, and value comparison of float and double types. This update expands the type system to include floating-point numbers, improving the library's functionality and usability. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Introduced new tests for evaluating float and double rules, covering equality, inequality, less than, and no match scenarios. - Enhanced the test suite for the evaluator to ensure correct handling of floating-point comparisons and edge cases. - Updated property tests to include float and double types in the generation strategies. This update strengthens the testing framework for floating-point types, ensuring robust evaluation behavior. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…ith-endian-variants
|
Caution Review failedFailed to post review comments Summary by CodeRabbit
WalkthroughAdds IEEE‑754 32-bit float and 64-bit double support with endianness variants across parser, AST, evaluator (readers, coercion, comparison, equality), output formatting, RuleMatch.type_kind, and tests. Changes
Sequence DiagramsequenceDiagram
participant Client as Rule/Tester
participant Parser as Parser
participant Evaluator as Evaluator
participant Reader as FloatReader
participant Operators as Operators
participant Output as Formatter
Client->>Parser: define rule with TypeKind::Float/Double
Parser->>Evaluator: emit MagicRule(TypeKind::Float/Double)
Evaluator->>Reader: read_typed_value(buffer, offset, Float/Double)
Reader->>Reader: validate window and deserialize (f32/f64, endian)
Reader->>Evaluator: return Value::Float(f64)
Evaluator->>Operators: compare_values(Value::Float, other, Operator)
Operators->>Operators: equality -> floats_equal (epsilon, NaN/infty rules)
Operators-->>Evaluator: boolean result
Evaluator->>Output: produce MatchResult(Value::Float, type_kind)
Output->>Client: formatted/serialized output (hex/JSON)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 CI must passWonderful, this rule succeeded.All CI checks must pass. Release-plz PRs are exempt because they only bump versions and changelogs (code was already tested on main), and GITHUB_TOKEN-triggered force-pushes suppress CI.
🟢 Do not merge outdated PRsWonderful, this rule succeeded.Make sure PRs are within 10 commits of the base branch before merging
|
🧪 CI InsightsHere's what we observed from your CI run for 36218d8. 🟢 All jobs passed!But CI Insights is watching 👀 |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
Implements IEEE-754 float/double support end-to-end (AST → parser → evaluator → output), including endian variants, to close issue #40 and expand magic-file compatibility.
Changes:
- Extended AST/type system with
TypeKind::Float/TypeKind::DoubleandValue::Float(f64). - Added parsing for float/double type keywords and float literals (with exponent support).
- Implemented float/double reads in the evaluator and updated operators/output/strength plus test coverage.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/property_tests.rs | Expands proptest generators to include Float/Double and float literal values. |
| tests/evaluator_tests.rs | Adds integration tests exercising float/double evaluation. |
| src/parser/types.rs | Recognizes float/double type keywords and maps them to TypeKind. |
| src/parser/grammar/tests.rs | Adds grammar tests for float literals and whitespace behavior. |
| src/parser/grammar/mod.rs | Implements float literal parsing and integrates it into parse_value. |
| src/parser/codegen.rs | Updates codegen serialization for Float/Double and Value::Float. |
| src/parser/ast.rs | Adds new TypeKind variants and Value::Float, adjusts derives/docs/tests. |
| src/output/mod.rs | Extends match-length logic to account for Value::Float. |
| src/output/json.rs | Adds hex formatting support for Value::Float. |
| src/evaluator/types/tests.rs | Adds dispatch/overrun/coercion tests for float/double type reads. |
| src/evaluator/types/mod.rs | Wires new float submodule into typed-value dispatch. |
| src/evaluator/types/float.rs | New module implementing endian-aware float/double reads with tests. |
| src/evaluator/strength.rs | Incorporates Float/Double into default strength heuristics. |
| src/evaluator/operators/equality.rs | Adds epsilon-aware float equality and NaN/inf handling plus tests. |
| src/evaluator/operators/comparison.rs | Enables float ordering via partial_cmp with NaN semantics plus tests. |
| src/build_helpers.rs | Adds tests for codegen serialization of float/double types/values. |
| AGENTS.md | Updates project documentation to reflect new supported types. |
| /// Machine-epsilon threshold used when comparing `Value::Float` operands for | ||
| /// equality. Two floats are considered equal when `|a - b| <= FLOAT_EPSILON`. | ||
| /// Special values (NaN, infinity) are handled explicitly before the epsilon | ||
| /// check. | ||
| const FLOAT_EPSILON: f64 = f64::EPSILON; | ||
|
|
||
| /// Return `true` when two `f64` values are considered equal under | ||
| /// epsilon-aware semantics. | ||
| /// | ||
| /// * **NaN**: NaN is never equal to anything (including itself). | ||
| /// * **Infinity**: positive/negative infinity are only equal to the same sign | ||
| /// of infinity. | ||
| /// * **Finite**: `|a - b| <= FLOAT_EPSILON`. | ||
| fn floats_equal(a: f64, b: f64) -> bool { | ||
| if a.is_nan() || b.is_nan() { | ||
| return false; | ||
| } | ||
| // Infinities: equal only when same sign (inf - inf = NaN, so must check first). | ||
| // Exact comparison is correct here -- infinities have precise IEEE 754 bit patterns. | ||
| if a.is_infinite() || b.is_infinite() { | ||
| #[allow(clippy::float_cmp)] | ||
| return a == b; | ||
| } | ||
| (a - b).abs() <= FLOAT_EPSILON | ||
| } |
There was a problem hiding this comment.
Float equality uses an absolute tolerance of f64::EPSILON, but TypeKind::Float values are read as f32 then widened to f64. For most decimal literals (e.g. 0.1), the widened f32 value will differ from the parsed f64 literal by far more than f64::EPSILON, so = comparisons will almost never match except for exactly representable values. Consider either rounding/coercing expected Value::Float to f32 when the rule type is Float (then compare exactly), or switching to a relative/ULP-based tolerance appropriate for f32-derived values.
| TypeKind::Float { endian } => read_float(buffer, offset, *endian), | ||
| TypeKind::Double { endian } => read_double(buffer, offset, *endian), |
There was a problem hiding this comment.
coerce_value_to_type is called before comparisons, but it currently doesn’t adjust Value::Float based on TypeKind::Float vs TypeKind::Double. Since read_float widens an f32 to f64, expected literals parsed as f64 often won’t match without type-aware rounding. Consider extending coercion so TypeKind::Float rounds/clamps the expected value to f32 (and back to f64) to align comparisons with the on-disk representation.
| let value: f64 = float_str | ||
| .parse() | ||
| .map_err(|_| nom::Err::Error(NomError::new(input, nom::error::ErrorKind::MapRes)))?; | ||
|
|
There was a problem hiding this comment.
parse_float_value accepts any f64 that str::parse() returns, including overflow to inf for inputs like 1e309. Non-finite floats can later break JSON serialization (serde_json rejects NaN/±inf) and also make rule codegen produce invalid Rust literals. Consider rejecting non-finite results here (!value.is_finite()) and returning a parse error instead.
| // Reject non-finite floats (NaN, +inf, -inf) to keep AST, JSON, and codegen valid | |
| if !value.is_finite() { | |
| return Err(nom::Err::Error(NomError::new( | |
| input, | |
| nom::error::ErrorKind::Float, | |
| ))); | |
| } |
| pub fn serialize_value(value: &Value) -> String { | ||
| match value { | ||
| Value::Uint(number) => format!("Value::Uint({})", format_number(*number)), | ||
| Value::Int(number) => format!("Value::Int({})", format_signed_number(*number)), | ||
| Value::Float(f) => format!("Value::Float({f:?})"), | ||
| Value::Bytes(bytes) => format!("Value::Bytes({})", format_byte_vec(bytes)), |
There was a problem hiding this comment.
serialize_value formats floats with {f:?}. For non-finite values this yields strings like inf/NaN, which are not valid Rust expressions, and will make generated code fail to compile if such a value is ever produced (e.g. exponent overflow during parsing). Consider serializing floats with a helper that emits f64::NAN / f64::INFINITY / f64::NEG_INFINITY (or f64::from_bits(..)), and uses a Rust-compatible literal format for finite values.
| //! Evaluator integration tests | ||
| //! | ||
| //! Tests for confidence calculation, rule ordering, and evaluation behavior | ||
| //! through the public `MagicDatabase` API. | ||
|
|
||
| use libmagic_rs::{EvaluationConfig, MagicDatabase}; | ||
| use libmagic_rs::evaluator::evaluate_rules; | ||
| use libmagic_rs::{ | ||
| Endianness, EvaluationConfig, EvaluationContext, MagicDatabase, MagicRule, OffsetSpec, | ||
| Operator, TypeKind, Value, | ||
| }; |
There was a problem hiding this comment.
File header says these tests run “through the public MagicDatabase API”, but the new float/double tests import and call evaluator::evaluate_rules directly. Either update the module-level doc comment to match the current approach, or refactor the new tests to exercise the public MagicDatabase API for consistency with the rest of the file.
|
Related Documentation 10 document(s) may need updating based on files changed in this PR: libMagic-rs API_REFERENCE
|
- Add `type_kind` field to `RuleMatch` so output formatting can report accurate on-disk byte lengths (fixes hardcoded 8 for all floats) - Add f32 coercion in `coerce_value_to_type` for `TypeKind::Float` so parsed f64 literals round to f32 precision before comparison - Reject non-finite floats in `parse_float_value` to prevent invalid JSON serialization and codegen output - Handle NaN/infinity in `serialize_value` with valid Rust expressions (`f64::NAN`, `f64::INFINITY`, `f64::NEG_INFINITY`) - Update evaluator test module doc comment to reflect mixed API usage Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
| let value: f64 = float_str | ||
| .parse() | ||
| .map_err(|_| nom::Err::Error(NomError::new(input, nom::error::ErrorKind::MapRes)))?; | ||
|
|
||
| // Reject non-finite floats (NaN, +inf, -inf) to keep AST, JSON, and codegen valid | ||
| if !value.is_finite() { | ||
| return Err(nom::Err::Error(NomError::new( | ||
| input, |
There was a problem hiding this comment.
parse_float_value rejects non-finite floats by returning nom::Err::Error, which allows parse_value's alt(...) to backtrack and then parse the same input as an integer prefix (e.g. "1.0e309" becomes Value::Uint(1) with the remainder treated as message). If non-finite float literals are meant to be rejected, return a non-backtracking error (nom::Err::Failure / cut) once the float pattern is recognized, or tighten the numeric parser to require a valid token boundary so it can’t accept a float prefix.
| let value: f64 = float_str | |
| .parse() | |
| .map_err(|_| nom::Err::Error(NomError::new(input, nom::error::ErrorKind::MapRes)))?; | |
| // Reject non-finite floats (NaN, +inf, -inf) to keep AST, JSON, and codegen valid | |
| if !value.is_finite() { | |
| return Err(nom::Err::Error(NomError::new( | |
| input, | |
| // At this point, we've definitively recognized a float token; any failure to | |
| // convert it to a finite f64 should be a non-backtracking error so that | |
| // higher-level parsers cannot reinterpret this input as another numeric type. | |
| let value: f64 = match float_str.parse() { | |
| Ok(v) => v, | |
| Err(_) => { | |
| return Err(nom::Err::Failure(NomError::new( | |
| float_str, | |
| nom::error::ErrorKind::Float, | |
| ))) | |
| } | |
| }; | |
| // Reject non-finite floats (NaN, +inf, -inf) to keep AST, JSON, and codegen valid | |
| if !value.is_finite() { | |
| return Err(nom::Err::Failure(NomError::new( | |
| float_str, |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Replace 3.14 literals with 3.125 (exactly representable in binary, not a PI approximation) and add #[allow(clippy::float_cmp)] for intentional exact float comparisons in native endian test match guards. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
| #[derive(Debug, Clone, PartialEq, Serialize, Deserialize)] | ||
| pub enum Value { | ||
| /// Unsigned integer value | ||
| Uint(u64), | ||
| /// Signed integer value | ||
| Int(i64), | ||
| /// Floating-point value (used for `float` and `double` types) | ||
| /// | ||
| /// # Examples | ||
| /// | ||
| /// ``` | ||
| /// use libmagic_rs::parser::ast::Value; | ||
| /// | ||
| /// let val = Value::Float(3.14); | ||
| /// assert_eq!(val, Value::Float(3.14)); | ||
| /// ``` | ||
| Float(f64), |
There was a problem hiding this comment.
Now that Value can contain Float(f64), evaluation can produce NaN/±inf values from file bytes. Since Value (and MatchResult/EvaluationResult) derives Serialize, attempting to serde_json::to_string these results will error because JSON does not support non-finite floats. Consider custom-serializing Value::Float (e.g., as a tagged string/hex bytes when non-finite) or preventing non-finite floats from reaching serializable output types.
| Value::Float(f) => { | ||
| // Convert to little-endian bytes for consistency | ||
| let bytes = f.to_le_bytes(); | ||
| let mut result = String::with_capacity(16); // 8 bytes * 2 chars per byte | ||
| for &b in &bytes { | ||
| write!(&mut result, "{b:02x}").expect("Writing to String should never fail"); | ||
| } | ||
| result |
There was a problem hiding this comment.
format_value_as_hex encodes Value::Float using f64::to_le_bytes() (always 8 bytes). More generally, numeric variants (Uint/Int/Float) are serialized without considering the on-disk width of the matched type, so JSON value can disagree with MatchResult.length (now derived from type_kind.bit_width()). Consider passing MatchResult.length/TypeKind into this helper so it emits 1/2/4/8-byte little-endian hex consistently (Float=4 bytes, Double=8 bytes).
| Value::Float(f) => { | ||
| // Convert to little-endian bytes for consistency | ||
| let bytes = f.to_le_bytes(); | ||
| let mut result = String::with_capacity(16); // 8 bytes * 2 chars per byte | ||
| for &b in &bytes { | ||
| write!(&mut result, "{b:02x}").expect("Writing to String should never fail"); | ||
| } | ||
| result |
There was a problem hiding this comment.
There are unit tests for format_value_as_hex covering Bytes/String/Uint/Int, but none covering the new Value::Float branch. Adding tests for both 32-bit float (4-byte) and 64-bit double (8-byte) representations (and optionally NaN/±inf) would prevent regressions here.
| // Round f64 expected value to f32 precision for TypeKind::Float so that | ||
| // parsed f64 literals compare correctly against f32-widened file values. | ||
| #[allow(clippy::cast_possible_truncation)] | ||
| (Value::Float(v), TypeKind::Float { .. }) => Value::Float(f64::from(*v as f32)), |
There was a problem hiding this comment.
coerce_value_to_type converts Value::Float to f32 precision via *v as f32. For large finite literals (e.g. 1e40) this cast overflows to inf, introducing a non-finite expected value even though the parser rejects non-finite float literals. Consider detecting overflow (e.g., cast then check is_finite()) and returning an evaluation error or otherwise handling out-of-range float literals explicitly.
| (Value::Float(v), TypeKind::Float { .. }) => Value::Float(f64::from(*v as f32)), | |
| (Value::Float(v), TypeKind::Float { .. }) => { | |
| let narrowed = *v as f32; | |
| if !narrowed.is_finite() { | |
| // If narrowing overflows to a non-finite f32 (e.g., +/-inf, NaN), | |
| // avoid introducing a non-finite expected value and keep the | |
| // original finite literal instead. | |
| Value::Float(*v) | |
| } else { | |
| Value::Float(f64::from(narrowed)) | |
| } | |
| } |
## 🤖 New release
* `libmagic-rs`: 0.4.3 -> 0.5.0 (⚠ API breaking changes)
### ⚠ `libmagic-rs` breaking changes
```text
--- failure constructible_struct_adds_field: externally-constructible struct adds field ---
Description:
A pub struct constructible with a struct literal has a new pub field. Existing struct literals must be updated to include the new field.
ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/constructible_struct_adds_field.ron
Failed in:
field RuleMatch.type_kind in /tmp/.tmpUBXECi/libmagic-rs/src/evaluator/mod.rs:220
field RuleMatch.type_kind in /tmp/.tmpUBXECi/libmagic-rs/src/evaluator/mod.rs:220
--- failure derive_trait_impl_removed: built-in derived trait no longer implemented ---
Description:
A public type has stopped deriving one or more traits. This can break downstream code that depends on those types implementing those traits.
ref: https://doc.rust-lang.org/reference/attributes/derive.html#derive
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/derive_trait_impl_removed.ron
Failed in:
type Value no longer derives Eq, in /tmp/.tmpUBXECi/libmagic-rs/src/parser/ast.rs:309
type Value no longer derives Eq, in /tmp/.tmpUBXECi/libmagic-rs/src/parser/ast.rs:309
type Value no longer derives Eq, in /tmp/.tmpUBXECi/libmagic-rs/src/parser/ast.rs:309
--- failure enum_no_repr_variant_discriminant_changed: enum variant had its discriminant change value ---
Description:
The enum's variant had its discriminant value change. This breaks downstream code that used its value via a numeric cast like `as isize`.
ref: https://doc.rust-lang.org/reference/items/enumerations.html#assigning-discriminant-values
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/enum_no_repr_variant_discriminant_changed.ron
Failed in:
variant TypeKind::String 4 -> 6 in /tmp/.tmpUBXECi/libmagic-rs/src/parser/ast.rs:147
variant TypeKind::String 4 -> 6 in /tmp/.tmpUBXECi/libmagic-rs/src/parser/ast.rs:147
variant TypeKind::String 4 -> 6 in /tmp/.tmpUBXECi/libmagic-rs/src/parser/ast.rs:147
--- failure enum_variant_added: enum variant added on exhaustive enum ---
Description:
A publicly-visible enum without #[non_exhaustive] has a new variant.
ref: https://doc.rust-lang.org/cargo/reference/semver.html#enum-variant-new
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.46.0/src/lints/enum_variant_added.ron
Failed in:
variant TypeKind:Float in /tmp/.tmpUBXECi/libmagic-rs/src/parser/ast.rs:128
variant TypeKind:Double in /tmp/.tmpUBXECi/libmagic-rs/src/parser/ast.rs:142
variant TypeKind:Float in /tmp/.tmpUBXECi/libmagic-rs/src/parser/ast.rs:128
variant TypeKind:Double in /tmp/.tmpUBXECi/libmagic-rs/src/parser/ast.rs:142
variant TypeKind:Float in /tmp/.tmpUBXECi/libmagic-rs/src/parser/ast.rs:128
variant TypeKind:Double in /tmp/.tmpUBXECi/libmagic-rs/src/parser/ast.rs:142
variant Value:Float in /tmp/.tmpUBXECi/libmagic-rs/src/parser/ast.rs:324
variant Value:Float in /tmp/.tmpUBXECi/libmagic-rs/src/parser/ast.rs:324
variant Value:Float in /tmp/.tmpUBXECi/libmagic-rs/src/parser/ast.rs:324
```
<details><summary><i><b>Changelog</b></i></summary><p>
<blockquote>
## [0.5.0] - 2026-03-07
### Features
- **parser**: Implement float and double types with endian variants
([#162](#162))
<!-- generated by git-cliff -->
</blockquote>
</p></details>
---
This PR was generated with
[release-plz](https://github.com/release-plz/release-plz/).
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Summary
Implements issue #40: floating-point type specifiers (
float,double,befloat,bedouble,lefloat,ledouble) across the full parser-evaluator pipeline.TypeKind::FloatandTypeKind::Doublevariants withendian: Endianness(nosignedfield -- IEEE 754 handles sign internally). AddedValue::Float(f64)variant;Valueno longer derivesEq.parse_type_keyword/type_keyword_to_kind. Newparse_float_valuegrammar function with mandatory decimal point to distinguish from integers.read_float(4 bytes, widened to f64) andread_double(8 bytes) in newevaluator/types/float.rssubmodule. Epsilon-aware equality (|a - b| <= f64::EPSILON) with explicit NaN/infinity handling. Ordering operators usepartial_cmpdirectly.Float/Double/Value::Float.Test plan
cargo test), zero clippy warningsfloat.rscovers all endianness variants, buffer overrun, offset overflowequality.rs)None(comparison.rs)evaluate_rules(evaluator_tests.rs)grammar/tests.rs)arb_type_kindincludes Float/Double variantsCloses #40
Generated with Claude Code