Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .bun-version
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.3.10
1.3.11
Comment thread
unclesp1d3r marked this conversation as resolved.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,7 @@ docs/plans/
.agents/
.augment/
.claude/
.context/
.cursor/
.roo/
.full-review/
Expand Down
9 changes: 5 additions & 4 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,8 +100,8 @@ evaluator/
├── offset/ // Offset resolution submodule
│ ├── mod.rs // Dispatcher (resolve_offset) and re-exports
│ ├── absolute.rs // OffsetError, resolve_absolute_offset
│ ├── indirect.rs // resolve_indirect_offset stub (issue #37)
│ └── relative.rs // resolve_relative_offset stub (issue #38)
│ ├── indirect.rs // resolve_indirect_offset (fully implemented, issue #37)
│ └── relative.rs // resolve_relative_offset (GNU `file` anchor semantics)
└── operators/ // Operator application submodule
├── mod.rs // Dispatcher (apply_operator, apply_any_value) and re-exports
├── equality.rs // apply_equal, apply_not_equal
Expand Down Expand Up @@ -204,7 +204,7 @@ cargo test --doc # Test documentation examples

### Currently Implemented (v0.1.0)

- **Offsets**: Absolute, from-end, and indirect specifications (relative offsets are parsed but not yet evaluated)
- **Offsets**: Absolute, from-end, indirect, and relative specifications (relative offsets `&+N`/`&-N` are evaluated using GNU `file` semantics -- the previous-match anchor)
- **Types**: `byte`, `short`, `long`, `quad`, `float`, `double`, `string`, `pstring` with endianness support; unsigned variants `ubyte`, `ushort`/`ubeshort`/`uleshort`, `ulong`/`ubelong`/`ulelong`, `uquad`/`ubequad`/`ulequad`; float/double endian variants `befloat`/`lefloat`, `bedouble`/`ledouble`; 32-bit date/timestamp types `date`/`ldate`/`bedate`/`beldate`/`ledate`/`leldate`; 64-bit date/timestamp types `qdate`/`qldate`/`beqdate`/`beqldate`/`leqdate`/`leqldate`; `pstring` is a Pascal string (length-prefixed) with support for 1/2/4-byte length prefixes via `/B`, `/H` (2-byte BE), `/h` (2-byte LE), `/L` (4-byte BE), `/l` (4-byte LE) suffixes, and the `/J` flag (stored length includes prefix width, JPEG convention) which is combinable with width suffixes (e.g., `pstring/HJ`); date values formatted as "Www Mmm DD HH:MM:SS YYYY" matching GNU `file` output; types are signed by default (libmagic-compatible)
- **Operators**: `=` (equal), `!=` (not equal), `<` (less than), `>` (greater than), `<=` (less equal), `>=` (greater equal), `&` (bitwise AND with optional mask), `^` (bitwise XOR), `~` (bitwise NOT), `x` (any value)
- **Nested Rules**: Hierarchical rule evaluation with proper indentation
Expand Down Expand Up @@ -246,7 +246,7 @@ impl BinaryRegex for regex::bytes::Regex {
### Offset Specifications

- Indirect offsets are fully implemented (parsing + evaluation) with specifiers: `.b/.B` (byte), `.s/.S` (short), `.l/.L` (long), `.q/.Q` (quad); lowercase = little-endian, uppercase = big-endian (GNU `file` semantics); pointer types signed by default; adjustment after closing paren: `(base.type)+adj`
- Relative offsets are parsed into the AST but evaluation is not yet implemented (#38)
- Relative offsets are fully evaluated against the GNU `file` previous-match anchor: the engine tracks `EvaluationContext::last_match_end()`, advancing it after each successful match by the bytes consumed (variable-width types include c-string NUL terminators and pstring length prefixes). Top-level relative offsets resolve from anchor 0. Magic-file `&+N`/`&-N` *parsing* is still TODO -- relative offsets are exercised programmatically through the AST.

### Magic File Syntax

Expand Down Expand Up @@ -535,6 +535,7 @@ This guide ensures consistent, high-quality development practices for the libmag
- In justfile recipes, never wrap `just` in `{{ mise_exec }}` -- it's redundant
- Changelog: `just changelog`, `just changelog-version <tag>`, `just changelog-unreleased`
- Security contact: <support@evilbitlabs.io> (matches PGP key in SECURITY.md)
- `docs/solutions/` — documented solutions to past problems, organized by category (logic-errors/, integration-issues/, security-issues/, developer-experience/) with YAML frontmatter (`tags`, `severity`, `components`). Relevant when implementing or debugging in documented areas.

## Open Source Quality Standards (OSSF Best Practices)

Expand Down
8 changes: 7 additions & 1 deletion GOTCHAS.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Serialization functions live in `src/parser/codegen.rs`, shared by both `build.r

### 2.1 `TypeKind` Exhaustive Matches

Adding a variant to `TypeKind` requires updating exhaustive matches in 10+ files: `ast`, `grammar`, `types`, `codegen`, `strength`, `property_tests`, `evaluator/types/mod.rs` (`read_typed_value`, `coerce_value_to_type`), `output/mod.rs` (2 length matches), `output/json.rs` (`format_value_as_hex`), and `grammar/tests.rs` (stale assertions). Note: `coerce_value_to_type` and output matches use catch-all `_ =>` so they compile without changes but may need semantic updates.
Adding a variant to `TypeKind` requires updating exhaustive matches in 10+ files: `ast`, `grammar`, `types`, `codegen`, `strength`, `property_tests`, `evaluator/types/mod.rs` (`read_typed_value`, `coerce_value_to_type`, **`bytes_consumed`** -- variable-width variants must be matched explicitly or relative-offset anchors will silently corrupt), `output/mod.rs` (2 length matches), `output/json.rs` (`format_value_as_hex`), and `grammar/tests.rs` (stale assertions). Note: `coerce_value_to_type`, output matches, and `bytes_consumed` use catch-all `_ =>` so they compile without changes but may need semantic updates -- `bytes_consumed` will fire a `debug_assert` in test/dev builds for unhandled variable-width variants.

### 2.2 `Operator` Exhaustive Matches

Expand Down Expand Up @@ -71,6 +71,12 @@ The nom `tuple` combinator is deprecated. Use bare tuple syntax `(a, b, c)` dire

Lowercase pointer specifiers (`.s`, `.l`, `.q`) map to **little-endian**, not native endian. Uppercase (`.S`, `.L`, `.Q`) map to big-endian. All numeric pointer types are **signed by default** (per S6.3). The adjustment is parsed **after** the closing paren: `(base.type)+adj`, not `(base.type+adj)`.

### 3.8 Relative Offsets: Global Anchor, No Save/Restore

`OffsetSpec::Relative(N)` resolves against `EvaluationContext::last_match_end()`, which is updated after every successful match in `evaluate_rules` and is **never saved/restored across child recursion**. This is intentional and matches GNU `file`: a sibling rule sees the anchor wherever the deepest descendant of the previous sibling left it. The anchor is global/shared rather than stack-scoped, but its numeric value is not guaranteed to be non-decreasing -- a successful `Relative(-N)` rule (or any later rule that matches at a lower absolute position) can move it earlier. Do not wrap recursion in a save/restore pair "for safety" -- it would silently break sibling-after-nested chains. The recursion-depth pattern in the same loop *is* save/restore, and the asymmetry is correct.

The load-bearing invariant is that the anchor is updated *before recursing into children* (so children and their followers see the new anchor). The current code also happens to set the anchor before `matches.push(...)`, but the push-ordering relative to `set_last_match_end` is incidental for anchor correctness -- only the ordering before the `evaluate_rules` recursion call matters. (Future code that reads the anchor while iterating `matches` would make this ordering load-bearing, so do not "optimize" the order without checking call sites first.) `bytes_consumed()` (in `evaluator/types/mod.rs`) is the source of truth for advance distance; for variable-width types it re-derives consumption from the buffer rather than trusting `Value::String.len()` (which can drift from the original byte length via `from_utf8_lossy`). Pascal-string consumption is also clamped against the remaining buffer to prevent attacker-controlled length prefixes from poisoning the anchor to `usize::MAX`.

## 4. Module Visibility & Re-exports

### 4.1 Private Engine Module
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
---
title: Rust Test Visibility Boundary — tests/ vs src/.../tests.rs
category: developer-experience
date: 2026-04-07
tags: [rust, testing, pub-crate, integration-tests, unit-tests, visibility]
issue: '#38'
pr: '#211'
severity: low
components: [testing]
---

# Rust Test Visibility Boundary — `tests/` vs `src/.../tests.rs`

## Context

When adding tests that need to exercise `pub(crate)` items directly — for example, injecting internal state via a crate-private setter to verify a graceful-skip contract — the test file location matters. The two test locations in a Rust crate have different visibility semantics, and this is not always obvious until you hit a compile error.

Encountered during PR #211 (relative offset evaluation) when adding a test that needed `EvaluationContext::set_last_match_end()` — a `pub(crate)` setter — to inject a near-saturation anchor value (`usize::MAX`) and verify that subsequent `OffsetSpec::Relative` rules skip gracefully without panicking. The initial instinct was to put the test in the existing integration test file at `tests/relative_offset_evaluation.rs`, but that failed to compile because `tests/` compiles as an external crate.

## Guidance

**Tests that need `pub(crate)` items must live in `src/.../tests.rs`, not in `tests/`.**

The two test locations in a Rust crate are:

| Location | Compiles as | Can access `pub(crate)` items? | Can access `pub` items? |
| -------------------------------------------------------------------------- | -------------- | ------------------------------------ | ----------------------- |
| `tests/foo.rs` (integration tests) | External crate | **No** — fails with E0603/E0624 | Yes |
| `src/foo/tests.rs` (module-adjacent unit tests, gated with `#[cfg(test)]`) | Same crate | **Yes** | Yes |
| Doctests in rustdoc | External crate | **No** — same constraint as `tests/` | Yes |

This matches the project convention already used in `libmagic-rs`:

- `src/evaluator/tests.rs`, `src/evaluator/engine/tests.rs`, `src/evaluator/types/tests.rs` — unit tests in `#[cfg(test)] mod tests` peer files, full access to crate-private APIs
- `tests/evaluator_tests.rs`, `tests/relative_offset_evaluation.rs` — integration tests that only exercise the public API through `libmagic_rs::...` paths

## Why This Matters

Mixing the two up produces compile errors rather than runtime failures, so you find out quickly — but the error message (`error[E0603]: function 'set_last_match_end' is private`) doesn't immediately suggest moving the test to a different file. The instinct is often to widen visibility (change `pub(crate)` to `pub`), which defeats the point of the crate boundary and bloats the public API surface that gets locked in at semver time.

Picking the right location is also the right answer for **doctests**: rustdoc examples compile as external crates, so `/// use crate::...` never works in doctests on published items. Use `/// use libmagic_rs::...` (or the actual crate name) in doctests.

## When to Apply

- Writing a unit test that needs to inject internal state, call a `pub(crate)` constructor, or inspect a private field → put it in `src/.../tests.rs`
- Writing an integration test that exercises the crate's public API through the same entry points an external consumer would use → put it in `tests/`
- Writing a rustdoc example for a `pub` function → use the full external path (e.g., `libmagic_rs::evaluator::evaluate_rules`), not `crate::`
- Tempted to widen `pub(crate)` to `pub` just to write a test → **stop**, move the test to `src/.../tests.rs` instead

## Examples

**Wrong location — compile error:**

```rust
// tests/my_integration_test.rs
use libmagic_rs::evaluator::EvaluationContext;
use libmagic_rs::EvaluationConfig;

#[test]
fn test_anchor_injection() {
let mut ctx = EvaluationContext::new(EvaluationConfig::default());
ctx.set_last_match_end(usize::MAX); // error[E0624]: method is private
// ...
}
```

**Right location — compiles fine:**

```rust
// src/evaluator/engine/tests.rs
use super::*;
use crate::evaluator::EvaluationContext;
use crate::EvaluationConfig;

#[test]
fn test_evaluate_rules_anchor_near_saturation_skips_relative_child_gracefully() {
let mut ctx = EvaluationContext::new(EvaluationConfig::default());
ctx.set_last_match_end(usize::MAX); // Fine — same crate
// ... assert evaluate_rules skips Relative rules gracefully
}
```

The second form compiled on the first try and now pins a safety contract that would be impossible to test from the integration layer.

## Related

- PR #211 — `test_evaluate_rules_anchor_near_saturation_skips_relative_child_gracefully` is the concrete case that prompted this learning
- `docs/solutions/security-issues/pstring-anchor-poisoning.md` — the security fix this test regression-guards
- Project GOTCHAS.md §7.1 — "Doctest Import Paths" (use `libmagic_rs::` not `crate::`) — same root cause, different surface
- AGENTS.md §4.1 — `evaluator::engine` is private; integration tests must import the re-exported `libmagic_rs::evaluator::evaluate_rules`, not `evaluator::engine::evaluate_rules` (related but distinct: module visibility, not test file visibility)
Loading
Loading