feat: strength calculation & documentation improvements (#21) by unclesp1d3r · Pull Request #30 · EvilBit-Labs/libmagic-rs

unclesp1d3r · 2026-02-06T01:41:51Z

Summary

Implement strength calculation system for magic rules based on libmagic's apprentice_magic_strength algorithm
Add comprehensive documentation with mdbook integration
Update AGENTS.md with PR review learnings

Strength Calculation (Issue #21)

Adds a complete strength calculation system to prioritize more specific magic rules during evaluation:

StrengthModifier enum: Add, Subtract, Multiply, Divide, Set operations
!:strength directive parsing: Parse strength modifiers from magic files
Default strength calculation: Based on type specificity, operator type, offset reliability, and value length
Rule sorting: sort_rules_by_strength() for evaluation ordering
Overflow protection: Safe arithmetic with clamping to [0, 255]
35 unit tests covering all strength calculation scenarios

Documentation Improvements

Complete API reference (MagicDatabase, EvaluationConfig, AST types, error handling)
Full CLI documentation (options, exit codes, magic file discovery)
Comprehensive magic file format guide (offsets, types, operators, nested rules)
Mermaid architecture diagrams
Standalone quick reference documents

Files Changed

src/evaluator/strength.rs (new) - Strength calculation module
src/parser/grammar.rs - !:strength directive parsing
src/parser/ast.rs - StrengthModifier enum
build.rs, src/build_helpers.rs - Build script support
docs/ - Comprehensive documentation updates

Test plan

🤖 Generated with Claude Code

Populate .serena/project.yml with project-specific metadata: set project_name to "libmagic-rs" and add placeholders for included_optional_tools, base_modes, default_modes, and fixed_tools (all initialized empty). Comments explain semantics (base_modes/default_modes override global settings, fixed_tools replaces Serena's default toolset and cannot be combined with excluded_tools/included_optional_tools). No functional defaults are enabled — this adds configuration hooks for future project-specific mode/tool customization.

- Add safe string operations guidance (use strip_prefix over slicing) - Add doc test verification reminder - Add case-insensitive matching pattern section Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Update api-reference.md with complete API docs (MagicDatabase, EvaluationConfig, EvaluationResult, AST types, error types) - Update cli-reference.md with full CLI documentation (options, exit codes, magic file discovery, troubleshooting) - Update magic-format.md with comprehensive magic file format guide (offsets, types, operators, nested rules, examples) - Add standalone quick reference documents in docs/ root - Add Mermaid architecture diagrams (architecture, evaluation-flow, error-handling, module-structure) - Update docs/README.md to reference mdbook as primary documentation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add strength calculation system based on libmagic's apprentice_magic_strength algorithm. Strength values are used to prioritize more specific rules during evaluation. Key additions: - StrengthModifier enum (Add, Subtract, Multiply, Divide, Set) in ast.rs - Parse `!:strength` directives in grammar.rs - New evaluator/strength.rs module with: - calculate_default_strength() based on type, operator, offset, value length - apply_strength_modifier() with overflow protection - sort_rules_by_strength() for rule ordering - Build script support for serializing strength modifiers - 35 unit tests for strength calculation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

coderabbitai · 2026-02-06T01:42:10Z

Caution

Review failed

Failed to post review comments

Summary by CodeRabbit

New Features
- Strength modifier support to influence rule weighting during evaluations.
- Extended per-project configuration: project_name, included_optional_tools, base_modes, default_modes, fixed_tools.
Documentation
- Major docs added and reorganized: comprehensive API reference, architecture guide, CLI reference, getting started, magic format, README, and multiple diagrams.

Walkthrough

Adds StrengthModifier (AST enum) and a new MagicRule field; parser support for !:strength directives; evaluator strength calculation and rule-sorting; build-time codegen/serialization for the new field; public re-exports; test/CI synchronization; new project config fields; extensive documentation additions.

Changes

Cohort / File(s)	Summary
Configuration & Project `\.serena/project.yml`	Added public project config fields: `project_name`, `included_optional_tools`, `base_modes`, `default_modes`, `fixed_tools`.
AST Types & Re-exports `src/parser/ast.rs`, `src/lib.rs`, `src/parser/mod.rs`	Add `StrengthModifier` enum and `strength_modifier: Option<StrengthModifier>` to `MagicRule`; re-export `StrengthModifier` via `parser` and crate root.
Parser Grammar & Preprocessing `src/parser/grammar.rs`, `src/parser/mod.rs`	Add `parse_strength_directive()` and `is_strength_directive()`; preprocessor recognizes `!:strength` directives, attaches modifier to `LineInfo`, and applies it to the next rule (modifier consumed; not inherited by children).
Evaluator: Strength Feature `src/evaluator/mod.rs`, `src/evaluator/strength.rs`	New `strength` module with constants, default-strength calculation, `apply_strength_modifier`, `calculate_rule_strength`, sorting helpers, and unit tests; evaluator integrates strength computation and sorts rules.
Build / Codegen `build.rs`, `src/build_helpers.rs`	Codegen imports/exports `StrengthModifier`; serialize `strength_modifier` via new helper; emit `allow(unused_imports)` for optional usage; build script rerun directive added; generated `BUILTIN_RULES` include `strength_modifier`.
Parser Tests & Initialization `src/parser/`, `src/evaluator/`	Updated tests and fixtures to initialize `strength_modifier` (commonly `None`); added parsing and behavior tests for strength directives and all modifier variants.
Docs / CLI / Guides `docs/`, `docs/src/`, `docs/diagrams/*`, `AGENTS.md`	Large documentation additions and rewrites (API reference, CLI reference, Getting Started, Architecture, Magic Format, diagrams); `AGENTS.md` adds Rust string handling and doc-test guidance.
CI & Test Helpers `.github/workflows/ci.yml`, `src/main.rs`	CI coverage step adds `--test-threads=1`; process-wide `FD_MUTEX` added to serialize FD operations in test helpers and skip certain stdin tests under LLVM profiling.
Build Artifacts / Generated Rules `BUILTIN_RULES` (generated code), `build_helpers` outputs	Generated MagicRule instances now include `strength_modifier` initialization and serialization in built-in rules output.

Sequence Diagram(s)

sequenceDiagram
    participant MF as Magic File
    participant P as Parser
    participant AST as AST
    participant E as Evaluator
    participant S as Strength Module

    MF->>P: read line "!:strength <op><n>"
    P->>P: is_strength_directive? → parse_strength_directive
    P-->>P: store pending_strength in LineInfo
    MF->>P: read next rule line
    P->>AST: construct MagicRule(strength_modifier=Some(...))
    AST-->>E: emit MagicRule(s)
    E->>S: calculate_rule_strength(rule)
    S->>S: compute default strength → apply modifier
    S-->>E: return numeric strength
    E->>S: sort_rules_by_strength(rules)
    S-->>E: rules sorted by computed strength

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat: built-in rules build time compilation fallback #28 — Adds StrengthModifier and related build_helpers/build.rs codegen for serializing strength modifiers.
12 add magic file format detection subtask 143 #25 — Implements similar AST, parsing, and strength application features affecting parser/evaluator surfaces.
feat: Implement text magic parser (issue #11) #16 — Related parser preprocessing and hierarchy construction changes that intersect strength directive handling.

Poem

🐇 I found a tiny directive bright,
I nudge each rule to tip the fight,
A plus, a set, a gentle shove,
I sort the bytes with carrot love,
Hop, hop — magic hums beneath moonlight.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title 'feat: strength calculation & documentation improvements (`#21`)' accurately reflects the main changes: a strength calculation system for magic rules and comprehensive documentation enhancements.
Description check	✅ Passed	The PR description is well-related to the changeset, detailing the strength calculation system implementation, documentation improvements, files changed, and test plan results.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch 21-strength-calculation-libmagic-algorithm-strength-parsing

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

- Fix is_image function syntax error in GETTING_STARTED.md - Add strength_modifier field and StrengthModifier type to API docs - Add strength.rs to module listings and architecture diagrams - Move "Strength modifiers" from unsupported to recently added - Fix GIF version labels (7a->87a, 9a->89a) - Fix JSON output example to match actual MatchResult structure - Replace sudo recommendation with chmod in CLI troubleshooting - Fix arrow directions in module-structure diagram - Add rustdoc example to into_sorted_by_strength function Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The stdin-mocking tests use dup/dup2 file descriptor manipulation which is not thread-safe. When tests run in parallel, they can interfere with each other's stdin redirection, causing intermittent failures. Adding --test-threads=1 ensures these tests run serially, preventing the race condition while still generating accurate coverage data. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The stdin-mocking tests manipulate process-wide file descriptors using dup/dup2. Even with --test-threads=1, llvm-cov instrumentation can interfere with FD operations. Adding a static mutex ensures exclusive access to stdio FD operations across all tests. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The dup/dup2 file descriptor manipulation used in stdin-mocking tests is fragile when combined with llvm-cov's instrumentation. This causes spurious test failures in CI coverage runs. Skip these tests when LLVM_PROFILE_FILE is set (indicating llvm-cov). The tests still run with cargo nextest (separate processes) and the core stdin handling logic is tested by non-mocking tests. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

codecov · 2026-02-06T06:09:58Z

Codecov Report

❌ Patch coverage is 97.74718% with 18 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/parser/mod.rs	93.63%	7 Missing ⚠️
src/build_helpers.rs	66.66%	6 Missing ⚠️
src/main.rs	40.00%	3 Missing ⚠️
src/evaluator/strength.rs	99.49%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

Fixes identified by comprehensive PR review: 1. Silent unwrap_or(0) on overflow (CRITICAL): - Changed to clamp_to_i32 helper that properly clamps values to i32 range - Values exceeding i32 range now clamp instead of silently becoming 0 2. Misleading documentation for negative offsets (CRITICAL): - Fixed OffsetSpec::Absolute doc to correctly state negative values are "from end of file" not "before current position" 3. Division by zero silent fallback (IMPORTANT): - Added eprintln warning when !:strength /0 is encountered - Behavior unchanged (returns base strength) but now visible 4. Unused DEFAULT_STRENGTH constant (IMPORTANT): - Removed the unused constant to avoid confusion Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Document why with_mocked_stdin and with_invalid_stdin don't acquire FD_MUTEX to prevent future false-positive review findings: - with_mocked_stdin: Always called from within capture_stdout/capture_stderr which already hold the mutex; adding it would cause deadlock - with_invalid_stdin: Relies on --test-threads=1 for serialization since it's called directly (not nested inside capture_* functions) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

unclesp1d3r and others added 4 commits February 5, 2026 00:13

docs: add learnings from PR review to AGENTS.md

ad051e8

- Add safe string operations guidance (use strip_prefix over slicing) - Add doc test verification reminder - Add case-insensitive matching pattern section Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

unclesp1d3r linked an issue Feb 6, 2026 that may be closed by this pull request

Strength Calculation: libmagic Algorithm & !:strength Parsing #21

Closed

10 tasks

dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Feb 6, 2026

dosubot Bot added documentation Improvements or additions to documentation enhancement New feature or request labels Feb 6, 2026

coderabbitai Bot added evaluator Rule evaluation engine and logic testing Test infrastructure and coverage labels Feb 6, 2026

unclesp1d3r and others added 4 commits February 5, 2026 23:05

unclesp1d3r self-assigned this Feb 7, 2026

unclesp1d3r merged commit 2ed146d into main Feb 7, 2026
21 checks passed

unclesp1d3r deleted the 21-strength-calculation-libmagic-algorithm-strength-parsing branch February 7, 2026 03:15

github-actions Bot mentioned this pull request Feb 15, 2026

chore: release v0.1.1 #71

Closed

This was referenced Apr 10, 2026

chore: resolve all pending TODO items #212

Merged

feat(evaluator): regex and search types (closes #39) #214

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: strength calculation & documentation improvements (#21)#30

feat: strength calculation & documentation improvements (#21)#30
unclesp1d3r merged 10 commits into
mainfrom
21-strength-calculation-libmagic-algorithm-strength-parsing

unclesp1d3r commented Feb 6, 2026

Uh oh!

coderabbitai Bot commented Feb 6, 2026 •

edited

Loading

Review failed

Uh oh!

codecov Bot commented Feb 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

unclesp1d3r commented Feb 6, 2026

Summary

Strength Calculation (Issue #21)

Documentation Improvements

Files Changed

Test plan

Uh oh!

coderabbitai Bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

codecov Bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Feb 6, 2026 •

edited

Loading

codecov Bot commented Feb 6, 2026 •

edited

Loading