Skip to content

feat: strength calculation & documentation improvements (#21)#30

Merged
unclesp1d3r merged 10 commits into
mainfrom
21-strength-calculation-libmagic-algorithm-strength-parsing
Feb 7, 2026
Merged

feat: strength calculation & documentation improvements (#21)#30
unclesp1d3r merged 10 commits into
mainfrom
21-strength-calculation-libmagic-algorithm-strength-parsing

Conversation

@unclesp1d3r
Copy link
Copy Markdown
Member

Summary

  • Implement strength calculation system for magic rules based on libmagic's apprentice_magic_strength algorithm
  • Add comprehensive documentation with mdbook integration
  • Update AGENTS.md with PR review learnings

Strength Calculation (Issue #21)

Adds a complete strength calculation system to prioritize more specific magic rules during evaluation:

  • StrengthModifier enum: Add, Subtract, Multiply, Divide, Set operations
  • !:strength directive parsing: Parse strength modifiers from magic files
  • Default strength calculation: Based on type specificity, operator type, offset reliability, and value length
  • Rule sorting: sort_rules_by_strength() for evaluation ordering
  • Overflow protection: Safe arithmetic with clamping to [0, 255]
  • 35 unit tests covering all strength calculation scenarios

Documentation Improvements

  • Complete API reference (MagicDatabase, EvaluationConfig, AST types, error handling)
  • Full CLI documentation (options, exit codes, magic file discovery)
  • Comprehensive magic file format guide (offsets, types, operators, nested rules)
  • Mermaid architecture diagrams
  • Standalone quick reference documents

Files Changed

  • src/evaluator/strength.rs (new) - Strength calculation module
  • src/parser/grammar.rs - !:strength directive parsing
  • src/parser/ast.rs - StrengthModifier enum
  • build.rs, src/build_helpers.rs - Build script support
  • docs/ - Comprehensive documentation updates

Test plan

  • All 860 unit tests passing
  • All 115 doc tests passing
  • CI check suite passing
  • Clippy with -D warnings passing
  • cargo fmt verified

🤖 Generated with Claude Code

unclesp1d3r and others added 4 commits February 5, 2026 00:13
Populate .serena/project.yml with project-specific metadata: set project_name to "libmagic-rs" and add placeholders for included_optional_tools, base_modes, default_modes, and fixed_tools (all initialized empty). Comments explain semantics (base_modes/default_modes override global settings, fixed_tools replaces Serena's default toolset and cannot be combined with excluded_tools/included_optional_tools). No functional defaults are enabled — this adds configuration hooks for future project-specific mode/tool customization.
- Add safe string operations guidance (use strip_prefix over slicing)
- Add doc test verification reminder
- Add case-insensitive matching pattern section

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update api-reference.md with complete API docs (MagicDatabase,
  EvaluationConfig, EvaluationResult, AST types, error types)
- Update cli-reference.md with full CLI documentation (options,
  exit codes, magic file discovery, troubleshooting)
- Update magic-format.md with comprehensive magic file format guide
  (offsets, types, operators, nested rules, examples)
- Add standalone quick reference documents in docs/ root
- Add Mermaid architecture diagrams (architecture, evaluation-flow,
  error-handling, module-structure)
- Update docs/README.md to reference mdbook as primary documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add strength calculation system based on libmagic's apprentice_magic_strength
algorithm. Strength values are used to prioritize more specific rules during
evaluation.

Key additions:
- StrengthModifier enum (Add, Subtract, Multiply, Divide, Set) in ast.rs
- Parse `!:strength` directives in grammar.rs
- New evaluator/strength.rs module with:
  - calculate_default_strength() based on type, operator, offset, value length
  - apply_strength_modifier() with overflow protection
  - sort_rules_by_strength() for rule ordering
- Build script support for serializing strength modifiers
- 35 unit tests for strength calculation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@unclesp1d3r unclesp1d3r linked an issue Feb 6, 2026 that may be closed by this pull request
10 tasks
@dosubot dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Feb 6, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 6, 2026

Caution

Review failed

Failed to post review comments

Summary by CodeRabbit

  • New Features

    • Strength modifier support to influence rule weighting during evaluations.
    • Extended per-project configuration: project_name, included_optional_tools, base_modes, default_modes, fixed_tools.
  • Documentation

    • Major docs added and reorganized: comprehensive API reference, architecture guide, CLI reference, getting started, magic format, README, and multiple diagrams.

Walkthrough

Adds StrengthModifier (AST enum) and a new MagicRule field; parser support for !:strength directives; evaluator strength calculation and rule-sorting; build-time codegen/serialization for the new field; public re-exports; test/CI synchronization; new project config fields; extensive documentation additions.

Changes

Cohort / File(s) Summary
Configuration & Project
\.serena/project.yml
Added public project config fields: project_name, included_optional_tools, base_modes, default_modes, fixed_tools.
AST Types & Re-exports
src/parser/ast.rs, src/lib.rs, src/parser/mod.rs
Add StrengthModifier enum and strength_modifier: Option<StrengthModifier> to MagicRule; re-export StrengthModifier via parser and crate root.
Parser Grammar & Preprocessing
src/parser/grammar.rs, src/parser/mod.rs
Add parse_strength_directive() and is_strength_directive(); preprocessor recognizes !:strength directives, attaches modifier to LineInfo, and applies it to the next rule (modifier consumed; not inherited by children).
Evaluator: Strength Feature
src/evaluator/mod.rs, src/evaluator/strength.rs
New strength module with constants, default-strength calculation, apply_strength_modifier, calculate_rule_strength, sorting helpers, and unit tests; evaluator integrates strength computation and sorts rules.
Build / Codegen
build.rs, src/build_helpers.rs
Codegen imports/exports StrengthModifier; serialize strength_modifier via new helper; emit allow(unused_imports) for optional usage; build script rerun directive added; generated BUILTIN_RULES include strength_modifier.
Parser Tests & Initialization
src/parser/*, src/evaluator/*
Updated tests and fixtures to initialize strength_modifier (commonly None); added parsing and behavior tests for strength directives and all modifier variants.
Docs / CLI / Guides
docs/*, docs/src/*, docs/diagrams/*, AGENTS.md
Large documentation additions and rewrites (API reference, CLI reference, Getting Started, Architecture, Magic Format, diagrams); AGENTS.md adds Rust string handling and doc-test guidance.
CI & Test Helpers
.github/workflows/ci.yml, src/main.rs
CI coverage step adds --test-threads=1; process-wide FD_MUTEX added to serialize FD operations in test helpers and skip certain stdin tests under LLVM profiling.
Build Artifacts / Generated Rules
BUILTIN_RULES (generated code), build_helpers outputs
Generated MagicRule instances now include strength_modifier initialization and serialization in built-in rules output.

Sequence Diagram(s)

sequenceDiagram
    participant MF as Magic File
    participant P as Parser
    participant AST as AST
    participant E as Evaluator
    participant S as Strength Module

    MF->>P: read line "!:strength <op><n>"
    P->>P: is_strength_directive? → parse_strength_directive
    P-->>P: store pending_strength in LineInfo
    MF->>P: read next rule line
    P->>AST: construct MagicRule(strength_modifier=Some(...))
    AST-->>E: emit MagicRule(s)
    E->>S: calculate_rule_strength(rule)
    S->>S: compute default strength → apply modifier
    S-->>E: return numeric strength
    E->>S: sort_rules_by_strength(rules)
    S-->>E: rules sorted by computed strength
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐇 I found a tiny directive bright,
I nudge each rule to tip the fight,
A plus, a set, a gentle shove,
I sort the bytes with carrot love,
Hop, hop — magic hums beneath moonlight.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'feat: strength calculation & documentation improvements (#21)' accurately reflects the main changes: a strength calculation system for magic rules and comprehensive documentation enhancements.
Description check ✅ Passed The PR description is well-related to the changeset, detailing the strength calculation system implementation, documentation improvements, files changed, and test plan results.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 21-strength-calculation-libmagic-algorithm-strength-parsing

Comment @coderabbitai help to get the list of available commands and usage tips.

@dosubot dosubot Bot added documentation Improvements or additions to documentation enhancement New feature or request labels Feb 6, 2026
@coderabbitai coderabbitai Bot added evaluator Rule evaluation engine and logic testing Test infrastructure and coverage labels Feb 6, 2026
unclesp1d3r and others added 4 commits February 5, 2026 23:05
- Fix is_image function syntax error in GETTING_STARTED.md
- Add strength_modifier field and StrengthModifier type to API docs
- Add strength.rs to module listings and architecture diagrams
- Move "Strength modifiers" from unsupported to recently added
- Fix GIF version labels (7a->87a, 9a->89a)
- Fix JSON output example to match actual MatchResult structure
- Replace sudo recommendation with chmod in CLI troubleshooting
- Fix arrow directions in module-structure diagram
- Add rustdoc example to into_sorted_by_strength function

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The stdin-mocking tests use dup/dup2 file descriptor manipulation which
is not thread-safe. When tests run in parallel, they can interfere with
each other's stdin redirection, causing intermittent failures.

Adding --test-threads=1 ensures these tests run serially, preventing
the race condition while still generating accurate coverage data.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The stdin-mocking tests manipulate process-wide file descriptors using
dup/dup2. Even with --test-threads=1, llvm-cov instrumentation can
interfere with FD operations. Adding a static mutex ensures exclusive
access to stdio FD operations across all tests.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The dup/dup2 file descriptor manipulation used in stdin-mocking tests
is fragile when combined with llvm-cov's instrumentation. This causes
spurious test failures in CI coverage runs.

Skip these tests when LLVM_PROFILE_FILE is set (indicating llvm-cov).
The tests still run with cargo nextest (separate processes) and the
core stdin handling logic is tested by non-mocking tests.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 6, 2026

Codecov Report

❌ Patch coverage is 97.74718% with 18 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/parser/mod.rs 93.63% 7 Missing ⚠️
src/build_helpers.rs 66.66% 6 Missing ⚠️
src/main.rs 40.00% 3 Missing ⚠️
src/evaluator/strength.rs 99.49% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

Fixes identified by comprehensive PR review:

1. Silent unwrap_or(0) on overflow (CRITICAL):
   - Changed to clamp_to_i32 helper that properly clamps values to i32 range
   - Values exceeding i32 range now clamp instead of silently becoming 0

2. Misleading documentation for negative offsets (CRITICAL):
   - Fixed OffsetSpec::Absolute doc to correctly state negative values
     are "from end of file" not "before current position"

3. Division by zero silent fallback (IMPORTANT):
   - Added eprintln warning when !:strength /0 is encountered
   - Behavior unchanged (returns base strength) but now visible

4. Unused DEFAULT_STRENGTH constant (IMPORTANT):
   - Removed the unused constant to avoid confusion

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@unclesp1d3r unclesp1d3r self-assigned this Feb 7, 2026
Document why with_mocked_stdin and with_invalid_stdin don't acquire
FD_MUTEX to prevent future false-positive review findings:

- with_mocked_stdin: Always called from within capture_stdout/capture_stderr
  which already hold the mutex; adding it would cause deadlock

- with_invalid_stdin: Relies on --test-threads=1 for serialization since
  it's called directly (not nested inside capture_* functions)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request evaluator Rule evaluation engine and logic size:XXL This PR changes 1000+ lines, ignoring generated files. testing Test infrastructure and coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Strength Calculation: libmagic Algorithm & !:strength Parsing

1 participant