Skip to content

feat: cli enhancements multiple files stdin magic discovery#27

Merged
unclesp1d3r merged 9 commits into
mainfrom
18-cli-enhancements-multiple-files-stdin-magic-discovery
Jan 25, 2026
Merged

feat: cli enhancements multiple files stdin magic discovery#27
unclesp1d3r merged 9 commits into
mainfrom
18-cli-enhancements-multiple-files-stdin-magic-discovery

Conversation

@unclesp1d3r
Copy link
Copy Markdown
Member

This pull request introduces several enhancements and new features to the libmagic-rs project, focusing on improved evaluation control, CLI/JSON output capabilities, and code quality standards. The most significant changes are the addition of evaluation timeouts, support for evaluating in-memory buffers, a stub for built-in magic rules, and improved JSON Lines output for multi-file scenarios. There are also updates to the documentation, dependencies, and prompt files for CI and code review automation.

Fixes issue #18

Core Library Enhancements:

  • Added support for evaluation timeouts by running rule evaluation in a separate thread and terminating with a Timeout error if the operation exceeds the configured limit. This prevents long-running or hanging evaluations. (src/evaluator/mod.rs, src/lib.rs)
  • Added evaluate_buffer method to MagicDatabase for evaluating in-memory byte buffers directly, enabling stdin and non-file input support. (src/lib.rs)
  • Introduced a stub implementation for built-in magic rules via MagicDatabase::with_builtin_rules and with_builtin_rules_and_config, returning a default "data" result for all inputs. (src/lib.rs)
  • Enhanced file evaluation to handle empty files gracefully by treating them as empty buffers, ensuring consistent results. (src/lib.rs)
  • Added configuration-aware constructors for loading magic rules from files with custom evaluation settings. (src/lib.rs)

Output and CLI Improvements:

  • Added JsonLineOutput struct and format_json_line_output function to produce JSON Lines output, including filename context for each evaluated file, supporting multi-file and streaming scenarios. (src/output/json.rs)
  • Updated dependencies to include clap-stdin for improved CLI stdin handling, and bumped nix version for test/dev dependencies. (Cargo.toml) [1] [2]

Documentation and Developer Experience:

  • Added CLAUDE.md, a quick reference guide for building, testing, and contributing to the project, including code standards and current CLI enhancement focus. (CLAUDE.md)
  • Added or updated prompt files for CI checks and code simplicity reviews to automate and standardize code quality and review processes. (.github/prompts/cicheck.prompt.md, .github/prompts/simplicity-review.prompt.md) [1] [2]

These changes collectively improve the flexibility, reliability, and maintainability of the library, especially for CLI and automation use cases.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…y review

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
feat(lib): implement custom config for loading magic rules
feat(evaluator): enable evaluation with timeout in separate thread
feat(test): add integration tests for multiple file processing and timeout behavior

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
@unclesp1d3r unclesp1d3r linked an issue Jan 25, 2026 that may be closed by this pull request
20 tasks
@unclesp1d3r unclesp1d3r self-assigned this Jan 25, 2026
@unclesp1d3r unclesp1d3r requested a review from Copilot January 25, 2026 00:45
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 25, 2026

Caution

Review failed

Failed to post review comments

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Summary by CodeRabbit

  • New Features

    • Multi-file batch processing, improved stdin handling, and in-memory buffer evaluation
    • Built-in rules mode and per-file evaluation timeout
    • JSON Lines output for multi-file runs and pretty JSON for single-file runs
    • Strict error-handling mode
  • Bug Fixes

    • Better empty-file detection and handling; refined per-file error reporting
  • Documentation

    • New project documentation and review-guidance prompts
  • Chores / Tests

    • Dependency updates and extensive CLI integration test suite added

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Adds multi-file CLI processing, per-file timeout support, built-in rules loading API, JSON Lines output, and extensive CLI integration tests; evaluator gained a thread-based timeout path while library and CLI APIs were extended for in-memory/file evaluations and new configuration exposure.

Changes

Cohort / File(s) Summary
CI & Docs
\.github/prompts/cicheck.prompt.md, \.github/prompts/simplicity-review.prompt.md, CLAUDE.md
New CI and review prompt files plus a project documentation file.
Dependencies
Cargo.toml
Added clap-stdin = "0.8.0" and bumped dev-dep nix 0.31.0→0.31.1.
Evaluator (timeout)
src/evaluator/mod.rs
Added thread-based evaluation when timeout_ms is set: clones data into Arcs, spawns a worker thread, uses mpsc + recv_timeout, returns LibmagicError::Timeout on expiry.
Core library API
src/lib.rs
New public constructors/methods: with_builtin_rules, with_builtin_rules_and_config, load_from_file_with_config, evaluate_buffer, config; evaluate_file now handles empty files and delegates appropriately.
CLI & args
src/main.rs
Reworked CLI to accept multiple inputs (files: Vec<FileOrStdin>), added flags (json, text, magic_file, strict, use_builtin, timeout_ms), to_evaluation_config(), magic_file_candidates(), single DB load, per-file processing and refined exit codes/output formats.
JSON output
src/output/json.rs
Added JsonLineOutput struct, constructors new / from_match_results, and format_json_line_output() to produce JSON Lines per file.
Integration tests
tests/cli_integration_tests.rs
Large new CLI integration test suite covering canonical tests, multi-file handling, stdin mixing, strict mode, built-in rules, JSON Lines/pretty JSON, timeouts, edge cases, and many helpers.

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI as CLI (main)
    participant DB as MagicDatabase
    participant Eval as Evaluator
    participant Output as Output Formatter

    User->>CLI: Invoke with files/options
    CLI->>CLI: Parse args, build EvaluationConfig
    CLI->>DB: Load once (from file or builtin)
    loop per file
        CLI->>Eval: evaluate_rules_with_config(buffer, config)
        alt timeout configured
            Eval->>Eval: spawn worker thread (Arc clones)
            Eval->>Eval: worker runs evaluate_rules -> send result via mpsc
            Eval->>Eval: main waits recv_timeout()
            alt result received
                Eval-->>CLI: return EvaluationResult
            else timeout
                Eval-->>CLI: return Timeout error
            end
        else no timeout
            Eval->>Eval: direct evaluate_rules in-thread
            Eval-->>CLI: return EvaluationResult
        end
        CLI->>Output: format per-file result (JSON Lines or text/pretty JSON)
    end
    Output->>User: emit formatted output per run
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • EvilBit-Labs/libmagic-rs PR #5 — Modifies evaluator timeout logic and related error handling (high overlap).
  • EvilBit-Labs/libmagic-rs PR #7 — Touches evaluator, lib, main, and output modules with similar functional changes.
  • EvilBit-Labs/libmagic-rs PR #25 — Adds/adjusts built-in rules loading and API methods used here.

Poem

🐰 I hopped through code to time each file,

Spawned threads and checked each byte a while,
JSON Lines arranged in tidy rows,
Built-in rules where curious logic goes,
Tests applaud with many passing smiles.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly highlights the main CLI enhancements (multiple files, stdin support, magic file discovery) which are primary themes throughout the changeset.
Description check ✅ Passed The description comprehensively covers the pull request objectives including library enhancements, CLI improvements, output changes, and documentation updates, directly relating to the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 18-cli-enhancements-multiple-files-stdin-magic-discovery

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added enhancement New feature or request evaluator Rule evaluation engine and logic output Result formatting and output generation testing Test infrastructure and coverage labels Jan 25, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enhances libmagic-rs CLI and core library to support multi-file processing, stdin evaluation, JSON Lines output for streaming/multi-file use, and configurable per-file evaluation timeouts (plus a stub “built-in rules” mode).

Changes:

  • Add multi-file + stdin handling to rmagic, including JSON Lines output for multi-file JSON mode and new CLI flags (--strict, --use-builtin, --timeout-ms).
  • Extend MagicDatabase with evaluate_buffer, config-aware constructors, and a built-in rules stub that returns "data".
  • Implement a timeout path in the evaluator and add extensive CLI integration tests; update dependencies and add contributor/CI prompt docs.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/cli_integration_tests.rs Adds many new CLI integration tests for multi-file/stdin/strict/JSON Lines/timeout behavior.
src/output/json.rs Introduces JSON Lines output struct + formatter to include filename per result.
src/main.rs Implements multi-file iteration, stdin reading, strict exit behavior, magic discovery order, and JSON Lines selection.
src/lib.rs Adds config-aware DB constructors, built-in rules stub, empty-file handling, and buffer evaluation API.
src/evaluator/mod.rs Adds a timeout mode around rule evaluation.
Cargo.toml Adds clap-stdin and bumps nix dev dependency.
CLAUDE.md Adds contributor quick-reference documentation.
.github/prompts/simplicity-review.prompt.md Adds an automation prompt for “simplicity” reviews.
.github/prompts/cicheck.prompt.md Adds an automation prompt for running/fixing CI checks.

Comment thread src/main.rs
Comment on lines +639 to +647
let saved_stdout = dup(std::io::stdout()).unwrap();
let (read_fd, write_fd) = pipe().unwrap();

dup2_stdout(write_fd).unwrap();

let result = f();

dup2_stdout(saved_stdout).unwrap();

Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capture_stdout redirects stdout to a pipe but never closes the original write_fd (or the duplicated saved_stdout) before reading. Because a write end of the pipe remains open, read() may block indefinitely and hang the test. Close/drop write_fd (and the saved fd) at the right times (typically: dup2 -> drop original write fd; after restoring stdout, close saved fd; also close read fd after finishing).

Copilot uses AI. Check for mistakes.
Comment thread src/main.rs
Comment on lines +668 to +676
let saved_stderr = dup(std::io::stderr()).unwrap();
let (read_fd, write_fd) = pipe().unwrap();

dup2_stderr(write_fd).unwrap();

let result = f();

dup2_stderr(saved_stderr).unwrap();

Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capture_stderr has the same pipe lifetime issue as capture_stdout: write_fd is kept open while attempting to read from read_fd, so the read loop can block forever. Drop/close the extra write end after dup2_stderr, and close the saved fd once stderr is restored.

Copilot uses AI. Check for mistakes.
Comment thread src/main.rs
Comment on lines +716 to +726
let saved_stdin = dup(std::io::stdin()).unwrap();
let temp_dir = std::env::temp_dir().join("rmagic_stdin_invalid");
fs::create_dir_all(&temp_dir).unwrap();
let dir_handle = fs::File::open(&temp_dir).unwrap();

dup2_stdin(&dir_handle).unwrap();
let result = f();

dup2_stdin(saved_stdin).unwrap();
let _ = fs::remove_dir_all(&temp_dir);

Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with_invalid_stdin uses a fixed directory name under the global temp dir (rmagic_stdin_invalid). Since Rust tests run in parallel, this can race between tests/processes (create/remove collisions). Use a unique temp directory (e.g., tempfile::tempdir() or include a random suffix / PID) to avoid cross-test interference.

Copilot uses AI. Check for mistakes.
Comment thread src/evaluator/mod.rs
Comment on lines +512 to +545
// If no timeout is configured, evaluate normally
let Some(timeout_ms) = config.timeout_ms else {
let mut context = EvaluationContext::new(config);
return evaluate_rules(rules, buffer, &mut context);
};

// With timeout: spawn evaluation in a thread and wait with timeout
// Clone data needed for the thread
let rules_owned = rules.to_vec();
let buffer_owned = buffer.to_vec();
let config_clone = config.clone();

let (tx, rx) = mpsc::channel();

// Spawn evaluation in separate thread
thread::spawn(move || {
let mut context = EvaluationContext::new(config_clone);
let result = evaluate_rules(&rules_owned, &buffer_owned, &mut context);
let _ = tx.send(result);
});

// Wait for result with timeout
match rx.recv_timeout(Duration::from_millis(timeout_ms)) {
Ok(result) => result,
Err(mpsc::RecvTimeoutError::Timeout) => Err(LibmagicError::Timeout { timeout_ms }),
Err(mpsc::RecvTimeoutError::Disconnected) => {
// Thread panicked or dropped sender
Err(LibmagicError::EvaluationError(
crate::error::EvaluationError::internal_error(
"Evaluation thread terminated unexpectedly",
),
))
}
}
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout implementation spawns a detached thread and returns Timeout on recv_timeout, but the evaluation thread keeps running to completion (it is not cancelled). With multiple timed-out inputs this can accumulate background CPU work and threads, undermining the timeout’s purpose and risking resource exhaustion. Consider implementing timeouts cooperatively (propagate a deadline and check it inside expensive operations like string scans) or running evaluation in a separate process if hard cancellation is required; also avoid cloning rules/buffer (to_vec()) for every evaluation.

Copilot uses AI. Check for mistakes.
Comment thread tests/cli_integration_tests.rs Outdated
Comment on lines +63 to +70
/// This is a convenience wrapper around Command::new("cargo").
fn run_cli_with_args(args: &[&str]) -> Result<Output, Box<dyn std::error::Error>> {
let output = Command::new("cargo")
.args(["run", "--quiet", "--"])
.args(args)
.output()?;
Ok(output)
}
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These helpers invoke cargo run for each test case. With many new CLI tests and CI using cargo nextest run (parallel), this adds significant overhead and can create contention/flakiness. Prefer executing the already-built test binary via env!("CARGO_BIN_EXE_rmagic") (or a helper like assert_cmd) instead of spawning nested Cargo builds.

Suggested change
/// This is a convenience wrapper around Command::new("cargo").
fn run_cli_with_args(args: &[&str]) -> Result<Output, Box<dyn std::error::Error>> {
let output = Command::new("cargo")
.args(["run", "--quiet", "--"])
.args(args)
.output()?;
Ok(output)
}
/// This is a convenience wrapper around invoking the built `rmagic` binary.
fn run_cli_with_args(args: &[&str]) -> Result<Output, Box<dyn std::error::Error>> {
let output = Command::new(env!("CARGO_BIN_EXE_rmagic"))
.args(args)
.output()?;
Ok(output)
}
}

Copilot uses AI. Check for mistakes.
- Fix capture_stdout/capture_stderr pipe lifetime issues by closing
  write_fd after dup2 and saved fd after restoring to prevent blocking
- Fix with_invalid_stdin race condition by using unique temp directory
  with PID and timestamp instead of fixed name
- Improve timeout implementation by using Arc instead of cloning
  rules/buffer, and document limitation that thread continues after timeout
- Use env!("CARGO_BIN_EXE_rmagic") in integration tests instead of
  cargo run for better performance in parallel test execution

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@unclesp1d3r unclesp1d3r enabled auto-merge (squash) January 25, 2026 01:24
@unclesp1d3r unclesp1d3r merged commit 6420ea2 into main Jan 25, 2026
18 of 20 checks passed
@unclesp1d3r unclesp1d3r deleted the 18-cli-enhancements-multiple-files-stdin-magic-discovery branch January 25, 2026 01:24
@github-actions github-actions Bot mentioned this pull request Feb 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request evaluator Rule evaluation engine and logic output Result formatting and output generation testing Test infrastructure and coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CLI Enhancements: Multiple Files, Stdin, Magic Discovery

2 participants