feat: cli enhancements multiple files stdin magic discovery by unclesp1d3r · Pull Request #27 · EvilBit-Labs/libmagic-rs

unclesp1d3r · 2026-01-25T00:45:20Z

This pull request introduces several enhancements and new features to the libmagic-rs project, focusing on improved evaluation control, CLI/JSON output capabilities, and code quality standards. The most significant changes are the addition of evaluation timeouts, support for evaluating in-memory buffers, a stub for built-in magic rules, and improved JSON Lines output for multi-file scenarios. There are also updates to the documentation, dependencies, and prompt files for CI and code review automation.

Fixes issue #18

Core Library Enhancements:

Added support for evaluation timeouts by running rule evaluation in a separate thread and terminating with a Timeout error if the operation exceeds the configured limit. This prevents long-running or hanging evaluations. (src/evaluator/mod.rs, src/lib.rs)
Added evaluate_buffer method to MagicDatabase for evaluating in-memory byte buffers directly, enabling stdin and non-file input support. (src/lib.rs)
Introduced a stub implementation for built-in magic rules via MagicDatabase::with_builtin_rules and with_builtin_rules_and_config, returning a default "data" result for all inputs. (src/lib.rs)
Enhanced file evaluation to handle empty files gracefully by treating them as empty buffers, ensuring consistent results. (src/lib.rs)
Added configuration-aware constructors for loading magic rules from files with custom evaluation settings. (src/lib.rs)

Output and CLI Improvements:

Added JsonLineOutput struct and format_json_line_output function to produce JSON Lines output, including filename context for each evaluated file, supporting multi-file and streaming scenarios. (src/output/json.rs)
Updated dependencies to include clap-stdin for improved CLI stdin handling, and bumped nix version for test/dev dependencies. (Cargo.toml) [1] [2]

Documentation and Developer Experience:

Added CLAUDE.md, a quick reference guide for building, testing, and contributing to the project, including code standards and current CLI enhancement focus. (CLAUDE.md)
Added or updated prompt files for CI checks and code simplicity reviews to automate and standardize code quality and review processes. (.github/prompts/cicheck.prompt.md, .github/prompts/simplicity-review.prompt.md) [1] [2]

These changes collectively improve the flexibility, reliability, and maintainability of the library, especially for CLI and automation use cases.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

…y review Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

feat(lib): implement custom config for loading magic rules feat(evaluator): enable evaluation with timeout in separate thread feat(test): add integration tests for multiple file processing and timeout behavior Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

coderabbitai · 2026-01-25T00:45:38Z

Caution

Review failed

Failed to post review comments

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Summary by CodeRabbit

New Features
- Multi-file batch processing, improved stdin handling, and in-memory buffer evaluation
- Built-in rules mode and per-file evaluation timeout
- JSON Lines output for multi-file runs and pretty JSON for single-file runs
- Strict error-handling mode
Bug Fixes
- Better empty-file detection and handling; refined per-file error reporting
Documentation
- New project documentation and review-guidance prompts
Chores / Tests
- Dependency updates and extensive CLI integration test suite added

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Walkthrough

Adds multi-file CLI processing, per-file timeout support, built-in rules loading API, JSON Lines output, and extensive CLI integration tests; evaluator gained a thread-based timeout path while library and CLI APIs were extended for in-memory/file evaluations and new configuration exposure.

Changes

Cohort / File(s)	Summary
CI & Docs `\.github/prompts/cicheck.prompt.md`, `\.github/prompts/simplicity-review.prompt.md`, `CLAUDE.md`	New CI and review prompt files plus a project documentation file.
Dependencies `Cargo.toml`	Added `clap-stdin = "0.8.0"` and bumped dev-dep `nix` 0.31.0→0.31.1.
Evaluator (timeout) `src/evaluator/mod.rs`	Added thread-based evaluation when `timeout_ms` is set: clones data into Arcs, spawns a worker thread, uses mpsc + `recv_timeout`, returns `LibmagicError::Timeout` on expiry.
Core library API `src/lib.rs`	New public constructors/methods: `with_builtin_rules`, `with_builtin_rules_and_config`, `load_from_file_with_config`, `evaluate_buffer`, `config`; `evaluate_file` now handles empty files and delegates appropriately.
CLI & args `src/main.rs`	Reworked CLI to accept multiple inputs (`files: Vec<FileOrStdin>`), added flags (`json`, `text`, `magic_file`, `strict`, `use_builtin`, `timeout_ms`), `to_evaluation_config()`, `magic_file_candidates()`, single DB load, per-file processing and refined exit codes/output formats.
JSON output `src/output/json.rs`	Added `JsonLineOutput` struct, constructors `new` / `from_match_results`, and `format_json_line_output()` to produce JSON Lines per file.
Integration tests `tests/cli_integration_tests.rs`	Large new CLI integration test suite covering canonical tests, multi-file handling, stdin mixing, strict mode, built-in rules, JSON Lines/pretty JSON, timeouts, edge cases, and many helpers.

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI as CLI (main)
    participant DB as MagicDatabase
    participant Eval as Evaluator
    participant Output as Output Formatter

    User->>CLI: Invoke with files/options
    CLI->>CLI: Parse args, build EvaluationConfig
    CLI->>DB: Load once (from file or builtin)
    loop per file
        CLI->>Eval: evaluate_rules_with_config(buffer, config)
        alt timeout configured
            Eval->>Eval: spawn worker thread (Arc clones)
            Eval->>Eval: worker runs evaluate_rules -> send result via mpsc
            Eval->>Eval: main waits recv_timeout()
            alt result received
                Eval-->>CLI: return EvaluationResult
            else timeout
                Eval-->>CLI: return Timeout error
            end
        else no timeout
            Eval->>Eval: direct evaluate_rules in-thread
            Eval-->>CLI: return EvaluationResult
        end
        CLI->>Output: format per-file result (JSON Lines or text/pretty JSON)
    end
    Output->>User: emit formatted output per run

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

EvilBit-Labs/libmagic-rs PR #5 — Modifies evaluator timeout logic and related error handling (high overlap).
EvilBit-Labs/libmagic-rs PR #7 — Touches evaluator, lib, main, and output modules with similar functional changes.
EvilBit-Labs/libmagic-rs PR #25 — Adds/adjusts built-in rules loading and API methods used here.

Poem

🐰 I hopped through code to time each file,

Spawned threads and checked each byte a while,
JSON Lines arranged in tidy rows,
Built-in rules where curious logic goes,
Tests applaud with many passing smiles.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly highlights the main CLI enhancements (multiple files, stdin support, magic file discovery) which are primary themes throughout the changeset.
Description check	✅ Passed	The description comprehensively covers the pull request objectives including library enhancements, CLI improvements, output changes, and documentation updates, directly relating to the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch 18-cli-enhancements-multiple-files-stdin-magic-discovery

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

Enhances libmagic-rs CLI and core library to support multi-file processing, stdin evaluation, JSON Lines output for streaming/multi-file use, and configurable per-file evaluation timeouts (plus a stub “built-in rules” mode).

Changes:

Add multi-file + stdin handling to rmagic, including JSON Lines output for multi-file JSON mode and new CLI flags (--strict, --use-builtin, --timeout-ms).
Extend MagicDatabase with evaluate_buffer, config-aware constructors, and a built-in rules stub that returns "data".
Implement a timeout path in the evaluator and add extensive CLI integration tests; update dependencies and add contributor/CI prompt docs.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/cli_integration_tests.rs	Adds many new CLI integration tests for multi-file/stdin/strict/JSON Lines/timeout behavior.
src/output/json.rs	Introduces JSON Lines output struct + formatter to include filename per result.
src/main.rs	Implements multi-file iteration, stdin reading, strict exit behavior, magic discovery order, and JSON Lines selection.
src/lib.rs	Adds config-aware DB constructors, built-in rules stub, empty-file handling, and buffer evaluation API.
src/evaluator/mod.rs	Adds a timeout mode around rule evaluation.
Cargo.toml	Adds `clap-stdin` and bumps `nix` dev dependency.
CLAUDE.md	Adds contributor quick-reference documentation.
.github/prompts/simplicity-review.prompt.md	Adds an automation prompt for “simplicity” reviews.
.github/prompts/cicheck.prompt.md	Adds an automation prompt for running/fixing CI checks.

Copilot · 2026-01-25T00:55:03Z

+        let saved_stdout = dup(std::io::stdout()).unwrap();
+        let (read_fd, write_fd) = pipe().unwrap();
+
+        dup2_stdout(write_fd).unwrap();
+
+        let result = f();
+
+        dup2_stdout(saved_stdout).unwrap();
+


capture_stdout redirects stdout to a pipe but never closes the original write_fd (or the duplicated saved_stdout) before reading. Because a write end of the pipe remains open, read() may block indefinitely and hang the test. Close/drop write_fd (and the saved fd) at the right times (typically: dup2 -> drop original write fd; after restoring stdout, close saved fd; also close read fd after finishing).

Copilot · 2026-01-25T00:55:03Z

+        let saved_stderr = dup(std::io::stderr()).unwrap();
+        let (read_fd, write_fd) = pipe().unwrap();
+
+        dup2_stderr(write_fd).unwrap();
+
+        let result = f();
+
+        dup2_stderr(saved_stderr).unwrap();
+


capture_stderr has the same pipe lifetime issue as capture_stdout: write_fd is kept open while attempting to read from read_fd, so the read loop can block forever. Drop/close the extra write end after dup2_stderr, and close the saved fd once stderr is restored.

Copilot · 2026-01-25T00:55:03Z

+        let saved_stdin = dup(std::io::stdin()).unwrap();
+        let temp_dir = std::env::temp_dir().join("rmagic_stdin_invalid");
+        fs::create_dir_all(&temp_dir).unwrap();
+        let dir_handle = fs::File::open(&temp_dir).unwrap();
+
+        dup2_stdin(&dir_handle).unwrap();
+        let result = f();
+
+        dup2_stdin(saved_stdin).unwrap();
+        let _ = fs::remove_dir_all(&temp_dir);
+


with_invalid_stdin uses a fixed directory name under the global temp dir (rmagic_stdin_invalid). Since Rust tests run in parallel, this can race between tests/processes (create/remove collisions). Use a unique temp directory (e.g., tempfile::tempdir() or include a random suffix / PID) to avoid cross-test interference.

Copilot · 2026-01-25T00:55:04Z

+    // If no timeout is configured, evaluate normally
+    let Some(timeout_ms) = config.timeout_ms else {
+        let mut context = EvaluationContext::new(config);
+        return evaluate_rules(rules, buffer, &mut context);
+    };
+
+    // With timeout: spawn evaluation in a thread and wait with timeout
+    // Clone data needed for the thread
+    let rules_owned = rules.to_vec();
+    let buffer_owned = buffer.to_vec();
+    let config_clone = config.clone();
+
+    let (tx, rx) = mpsc::channel();
+
+    // Spawn evaluation in separate thread
+    thread::spawn(move || {
+        let mut context = EvaluationContext::new(config_clone);
+        let result = evaluate_rules(&rules_owned, &buffer_owned, &mut context);
+        let _ = tx.send(result);
+    });
+
+    // Wait for result with timeout
+    match rx.recv_timeout(Duration::from_millis(timeout_ms)) {
+        Ok(result) => result,
+        Err(mpsc::RecvTimeoutError::Timeout) => Err(LibmagicError::Timeout { timeout_ms }),
+        Err(mpsc::RecvTimeoutError::Disconnected) => {
+            // Thread panicked or dropped sender
+            Err(LibmagicError::EvaluationError(
+                crate::error::EvaluationError::internal_error(
+                    "Evaluation thread terminated unexpectedly",
+                ),
+            ))
+        }
+    }


The timeout implementation spawns a detached thread and returns Timeout on recv_timeout, but the evaluation thread keeps running to completion (it is not cancelled). With multiple timed-out inputs this can accumulate background CPU work and threads, undermining the timeout’s purpose and risking resource exhaustion. Consider implementing timeouts cooperatively (propagate a deadline and check it inside expensive operations like string scans) or running evaluation in a separate process if hard cancellation is required; also avoid cloning rules/buffer (to_vec()) for every evaluation.

Copilot · 2026-01-25T00:55:04Z

+/// This is a convenience wrapper around Command::new("cargo").
+fn run_cli_with_args(args: &[&str]) -> Result<Output, Box<dyn std::error::Error>> {
+    let output = Command::new("cargo")
+        .args(["run", "--quiet", "--"])
+        .args(args)
+        .output()?;
+    Ok(output)
+}


These helpers invoke cargo run for each test case. With many new CLI tests and CI using cargo nextest run (parallel), this adds significant overhead and can create contention/flakiness. Prefer executing the already-built test binary via env!("CARGO_BIN_EXE_rmagic") (or a helper like assert_cmd) instead of spawning nested Cargo builds.

Suggested change

/// This is a convenience wrapper around Command::new("cargo").

fn run_cli_with_args(args: &[&str]) -> Result<Output, Box<dyn std::error::Error>> {

let output = Command::new("cargo")

.args(["run", "--quiet", "--"])

.args(args)

.output()?;

Ok(output)

}

/// This is a convenience wrapper around invoking the built `rmagic` binary.

fn run_cli_with_args(args: &[&str]) -> Result<Output, Box<dyn std::error::Error>> {

let output = Command::new(env!("CARGO_BIN_EXE_rmagic"))

.args(args)

.output()?;

Ok(output)

}

}

- Fix capture_stdout/capture_stderr pipe lifetime issues by closing write_fd after dup2 and saved fd after restoring to prevent blocking - Fix with_invalid_stdin race condition by using unique temp directory with PID and timestamp instead of fixed name - Improve timeout implementation by using Arc instead of cloning rules/buffer, and document limitation that thread continues after timeout - Use env!("CARGO_BIN_EXE_rmagic") in integration tests instead of cargo run for better performance in parallel test execution Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

unclesp1d3r added 8 commits January 24, 2026 01:01

feat(cli): enhance magic file search order with text file prioritization

858dcd7

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

feat(cli): support multiple file inputs and stdin processing

d6b64fa

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

feat(cli): add support for stdin input and buffer evaluation

cc5f362

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

feat(cli): add stub implementation for built-in magic rules

5953d74

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

feat(cli): add prompts for continuous integration check and simplicit…

663dc46

…y review Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

feat(cli): add documentation for CLI enhancements and project structure

64b7619

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

feat(cli): enhance JSON output for multiple files with JSON Lines format

6bfe9f3

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

unclesp1d3r linked an issue Jan 25, 2026 that may be closed by this pull request

CLI Enhancements: Multiple Files, Stdin, Magic Discovery #18

Closed

20 tasks

unclesp1d3r self-assigned this Jan 25, 2026

unclesp1d3r requested a review from Copilot January 25, 2026 00:45

Copilot started reviewing on behalf of unclesp1d3r January 25, 2026 00:45 View session

coderabbitai Bot added enhancement New feature or request evaluator Rule evaluation engine and logic output Result formatting and output generation testing Test infrastructure and coverage labels Jan 25, 2026

Copilot AI reviewed Jan 25, 2026

View reviewed changes

unclesp1d3r enabled auto-merge (squash) January 25, 2026 01:24

unclesp1d3r merged commit 6420ea2 into main Jan 25, 2026
18 of 20 checks passed

unclesp1d3r deleted the 18-cli-enhancements-multiple-files-stdin-magic-discovery branch January 25, 2026 01:24

coderabbitai Bot mentioned this pull request Feb 11, 2026

Test infrastructure, compatibility tests, and architecture improvements #31

Merged

5 tasks

github-actions Bot mentioned this pull request Feb 15, 2026

chore: release v0.1.1 #71

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: cli enhancements multiple files stdin magic discovery#27

feat: cli enhancements multiple files stdin magic discovery#27
unclesp1d3r merged 9 commits into
mainfrom
18-cli-enhancements-multiple-files-stdin-magic-discovery

unclesp1d3r commented Jan 25, 2026

Uh oh!

coderabbitai Bot commented Jan 25, 2026 •

edited

Loading

Review failed

Other AI code review bot(s) detected

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 25, 2026

Uh oh!

Copilot AI Jan 25, 2026

Uh oh!

Copilot AI Jan 25, 2026

Uh oh!

Copilot AI Jan 25, 2026

Uh oh!

Copilot AI Jan 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

unclesp1d3r commented Jan 25, 2026

Uh oh!

coderabbitai Bot commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Other AI code review bot(s) detected

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Jan 25, 2026 •

edited

Loading