feat: cli enhancements multiple files stdin magic discovery#27
Conversation
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…y review Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
feat(lib): implement custom config for loading magic rules feat(evaluator): enable evaluation with timeout in separate thread feat(test): add integration tests for multiple file processing and timeout behavior Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
|
Caution Review failedFailed to post review comments Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughAdds multi-file CLI processing, per-file timeout support, built-in rules loading API, JSON Lines output, and extensive CLI integration tests; evaluator gained a thread-based timeout path while library and CLI APIs were extended for in-memory/file evaluations and new configuration exposure. Changes
Sequence DiagramsequenceDiagram
participant User
participant CLI as CLI (main)
participant DB as MagicDatabase
participant Eval as Evaluator
participant Output as Output Formatter
User->>CLI: Invoke with files/options
CLI->>CLI: Parse args, build EvaluationConfig
CLI->>DB: Load once (from file or builtin)
loop per file
CLI->>Eval: evaluate_rules_with_config(buffer, config)
alt timeout configured
Eval->>Eval: spawn worker thread (Arc clones)
Eval->>Eval: worker runs evaluate_rules -> send result via mpsc
Eval->>Eval: main waits recv_timeout()
alt result received
Eval-->>CLI: return EvaluationResult
else timeout
Eval-->>CLI: return Timeout error
end
else no timeout
Eval->>Eval: direct evaluate_rules in-thread
Eval-->>CLI: return EvaluationResult
end
CLI->>Output: format per-file result (JSON Lines or text/pretty JSON)
end
Output->>User: emit formatted output per run
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Pull request overview
Enhances libmagic-rs CLI and core library to support multi-file processing, stdin evaluation, JSON Lines output for streaming/multi-file use, and configurable per-file evaluation timeouts (plus a stub “built-in rules” mode).
Changes:
- Add multi-file + stdin handling to
rmagic, including JSON Lines output for multi-file JSON mode and new CLI flags (--strict,--use-builtin,--timeout-ms). - Extend
MagicDatabasewithevaluate_buffer, config-aware constructors, and a built-in rules stub that returns"data". - Implement a timeout path in the evaluator and add extensive CLI integration tests; update dependencies and add contributor/CI prompt docs.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/cli_integration_tests.rs | Adds many new CLI integration tests for multi-file/stdin/strict/JSON Lines/timeout behavior. |
| src/output/json.rs | Introduces JSON Lines output struct + formatter to include filename per result. |
| src/main.rs | Implements multi-file iteration, stdin reading, strict exit behavior, magic discovery order, and JSON Lines selection. |
| src/lib.rs | Adds config-aware DB constructors, built-in rules stub, empty-file handling, and buffer evaluation API. |
| src/evaluator/mod.rs | Adds a timeout mode around rule evaluation. |
| Cargo.toml | Adds clap-stdin and bumps nix dev dependency. |
| CLAUDE.md | Adds contributor quick-reference documentation. |
| .github/prompts/simplicity-review.prompt.md | Adds an automation prompt for “simplicity” reviews. |
| .github/prompts/cicheck.prompt.md | Adds an automation prompt for running/fixing CI checks. |
| let saved_stdout = dup(std::io::stdout()).unwrap(); | ||
| let (read_fd, write_fd) = pipe().unwrap(); | ||
|
|
||
| dup2_stdout(write_fd).unwrap(); | ||
|
|
||
| let result = f(); | ||
|
|
||
| dup2_stdout(saved_stdout).unwrap(); | ||
|
|
There was a problem hiding this comment.
capture_stdout redirects stdout to a pipe but never closes the original write_fd (or the duplicated saved_stdout) before reading. Because a write end of the pipe remains open, read() may block indefinitely and hang the test. Close/drop write_fd (and the saved fd) at the right times (typically: dup2 -> drop original write fd; after restoring stdout, close saved fd; also close read fd after finishing).
| let saved_stderr = dup(std::io::stderr()).unwrap(); | ||
| let (read_fd, write_fd) = pipe().unwrap(); | ||
|
|
||
| dup2_stderr(write_fd).unwrap(); | ||
|
|
||
| let result = f(); | ||
|
|
||
| dup2_stderr(saved_stderr).unwrap(); | ||
|
|
There was a problem hiding this comment.
capture_stderr has the same pipe lifetime issue as capture_stdout: write_fd is kept open while attempting to read from read_fd, so the read loop can block forever. Drop/close the extra write end after dup2_stderr, and close the saved fd once stderr is restored.
| let saved_stdin = dup(std::io::stdin()).unwrap(); | ||
| let temp_dir = std::env::temp_dir().join("rmagic_stdin_invalid"); | ||
| fs::create_dir_all(&temp_dir).unwrap(); | ||
| let dir_handle = fs::File::open(&temp_dir).unwrap(); | ||
|
|
||
| dup2_stdin(&dir_handle).unwrap(); | ||
| let result = f(); | ||
|
|
||
| dup2_stdin(saved_stdin).unwrap(); | ||
| let _ = fs::remove_dir_all(&temp_dir); | ||
|
|
There was a problem hiding this comment.
with_invalid_stdin uses a fixed directory name under the global temp dir (rmagic_stdin_invalid). Since Rust tests run in parallel, this can race between tests/processes (create/remove collisions). Use a unique temp directory (e.g., tempfile::tempdir() or include a random suffix / PID) to avoid cross-test interference.
| // If no timeout is configured, evaluate normally | ||
| let Some(timeout_ms) = config.timeout_ms else { | ||
| let mut context = EvaluationContext::new(config); | ||
| return evaluate_rules(rules, buffer, &mut context); | ||
| }; | ||
|
|
||
| // With timeout: spawn evaluation in a thread and wait with timeout | ||
| // Clone data needed for the thread | ||
| let rules_owned = rules.to_vec(); | ||
| let buffer_owned = buffer.to_vec(); | ||
| let config_clone = config.clone(); | ||
|
|
||
| let (tx, rx) = mpsc::channel(); | ||
|
|
||
| // Spawn evaluation in separate thread | ||
| thread::spawn(move || { | ||
| let mut context = EvaluationContext::new(config_clone); | ||
| let result = evaluate_rules(&rules_owned, &buffer_owned, &mut context); | ||
| let _ = tx.send(result); | ||
| }); | ||
|
|
||
| // Wait for result with timeout | ||
| match rx.recv_timeout(Duration::from_millis(timeout_ms)) { | ||
| Ok(result) => result, | ||
| Err(mpsc::RecvTimeoutError::Timeout) => Err(LibmagicError::Timeout { timeout_ms }), | ||
| Err(mpsc::RecvTimeoutError::Disconnected) => { | ||
| // Thread panicked or dropped sender | ||
| Err(LibmagicError::EvaluationError( | ||
| crate::error::EvaluationError::internal_error( | ||
| "Evaluation thread terminated unexpectedly", | ||
| ), | ||
| )) | ||
| } | ||
| } |
There was a problem hiding this comment.
The timeout implementation spawns a detached thread and returns Timeout on recv_timeout, but the evaluation thread keeps running to completion (it is not cancelled). With multiple timed-out inputs this can accumulate background CPU work and threads, undermining the timeout’s purpose and risking resource exhaustion. Consider implementing timeouts cooperatively (propagate a deadline and check it inside expensive operations like string scans) or running evaluation in a separate process if hard cancellation is required; also avoid cloning rules/buffer (to_vec()) for every evaluation.
| /// This is a convenience wrapper around Command::new("cargo"). | ||
| fn run_cli_with_args(args: &[&str]) -> Result<Output, Box<dyn std::error::Error>> { | ||
| let output = Command::new("cargo") | ||
| .args(["run", "--quiet", "--"]) | ||
| .args(args) | ||
| .output()?; | ||
| Ok(output) | ||
| } |
There was a problem hiding this comment.
These helpers invoke cargo run for each test case. With many new CLI tests and CI using cargo nextest run (parallel), this adds significant overhead and can create contention/flakiness. Prefer executing the already-built test binary via env!("CARGO_BIN_EXE_rmagic") (or a helper like assert_cmd) instead of spawning nested Cargo builds.
| /// This is a convenience wrapper around Command::new("cargo"). | |
| fn run_cli_with_args(args: &[&str]) -> Result<Output, Box<dyn std::error::Error>> { | |
| let output = Command::new("cargo") | |
| .args(["run", "--quiet", "--"]) | |
| .args(args) | |
| .output()?; | |
| Ok(output) | |
| } | |
| /// This is a convenience wrapper around invoking the built `rmagic` binary. | |
| fn run_cli_with_args(args: &[&str]) -> Result<Output, Box<dyn std::error::Error>> { | |
| let output = Command::new(env!("CARGO_BIN_EXE_rmagic")) | |
| .args(args) | |
| .output()?; | |
| Ok(output) | |
| } | |
| } |
- Fix capture_stdout/capture_stderr pipe lifetime issues by closing
write_fd after dup2 and saved fd after restoring to prevent blocking
- Fix with_invalid_stdin race condition by using unique temp directory
with PID and timestamp instead of fixed name
- Improve timeout implementation by using Arc instead of cloning
rules/buffer, and document limitation that thread continues after timeout
- Use env!("CARGO_BIN_EXE_rmagic") in integration tests instead of
cargo run for better performance in parallel test execution
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This pull request introduces several enhancements and new features to the
libmagic-rsproject, focusing on improved evaluation control, CLI/JSON output capabilities, and code quality standards. The most significant changes are the addition of evaluation timeouts, support for evaluating in-memory buffers, a stub for built-in magic rules, and improved JSON Lines output for multi-file scenarios. There are also updates to the documentation, dependencies, and prompt files for CI and code review automation.Fixes issue #18
Core Library Enhancements:
Timeouterror if the operation exceeds the configured limit. This prevents long-running or hanging evaluations. (src/evaluator/mod.rs,src/lib.rs)evaluate_buffermethod toMagicDatabasefor evaluating in-memory byte buffers directly, enabling stdin and non-file input support. (src/lib.rs)MagicDatabase::with_builtin_rulesandwith_builtin_rules_and_config, returning a default "data" result for all inputs. (src/lib.rs)src/lib.rs)src/lib.rs)Output and CLI Improvements:
JsonLineOutputstruct andformat_json_line_outputfunction to produce JSON Lines output, including filename context for each evaluated file, supporting multi-file and streaming scenarios. (src/output/json.rs)clap-stdinfor improved CLI stdin handling, and bumpednixversion for test/dev dependencies. (Cargo.toml) [1] [2]Documentation and Developer Experience:
CLAUDE.md, a quick reference guide for building, testing, and contributing to the project, including code standards and current CLI enhancement focus. (CLAUDE.md).github/prompts/cicheck.prompt.md,.github/prompts/simplicity-review.prompt.md) [1] [2]These changes collectively improve the flexibility, reliability, and maintainability of the library, especially for CLI and automation use cases.