feat: Implement text magic parser (issue #11) by param-jasani · Pull Request #16 · EvilBit-Labs/libmagic-rs

param-jasani · 2025-12-23T21:12:32Z

Overview

This pull request implements the complete text-based magic file parser for issue #11. The parser now fully reads magic files and converts them into a hierarchical tree of MagicRule structures, handling line preprocessing, rule parsing, hierarchy construction, and error reporting.

Current Modifications

Full implementation of parse_text_magic_file in src/parser/mod.rs.
Line preprocessing: comments, empty lines, continuation lines, and hierarchy level detection.
Rule parsing using existing grammar functions (parse_offset, parse_value, etc.).
Hierarchy construction with proper parent-child relationships and validation for level jumps.
Comprehensive unit tests covering simple rules, hierarchical rules, continuation lines, comments, and error conditions.
Integration tests using sample magic files and real-world examples.

Verification

All tests pass successfully (cargo test).
Manual review confirms correct parsing and hierarchy building on representative inputs.
Code passes cargo clippy -- -D warnings.

Compliance Checklist

Adheres to project coding standards.
Unit and integration tests cover all implemented features.
Documentation updated with examples.
Commits are signed according to repository policy.

Related Issue

#11.

This PR is now ready for review. Feedback is appreciated to confirm alignment with project requirements.

Introduce line preprocessing for text-based magic files, including: - Skipping full-line comments and empty lines - Joining continuation lines (backslash-terminated) - Detecting and stripping hierarchy levels (leading '>') - Preserving internal whitespace and escape sequences - Tracking original line numbers for error reporting This completes the line preprocessing component (Phase 1 of issue EvilBit-Labs#11), preparing cleaned logical lines for subsequent rule parsing and hierarchical AST construction. Includes comprehensive unit tests covering basic rules, continuations, hierarchy, comments, whitespace, and edge cases. Signed-off-by: param-jasani <jasanip24@gmail.com>

coderabbitai · 2025-12-23T21:12:43Z

Caution

Review failed

Failed to post review comments

Summary by CodeRabbit

New Features
- Added a public API to parse text magic files and produce hierarchical rule trees.
Refactor
- Improved parsing pipeline with robust handling of continuations, comments, empty lines, indent-based hierarchy, and clearer line-number-aware error mapping.
Tests
- Added comprehensive tests for parsing stages, hierarchy construction, edge cases, error scenarios, end-to-end demonstration, and more reliable temporary-file handling in tests.
Chores
- Updated distribution/release configuration and CI workflow behavior for packaging and publish steps.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Walkthrough

Adds a complete text-magic-file parsing pipeline in src/parser/mod.rs: line metadata, preprocessing (comments, continuations, empties), per-line parsing with line-aware errors, stack-based indentation hierarchy, and a public parse_text_magic_file API plus unit tests. Also bumps cargo-dist-version in dist-workspace.toml.

Changes

Cohort / File(s)	Summary
Parser implementation & tests `src/parser/mod.rs`	New parsing pipeline: `LineInfo` tracking, `preprocess_lines` (comments, empty lines, continuations) preserving original line numbers and mapping errors, `parse_magic_rule_line` (grammar parse → `MagicRule` with line-aware errors), `build_rule_hierarchy` (indentation-stack tree assembly, orphan/stack handling), and `pub fn parse_text_magic_file(...)`. Extensive unit tests and an `output_test` demo. Review error mapping, continuation handling, and stack logic.
Distribution config `dist-workspace.toml`	Bumped `cargo-dist-version` from `0.30.0` to `0.30.3`; reorganized targets list and added dist options (installer/releases/archives/attestations/auditable/cyclonedx). No runtime code changes but CI/dist behavior may change.
CI workflow `.github/workflows/release.yml`	Adjusted permissions (removed top-level attestations/id-token), added scoped permissions for `build-local-artifacts`, updated dist installer version to `0.30.3`, and tightened host job condition to require `plan.result == 'success'`.
IO test behavior `src/io/mod.rs`	Tests: scope temporary file creation/write/flush in a block so the file is closed before returning its path (ensures deterministic file close in tests).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Text-based magic file parsing implementation (Task 14 tracking) #13 — Implements the public parse_text_magic_file API and end-to-end text-magic parsing referenced by this issue.
Implement comprehensive text-based magic file parser #11 — Implements the preprocessing, per-line parsing, and hierarchy construction requested by this issue.

Possibly related PRs

feat: Create rmagic cli #7 — Related parser/grammar and AST changes that this new parsing pipeline depends on (types/parsers referenced by parse_magic_rule_line).

Suggested labels

enhancement, testing

Poem

🐰 I nibble at backslashes, stitch lines with care,
I count each number, tuck comments in a nest,
I stack the rules like carrots in a row, there—
A magic file trimmed, all tidy for a test,
Hop — parsing done, and I take a celebratory rest.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: Implement text magic parser (issue #11)' directly and clearly describes the main change: implementing a text-based magic file parser, which aligns with the primary objective of this PR.
Description check	✅ Passed	The description comprehensively covers the PR's scope, modifications, verification steps, and compliance checklist, directly relating to the implementation of the text magic parser and supporting changes.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Implements full file-level parsing for text-based magic files, completing the missing orchestration layer between grammar parsing and evaluation. Signed-off-by: param-jasani <jasanip24@gmail.com>

unclesp1d3r · 2026-01-04T18:47:09Z

I apologize for the delay in reviewing your code. I am having a look at it now.

codecov · 2026-01-04T18:51:39Z

Codecov Report

❌ Patch coverage is 98.99160% with 6 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/parser/mod.rs	98.98%	6 Missing ⚠️

📢 Thoughts on this report? Let us know!

…tion - Removed unnecessary tracking of empty line numbers and simplified line buffer handling in `preprocess_lines`. - Improved error messages for comment parsing and enhanced clarity in documentation for `MagicRule` and `LineInfo` structs. - Made `parse_text_magic_file` public to facilitate external access. These changes enhance the readability and maintainability of the parser code while improving user documentation for better understanding of the parsing process. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

unclesp1d3r · 2026-01-04T19:08:39Z

Overall, this is some really solid work if you still consider yourself to be early in your learning journey with Rust. There were some issues that I had to fix to get it to compile and pass clippy checks, specifically with the use of to_owned() vs clone() at line 116. This causes an implicit clone, so I just changed it to an explicit clone. The other minor issues had to do with the ability to explicitly use variables in format strings. I suspect both of these things may just be a difference in Rust versions since I'm on Rust 1.92 and these were added in a more recent version of clippy.

I'll dig in to the code now.

unclesp1d3r · 2026-01-04T19:39:33Z

+    let mut lines_info: Vec<LineInfo> = Vec::new();
+    let mut line_buf = String::new();
+    let mut cont_ctr: usize = 0;
+    for (i, mut line) in input.lines().enumerate() {


If the input ends with a continuation character (\\), the content in line_buf is silently lost. Add a check to throw an error after the for loop if there's still data in line_buf.

unclesp1d3r · 2026-01-04T19:41:21Z

+            cont_ctr += 1;
+            continue;
+        }
+        lines_info.push(LineInfo::new(line_buf.clone(), (i + 1) - cont_ctr, false));


Similar to the other continuation issue, the math doesn't work if you have mixed continuation and non-continuation lines. Best to just track your starting line number instead of counting.

Also, just as an educational note, you might consider std::mem::take here instead of line_buf.clone() and then line_buf.clear(). Just a minor simplication and improvement.

unclesp1d3r · 2026-01-04T19:42:32Z

+//! use libmagic_rs::parser::parse_text_magic_file;
+//!
+//! let magic_content = r#"
+//! 0 string 0x7fELF ELF executable


This should be escaped. The grammar parser expects \x, not just x

unclesp1d3r · 2026-01-04T19:43:37Z

+/// ```ignore
+/// use libmagic_rs::parser::parse_text_magic_file;
+///
+/// let magic = r#"0 string 0x7fELF ELF file


Same issue as line 22. \x for hex bytes, not 0x.

In libmagic magic files:

\x7f is the escape sequence for hex bytes in string values (like C string literals)

0x7f is for numeric hex values (e.g., 0x7f as a number = 127)

unclesp1d3r · 2026-01-04T19:55:22Z

+///
+/// `Result<Vec<MagicRule>, ParseError>` - Root-level rules with children attached
+///
+/// # Behavior


I love this type of documentation, explaining the behavior. This is just good practice for any file format parser, in my opinion. We just probably include an explanation for the edge case when a rule starts with an child marker, but there's no parent.

…inuations - Fixed handling of comments during line continuations to ensure that ongoing rules are discarded correctly when a comment is encountered. - Updated line number tracking to accurately reflect the starting line of rules, even when empty lines are present in continuations. - Added unit tests to verify the fixes for both bugs, ensuring robust handling of edge cases in the preprocessing logic. These changes enhance the reliability of the parser by preventing corruption of rule data and maintaining accurate line number reporting. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

- Added error handling for unterminated line continuations at the end of input, ensuring that the parser correctly identifies and reports syntax errors. - Updated line buffer management to utilize `std::mem::take` for improved clarity and safety in handling line data. These changes enhance the robustness of the line preprocessing logic, preventing potential data corruption and improving error reporting. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

… processing - Made `parse_text_magic_file` public for external access. - Improved documentation for error handling in line processing, including unterminated line continuations and orphaned child rules. - Refactored `build_rule_hierarchy` to utilize a helper function for better readability and maintainability. These changes improve the clarity of the parser's functionality and enhance the overall robustness of the rule processing logic. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

…tests - Made `parse_text_magic_file` public for external access. - Improved documentation for error handling in line processing, including unterminated line continuations and orphaned child rules. - Refactored `build_rule_hierarchy` to utilize a helper function for better readability. - Added unit tests for overflow scenarios in decimal and hexadecimal parsing, ensuring robust error handling for large numbers. These changes improve the clarity and robustness of the parser's functionality, enhancing overall error reporting and processing logic. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

- Updated documentation to clarify error handling for orphaned child rules and invalid syntax. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

…hensive tests - Introduced a helper function `pop_and_attach` to simplify the logic for managing the rule hierarchy during parsing. - Enhanced the `build_rule_hierarchy` function for better readability and maintainability. - Added unit tests for overflow scenarios in decimal and hexadecimal parsing, ensuring robust error handling for large numbers. - Implemented tests for edge cases related to line continuations and line number accuracy, improving overall error reporting and parser reliability. These changes enhance the clarity and robustness of the parser's functionality, ensuring better handling of complex parsing scenarios. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

- Incremented the cargo-dist-version in the dist-workspace.toml file to reflect the latest version for CI compatibility. This change ensures that the project uses the most recent version of cargo-dist for distribution tasks. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

- Restored "attestations" permission to "write" and set "contents" permission to "read" in the release workflow. - Updated the cargo-dist installer version to 0.30.3 for compatibility. - Modified the condition for the publishing step to ensure it checks the success of the plan job. These changes enhance the release workflow's functionality and ensure proper permissions are set for artifact publishing. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

- Reformatted the target platforms list in the dist-workspace.toml file for improved readability by using a multi-line array format. This change enhances the clarity of the configuration file, making it easier to manage and update target platforms in the future. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

- Wrapped the file operations in a block to ensure the file is closed immediately after writing and syncing, improving resource management. This change enhances the reliability of the `create_temp_file` function by ensuring that file handles are properly released after use. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

- Updated string literals in multiple test cases to use a more concise format, improving readability and consistency across the test suite. - This change enhances the clarity of the test inputs, making it easier to understand the expected data structure for parsing. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

param-jasani · 2026-01-05T04:00:30Z

Thanks for the review and the fixes!
Appreciate the kind words and the detailed feedback. I’ll go through the comments and improve myself.

param-jasani marked this pull request as draft December 23, 2025 21:13

param-jasani mentioned this pull request Dec 23, 2025

Implement comprehensive text-based magic file parser #11

Closed

10 tasks

feat(parser): implement comprehensive text-based magic file parser

48d376f

Implements full file-level parsing for text-based magic files, completing the missing orchestration layer between grammar parsing and evaluation. Signed-off-by: param-jasani <jasanip24@gmail.com>

param-jasani marked this pull request as ready for review December 28, 2025 09:41

unclesp1d3r self-requested a review January 4, 2026 18:45

unclesp1d3r reviewed Jan 4, 2026

View reviewed changes

unclesp1d3r added 11 commits January 4, 2026 15:00

refactor(parser): improve rule hierarchy comment

22b3dff

- Updated documentation to clarify error handling for orphaned child rules and invalid syntax. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

coderabbitai Bot added enhancement New feature or request testing Test infrastructure and coverage labels Jan 4, 2026

unclesp1d3r merged commit 8429019 into EvilBit-Labs:main Jan 4, 2026
21 of 22 checks passed

unclesp1d3r linked an issue Jan 4, 2026 that may be closed by this pull request

Implement comprehensive text-based magic file parser #11

Closed

10 tasks

This was referenced Jan 23, 2026

feat: parser integration, CI modernization with mise, and Dev Container support #26

Merged

feat: built-in rules build time compilation fallback #28

Merged

This was referenced Feb 6, 2026

feat: strength calculation & documentation improvements (#21) #30

Merged

Test infrastructure, compatibility tests, and architecture improvements #31

Merged

Documentation: comprehensive mdbook rewrite, rustdoc fixes, and test stability #33

Merged

github-actions Bot mentioned this pull request Feb 15, 2026

chore: release v0.1.1 #71

Closed

This was referenced Mar 1, 2026

feat(parser): implement comparison operators #104

Merged

feat(parser): implement bitwise xor not and any value x operators #145

Merged

Uh oh!

Conversation

param-jasani commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Current Modifications

Verification

Compliance Checklist

Related Issue

Uh oh!

coderabbitai Bot commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Poem

Pre-merge checks and finishing touches

Uh oh!

unclesp1d3r commented Jan 4, 2026

Uh oh!

codecov Bot commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

unclesp1d3r commented Jan 4, 2026

Uh oh!

unclesp1d3r Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

unclesp1d3r Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

unclesp1d3r Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

unclesp1d3r Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

unclesp1d3r Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

unclesp1d3r Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

unclesp1d3r Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

param-jasani commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

param-jasani commented Dec 23, 2025 •

edited

Loading

coderabbitai Bot commented Dec 23, 2025 •

edited

Loading

codecov Bot commented Jan 4, 2026 •

edited

Loading