Skip to content

feat: Implement text magic parser (issue #11)#16

Merged
unclesp1d3r merged 14 commits into
EvilBit-Labs:mainfrom
param-jasani:feat/issue-11-text-magic-parser
Jan 4, 2026
Merged

feat: Implement text magic parser (issue #11)#16
unclesp1d3r merged 14 commits into
EvilBit-Labs:mainfrom
param-jasani:feat/issue-11-text-magic-parser

Conversation

@param-jasani
Copy link
Copy Markdown
Contributor

@param-jasani param-jasani commented Dec 23, 2025

Overview

This pull request implements the complete text-based magic file parser for issue #11. The parser now fully reads magic files and converts them into a hierarchical tree of MagicRule structures, handling line preprocessing, rule parsing, hierarchy construction, and error reporting.

Current Modifications

  • Full implementation of parse_text_magic_file in src/parser/mod.rs.
  • Line preprocessing: comments, empty lines, continuation lines, and hierarchy level detection.
  • Rule parsing using existing grammar functions (parse_offset, parse_value, etc.).
  • Hierarchy construction with proper parent-child relationships and validation for level jumps.
  • Comprehensive unit tests covering simple rules, hierarchical rules, continuation lines, comments, and error conditions.
  • Integration tests using sample magic files and real-world examples.

Verification

  • All tests pass successfully (cargo test).
  • Manual review confirms correct parsing and hierarchy building on representative inputs.
  • Code passes cargo clippy -- -D warnings.

Compliance Checklist

  • Adheres to project coding standards.
  • Unit and integration tests cover all implemented features.
  • Documentation updated with examples.
  • Commits are signed according to repository policy.

Related Issue

#11.

This PR is now ready for review. Feedback is appreciated to confirm alignment with project requirements.

Introduce line preprocessing for text-based magic files, including:

- Skipping full-line comments and empty lines
- Joining continuation lines (backslash-terminated)
- Detecting and stripping hierarchy levels (leading '>')
- Preserving internal whitespace and escape sequences
- Tracking original line numbers for error reporting

This completes the line preprocessing component (Phase 1 of issue EvilBit-Labs#11),
preparing cleaned logical lines for subsequent rule parsing and
hierarchical AST construction.

Includes comprehensive unit tests covering basic rules, continuations,
hierarchy, comments, whitespace, and edge cases.

Signed-off-by: param-jasani <jasanip24@gmail.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Dec 23, 2025

Caution

Review failed

Failed to post review comments

Summary by CodeRabbit

  • New Features

    • Added a public API to parse text magic files and produce hierarchical rule trees.
  • Refactor

    • Improved parsing pipeline with robust handling of continuations, comments, empty lines, indent-based hierarchy, and clearer line-number-aware error mapping.
  • Tests

    • Added comprehensive tests for parsing stages, hierarchy construction, edge cases, error scenarios, end-to-end demonstration, and more reliable temporary-file handling in tests.
  • Chores

    • Updated distribution/release configuration and CI workflow behavior for packaging and publish steps.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Adds a complete text-magic-file parsing pipeline in src/parser/mod.rs: line metadata, preprocessing (comments, continuations, empties), per-line parsing with line-aware errors, stack-based indentation hierarchy, and a public parse_text_magic_file API plus unit tests. Also bumps cargo-dist-version in dist-workspace.toml.

Changes

Cohort / File(s) Summary
Parser implementation & tests
src/parser/mod.rs
New parsing pipeline: LineInfo tracking, preprocess_lines (comments, empty lines, continuations) preserving original line numbers and mapping errors, parse_magic_rule_line (grammar parse → MagicRule with line-aware errors), build_rule_hierarchy (indentation-stack tree assembly, orphan/stack handling), and pub fn parse_text_magic_file(...). Extensive unit tests and an output_test demo. Review error mapping, continuation handling, and stack logic.
Distribution config
dist-workspace.toml
Bumped cargo-dist-version from 0.30.0 to 0.30.3; reorganized targets list and added dist options (installer/releases/archives/attestations/auditable/cyclonedx). No runtime code changes but CI/dist behavior may change.
CI workflow
.github/workflows/release.yml
Adjusted permissions (removed top-level attestations/id-token), added scoped permissions for build-local-artifacts, updated dist installer version to 0.30.3, and tightened host job condition to require plan.result == 'success'.
IO test behavior
src/io/mod.rs
Tests: scope temporary file creation/write/flush in a block so the file is closed before returning its path (ensures deterministic file close in tests).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

  • feat: Create rmagic cli #7 — Related parser/grammar and AST changes that this new parsing pipeline depends on (types/parsers referenced by parse_magic_rule_line).

Suggested labels

enhancement, testing

Poem

🐰 I nibble at backslashes, stitch lines with care,
I count each number, tuck comments in a nest,
I stack the rules like carrots in a row, there—
A magic file trimmed, all tidy for a test,
Hop — parsing done, and I take a celebratory rest.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: Implement text magic parser (issue #11)' directly and clearly describes the main change: implementing a text-based magic file parser, which aligns with the primary objective of this PR.
Description check ✅ Passed The description comprehensively covers the PR's scope, modifications, verification steps, and compliance checklist, directly relating to the implementation of the text magic parser and supporting changes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@param-jasani param-jasani marked this pull request as draft December 23, 2025 21:13
Implements full file-level parsing for text-based magic files, completing
the missing orchestration layer between grammar parsing and evaluation.

Signed-off-by: param-jasani <jasanip24@gmail.com>
@param-jasani param-jasani marked this pull request as ready for review December 28, 2025 09:41
@unclesp1d3r unclesp1d3r self-requested a review January 4, 2026 18:45
@unclesp1d3r
Copy link
Copy Markdown
Member

I apologize for the delay in reviewing your code. I am having a look at it now.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 4, 2026

Codecov Report

❌ Patch coverage is 98.99160% with 6 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/parser/mod.rs 98.98% 6 Missing ⚠️

📢 Thoughts on this report? Let us know!

…tion

- Removed unnecessary tracking of empty line numbers and simplified line buffer handling in `preprocess_lines`.
- Improved error messages for comment parsing and enhanced clarity in documentation for `MagicRule` and `LineInfo` structs.
- Made `parse_text_magic_file` public to facilitate external access.

These changes enhance the readability and maintainability of the parser code while improving user documentation for better understanding of the parsing process.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
@unclesp1d3r
Copy link
Copy Markdown
Member

Overall, this is some really solid work if you still consider yourself to be early in your learning journey with Rust. There were some issues that I had to fix to get it to compile and pass clippy checks, specifically with the use of to_owned() vs clone() at line 116. This causes an implicit clone, so I just changed it to an explicit clone. The other minor issues had to do with the ability to explicitly use variables in format strings. I suspect both of these things may just be a difference in Rust versions since I'm on Rust 1.92 and these were added in a more recent version of clippy.

I'll dig in to the code now.

Comment thread src/parser/mod.rs
let mut lines_info: Vec<LineInfo> = Vec::new();
let mut line_buf = String::new();
let mut cont_ctr: usize = 0;
for (i, mut line) in input.lines().enumerate() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the input ends with a continuation character (\\), the content in line_buf is silently lost. Add a check to throw an error after the for loop if there's still data in line_buf.

Comment thread src/parser/mod.rs Outdated
cont_ctr += 1;
continue;
}
lines_info.push(LineInfo::new(line_buf.clone(), (i + 1) - cont_ctr, false));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the other continuation issue, the math doesn't work if you have mixed continuation and non-continuation lines. Best to just track your starting line number instead of counting.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, just as an educational note, you might consider std::mem::take here instead of line_buf.clone() and then line_buf.clear(). Just a minor simplication and improvement.

Comment thread src/parser/mod.rs Outdated
//! use libmagic_rs::parser::parse_text_magic_file;
//!
//! let magic_content = r#"
//! 0 string 0x7fELF ELF executable
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be escaped. The grammar parser expects \x, not just x

Comment thread src/parser/mod.rs Outdated
/// ```ignore
/// use libmagic_rs::parser::parse_text_magic_file;
///
/// let magic = r#"0 string 0x7fELF ELF file
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as line 22. \x for hex bytes, not 0x.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In libmagic magic files:

  • \x7f is the escape sequence for hex bytes in string values (like C string literals)
  • 0x7f is for numeric hex values (e.g., 0x7f as a number = 127)

Comment thread src/parser/mod.rs
///
/// `Result<Vec<MagicRule>, ParseError>` - Root-level rules with children attached
///
/// # Behavior
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this type of documentation, explaining the behavior. This is just good practice for any file format parser, in my opinion. We just probably include an explanation for the edge case when a rule starts with an child marker, but there's no parent.

…inuations

- Fixed handling of comments during line continuations to ensure that ongoing rules are discarded correctly when a comment is encountered.
- Updated line number tracking to accurately reflect the starting line of rules, even when empty lines are present in continuations.
- Added unit tests to verify the fixes for both bugs, ensuring robust handling of edge cases in the preprocessing logic.

These changes enhance the reliability of the parser by preventing corruption of rule data and maintaining accurate line number reporting.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Added error handling for unterminated line continuations at the end of input, ensuring that the parser correctly identifies and reports syntax errors.
- Updated line buffer management to utilize `std::mem::take` for improved clarity and safety in handling line data.

These changes enhance the robustness of the line preprocessing logic, preventing potential data corruption and improving error reporting.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
… processing

- Made `parse_text_magic_file` public for external access.
- Improved documentation for error handling in line processing, including unterminated line continuations and orphaned child rules.
- Refactored `build_rule_hierarchy` to utilize a helper function for better readability and maintainability.

These changes improve the clarity of the parser's functionality and enhance the overall robustness of the rule processing logic.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…tests

- Made `parse_text_magic_file` public for external access.
- Improved documentation for error handling in line processing, including unterminated line continuations and orphaned child rules.
- Refactored `build_rule_hierarchy` to utilize a helper function for better readability.
- Added unit tests for overflow scenarios in decimal and hexadecimal parsing, ensuring robust error handling for large numbers.

These changes improve the clarity and robustness of the parser's functionality, enhancing overall error reporting and processing logic.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Updated documentation to clarify error handling for orphaned child rules and invalid syntax.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…hensive tests

- Introduced a helper function `pop_and_attach` to simplify the logic for managing the rule hierarchy during parsing.
- Enhanced the `build_rule_hierarchy` function for better readability and maintainability.
- Added unit tests for overflow scenarios in decimal and hexadecimal parsing, ensuring robust error handling for large numbers.
- Implemented tests for edge cases related to line continuations and line number accuracy, improving overall error reporting and parser reliability.

These changes enhance the clarity and robustness of the parser's functionality, ensuring better handling of complex parsing scenarios.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Incremented the cargo-dist-version in the dist-workspace.toml file to reflect the latest version for CI compatibility.

This change ensures that the project uses the most recent version of cargo-dist for distribution tasks.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Restored "attestations" permission to "write" and set "contents" permission to "read" in the release workflow.
- Updated the cargo-dist installer version to 0.30.3 for compatibility.
- Modified the condition for the publishing step to ensure it checks the success of the plan job.

These changes enhance the release workflow's functionality and ensure proper permissions are set for artifact publishing.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Reformatted the target platforms list in the dist-workspace.toml file for improved readability by using a multi-line array format.

This change enhances the clarity of the configuration file, making it easier to manage and update target platforms in the future.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Wrapped the file operations in a block to ensure the file is closed immediately after writing and syncing, improving resource management.

This change enhances the reliability of the `create_temp_file` function by ensuring that file handles are properly released after use.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Updated string literals in multiple test cases to use a more concise format, improving readability and consistency across the test suite.
- This change enhances the clarity of the test inputs, making it easier to understand the expected data structure for parsing.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
@coderabbitai coderabbitai Bot added enhancement New feature or request testing Test infrastructure and coverage labels Jan 4, 2026
@unclesp1d3r unclesp1d3r merged commit 8429019 into EvilBit-Labs:main Jan 4, 2026
21 of 22 checks passed
@unclesp1d3r unclesp1d3r linked an issue Jan 4, 2026 that may be closed by this pull request
10 tasks
@param-jasani
Copy link
Copy Markdown
Contributor Author

Thanks for the review and the fixes!
Appreciate the kind words and the detailed feedback. I’ll go through the comments and improve myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request testing Test infrastructure and coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement comprehensive text-based magic file parser

2 participants