feat: Implement text magic parser (issue #11)#16
Conversation
Introduce line preprocessing for text-based magic files, including: - Skipping full-line comments and empty lines - Joining continuation lines (backslash-terminated) - Detecting and stripping hierarchy levels (leading '>') - Preserving internal whitespace and escape sequences - Tracking original line numbers for error reporting This completes the line preprocessing component (Phase 1 of issue EvilBit-Labs#11), preparing cleaned logical lines for subsequent rule parsing and hierarchical AST construction. Includes comprehensive unit tests covering basic rules, continuations, hierarchy, comments, whitespace, and edge cases. Signed-off-by: param-jasani <jasanip24@gmail.com>
|
Caution Review failedFailed to post review comments Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughAdds a complete text-magic-file parsing pipeline in Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
Suggested labels
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Implements full file-level parsing for text-based magic files, completing the missing orchestration layer between grammar parsing and evaluation. Signed-off-by: param-jasani <jasanip24@gmail.com>
|
I apologize for the delay in reviewing your code. I am having a look at it now. |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
…tion - Removed unnecessary tracking of empty line numbers and simplified line buffer handling in `preprocess_lines`. - Improved error messages for comment parsing and enhanced clarity in documentation for `MagicRule` and `LineInfo` structs. - Made `parse_text_magic_file` public to facilitate external access. These changes enhance the readability and maintainability of the parser code while improving user documentation for better understanding of the parsing process. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
|
Overall, this is some really solid work if you still consider yourself to be early in your learning journey with Rust. There were some issues that I had to fix to get it to compile and pass clippy checks, specifically with the use of I'll dig in to the code now. |
| let mut lines_info: Vec<LineInfo> = Vec::new(); | ||
| let mut line_buf = String::new(); | ||
| let mut cont_ctr: usize = 0; | ||
| for (i, mut line) in input.lines().enumerate() { |
There was a problem hiding this comment.
If the input ends with a continuation character (\\), the content in line_buf is silently lost. Add a check to throw an error after the for loop if there's still data in line_buf.
| cont_ctr += 1; | ||
| continue; | ||
| } | ||
| lines_info.push(LineInfo::new(line_buf.clone(), (i + 1) - cont_ctr, false)); |
There was a problem hiding this comment.
Similar to the other continuation issue, the math doesn't work if you have mixed continuation and non-continuation lines. Best to just track your starting line number instead of counting.
There was a problem hiding this comment.
Also, just as an educational note, you might consider std::mem::take here instead of line_buf.clone() and then line_buf.clear(). Just a minor simplication and improvement.
| //! use libmagic_rs::parser::parse_text_magic_file; | ||
| //! | ||
| //! let magic_content = r#" | ||
| //! 0 string 0x7fELF ELF executable |
There was a problem hiding this comment.
This should be escaped. The grammar parser expects \x, not just x
| /// ```ignore | ||
| /// use libmagic_rs::parser::parse_text_magic_file; | ||
| /// | ||
| /// let magic = r#"0 string 0x7fELF ELF file |
There was a problem hiding this comment.
Same issue as line 22. \x for hex bytes, not 0x.
There was a problem hiding this comment.
In libmagic magic files:
\x7fis the escape sequence for hex bytes in string values (like C string literals)0x7fis for numeric hex values (e.g.,0x7fas a number = 127)
| /// | ||
| /// `Result<Vec<MagicRule>, ParseError>` - Root-level rules with children attached | ||
| /// | ||
| /// # Behavior |
There was a problem hiding this comment.
I love this type of documentation, explaining the behavior. This is just good practice for any file format parser, in my opinion. We just probably include an explanation for the edge case when a rule starts with an child marker, but there's no parent.
…inuations - Fixed handling of comments during line continuations to ensure that ongoing rules are discarded correctly when a comment is encountered. - Updated line number tracking to accurately reflect the starting line of rules, even when empty lines are present in continuations. - Added unit tests to verify the fixes for both bugs, ensuring robust handling of edge cases in the preprocessing logic. These changes enhance the reliability of the parser by preventing corruption of rule data and maintaining accurate line number reporting. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Added error handling for unterminated line continuations at the end of input, ensuring that the parser correctly identifies and reports syntax errors. - Updated line buffer management to utilize `std::mem::take` for improved clarity and safety in handling line data. These changes enhance the robustness of the line preprocessing logic, preventing potential data corruption and improving error reporting. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
… processing - Made `parse_text_magic_file` public for external access. - Improved documentation for error handling in line processing, including unterminated line continuations and orphaned child rules. - Refactored `build_rule_hierarchy` to utilize a helper function for better readability and maintainability. These changes improve the clarity of the parser's functionality and enhance the overall robustness of the rule processing logic. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…tests - Made `parse_text_magic_file` public for external access. - Improved documentation for error handling in line processing, including unterminated line continuations and orphaned child rules. - Refactored `build_rule_hierarchy` to utilize a helper function for better readability. - Added unit tests for overflow scenarios in decimal and hexadecimal parsing, ensuring robust error handling for large numbers. These changes improve the clarity and robustness of the parser's functionality, enhancing overall error reporting and processing logic. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Updated documentation to clarify error handling for orphaned child rules and invalid syntax. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…hensive tests - Introduced a helper function `pop_and_attach` to simplify the logic for managing the rule hierarchy during parsing. - Enhanced the `build_rule_hierarchy` function for better readability and maintainability. - Added unit tests for overflow scenarios in decimal and hexadecimal parsing, ensuring robust error handling for large numbers. - Implemented tests for edge cases related to line continuations and line number accuracy, improving overall error reporting and parser reliability. These changes enhance the clarity and robustness of the parser's functionality, ensuring better handling of complex parsing scenarios. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Incremented the cargo-dist-version in the dist-workspace.toml file to reflect the latest version for CI compatibility. This change ensures that the project uses the most recent version of cargo-dist for distribution tasks. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Restored "attestations" permission to "write" and set "contents" permission to "read" in the release workflow. - Updated the cargo-dist installer version to 0.30.3 for compatibility. - Modified the condition for the publishing step to ensure it checks the success of the plan job. These changes enhance the release workflow's functionality and ensure proper permissions are set for artifact publishing. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Reformatted the target platforms list in the dist-workspace.toml file for improved readability by using a multi-line array format. This change enhances the clarity of the configuration file, making it easier to manage and update target platforms in the future. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Wrapped the file operations in a block to ensure the file is closed immediately after writing and syncing, improving resource management. This change enhances the reliability of the `create_temp_file` function by ensuring that file handles are properly released after use. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Updated string literals in multiple test cases to use a more concise format, improving readability and consistency across the test suite. - This change enhances the clarity of the test inputs, making it easier to understand the expected data structure for parsing. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
|
Thanks for the review and the fixes! |
Overview
This pull request implements the complete text-based magic file parser for issue #11. The parser now fully reads magic files and converts them into a hierarchical tree of
MagicRulestructures, handling line preprocessing, rule parsing, hierarchy construction, and error reporting.Current Modifications
parse_text_magic_fileinsrc/parser/mod.rs.parse_offset,parse_value, etc.).Verification
cargo test).cargo clippy -- -D warnings.Compliance Checklist
Related Issue
#11.
This PR is now ready for review. Feedback is appreciated to confirm alignment with project requirements.