Skip to content

feat: Implement Mach-O load command extraction with section weight normalization and enhanced tagging#67

Merged
unclesp1d3r merged 5 commits into
mainfrom
6-mach-o-section-classification
Nov 11, 2025
Merged

feat: Implement Mach-O load command extraction with section weight normalization and enhanced tagging#67
unclesp1d3r merged 5 commits into
mainfrom
6-mach-o-section-classification

Conversation

@unclesp1d3r
Copy link
Copy Markdown
Member

This pull request adds comprehensive support for extracting and classifying load command strings from Mach-O binaries, including library dependency paths and runtime search paths. It introduces a new extraction module, updates the tagging system to better classify extracted strings, and normalizes section weight scoring for Mach-O to a 0.0–1.0 scale for consistency. Extensive tests and documentation have been added to ensure correctness and ease of use.

Addresses issues #6 and #7

Mach-O Load Command Extraction:

  • Added new module src/extraction/macho_load_commands.rs to extract library paths (LC_LOAD_DYLIB, LC_LOAD_WEAK_DYLIB, LC_REEXPORT_DYLIB) and runtime search paths (LC_RPATH) from Mach-O load commands, with robust tagging and test coverage.
  • Updated extraction documentation and module exports to include extract_load_command_strings() and provided usage examples for Mach-O binaries. [1] [2]

Tagging and Classification Improvements:

  • Extended the Tag enum in src/types.rs with new tags: DylibPath, Rpath, RpathVariable, and FrameworkPath, and marked it as #[non_exhaustive] for future extensibility. [1] [2]

Section Weight Normalization and Classification:

  • Normalized Mach-O section weights to a 0.0–1.0 scale and updated logic to classify additional string-relevant sections (__objc_methname, __objc_classname, __ustring) as high-priority for extraction. [1] [2] [3]
  • Updated and expanded tests to validate new section classifications and normalized weights. [1] [2] [3]

Documentation and Fixture Updates:

  • Enhanced fixture documentation in tests/fixtures/README.md to describe Mach-O binary contents and provide instructions for generating test binaries with rpaths and framework dependencies. [1] [2]

…test

- Updated the assertion for the resources field in the PE integration test to clarify that it may be None for minimal binaries, such as those compiled without resource files.
- Removed redundant checks to streamline the test logic while maintaining clarity in the comments.

This change enhances the readability of the test and ensures accurate expectations regarding resource availability in PE binaries.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Updated the section weight calculations in the Mach-O parser to use a normalized scale (0.0-1.0) for consistency with other formats.
- Adjusted weights for various section types, including string data, read-only data, and code sections, to better reflect their likelihood of containing meaningful strings.
- Enhanced section classification to include additional Objective-C related sections.
- Added unit tests to validate the new weight calculations and classifications.

This refactor improves the accuracy of string extraction from Mach-O binaries, aligning it with the established standards for ELF and PE formats.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Introduced a new module for extracting load command strings from Mach-O binaries, including library dependency paths and runtime search paths.
- Enhanced the `Tag` enum with new variants for `DylibPath`, `Rpath`, `RpathVariable`, and `FrameworkPath` to support the new extraction functionality.
- Updated the extraction module documentation with usage examples and detailed descriptions of the extraction process.
- Added integration tests to validate the load command extraction functionality against a Mach-O fixture.

This feature improves the ability to analyze Mach-O binaries by enabling the extraction of meaningful load command strings, which are crucial for understanding library dependencies and runtime behavior.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Added helper functions for extracting and sorting dylib paths, rpaths, and framework paths from load command strings.
- Updated assertions in the Mach-O integration tests to verify the presence of exports and ensure correct tagging of load command strings.
- Introduced snapshot tests for load command string extraction, providing a detailed breakdown of dylib paths, rpaths, and framework paths.
- Enhanced documentation in the fixtures README to clarify the purpose and expected contents of the `test_binary_macho` fixture.

These improvements strengthen the testing framework for Mach-O binaries, ensuring comprehensive validation of load command extraction and classification.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
@unclesp1d3r unclesp1d3r linked an issue Nov 11, 2025 that may be closed by this pull request
7 tasks
@dosubot dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Nov 11, 2025
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Nov 11, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Summary by CodeRabbit

  • New Features

    • Added extraction of Mach-O load command strings, capturing dylib dependencies, framework paths, and runtime paths with automatic classification.
  • Improvements

    • Normalized section weight calculations for enhanced Mach-O binary analysis accuracy.
    • Expanded test coverage and documentation for Mach-O parsing capabilities.
  • Type System

    • Added new classification tags: DylibPath, Rpath, RpathVariable, and FrameworkPath for improved binary metadata tagging.

Walkthrough

Normalized Mach-O section weight scale to 0.0–1.0, expanded string-section classifications, added a new extractor module to pull dylib and rpath strings from Mach-O load commands, introduced four Tag variants, updated docs/fixtures, and added comprehensive Mach-O tests and minor PE test relaxation.

Changes

Cohort / File(s) Summary
Mach-O Section Weight Normalization
src/container/macho.rs
Converted section weight calculations from 1–10 to 0.0–1.0 scale; reduced weights across SectionType branches; added __objc_methname, __objc_classname, __ustring as StringData in classification; updated related tests.
Load Command String Extraction
src/extraction/macho_load_commands.rs
New module extract_load_command_strings(data: &[u8]) -> Vec<FoundString>: parses single/fat Mach-O binaries, extracts LC_LOAD_DYLIB/LC_LOAD_WEAK_DYLIB/LC_REEXPORT_DYLIB and LC_RPATH strings, classifies tags (DylibPath, Rpath, RpathVariable, FrameworkPath), and returns metadata-rich FoundString entries; includes unit tests and helpers.
Extraction Module Exposure
src/extraction/mod.rs
Added macho_load_commands module and pub use macho_load_commands::extract_load_command_strings;; updated examples/docs to showcase new extractor.
Type Extensions
src/types.rs
Marked Tag enum #[non_exhaustive] and added variants DylibPath ("dylib-path"), Rpath ("rpath"), RpathVariable ("rpath-var"), and FrameworkPath ("framework-path").
Mach-O Testing & Documentation
tests/integration_macho.rs, tests/fixtures/README.md
Added extensive integration tests for load command extraction, tag validation, framework detection, rpath-variable detection, metadata checks, and expanded fixture README with Mach-O load-command guidance and example build commands.
PE Test Relaxation
tests/integration_pe.rs
Relaxed assertion about ContainerInfo.resources presence in test_pe_import_export_extraction to allow None for minimal binaries.

Sequence Diagram(s)

sequenceDiagram
    participant Caller as Caller
    participant Extractor as extract_load_command_strings()
    participant Goblin as goblin Parser
    participant Classifier as classify_* helpers
    participant Result as Vec<FoundString>

    Caller->>Extractor: data: &[u8]
    Extractor->>Goblin: parse (fat? -> extract arch -> parse single)
    Goblin->>Goblin: iterate load commands
    alt dylib load commands (LC_LOAD_DYLIB / _WEAK / _REEXPORT)
        Goblin->>Classifier: classify_dylib_path(path)
        Classifier-->>Goblin: tags [DylibPath, (FrameworkPath?)]
    else rpath (LC_RPATH)
        Goblin->>Classifier: classify_rpath(path)
        Classifier-->>Goblin: tags [Rpath, (RpathVariable?), (FrameworkPath?)]
    end
    Goblin->>Result: build FoundString (text, tags, encoding, length, source)
    Extractor-->>Caller: Vec<FoundString>
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25–30 minutes

  • Areas needing extra attention:
    • src/extraction/macho_load_commands.rs: fat-binary handling, goblin API usage, tag heuristics, and edge-case handling.
    • src/container/macho.rs: consistency and intent of normalized weights and new section classifications.
    • src/types.rs: ensure serde renames and #[non_exhaustive] addition don't break serialization or downstream pattern matches.
    • tests/integration_macho.rs and fixtures: verify fixture assumptions and snapshot stability.

Possibly related issues

Possibly related PRs

Suggested labels

lgtm

Poem

🐰
I hop through bytes with eager paws,
I sniff out dylibs and rpath laws,
Weights trimmed neat, strings found with cheer,
Frameworks sparkle when they appear,
A tiny rabbit clap — hooray! 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main changes: Mach-O load command extraction, section weight normalization, and enhanced tagging.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, clearly explaining the new modules, tagging improvements, weight normalization, and documentation updates.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 6-mach-o-section-classification

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2cf74de and d4334c9.

⛔ Files ignored due to path filters (1)
  • tests/snapshots/integration_macho__macho_load_command_strings.snap is excluded by !**/*.snap
📒 Files selected for processing (3)
  • src/extraction/macho_load_commands.rs (1 hunks)
  • tests/fixtures/README.md (3 hunks)
  • tests/integration_macho.rs (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/integration_macho.rs (1)
src/extraction/macho_load_commands.rs (1)
  • extract_load_command_strings (53-74)
src/extraction/macho_load_commands.rs (1)
src/container/macho.rs (2)
  • parse (291-298)
  • extract_architecture_data (335-350)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: test-cross-platform (windows-latest, Windows)
  • GitHub Check: CodeQL Analyze
🔇 Additional comments (20)
tests/fixtures/README.md (2)

8-11: LGTM! Documentation clarity improved.

The bullet-point format successfully addresses the past review feedback and makes the fixture description much more readable.


39-73: Excellent documentation for Mach-O fixtures and load commands.

The build instructions are practical and the load command section clearly explains what's being tested and why cross-platform parsing works.

src/extraction/macho_load_commands.rs (8)

1-34: Well-documented module with correct doc example.

The usage example properly handles errors and demonstrates the API effectively. The no_run attribute is appropriate since it requires an external file.


53-74: LGTM! Consistent with existing parser behavior.

The function correctly handles both single and fat Mach-O binaries, and the approach of extracting from the first architecture in fat binaries aligns with the existing MachoParser implementation.


94-115: LGTM! Proper metadata for load command strings.

The hardcoded values (offset=0, rva=None, section=None, score=0) are appropriate for load command strings, which don't originate from sections and have different characteristics than section-based strings.


122-143: LGTM! Consistent implementation with dylib extraction.

The function follows the same pattern as extract_dylib_strings, ensuring consistency in metadata handling across different load command types.


145-185: LGTM! Clear and correct classification logic.

The classification functions use appropriate string matching for Mach-O conventions. The tag combinations (e.g., DylibPath + FilePath, Rpath + RpathVariable) provide useful categorization for consumers.


187-204: LGTM! Proper overflow protection implemented.

The use of checked_add successfully addresses the past review concern about potential integer overflow, ensuring safe bounds checking.


206-295: Excellent unit test coverage.

The tests cover edge cases (invalid/empty data) and thoroughly validate the utility functions with both positive and negative test cases.


297-367: Well-designed integration tests.

The use of #[ignore] for fixture-dependent tests is appropriate, and the graceful handling when fixtures don't exist prevents test failures. Tag validation ensures the implementation maintains internal consistency.

tests/integration_macho.rs (10)

12-44: Excellent test helpers that reduce duplication.

The filtering and sorting functions provide consistent, deterministic results across tests and improve test readability. The has_rpath_variable helper appropriately mirrors the extraction module logic.


69-84: LGTM! Pragmatic relaxation of export assertions.

The change acknowledges that executables may not consistently export specific symbols, making the test more robust while still validating that exports are found.


143-250: Comprehensive test with excellent coverage.

The test thoroughly validates load command extraction, including metadata, tags, and special cases like framework paths and rpath variables. The diagnostic output aids debugging.


252-317: LGTM! Well-formatted snapshot test.

The format strings have been corrected to avoid trailing whitespace, and the structured output provides clear, human-readable snapshots for regression testing.


319-385: Excellent tag validation ensuring API consistency.

This test enforces critical tag invariants (e.g., DylibPath implies FilePath, mutual exclusion of DylibPath and Rpath) and provides clear error messages when violations occur.


387-447: Thorough bidirectional validation of framework path tagging.

The test correctly validates both that .framework strings are tagged and that non-framework strings are not, preventing false positives and false negatives.


449-508: LGTM! Correct validation of rpath variable detection.

The past review issue regarding the redundant condition has been addressed, and the test now correctly validates the presence or absence of the RpathVariable tag with helpful diagnostic output.


510-526: Essential edge case coverage.

Testing empty and invalid data ensures the API degrades gracefully without panicking, which is critical for robustness.


528-582: Useful categorization and validation of dylib paths.

The test validates the FilePath tag invariant and provides insightful categorization into system, framework, and other libraries.


584-624: Thorough metadata validation with helpful future-proofing note.

The test validates all critical metadata fields and includes a useful comment about offset/rva fields being unspecified for load commands, which helps future maintainers understand intentional design choices.


Comment @coderabbitai help to get the list of available commands and usage tips.

@dosubot
Copy link
Copy Markdown

dosubot Bot commented Nov 11, 2025

Documentation Updates

1 document(s) were updated by changes in this PR:

stringy

How did I do? Any feedback?  Join Discord

@dosubot dosubot Bot added the area:analyzer Binary analyzer functionality label Nov 11, 2025
@coderabbitai coderabbitai Bot changed the title 6 mach o section classification @coderabbitai 6 mach o section classification Normalize section weights and extract load command strings Nov 11, 2025
@coderabbitai coderabbitai Bot added documentation Improvements or additions to documentation enhancement New feature or request lang:rust Rust implementation size:M This PR changes 30-99 lines, ignoring generated files. labels Nov 11, 2025
@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. size:XXL This PR changes 1000+ lines, ignoring generated files. labels Nov 11, 2025
@unclesp1d3r
Copy link
Copy Markdown
Member Author

@coderabbitai Fix the PR title.

@coderabbitai coderabbitai Bot changed the title 6 mach o section classification Normalize section weights and extract load command strings feat: Implement Mach-O load command extraction with section weight normalization and enhanced tagging Nov 11, 2025
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Nov 11, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get the current PR title
gh pr view 67 --json title,url --jq '.title'

# Also check if there's a standard format by looking at recent PRs
gh pr list --limit 5 --state merged --json number,title --jq '.[] | "#\(.number): \(.title)"'

Length of output: 500


🏁 Script executed:

#!/bin/bash
# Update the PR title to follow conventional commit format and be more descriptive
gh pr edit 67 --title "feat: Implement Mach-O load command extraction with section weight normalization and enhanced tagging"

Length of output: 59


@unclesp1d3r: I've fixed the PR title.

Old title:
6 mach o section classification Normalize section weights and extract load command strings

New title:
feat: Implement Mach-O load command extraction with section weight normalization and enhanced tagging

The title now follows the conventional commit format (with feat: prefix) and is properly capitalized, matching the style of other recent PRs in the repository like #66.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/types.rs (1)

13-47: Flag: #[non_exhaustive] here breaks existing consumers

Tag is part of the public API. Flipping it to #[non_exhaustive] forces every downstream exhaustive match to add a wildcard arm, so current consumers will stop compiling unless we ship a breaking release. Either coordinate a major version bump (even in 0.x terms) or hold off on this annotation so we don’t strand users mid-release.

🧹 Nitpick comments (2)
src/extraction/macho_load_commands.rs (1)

145-171: Also flag @-based dylib paths as RpathVariable

Dylib load commands often carry @rpath/@loader_path/@executable_path prefixes (e.g. framework stubs), but classify_dylib_path currently ignores them so those strings miss the new RpathVariable tag. Mirroring the check you have in classify_rpath keeps tagging consistent and makes it easier to surface the placeholders regardless of which load command they appear in.

 fn classify_dylib_path(path: &str) -> Vec<Tag> {
-    let mut tags = vec![Tag::DylibPath, Tag::FilePath];
+    let mut tags = vec![Tag::DylibPath, Tag::FilePath];
+
+    if contains_rpath_variable(path) {
+        tags.push(Tag::RpathVariable);
+    }
 
     if is_framework_path(path) {
         tags.push(Tag::FrameworkPath);
     }
tests/integration_macho.rs (1)

578-581: Simplify redundant logical condition.

Since system_libraries is a filtered subset of dylib_paths, the condition !system_libraries.is_empty() || !dylib_paths.is_empty() is logically equivalent to just !dylib_paths.is_empty().

-    assert!(
-        !system_libraries.is_empty() || !dylib_paths.is_empty(),
-        "Should find at least some system libraries or dylib dependencies"
-    );
+    assert!(
+        !dylib_paths.is_empty(),
+        "Should find at least some dylib dependencies"
+    );
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 08cd31b and 2cf74de.

⛔ Files ignored due to path filters (1)
  • tests/snapshots/integration_macho__macho_load_command_strings.snap is excluded by !**/*.snap
📒 Files selected for processing (7)
  • src/container/macho.rs (7 hunks)
  • src/extraction/macho_load_commands.rs (1 hunks)
  • src/extraction/mod.rs (2 hunks)
  • src/types.rs (2 hunks)
  • tests/fixtures/README.md (3 hunks)
  • tests/integration_macho.rs (4 hunks)
  • tests/integration_pe.rs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
src/extraction/macho_load_commands.rs (1)
src/container/macho.rs (2)
  • parse (291-298)
  • extract_architecture_data (335-350)
tests/integration_macho.rs (1)
src/extraction/macho_load_commands.rs (1)
  • extract_load_command_strings (49-70)
src/container/macho.rs (2)
src/container/elf.rs (2)
  • classify_section (54-81)
  • calculate_section_weight (25-51)
src/container/pe.rs (2)
  • classify_section (139-178)
  • calculate_section_weight (113-136)
src/extraction/mod.rs (1)
src/extraction/macho_load_commands.rs (1)
  • extract_load_command_strings (49-70)
🪛 LanguageTool
tests/fixtures/README.md

[uncategorized] ~8-~8: Possible missing comma found.
Context: ...64 Mach-O binary (contains typical load commands including LC_LOAD_DYLIB for system libr...

(AI_HYDRA_LEO_MISSING_COMMA)

🪛 markdownlint-cli2 (0.18.1)
tests/fixtures/README.md

59-59: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔍 Remote MCP Linear

Based on the gathered context, I have sufficient information to provide a comprehensive review summary. Let me compile the findings:

PR Review Context Summary

Key Changes Overview

This PR implements Mach-O load command string extraction and normalizes section weight scoring. The changes span four main areas:

1. Mach-O Section Weight Normalization ``

  • Weight scale normalized from 1-10 to 0.0-1.0 across all section types
  • String-related sections (__cstring, __const, __cfstring) reduced proportionally (e.g., 10.0→1.0, 9.0→0.7)
  • New classifications for Objective-C sections (__objc_methname, __objc_classname, __ustring)
  • Non-string data weights reduced significantly (ReadOnlyData: 7.0→0.4, Code: 1.0→0.1)

2. New Load Command Extraction Module ``

The PR adds src/extraction/macho_load_commands.rs with:

  • Extraction of dylib paths from LC_LOAD_DYLIB, LC_LOAD_WEAK_DYLIB, LC_REEXPORT_DYLIB
  • Runtime path extraction from LC_RPATH commands
  • Tagging system with DylibPath, FilePath, FrameworkPath, Rpath, RpathVariable variants
  • Support for framework detection and @-variable identification

3. Tag Enum Extension ``

New Tag enum variants added:

  • DylibPath (dylib-path), Rpath (rpath), RpathVariable (rpath-var), FrameworkPath (framework-path)
  • Enum marked as #[non_exhaustive] for future extensibility
  • Serde serialization mappings for JSON output

4. Comprehensive Test Coverage

Integration tests validate:

  • Load command extraction with source/encoding verification
  • Tag integrity across DylibPath, Rpath, FrameworkPath classifications
  • Framework and @-variable detection accuracy
  • Empty/invalid input handling
  • Metadata correctness (section=None, UTF-8 encoding, LoadCommand source)

Strategic Context

This work advances PROD-275 (MVP Pipeline Epic) by completing Mach-O-specific extraction capabilities. It directly implements requirements from PROD-298 (Load Command Parsing) and PROD-294 (Import/Export Integration). The section weight normalization aligns with foundation work in PROD-237-239 (ELF/PE section classification standards).

Integration Points

The PR integrates with existing infrastructure:

  • Uses goblin crate for Mach-O binary parsing
  • Extends FoundString data structure for metadata tracking
  • Plugs into modular extraction pipeline architecture
  • Compatible with section-based string extraction workflow,
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: test-cross-platform (macos-latest, macOS)
  • GitHub Check: test-cross-platform (windows-latest, Windows)
  • GitHub Check: CodeQL Analyze
🔇 Additional comments (13)
src/container/macho.rs (1)

95-101: Comprehensive Objective-C coverage.

Capturing __objc_methname, __objc_classname, and __ustring as StringData keeps the classifier aligned with real-world string reservoirs—great attention to detail.

tests/integration_pe.rs (1)

32-33: Good call-out on optional resources

The note makes the intent around ContainerInfo::resources much clearer for the minimal PE fixture. Thanks for documenting that nuance.

src/extraction/mod.rs (1)

14-38: Example update looks solid

The refreshed docs and example nicely surface the new Mach-O extractor alongside the existing PE helpers.

tests/integration_macho.rs (10)

12-44: Well-structured test helpers.

The helper functions provide clean abstractions for filtering and sorting load command strings by tag type. The consistent pattern and deterministic sorting (by text) ensure reliable test assertions.


69-84: Appropriate relaxation of export assertions.

The change from checking specific symbol names to verifying non-empty exports is sensible, as Mach-O executable symbol export behavior can vary. The inline comment clearly documents the rationale.


143-250: Comprehensive load command extraction test.

This test thoroughly validates the extraction pipeline: source/encoding correctness, tag presence, framework path relationships, and rpath variable detection. The diagnostic output aids debugging and the assertion coverage is appropriate.


252-315: Effective snapshot test for regression protection.

The snapshot test provides human-readable output with categorization (frameworks, @-variables) and count summaries. Limiting to 20 items per category with overflow indication balances detail with snapshot size.


317-383: Thorough tag relationship validation.

This test validates critical invariants: DylibPath↔FilePath, RpathVariable↔Rpath, FrameworkPath↔(DylibPath|Rpath), and mutual exclusivity of DylibPath/Rpath. The comprehensive assertions with descriptive messages ensure tag integrity.


385-445: Bidirectional framework path validation.

The test validates both positive and negative cases: strings with .framework must have FrameworkPath tags, and strings without must not. The categorization by dylib/rpath frameworks provides useful diagnostic context.


447-508: Comprehensive rpath variable detection test.

The bidirectional validation ensures RpathVariable tags are correctly applied based on the presence of @-variables. The diagnostic output detailing which specific variables are found (@rpath, @executable_path, @loader_path) aids debugging.


510-526: Good defensive test for invalid input.

This test ensures graceful error handling for both empty and invalid input, verifying the function returns an empty vector without panicking. Essential coverage for robustness.


528-582: Solid dylib path classification test.

The test validates FilePath tag presence on all DylibPath strings and categorizes paths by type (system, framework, other). The categorization logic and diagnostic output are well-structured.


584-624: Thorough metadata validation test.

This test validates critical metadata fields: section (None for load commands), length (matches text), source (LoadCommand), and encoding (UTF-8). The comment acknowledging unspecified offset/rva values allows for future enhancements without breaking existing tests.

coderabbitai[bot]
coderabbitai Bot previously approved these changes Nov 11, 2025
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request implements comprehensive Mach-O load command string extraction with enhanced tagging and normalized section weight scoring. The changes add support for extracting library dependency paths and runtime search paths from Mach-O binaries, expand the tagging system with Mach-O-specific classifications, and normalize Mach-O section weights to a 0.0-1.0 scale (while ELF and PE still use 1-10).

Key Changes:

  • Added macho_load_commands.rs module with extract_load_command_strings() function to extract dylib paths (LC_LOAD_DYLIB, LC_LOAD_WEAK_DYLIB, LC_REEXPORT_DYLIB) and runtime search paths (LC_RPATH) from Mach-O load commands
  • Extended Tag enum with four new tags: DylibPath, Rpath, RpathVariable, and FrameworkPath, marked enum as #[non_exhaustive] for future extensibility
  • Normalized Mach-O section weights to 0.0-1.0 scale and added classification for __objc_methname, __objc_classname, and __ustring sections

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/extraction/macho_load_commands.rs New module implementing load command string extraction with classification and tagging logic for dylib paths, rpaths, and framework paths
src/extraction/mod.rs Updated module documentation and exports to expose extract_load_command_strings() function
src/types.rs Added four new Tag variants for Mach-O load commands and marked Tag enum as non_exhaustive
src/container/macho.rs Normalized section weights to 0.0-1.0 scale, added classification for additional Objective-C sections, updated tests
tests/integration_macho.rs Added comprehensive test suite for load command extraction including snapshot tests, tag validation, and edge case handling
tests/integration_pe.rs Removed tautological assertion checking resources field existence
tests/fixtures/README.md Enhanced documentation describing Mach-O load command contents and build instructions
tests/snapshots/integration_macho__macho_load_command_strings.snap Snapshot test output for Mach-O load command string extraction

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/extraction/macho_load_commands.rs Outdated
Comment thread src/extraction/macho_load_commands.rs Outdated
Comment on lines +191 to +192
if offset + size <= data.len() {
Ok(&data[offset..offset + size])
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential integer overflow when adding offset + size. If both values are large, this could overflow before the comparison with data.len(). Use checked arithmetic:

let offset = arch.offset as usize;
let size = arch.size as usize;

if let Some(end) = offset.checked_add(size) {
    if end <= data.len() {
        return Ok(&data[offset..end]);
    }
}
Err(())
Suggested change
if offset + size <= data.len() {
Ok(&data[offset..offset + size])
if let Some(end) = offset.checked_add(size) {
if end <= data.len() {
Ok(&data[offset..end])
} else {
Err(())
}

Copilot uses AI. Check for mistakes.
Comment thread tests/integration_macho.rs
Comment thread tests/fixtures/README.md Outdated
Comment thread tests/integration_macho.rs Outdated
Comment thread tests/integration_macho.rs
@unclesp1d3r
Copy link
Copy Markdown
Member Author

@copilot open a new pull request to apply changes based on the comments in this thread

Copy link
Copy Markdown
Contributor

Copilot AI commented Nov 11, 2025

@unclesp1d3r I've opened a new pull request, #103, to work on those changes. Once the pull request is ready, I'll request review from you.

…103)

* Initial plan

* Apply code review fixes: error handling, integer overflow, formatting

Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com>
@coderabbitai coderabbitai Bot added the lgtm This PR has been approved by a maintainer label Nov 11, 2025
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@unclesp1d3r unclesp1d3r merged commit baf4526 into main Nov 11, 2025
17 of 18 checks passed
@unclesp1d3r unclesp1d3r deleted the 6-mach-o-section-classification branch November 11, 2025 05:25
@unclesp1d3r unclesp1d3r linked an issue Nov 11, 2025 that may be closed by this pull request
8 tasks
unclesp1d3r added a commit that referenced this pull request Feb 25, 2026
…rmalization and enhanced tagging (#67)

* refactor(tests): Simplify resource field assertion in PE integration test

- Updated the assertion for the resources field in the PE integration test to clarify that it may be None for minimal binaries, such as those compiled without resource files.
- Removed redundant checks to streamline the test logic while maintaining clarity in the comments.

This change enhances the readability of the test and ensures accurate expectations regarding resource availability in PE binaries.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

* refactor(macho): Normalize section weights for Mach-O format

- Updated the section weight calculations in the Mach-O parser to use a normalized scale (0.0-1.0) for consistency with other formats.
- Adjusted weights for various section types, including string data, read-only data, and code sections, to better reflect their likelihood of containing meaningful strings.
- Enhanced section classification to include additional Objective-C related sections.
- Added unit tests to validate the new weight calculations and classifications.

This refactor improves the accuracy of string extraction from Mach-O binaries, aligning it with the established standards for ELF and PE formats.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

* feat(macho): Add load command string extraction for Mach-O binaries

- Introduced a new module for extracting load command strings from Mach-O binaries, including library dependency paths and runtime search paths.
- Enhanced the `Tag` enum with new variants for `DylibPath`, `Rpath`, `RpathVariable`, and `FrameworkPath` to support the new extraction functionality.
- Updated the extraction module documentation with usage examples and detailed descriptions of the extraction process.
- Added integration tests to validate the load command extraction functionality against a Mach-O fixture.

This feature improves the ability to analyze Mach-O binaries by enabling the extraction of meaningful load command strings, which are crucial for understanding library dependencies and runtime behavior.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

* feat(tests): Enhance Mach-O load command extraction tests

- Added helper functions for extracting and sorting dylib paths, rpaths, and framework paths from load command strings.
- Updated assertions in the Mach-O integration tests to verify the presence of exports and ensure correct tagging of load command strings.
- Introduced snapshot tests for load command string extraction, providing a detailed breakdown of dylib paths, rpaths, and framework paths.
- Enhanced documentation in the fixtures README to clarify the purpose and expected contents of the `test_binary_macho` fixture.

These improvements strengthen the testing framework for Mach-O binaries, ensuring comprehensive validation of load command extraction and classification.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

* fix: Address code review feedback for Mach-O load command extraction (#103)

* Initial plan

* Apply code review fixes: error handling, integer overflow, formatting

Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com>

---------

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:analyzer Binary analyzer functionality documentation Improvements or additions to documentation enhancement New feature or request lang:rust Rust implementation lgtm This PR has been approved by a maintainer size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extract and Classify Strings from Mach-O Load Commands Mach-O Section Classification

3 participants