feat: Implement Mach-O load command extraction with section weight normalization and enhanced tagging by unclesp1d3r · Pull Request #67 · EvilBit-Labs/Stringy

unclesp1d3r · 2025-11-11T03:49:52Z

This pull request adds comprehensive support for extracting and classifying load command strings from Mach-O binaries, including library dependency paths and runtime search paths. It introduces a new extraction module, updates the tagging system to better classify extracted strings, and normalizes section weight scoring for Mach-O to a 0.0–1.0 scale for consistency. Extensive tests and documentation have been added to ensure correctness and ease of use.

Addresses issues #6 and #7

Mach-O Load Command Extraction:

Added new module src/extraction/macho_load_commands.rs to extract library paths (LC_LOAD_DYLIB, LC_LOAD_WEAK_DYLIB, LC_REEXPORT_DYLIB) and runtime search paths (LC_RPATH) from Mach-O load commands, with robust tagging and test coverage.
Updated extraction documentation and module exports to include extract_load_command_strings() and provided usage examples for Mach-O binaries. [1] [2]

Tagging and Classification Improvements:

Extended the Tag enum in src/types.rs with new tags: DylibPath, Rpath, RpathVariable, and FrameworkPath, and marked it as #[non_exhaustive] for future extensibility. [1] [2]

Section Weight Normalization and Classification:

Normalized Mach-O section weights to a 0.0–1.0 scale and updated logic to classify additional string-relevant sections (__objc_methname, __objc_classname, __ustring) as high-priority for extraction. [1] [2] [3]
Updated and expanded tests to validate new section classifications and normalized weights. [1] [2] [3]

Documentation and Fixture Updates:

Enhanced fixture documentation in tests/fixtures/README.md to describe Mach-O binary contents and provide instructions for generating test binaries with rpaths and framework dependencies. [1] [2]

…test - Updated the assertion for the resources field in the PE integration test to clarify that it may be None for minimal binaries, such as those compiled without resource files. - Removed redundant checks to streamline the test logic while maintaining clarity in the comments. This change enhances the readability of the test and ensures accurate expectations regarding resource availability in PE binaries. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

- Updated the section weight calculations in the Mach-O parser to use a normalized scale (0.0-1.0) for consistency with other formats. - Adjusted weights for various section types, including string data, read-only data, and code sections, to better reflect their likelihood of containing meaningful strings. - Enhanced section classification to include additional Objective-C related sections. - Added unit tests to validate the new weight calculations and classifications. This refactor improves the accuracy of string extraction from Mach-O binaries, aligning it with the established standards for ELF and PE formats. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

- Introduced a new module for extracting load command strings from Mach-O binaries, including library dependency paths and runtime search paths. - Enhanced the `Tag` enum with new variants for `DylibPath`, `Rpath`, `RpathVariable`, and `FrameworkPath` to support the new extraction functionality. - Updated the extraction module documentation with usage examples and detailed descriptions of the extraction process. - Added integration tests to validate the load command extraction functionality against a Mach-O fixture. This feature improves the ability to analyze Mach-O binaries by enabling the extraction of meaningful load command strings, which are crucial for understanding library dependencies and runtime behavior. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

- Added helper functions for extracting and sorting dylib paths, rpaths, and framework paths from load command strings. - Updated assertions in the Mach-O integration tests to verify the presence of exports and ensure correct tagging of load command strings. - Introduced snapshot tests for load command string extraction, providing a detailed breakdown of dylib paths, rpaths, and framework paths. - Enhanced documentation in the fixtures README to clarify the purpose and expected contents of the `test_binary_macho` fixture. These improvements strengthen the testing framework for Mach-O binaries, ensuring comprehensive validation of load command extraction and classification. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

coderabbitai · 2025-11-11T03:50:01Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Summary by CodeRabbit

New Features
- Added extraction of Mach-O load command strings, capturing dylib dependencies, framework paths, and runtime paths with automatic classification.
Improvements
- Normalized section weight calculations for enhanced Mach-O binary analysis accuracy.
- Expanded test coverage and documentation for Mach-O parsing capabilities.
Type System
- Added new classification tags: DylibPath, Rpath, RpathVariable, and FrameworkPath for improved binary metadata tagging.

Walkthrough

Normalized Mach-O section weight scale to 0.0–1.0, expanded string-section classifications, added a new extractor module to pull dylib and rpath strings from Mach-O load commands, introduced four Tag variants, updated docs/fixtures, and added comprehensive Mach-O tests and minor PE test relaxation.

Changes

Cohort / File(s)	Summary
Mach-O Section Weight Normalization `src/container/macho.rs`	Converted section weight calculations from 1–10 to 0.0–1.0 scale; reduced weights across SectionType branches; added `__objc_methname`, `__objc_classname`, `__ustring` as StringData in classification; updated related tests.
Load Command String Extraction `src/extraction/macho_load_commands.rs`	New module `extract_load_command_strings(data: &[u8]) -> Vec<FoundString>`: parses single/fat Mach-O binaries, extracts LC_LOAD_DYLIB/LC_LOAD_WEAK_DYLIB/LC_REEXPORT_DYLIB and LC_RPATH strings, classifies tags (DylibPath, Rpath, RpathVariable, FrameworkPath), and returns metadata-rich FoundString entries; includes unit tests and helpers.
Extraction Module Exposure `src/extraction/mod.rs`	Added `macho_load_commands` module and `pub use macho_load_commands::extract_load_command_strings;`; updated examples/docs to showcase new extractor.
Type Extensions `src/types.rs`	Marked `Tag` enum `#[non_exhaustive]` and added variants `DylibPath` (`"dylib-path"`), `Rpath` (`"rpath"`), `RpathVariable` (`"rpath-var"`), and `FrameworkPath` (`"framework-path"`).
Mach-O Testing & Documentation `tests/integration_macho.rs`, `tests/fixtures/README.md`	Added extensive integration tests for load command extraction, tag validation, framework detection, rpath-variable detection, metadata checks, and expanded fixture README with Mach-O load-command guidance and example build commands.
PE Test Relaxation `tests/integration_pe.rs`	Relaxed assertion about `ContainerInfo.resources` presence in `test_pe_import_export_extraction` to allow None for minimal binaries.

Sequence Diagram(s)

sequenceDiagram
    participant Caller as Caller
    participant Extractor as extract_load_command_strings()
    participant Goblin as goblin Parser
    participant Classifier as classify_* helpers
    participant Result as Vec<FoundString>

    Caller->>Extractor: data: &[u8]
    Extractor->>Goblin: parse (fat? -> extract arch -> parse single)
    Goblin->>Goblin: iterate load commands
    alt dylib load commands (LC_LOAD_DYLIB / _WEAK / _REEXPORT)
        Goblin->>Classifier: classify_dylib_path(path)
        Classifier-->>Goblin: tags [DylibPath, (FrameworkPath?)]
    else rpath (LC_RPATH)
        Goblin->>Classifier: classify_rpath(path)
        Classifier-->>Goblin: tags [Rpath, (RpathVariable?), (FrameworkPath?)]
    end
    Goblin->>Result: build FoundString (text, tags, encoding, length, source)
    Extractor-->>Caller: Vec<FoundString>

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25–30 minutes

Areas needing extra attention:
- src/extraction/macho_load_commands.rs: fat-binary handling, goblin API usage, tag heuristics, and edge-case handling.
- src/container/macho.rs: consistency and intent of normalized weights and new section classifications.
- src/types.rs: ensure serde renames and #[non_exhaustive] addition don't break serialization or downstream pattern matches.
- tests/integration_macho.rs and fixtures: verify fixture assumptions and snapshot stability.

Possibly related issues

Extract and Classify Strings from Mach-O Load Commands #7 — Implements Mach-O load-command string extraction and tagging (dylib/rpath), matching this PR’s extractor and Tag additions.
Mach-O Section Classification #6 — Adjusts Mach-O section classification and per-section weights; overlaps with the normalized weights and new section classifications here.
Parse Mach-O Load Commands for Dynamic Library and Framework Path Extraction #63 — Requests load-command extraction for Mach-O; this PR implements that functionality and related tests.

Possibly related PRs

feat: Adds section-weighting and improves ELF imports #45 — Related changes to Mach-O section weighting and SectionInfo weight normalization; overlaps code-wise with src/container/macho.rs edits.

Suggested labels

lgtm

Poem

🐰
I hop through bytes with eager paws,
I sniff out dylibs and rpath laws,
Weights trimmed neat, strings found with cheer,
Frameworks sparkle when they appear,
A tiny rabbit clap — hooray! 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main changes: Mach-O load command extraction, section weight normalization, and enhanced tagging.
Description check	✅ Passed	The description is comprehensive and directly related to the changeset, clearly explaining the new modules, tagging improvements, weight normalization, and documentation updates.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch 6-mach-o-section-classification

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2cf74de and d4334c9.

⛔ Files ignored due to path filters (1)

tests/snapshots/integration_macho__macho_load_command_strings.snap is excluded by !**/*.snap

📒 Files selected for processing (3)

src/extraction/macho_load_commands.rs (1 hunks)
tests/fixtures/README.md (3 hunks)
tests/integration_macho.rs (4 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

tests/integration_macho.rs (1)

src/extraction/macho_load_commands.rs (1)

extract_load_command_strings (53-74)

src/extraction/macho_load_commands.rs (1)

src/container/macho.rs (2)

parse (291-298)

extract_architecture_data (335-350)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: test-cross-platform (windows-latest, Windows)
GitHub Check: CodeQL Analyze

🔇 Additional comments (20)

tests/fixtures/README.md (2)

8-11: LGTM! Documentation clarity improved.

The bullet-point format successfully addresses the past review feedback and makes the fixture description much more readable.

39-73: Excellent documentation for Mach-O fixtures and load commands.

The build instructions are practical and the load command section clearly explains what's being tested and why cross-platform parsing works.

src/extraction/macho_load_commands.rs (8)

1-34: Well-documented module with correct doc example.

The usage example properly handles errors and demonstrates the API effectively. The no_run attribute is appropriate since it requires an external file.

53-74: LGTM! Consistent with existing parser behavior.

The function correctly handles both single and fat Mach-O binaries, and the approach of extracting from the first architecture in fat binaries aligns with the existing MachoParser implementation.

94-115: LGTM! Proper metadata for load command strings.

The hardcoded values (offset=0, rva=None, section=None, score=0) are appropriate for load command strings, which don't originate from sections and have different characteristics than section-based strings.

122-143: LGTM! Consistent implementation with dylib extraction.

The function follows the same pattern as extract_dylib_strings, ensuring consistency in metadata handling across different load command types.

145-185: LGTM! Clear and correct classification logic.

The classification functions use appropriate string matching for Mach-O conventions. The tag combinations (e.g., DylibPath + FilePath, Rpath + RpathVariable) provide useful categorization for consumers.

187-204: LGTM! Proper overflow protection implemented.

The use of checked_add successfully addresses the past review concern about potential integer overflow, ensuring safe bounds checking.

206-295: Excellent unit test coverage.

The tests cover edge cases (invalid/empty data) and thoroughly validate the utility functions with both positive and negative test cases.

297-367: Well-designed integration tests.

The use of #[ignore] for fixture-dependent tests is appropriate, and the graceful handling when fixtures don't exist prevents test failures. Tag validation ensures the implementation maintains internal consistency.

tests/integration_macho.rs (10)

12-44: Excellent test helpers that reduce duplication.

The filtering and sorting functions provide consistent, deterministic results across tests and improve test readability. The has_rpath_variable helper appropriately mirrors the extraction module logic.

69-84: LGTM! Pragmatic relaxation of export assertions.

The change acknowledges that executables may not consistently export specific symbols, making the test more robust while still validating that exports are found.

143-250: Comprehensive test with excellent coverage.

The test thoroughly validates load command extraction, including metadata, tags, and special cases like framework paths and rpath variables. The diagnostic output aids debugging.

252-317: LGTM! Well-formatted snapshot test.

The format strings have been corrected to avoid trailing whitespace, and the structured output provides clear, human-readable snapshots for regression testing.

319-385: Excellent tag validation ensuring API consistency.

This test enforces critical tag invariants (e.g., DylibPath implies FilePath, mutual exclusion of DylibPath and Rpath) and provides clear error messages when violations occur.

387-447: Thorough bidirectional validation of framework path tagging.

The test correctly validates both that .framework strings are tagged and that non-framework strings are not, preventing false positives and false negatives.

449-508: LGTM! Correct validation of rpath variable detection.

The past review issue regarding the redundant condition has been addressed, and the test now correctly validates the presence or absence of the RpathVariable tag with helpful diagnostic output.

510-526: Essential edge case coverage.

Testing empty and invalid data ensures the API degrades gracefully without panicking, which is critical for robustness.

528-582: Useful categorization and validation of dylib paths.

The test validates the FilePath tag invariant and provides insightful categorization into system, framework, and other libraries.

584-624: Thorough metadata validation with helpful future-proofing note.

The test validates all critical metadata fields and includes a useful comment about offset/rva fields being unspecified for load commands, which helps future maintainers understand intentional design choices.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

dosubot · 2025-11-11T03:50:59Z

Documentation Updates

1 document(s) were updated by changes in this PR:

stringy

Binary Section Weighting and Symbol Extraction (View Changes)

^{How did I do? Any feedback?}

unclesp1d3r · 2025-11-11T03:53:53Z

@coderabbitai Fix the PR title.

coderabbitai · 2025-11-11T03:54:34Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get the current PR title
gh pr view 67 --json title,url --jq '.title'

# Also check if there's a standard format by looking at recent PRs
gh pr list --limit 5 --state merged --json number,title --jq '.[] | "#\(.number): \(.title)"'

Length of output: 500

🏁 Script executed:

#!/bin/bash
# Update the PR title to follow conventional commit format and be more descriptive
gh pr edit 67 --title "feat: Implement Mach-O load command extraction with section weight normalization and enhanced tagging"

Length of output: 59

@unclesp1d3r: I've fixed the PR title.

Old title:
6 mach o section classification Normalize section weights and extract load command strings

New title:
feat: Implement Mach-O load command extraction with section weight normalization and enhanced tagging

The title now follows the conventional commit format (with feat: prefix) and is properly capitalized, matching the style of other recent PRs in the repository like #66.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/types.rs (1)

13-47: Flag: #[non_exhaustive] here breaks existing consumers

Tag is part of the public API. Flipping it to #[non_exhaustive] forces every downstream exhaustive match to add a wildcard arm, so current consumers will stop compiling unless we ship a breaking release. Either coordinate a major version bump (even in 0.x terms) or hold off on this annotation so we don’t strand users mid-release.

🧹 Nitpick comments (2)

src/extraction/macho_load_commands.rs (1)
145-171: Also flag @-based dylib paths as RpathVariable

Dylib load commands often carry @rpath/@loader_path/@executable_path prefixes (e.g. framework stubs), but classify_dylib_path currently ignores them so those strings miss the new RpathVariable tag. Mirroring the check you have in classify_rpath keeps tagging consistent and makes it easier to surface the placeholders regardless of which load command they appear in.
 fn classify_dylib_path(path: &str) -> Vec<Tag> {
-    let mut tags = vec![Tag::DylibPath, Tag::FilePath];
+    let mut tags = vec![Tag::DylibPath, Tag::FilePath];
+
+    if contains_rpath_variable(path) {
+        tags.push(Tag::RpathVariable);
+    }
 
     if is_framework_path(path) {
         tags.push(Tag::FrameworkPath);
     }
tests/integration_macho.rs (1)
578-581: Simplify redundant logical condition.

Since system_libraries is a filtered subset of dylib_paths, the condition !system_libraries.is_empty() || !dylib_paths.is_empty() is logically equivalent to just !dylib_paths.is_empty().
-    assert!(
-        !system_libraries.is_empty() || !dylib_paths.is_empty(),
-        "Should find at least some system libraries or dylib dependencies"
-    );
+    assert!(
+        !dylib_paths.is_empty(),
+        "Should find at least some dylib dependencies"
+    );

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 08cd31b and 2cf74de.

⛔ Files ignored due to path filters (1)

tests/snapshots/integration_macho__macho_load_command_strings.snap is excluded by !**/*.snap

📒 Files selected for processing (7)

src/container/macho.rs (7 hunks)
src/extraction/macho_load_commands.rs (1 hunks)
src/extraction/mod.rs (2 hunks)
src/types.rs (2 hunks)
tests/fixtures/README.md (3 hunks)
tests/integration_macho.rs (4 hunks)
tests/integration_pe.rs (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (4)

src/extraction/macho_load_commands.rs (1)

src/container/macho.rs (2)

parse (291-298)

extract_architecture_data (335-350)

tests/integration_macho.rs (1)

src/extraction/macho_load_commands.rs (1)

extract_load_command_strings (49-70)

src/container/macho.rs (2)

src/container/elf.rs (2)

classify_section (54-81)

calculate_section_weight (25-51)

src/container/pe.rs (2)

classify_section (139-178)

calculate_section_weight (113-136)

src/extraction/mod.rs (1)

src/extraction/macho_load_commands.rs (1)

extract_load_command_strings (49-70)

🪛 LanguageTool

tests/fixtures/README.md

[uncategorized] ~8-~8: Possible missing comma found.
Context: ...64 Mach-O binary (contains typical load commands including LC_LOAD_DYLIB for system libr...

(AI_HYDRA_LEO_MISSING_COMMA)

🪛 markdownlint-cli2 (0.18.1)

tests/fixtures/README.md

59-59: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔍 Remote MCP Linear

Based on the gathered context, I have sufficient information to provide a comprehensive review summary. Let me compile the findings:

PR Review Context Summary

Key Changes Overview

This PR implements Mach-O load command string extraction and normalizes section weight scoring. The changes span four main areas:

1. Mach-O Section Weight Normalization ``

Weight scale normalized from 1-10 to 0.0-1.0 across all section types
String-related sections (__cstring, __const, __cfstring) reduced proportionally (e.g., 10.0→1.0, 9.0→0.7)
New classifications for Objective-C sections (__objc_methname, __objc_classname, __ustring)
Non-string data weights reduced significantly (ReadOnlyData: 7.0→0.4, Code: 1.0→0.1)

2. New Load Command Extraction Module ``

The PR adds src/extraction/macho_load_commands.rs with:

Extraction of dylib paths from LC_LOAD_DYLIB, LC_LOAD_WEAK_DYLIB, LC_REEXPORT_DYLIB
Runtime path extraction from LC_RPATH commands
Tagging system with DylibPath, FilePath, FrameworkPath, Rpath, RpathVariable variants
Support for framework detection and @-variable identification

3. Tag Enum Extension ``

New Tag enum variants added:

DylibPath (dylib-path), Rpath (rpath), RpathVariable (rpath-var), FrameworkPath (framework-path)
Enum marked as #[non_exhaustive] for future extensibility
Serde serialization mappings for JSON output

4. Comprehensive Test Coverage

Integration tests validate:

Load command extraction with source/encoding verification
Tag integrity across DylibPath, Rpath, FrameworkPath classifications
Framework and @-variable detection accuracy
Empty/invalid input handling
Metadata correctness (section=None, UTF-8 encoding, LoadCommand source)

Strategic Context

This work advances PROD-275 (MVP Pipeline Epic) by completing Mach-O-specific extraction capabilities. It directly implements requirements from PROD-298 (Load Command Parsing) and PROD-294 (Import/Export Integration). The section weight normalization aligns with foundation work in PROD-237-239 (ELF/PE section classification standards).

Integration Points

The PR integrates with existing infrastructure:

Uses goblin crate for Mach-O binary parsing
Extends FoundString data structure for metadata tracking
Plugs into modular extraction pipeline architecture
Compatible with section-based string extraction workflow,

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: test-cross-platform (macos-latest, macOS)
GitHub Check: test-cross-platform (windows-latest, Windows)
GitHub Check: CodeQL Analyze

🔇 Additional comments (13)

src/container/macho.rs (1)

95-101: Comprehensive Objective-C coverage.

Capturing __objc_methname, __objc_classname, and __ustring as StringData keeps the classifier aligned with real-world string reservoirs—great attention to detail.

tests/integration_pe.rs (1)

32-33: Good call-out on optional resources

The note makes the intent around ContainerInfo::resources much clearer for the minimal PE fixture. Thanks for documenting that nuance.

src/extraction/mod.rs (1)

14-38: Example update looks solid

The refreshed docs and example nicely surface the new Mach-O extractor alongside the existing PE helpers.

tests/integration_macho.rs (10)

12-44: Well-structured test helpers.

The helper functions provide clean abstractions for filtering and sorting load command strings by tag type. The consistent pattern and deterministic sorting (by text) ensure reliable test assertions.

69-84: Appropriate relaxation of export assertions.

The change from checking specific symbol names to verifying non-empty exports is sensible, as Mach-O executable symbol export behavior can vary. The inline comment clearly documents the rationale.

143-250: Comprehensive load command extraction test.

This test thoroughly validates the extraction pipeline: source/encoding correctness, tag presence, framework path relationships, and rpath variable detection. The diagnostic output aids debugging and the assertion coverage is appropriate.

252-315: Effective snapshot test for regression protection.

The snapshot test provides human-readable output with categorization (frameworks, @-variables) and count summaries. Limiting to 20 items per category with overflow indication balances detail with snapshot size.

317-383: Thorough tag relationship validation.

This test validates critical invariants: DylibPath↔FilePath, RpathVariable↔Rpath, FrameworkPath↔(DylibPath|Rpath), and mutual exclusivity of DylibPath/Rpath. The comprehensive assertions with descriptive messages ensure tag integrity.

385-445: Bidirectional framework path validation.

The test validates both positive and negative cases: strings with .framework must have FrameworkPath tags, and strings without must not. The categorization by dylib/rpath frameworks provides useful diagnostic context.

447-508: Comprehensive rpath variable detection test.

The bidirectional validation ensures RpathVariable tags are correctly applied based on the presence of @-variables. The diagnostic output detailing which specific variables are found (@rpath, @executable_path, @loader_path) aids debugging.

510-526: Good defensive test for invalid input.

This test ensures graceful error handling for both empty and invalid input, verifying the function returns an empty vector without panicking. Essential coverage for robustness.

528-582: Solid dylib path classification test.

The test validates FilePath tag presence on all DylibPath strings and categorizes paths by type (system, framework, other). The categorization logic and diagnostic output are well-structured.

584-624: Thorough metadata validation test.

This test validates critical metadata fields: section (None for load commands), length (matches text), source (LoadCommand), and encoding (UTF-8). The comment acknowledging unspecified offset/rva values allows for future enhancements without breaking existing tests.

Copilot

Pull Request Overview

This pull request implements comprehensive Mach-O load command string extraction with enhanced tagging and normalized section weight scoring. The changes add support for extracting library dependency paths and runtime search paths from Mach-O binaries, expand the tagging system with Mach-O-specific classifications, and normalize Mach-O section weights to a 0.0-1.0 scale (while ELF and PE still use 1-10).

Key Changes:

Added macho_load_commands.rs module with extract_load_command_strings() function to extract dylib paths (LC_LOAD_DYLIB, LC_LOAD_WEAK_DYLIB, LC_REEXPORT_DYLIB) and runtime search paths (LC_RPATH) from Mach-O load commands
Extended Tag enum with four new tags: DylibPath, Rpath, RpathVariable, and FrameworkPath, marked enum as #[non_exhaustive] for future extensibility
Normalized Mach-O section weights to 0.0-1.0 scale and added classification for __objc_methname, __objc_classname, and __ustring sections

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
src/extraction/macho_load_commands.rs	New module implementing load command string extraction with classification and tagging logic for dylib paths, rpaths, and framework paths
src/extraction/mod.rs	Updated module documentation and exports to expose `extract_load_command_strings()` function
src/types.rs	Added four new Tag variants for Mach-O load commands and marked Tag enum as non_exhaustive
src/container/macho.rs	Normalized section weights to 0.0-1.0 scale, added classification for additional Objective-C sections, updated tests
tests/integration_macho.rs	Added comprehensive test suite for load command extraction including snapshot tests, tag validation, and edge case handling
tests/integration_pe.rs	Removed tautological assertion checking resources field existence
tests/fixtures/README.md	Enhanced documentation describing Mach-O load command contents and build instructions
tests/snapshots/integration_macho__macho_load_command_strings.snap	Snapshot test output for Mach-O load command string extraction

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-11T04:24:14Z

+    if offset + size <= data.len() {
+        Ok(&data[offset..offset + size])


Potential integer overflow when adding offset + size. If both values are large, this could overflow before the comparison with data.len(). Use checked arithmetic:

let offset = arch.offset as usize; let size = arch.size as usize; if let Some(end) = offset.checked_add(size) { if end <= data.len() { return Ok(&data[offset..end]); } } Err(())

Suggested change

if offset + size <= data.len() {

Ok(&data[offset..offset + size])

if let Some(end) = offset.checked_add(size) {

if end <= data.len() {

Ok(&data[offset..end])

} else {

Err(())

}

unclesp1d3r · 2025-11-11T04:24:59Z

@copilot open a new pull request to apply changes based on the comments in this thread

Copilot · 2025-11-11T04:25:08Z

@unclesp1d3r I've opened a new pull request, #103, to work on those changes. Once the pull request is ready, I'll request review from you.

…103) * Initial plan * Apply code review fixes: error handling, integer overflow, formatting Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com>

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

…rmalization and enhanced tagging (#67) * refactor(tests): Simplify resource field assertion in PE integration test - Updated the assertion for the resources field in the PE integration test to clarify that it may be None for minimal binaries, such as those compiled without resource files. - Removed redundant checks to streamline the test logic while maintaining clarity in the comments. This change enhances the readability of the test and ensures accurate expectations regarding resource availability in PE binaries. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io> * refactor(macho): Normalize section weights for Mach-O format - Updated the section weight calculations in the Mach-O parser to use a normalized scale (0.0-1.0) for consistency with other formats. - Adjusted weights for various section types, including string data, read-only data, and code sections, to better reflect their likelihood of containing meaningful strings. - Enhanced section classification to include additional Objective-C related sections. - Added unit tests to validate the new weight calculations and classifications. This refactor improves the accuracy of string extraction from Mach-O binaries, aligning it with the established standards for ELF and PE formats. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io> * feat(macho): Add load command string extraction for Mach-O binaries - Introduced a new module for extracting load command strings from Mach-O binaries, including library dependency paths and runtime search paths. - Enhanced the `Tag` enum with new variants for `DylibPath`, `Rpath`, `RpathVariable`, and `FrameworkPath` to support the new extraction functionality. - Updated the extraction module documentation with usage examples and detailed descriptions of the extraction process. - Added integration tests to validate the load command extraction functionality against a Mach-O fixture. This feature improves the ability to analyze Mach-O binaries by enabling the extraction of meaningful load command strings, which are crucial for understanding library dependencies and runtime behavior. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io> * feat(tests): Enhance Mach-O load command extraction tests - Added helper functions for extracting and sorting dylib paths, rpaths, and framework paths from load command strings. - Updated assertions in the Mach-O integration tests to verify the presence of exports and ensure correct tagging of load command strings. - Introduced snapshot tests for load command string extraction, providing a detailed breakdown of dylib paths, rpaths, and framework paths. - Enhanced documentation in the fixtures README to clarify the purpose and expected contents of the `test_binary_macho` fixture. These improvements strengthen the testing framework for Mach-O binaries, ensuring comprehensive validation of load command extraction and classification. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io> * fix: Address code review feedback for Mach-O load command extraction (#103) * Initial plan * Apply code review fixes: error handling, integer overflow, formatting Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com> --------- Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>

unclesp1d3r added 4 commits November 10, 2025 20:04

unclesp1d3r linked an issue Nov 11, 2025 that may be closed by this pull request

Mach-O Section Classification #6

Closed

7 tasks

dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Nov 11, 2025

dosubot Bot added the area:analyzer Binary analyzer functionality label Nov 11, 2025

coderabbitai Bot changed the title ~~6 mach o section classification @coderabbitai~~ 6 mach o section classification Normalize section weights and extract load command strings Nov 11, 2025

coderabbitai Bot added documentation Improvements or additions to documentation enhancement New feature or request lang:rust Rust implementation size:M This PR changes 30-99 lines, ignoring generated files. labels Nov 11, 2025

dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. size:XXL This PR changes 1000+ lines, ignoring generated files. labels Nov 11, 2025

coderabbitai Bot changed the title ~~6 mach o section classification Normalize section weights and extract load command strings~~ feat: Implement Mach-O load command extraction with section weight normalization and enhanced tagging Nov 11, 2025

coderabbitai Bot reviewed Nov 11, 2025

View reviewed changes

coderabbitai Bot previously approved these changes Nov 11, 2025

View reviewed changes

unclesp1d3r requested a review from Copilot November 11, 2025 04:18

unclesp1d3r self-assigned this Nov 11, 2025

Copilot started reviewing on behalf of unclesp1d3r November 11, 2025 04:18 View session

Copilot finished reviewing on behalf of unclesp1d3r November 11, 2025 04:23

Copilot AI reviewed Nov 11, 2025

View reviewed changes

Copilot AI mentioned this pull request Nov 11, 2025

fix: Address code review feedback for Mach-O load command extraction #103

Merged

unclesp1d3r dismissed coderabbitai[bot]’s stale review via d4334c9 November 11, 2025 04:43

coderabbitai Bot added the lgtm This PR has been approved by a maintainer label Nov 11, 2025

coderabbitai Bot approved these changes Nov 11, 2025

View reviewed changes

Copilot AI reviewed Nov 11, 2025

View reviewed changes

unclesp1d3r merged commit baf4526 into main Nov 11, 2025
17 of 18 checks passed

unclesp1d3r deleted the 6-mach-o-section-classification branch November 11, 2025 05:25

unclesp1d3r linked an issue Nov 11, 2025 that may be closed by this pull request

Extract and Classify Strings from Mach-O Load Commands #7

Closed

8 tasks

unclesp1d3r mentioned this pull request Nov 11, 2025

Extract and Classify Strings from Mach-O Load Commands #7

Closed

8 tasks

coderabbitai Bot mentioned this pull request Nov 13, 2025

Add String Extraction Framework with Noise Filtering and Benchmarks #108

Merged

		if offset + size <= data.len() {
		Ok(&data[offset..offset + size])

-    if offset + size <= data.len() {
-        Ok(&data[offset..offset + size])
+    if let Some(end) = offset.checked_add(size) {
+        if end <= data.len() {
+            Ok(&data[offset..end])
+        } else {
+            Err(())
+        }

Uh oh!

Conversation

unclesp1d3r commented Nov 11, 2025

Uh oh!

coderabbitai Bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Poem

Pre-merge checks and finishing touches

Uh oh!

dosubot Bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

unclesp1d3r commented Nov 11, 2025

Uh oh!

coderabbitai Bot commented Nov 11, 2025

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

PR Review Context Summary

Key Changes Overview

Strategic Context

Integration Points

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

unclesp1d3r commented Nov 11, 2025

Uh oh!

Copilot AI commented Nov 11, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai Bot commented Nov 11, 2025 •

edited

Loading

dosubot Bot commented Nov 11, 2025 •

edited

Loading