Skip to content

feat: Implement comprehensive PE support with section classification, import/export parsing, and resource extraction#66

Merged
unclesp1d3r merged 5 commits into
mainfrom
3-implement-pe-section-classification-and-importexport-table-parsing
Nov 11, 2025
Merged

feat: Implement comprehensive PE support with section classification, import/export parsing, and resource extraction#66
unclesp1d3r merged 5 commits into
mainfrom
3-implement-pe-section-classification-and-importexport-table-parsing

Conversation

@unclesp1d3r
Copy link
Copy Markdown
Member

This pull request adds comprehensive support for PE (Portable Executable) resource extraction and section classification, including full implementation of VERSIONINFO, STRINGTABLE, and MANIFEST resource parsing. It also introduces new benchmarks for PE parsing and updates documentation to reflect these enhancements. Additionally, the CI/CD workflows are updated to use newer versions of Rust toolchain and GitHub Actions, and dependencies are updated to support PE resource parsing.

Directly addresses issues #3, #4, and #5.

PE Resource Extraction and Section Classification

  • Implemented full PE resource extraction: VERSIONINFO, STRINGTABLE, and MANIFEST parsing, with robust error handling and comprehensive unit/integration tests. All extracted strings are tagged and metadata is included. [1] [2] [3]
  • Enhanced PE section classification logic to assign weights based on string likelihood, improving prioritization for string extraction. [1] [2]
  • Updated documentation to detail resource extraction, section weighting, and limitations/future enhancements for PE parsing. [1] [2]
  • Added pelite dependency to Cargo.toml for PE resource parsing support.

Benchmarks and Testing

  • Added new PE parsing benchmarks in benches/pe.rs, covering full parse, import extraction, and export extraction using test fixtures.
  • Registered new PE benchmark in Cargo.toml.

Documentation and Status Updates

  • Updated README.md and spec/task files to reflect completed PE resource extraction features (Phase 1 and 2), including implementation notes and usage examples. [1] [2] [3]

CI/CD and Workflow Updates

  • Updated Rust toolchain in all GitHub workflow files from version 1.90 to 1.91.0 for improved compatibility and features. [1] [2] [3] [4] [5] [6] [7] [8]
  • Downgraded actions/upload-artifact and actions/attest-build-provenance to v4 and v2 respectively in release workflow for stability. [1] [2] [3] [4] [5] [6]

Other Minor Changes

  • Updated ELF parser to explicitly set ordinal: None for imports/exports, clarifying ELF symbol handling. [1] [2]

- Introduced a new benchmark for PE parsing in `benches/pe.rs` to evaluate performance.
- Enhanced the PE parser to include import and export ordinal extraction, improving accuracy in symbol handling.
- Updated documentation to reflect new features and extraction capabilities.
- Added snapshot tests for PE symbol extraction to ensure consistent output.

This commit improves the performance measurement and accuracy of the PE parser, facilitating better analysis of Portable Executable files.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Added support for extracting resource metadata from PE binaries using the pelite library.
- Introduced new types for resource metadata, including ResourceMetadata and ResourceType.
- Updated ContainerInfo to include an optional resources field for storing extracted resource data.
- Refactored PE parser to utilize pelite for resource extraction while maintaining goblin for general PE structure parsing.
- Added integration tests to verify resource extraction functionality and ensure robustness.

This commit improves the ability to analyze PE binaries by enabling the extraction of meaningful resource information, which is crucial for comprehensive string analysis.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Updated the Rust toolchain version from 1.90 to 1.91.0 in multiple GitHub Actions workflows, including CI, CodeQL, Copilot setup, documentation, and security workflows.
- Ensured consistency in the toolchain version used across all workflows to leverage the latest features and improvements.

This update enhances the development environment by utilizing the most recent stable Rust version.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Finalized the implementation of resource enumeration and metadata extraction for PE binaries, including VERSIONINFO, STRINGTABLE, and MANIFEST resources.
- Updated documentation to reflect the completion of Phase 1, detailing the capabilities of the resource extraction framework.
- Enhanced unit tests to cover edge cases and ensure robust handling of various resource scenarios.
- Improved error handling and added comprehensive test coverage for resource detection and extraction.

This commit significantly enhances the ability to analyze PE binaries by providing detailed resource metadata, laying the groundwork for future string extraction capabilities.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Finalized the implementation of string extraction from PE resources, including VERSIONINFO, STRINGTABLE, and MANIFEST.
- Enhanced the extraction process with UTF-16LE decoding utilities and comprehensive unit and integration tests.
- Updated documentation to reflect the capabilities of the new extraction features and provided usage examples.
- Improved error handling to ensure graceful degradation during extraction failures.

This commit significantly enhances the ability to extract meaningful strings from PE binaries, facilitating better analysis and understanding of resource content.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
@unclesp1d3r unclesp1d3r linked an issue Nov 11, 2025 that may be closed by this pull request
5 tasks
@dosubot dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Nov 11, 2025
@unclesp1d3r unclesp1d3r linked an issue Nov 11, 2025 that may be closed by this pull request
10 tasks
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Nov 11, 2025

Caution

Review failed

Failed to post review comments

Summary by CodeRabbit

Release Notes

  • New Features

    • PE resource extraction now supports VERSIONINFO, STRINGTABLE, and MANIFEST resource types
    • Resource enumeration API with language, size, and metadata details
    • Enhanced import/export data including ordinal information and forwarded export detection
  • Tests

    • Expanded PE resource extraction and symbol handling test coverage

Walkthrough

This PR implements PE resource extraction capabilities including VERSIONINFO, STRINGTABLE, and MANIFEST parsing, updates Rust toolchain versions across CI workflows, introduces the pelite dependency, extends container types with ordinal support and resource metadata, and adds comprehensive tests and benchmarks.

Changes

Cohort / File(s) Summary
GitHub Actions Toolchain Updates
.github/workflows/ci.yml, .github/workflows/codeql.yml, .github/workflows/copilot-setup-steps.yml, .github/workflows/docs.yml, .github/workflows/security.yml
Bumped Rust toolchain version from 1.90 to 1.91.0 across multiple CI workflows; security.yml additionally fixed indentation structure of cargo-deny-action step.
GitHub Actions Downgrades
.github/workflows/release.yml
Downgraded actions/upload-artifact from v5 to v4 and actions/attest-build-provenance from v3 to v2.
Dependency & Build Configuration
Cargo.toml
Added pelite = "0.10" dependency; added PE benchmark target [[bench]] name = "pe" with harness = false.
Task Tracking & Documentation
.kiro/specs/stringy-binary-analyzer/tasks.md, README.md, docs/src/binary-formats.md
Marked PE section classification and resource extraction tasks as complete; added PE resource enumeration (Phase 1) to development status; expanded binary-formats documentation with enhanced import/export extraction and Phase 2 resource extraction details including section weighting and implementation hooks.
Core Type Definitions
src/types.rs
Marked ContainerInfo as #[non\_exhaustive]; added resources field (Option<Vec>); introduced ContainerInfo::new() constructor; added ordinal field to ImportInfo; introduced ResourceType enum (VersionInfo, StringTable, Manifest, Other), ResourceMetadata, ResourceStringTable, and ResourceStringEntry structs; added error conversions for pelite types.
Container Parsing Updates
src/container/elf.rs, src/container/macho.rs, src/container/pe.rs
Updated ELF and Mach-O parsers to set ordinal: None for imports/exports and use ContainerInfo::new() constructor; enhanced PE parser with ordinal synthesis for imports, ordinal computation for exports, forwarded export detection, and resource extraction via pe_resources; improved section classification to account for MEM\_EXECUTE flag.
Module Exports
src/lib.rs, src/extraction/mod.rs
Added pub mod extraction and pub mod output; added re-exports for ResourceMetadata, ResourceStringEntry, ResourceStringTable, ResourceType; created new pe_resources submodule with public use statements.
PE Resource Extraction Module
src/extraction/pe_resources.rs
New module implementing Phase 1/2 resource extraction with public functions: extract_resources(), extract_version_info_strings(), extract_string_table_strings(), extract_manifest_strings(), extract_resource_strings(); includes UTF-16LE decoding, manifest encoding detection, comprehensive error handling, and extensive unit/integration tests.
Benchmarks
benches/pe.rs
New benchmark file with three Criterion-based benchmarks: bench_pe_full_parse, bench_pe_parse_with_imports, bench_pe_parse_with_exports using test fixture test_binary_pe.exe.
Test Fixtures & Documentation
tests/fixtures/README.md, tests/fixtures/test_binary_with_resources.c, tests/fixtures/test_binary_with_resources.rc
Added documentation and resource testing section; added new C source file with exported and helper functions; added Windows resource script defining VERSIONINFO and RT_STRING (STRINGTABLE) resources.
Integration Tests
tests/integration_pe.rs
Added comprehensive PE resource and symbol extraction tests including snapshot tests for PE imports/exports, resource enumeration and extraction, version info and string table parsing, resource string extraction, and fixture validation with error guidance.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant PeParser
    participant extract_imports
    participant extract_exports
    participant pe_resources
    participant ContainerInfo

    User->>PeParser: parse(pe_data)
    PeParser->>extract_imports: extract_imports(pe)
    extract_imports->>extract_imports: Synthesize ordinal names<br/>Populate ordinal field
    extract_imports-->>PeParser: Vec<ImportInfo>
    PeParser->>extract_exports: extract_exports(pe)
    extract_exports->>extract_exports: Compute ordinals<br/>Detect forwarded exports<br/>Annotate names
    extract_exports-->>PeParser: Vec<ExportInfo>
    PeParser->>pe_resources: extract_resources(data)
    pe_resources->>pe_resources: Parse resource directory<br/>Decode VERSIONINFO/STRINGTABLE<br/>Extract MANIFEST
    pe_resources-->>PeParser: Vec<ResourceMetadata>
    PeParser->>ContainerInfo: ContainerInfo::new(..., resources)
    ContainerInfo-->>User: ContainerInfo with<br/>imports, exports,<br/>resources, ordinals
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • src/extraction/pe_resources.rs — Dense logic with resource parsing, UTF-16LE decoding, multiple resource-type handlers, and ~300+ lines of comprehensive unit/integration tests; requires careful review of resource tree traversal and error handling paths
  • src/types.rs — Significant public API surface changes (ContainerInfo::new constructor, new resource-related types, ImportInfo.ordinal); impacts all downstream parser implementations
  • src/container/{elf,macho,pe}.rs — Cross-file API changes affecting container construction and ordinal/resource population; PE parser has the most logic density with section classification, forwarded export handling, and resource integration
  • tests/integration_pe.rs — Extensive test additions with snapshot assertions and fixture handling; validates both new and existing parsing paths

Possibly related issues

  • Addresses objective of #4: Implements PE resource extraction module with public APIs for resource metadata and string extraction
  • Addresses objective of #57: Provides PE resource extraction integration into container parsing pipeline with VERSIONINFO/STRINGTABLE/MANIFEST support
  • Addresses objective of #5: Adds comprehensive PE resource string extraction with extract_resource_strings and related public types
  • Addresses objective of #40: Implements PE resource extraction and import/export enhancements (ordinals, forwarded exports) described in epic
  • Addresses objective of #59: Populates import/export ordinals and exposes extraction modules/resources needed for import/export surfacing
  • Addresses objective of #3: Implements PE section classification logic and import/export parsing including ordinal handling
  • Addresses objective of #56: Implements PE resource-directory parsing and string extraction with pelite integration

Possibly related PRs

  • PR #55: Modifies ELF import/export extraction and ContainerInfo construction in src/container/elf.rs, directly affected by this PR's API changes
  • PR #51: Modifies ELF export/import extraction logic in src/container/elf.rs; shares ordinal field additions and ContainerInfo constructor changes
  • PR #45: Modifies same container parsing files (src/container/\*.rs) and src/types.rs; shares changes to SectionInfo, ImportInfo, ExportInfo, and ContainerInfo construction

Suggested labels

enhancement, lang:rust, github_actions

Poem

🐰 PE resources now sing with glee,
Pelite helps us parse with three—
VERSIONINFO, STRINGTABLE, MANIFEST flow,
Ordinals tallied from deep in the elf-rock below!
Benchmarks bounce and fixtures fulfill the feast, 🎉

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title directly addresses the main objective: PE section classification and import/export table parsing implementation.
Description check ✅ Passed The description provides comprehensive coverage of all major changes including PE resource extraction, section classification, benchmarks, documentation, and CI/CD updates.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 3-implement-pe-section-classification-and-importexport-table-parsing

Comment @coderabbitai help to get the list of available commands and usage tips.

@unclesp1d3r unclesp1d3r linked an issue Nov 11, 2025 that may be closed by this pull request
13 tasks
@dosubot dosubot Bot added dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation enhancement New feature or request github_actions Pull requests that update GitHub Actions code labels Nov 11, 2025
@dosubot
Copy link
Copy Markdown

dosubot Bot commented Nov 11, 2025

Documentation Updates

2 document(s) were updated by changes in this PR:

stringy

How did I do? Any feedback?  Join Discord

@coderabbitai coderabbitai Bot added the lang:rust Rust implementation label Nov 11, 2025
@unclesp1d3r
Copy link
Copy Markdown
Member Author

@coderabbitai Fix the PR title

@coderabbitai coderabbitai Bot changed the title 3 implement pe section classification and importexport table parsing feat: Implement comprehensive PE support with section classification, import/export parsing, and resource extraction Nov 11, 2025
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Nov 11, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

@unclesp1d3r I'll update the PR title to better reflect the comprehensive changes in this pull request.

The new title better describes the full scope of this PR, which includes:

  • PE section classification with weight-based prioritization
  • Import/export table parsing with ordinal support
  • Resource extraction (VERSIONINFO, STRINGTABLE, MANIFEST)
  • Supporting infrastructure (pelite dependency, benchmarks, tests, documentation)

🏁 Script executed:

#!/bin/bash
# Update the PR title to be more descriptive and properly formatted
gh pr edit 66 --title "feat: Implement comprehensive PE support with section classification, import/export parsing, and resource extraction"

Length of output: 59


@unclesp1d3r ✅ PR title has been successfully updated!

New title: feat: Implement comprehensive PE support with section classification, import/export parsing, and resource extraction

The updated title now:

  • Follows conventional commit format with the feat: prefix
  • Properly capitalizes and formats the description
  • Captures the comprehensive scope of changes including section classification, import/export parsing, and resource extraction
  • Removes the issue number prefix for better readability

@unclesp1d3r unclesp1d3r merged commit 08cd31b into main Nov 11, 2025
18 of 19 checks passed
@unclesp1d3r unclesp1d3r deleted the 3-implement-pe-section-classification-and-importexport-table-parsing branch November 11, 2025 00:56
unclesp1d3r added a commit that referenced this pull request Feb 25, 2026
… import/export parsing, and resource extraction (#66)

* feat(pe): Add PE benchmark and enhance import/export extraction

- Introduced a new benchmark for PE parsing in `benches/pe.rs` to evaluate performance.
- Enhanced the PE parser to include import and export ordinal extraction, improving accuracy in symbol handling.
- Updated documentation to reflect new features and extraction capabilities.
- Added snapshot tests for PE symbol extraction to ensure consistent output.

This commit improves the performance measurement and accuracy of the PE parser, facilitating better analysis of Portable Executable files.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

* feat(pe): Enhance resource extraction in PE binaries

- Added support for extracting resource metadata from PE binaries using the pelite library.
- Introduced new types for resource metadata, including ResourceMetadata and ResourceType.
- Updated ContainerInfo to include an optional resources field for storing extracted resource data.
- Refactored PE parser to utilize pelite for resource extraction while maintaining goblin for general PE structure parsing.
- Added integration tests to verify resource extraction functionality and ensure robustness.

This commit improves the ability to analyze PE binaries by enabling the extraction of meaningful resource information, which is crucial for comprehensive string analysis.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

* chore(ci): Update Rust toolchain version to 1.91.0 across workflows

- Updated the Rust toolchain version from 1.90 to 1.91.0 in multiple GitHub Actions workflows, including CI, CodeQL, Copilot setup, documentation, and security workflows.
- Ensured consistency in the toolchain version used across all workflows to leverage the latest features and improvements.

This update enhances the development environment by utilizing the most recent stable Rust version.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

* feat(pe): Complete Phase 1 of PE resource extraction

- Finalized the implementation of resource enumeration and metadata extraction for PE binaries, including VERSIONINFO, STRINGTABLE, and MANIFEST resources.
- Updated documentation to reflect the completion of Phase 1, detailing the capabilities of the resource extraction framework.
- Enhanced unit tests to cover edge cases and ensure robust handling of various resource scenarios.
- Improved error handling and added comprehensive test coverage for resource detection and extraction.

This commit significantly enhances the ability to analyze PE binaries by providing detailed resource metadata, laying the groundwork for future string extraction capabilities.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

* feat(pe): Complete Phase 2 of PE resource string extraction

- Finalized the implementation of string extraction from PE resources, including VERSIONINFO, STRINGTABLE, and MANIFEST.
- Enhanced the extraction process with UTF-16LE decoding utilities and comprehensive unit and integration tests.
- Updated documentation to reflect the capabilities of the new extraction features and provided usage examples.
- Improved error handling to ensure graceful degradation during extraction failures.

This commit significantly enhances the ability to extract meaningful strings from PE binaries, facilitating better analysis and understanding of resource content.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

---------

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation enhancement New feature or request github_actions Pull requests that update GitHub Actions code lang:rust Rust implementation size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

1 participant