Skip to content

Add pelite dependency for PE resource extraction (VERSIONINFO, STRINGTABLE) #4

@unclesp1d3r

Description

@unclesp1d3r

Summary

Implement structured extraction of PE resources, specifically VERSIONINFO and STRINGTABLE data, which contain rich metadata and user-facing strings often crucial for malware analysis, software identification, and reverse engineering workflows.

Background & Context

Windows PE (Portable Executable) binaries embed structured resources in the .rsrc section containing valuable string data:

  • VERSIONINFO: File metadata including ProductName, FileDescription, CompanyName, LegalCopyright, FileVersion, and ProductVersion
  • STRINGTABLE: Localized UI strings, error messages, and application text
  • Other resources: Dialog templates, menus, and accelerator tables (future work)

While the current implementation uses goblin for PE parsing, its resource section support is limited to raw byte access. The pelite crate provides a pure Rust PE parser with comprehensive resource parsing capabilities, making structured extraction straightforward.

Implementation Plan

Phase 1: Foundation (This Issue)

  1. Add pelite dependency to Cargo.toml

    • Version: Latest stable (0.10.x recommended)
    • Evaluate if pelite should replace or complement goblin
  2. Create resource extraction module at src/extraction/pe_resources.rs

    • Define resource types enum (VersionInfo, StringTable, etc.)
    • Implement resource enumeration using pelite's resource directory walker
    • Add error handling for malformed resource sections
  3. Extend existing types in src/types.rs

    • Add ResourceMetadata struct for VERSIONINFO fields
    • Add ResourceStringTable struct for STRINGTABLE entries
    • Extend FoundString to include resource context
  4. Integration with PE parser (src/container/pe.rs)

    • Add resource section parsing to PeParser::parse()
    • Populate ContainerInfo with resource metadata
    • Ensure existing section classification still works

Phase 2: Resource Extraction (Follow-up)

  • Implement VERSIONINFO extraction with key-value pair parsing
  • Implement STRINGTABLE extraction with locale handling
  • Add Unicode and ANSI string decoding for resource data
  • Map extracted strings to FoundString with StringSource::ResourceString

Phase 3: Testing & Documentation (Follow-up)

  • Unit tests with sample PE resource data
  • Integration tests with real-world PE binaries
  • Document resource extraction architecture
  • Add examples to README

Technical Considerations

  • Dual parser strategy: Consider using pelite specifically for resource extraction while keeping goblin for section/import/export parsing
  • Performance: Resource parsing should be optional via CLI flag (e.g., --extract-resources)
  • Malformed binaries: Gracefully handle corrupted resource directories (common in packed/obfuscated malware)
  • Memory safety: pelite operates on byte slices; ensure proper bounds checking

Success Criteria

  • pelite dependency added and compiling
  • Basic resource enumeration working for .rsrc section
  • Framework for VERSIONINFO and STRINGTABLE extraction in place
  • No regressions in existing PE parsing functionality
  • Unit tests covering resource enumeration edge cases

References

Related Work

  • Milestone: v0.1 (MVP)
  • Depends on: Existing PE parser infrastructure
  • Enables: Future work on dialog/menu resource parsing, icon extraction

@traycerai branch:3-implement-pe-section-classification-and-importexport-table-parsing

Metadata

Metadata

Assignees

Type

No fields configured for Task.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions