Skip to content

Implement PE Section Classification and Import/Export Table Parsing #3

@unclesp1d3r

Description

@unclesp1d3r

Summary

Enhance the PE (Portable Executable) parser to intelligently classify sections based on string likelihood and implement import/export table parsing to extract meaningful symbols and metadata from Windows executables.

Context

PE binaries structure data differently than ELF or Mach-O formats. Windows executables typically store:

  • Read-only strings in .rdata (read-only data) sections - high value for string extraction
  • Initialized data in .data sections - lower priority, often contains runtime state
  • Import tables listing DLL dependencies and function names - valuable for understanding program behavior
  • Export tables defining public API surfaces - critical for DLL analysis

Currently, StringyMcStringFace lacks PE-specific intelligence to prioritize these sections appropriately, which means:

  • We treat all sections equally, missing optimization opportunities
  • We don't extract import/export symbols that provide high-signal strings
  • UTF-16LE strings common in PE binaries aren't prioritized correctly
  • Section scoring doesn't account for PE-specific characteristics

Proposed Solution

1. Section Classification Enhancement

Implement section weight assignment based on PE characteristics:

// Proposed section scoring for PE
fn classify_pe_section(section: &PESection) -> SectionWeight {
    match section.name.as_str() {
        ".rdata" | ".text" => SectionWeight::High,      // Read-only, likely strings
        ".rsrc" => SectionWeight::Medium,                // Resources, may contain strings
        ".data" => SectionWeight::Low,                   // Writable, runtime state
        ".bss" | ".reloc" => SectionWeight::VeryLow,    // Unlikely to contain strings
        _ => SectionWeight::Medium                         // Default for unknown sections
    }
}

2. Import/Export Table Parsing

Extract symbol names from PE import and export directories:

  • Parse import directory to extract DLL names and imported function names
  • Parse export directory to extract exported function names (for DLL analysis)
  • Tag extracted symbols with appropriate metadata (source: ImportTable, source: ExportTable)
  • Assign high scores to these strings as they represent high-confidence identifiers

3. Integration Points

  • Update PEParser in crates/stringy-analyzer/src/parsers/pe.rs
  • Extend SectionInfo to include PE-specific weight heuristics
  • Add import/export extraction to the symbol extraction pipeline
  • Ensure UTF-16LE detection prioritizes .rdata sections

Technical Requirements

Requirement 1.2: Section classification by string likelihood
Requirement 1.4: Import/export table parsing

Dependencies

  • goblin PE parser capabilities
  • Existing SectionWeight enum may need extension
  • StringSource enum needs ImportTable and ExportTable variants

Performance Considerations

  • Import/export parsing is typically fast (small tables)
  • Section classification is O(n) where n = number of sections (usually < 10)
  • No significant performance impact expected

Acceptance Criteria

  • Implement section weight classification for PE-specific sections (.rdata, .data, .rsrc, .text, .bss, .reloc)
  • Parse PE import directory and extract DLL names and imported function names
  • Parse PE export directory and extract exported function names
  • Add ImportTable and ExportTable variants to StringSource enum
  • Assign appropriate scores to import/export strings (high priority)
  • Add unit tests for section classification logic
  • Add integration tests using sample PE binaries (e.g., a simple DLL with exports)
  • cargo clippy -- -D warnings passes without errors
  • Add benchmarks with criterion for PE parsing performance
  • Use insta for snapshot testing of import/export extraction results
  • Update justfile recipes for PE-specific test cases
  • Ensure CI pipeline passes all checks
  • Document PE-specific behavior in module-level docs

Test Cases

Unit Tests

  • Section weight assignment for known PE section names
  • Section weight for unknown/custom section names
  • Empty import/export table handling

Integration Tests

  • Parse a PE binary with imports (e.g., kernel32.dll functions)
  • Parse a DLL with exports
  • Verify UTF-16LE strings from .rdata score higher than .data

Snapshot Tests (insta)

  • Import table extraction output
  • Export table extraction output
  • Section classification results

Related Work

References


Task-ID: stringy-analyzer/pe-section-classification
Requirements: 1.2, 1.4
Milestone: v0.1
@traycerai branch:3-implement-pe-section-classification-and-importexport-table-parsing

Metadata

Metadata

Assignees

No fields configured for Feature.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions