Summary
Enhance the PE (Portable Executable) parser to intelligently classify sections based on string likelihood and implement import/export table parsing to extract meaningful symbols and metadata from Windows executables.
Context
PE binaries structure data differently than ELF or Mach-O formats. Windows executables typically store:
- Read-only strings in
.rdata (read-only data) sections - high value for string extraction
- Initialized data in
.data sections - lower priority, often contains runtime state
- Import tables listing DLL dependencies and function names - valuable for understanding program behavior
- Export tables defining public API surfaces - critical for DLL analysis
Currently, StringyMcStringFace lacks PE-specific intelligence to prioritize these sections appropriately, which means:
- We treat all sections equally, missing optimization opportunities
- We don't extract import/export symbols that provide high-signal strings
- UTF-16LE strings common in PE binaries aren't prioritized correctly
- Section scoring doesn't account for PE-specific characteristics
Proposed Solution
1. Section Classification Enhancement
Implement section weight assignment based on PE characteristics:
// Proposed section scoring for PE
fn classify_pe_section(section: &PESection) -> SectionWeight {
match section.name.as_str() {
".rdata" | ".text" => SectionWeight::High, // Read-only, likely strings
".rsrc" => SectionWeight::Medium, // Resources, may contain strings
".data" => SectionWeight::Low, // Writable, runtime state
".bss" | ".reloc" => SectionWeight::VeryLow, // Unlikely to contain strings
_ => SectionWeight::Medium // Default for unknown sections
}
}
2. Import/Export Table Parsing
Extract symbol names from PE import and export directories:
- Parse import directory to extract DLL names and imported function names
- Parse export directory to extract exported function names (for DLL analysis)
- Tag extracted symbols with appropriate metadata (
source: ImportTable, source: ExportTable)
- Assign high scores to these strings as they represent high-confidence identifiers
3. Integration Points
- Update
PEParser in crates/stringy-analyzer/src/parsers/pe.rs
- Extend
SectionInfo to include PE-specific weight heuristics
- Add import/export extraction to the symbol extraction pipeline
- Ensure UTF-16LE detection prioritizes
.rdata sections
Technical Requirements
Requirement 1.2: Section classification by string likelihood
Requirement 1.4: Import/export table parsing
Dependencies
goblin PE parser capabilities
- Existing
SectionWeight enum may need extension
StringSource enum needs ImportTable and ExportTable variants
Performance Considerations
- Import/export parsing is typically fast (small tables)
- Section classification is O(n) where n = number of sections (usually < 10)
- No significant performance impact expected
Acceptance Criteria
Test Cases
Unit Tests
- Section weight assignment for known PE section names
- Section weight for unknown/custom section names
- Empty import/export table handling
Integration Tests
- Parse a PE binary with imports (e.g.,
kernel32.dll functions)
- Parse a DLL with exports
- Verify UTF-16LE strings from
.rdata score higher than .data
Snapshot Tests (insta)
- Import table extraction output
- Export table extraction output
- Section classification results
Related Work
References
Task-ID: stringy-analyzer/pe-section-classification
Requirements: 1.2, 1.4
Milestone: v0.1
@traycerai branch:3-implement-pe-section-classification-and-importexport-table-parsing
Summary
Enhance the PE (Portable Executable) parser to intelligently classify sections based on string likelihood and implement import/export table parsing to extract meaningful symbols and metadata from Windows executables.
Context
PE binaries structure data differently than ELF or Mach-O formats. Windows executables typically store:
.rdata(read-only data) sections - high value for string extraction.datasections - lower priority, often contains runtime stateCurrently, StringyMcStringFace lacks PE-specific intelligence to prioritize these sections appropriately, which means:
Proposed Solution
1. Section Classification Enhancement
Implement section weight assignment based on PE characteristics:
2. Import/Export Table Parsing
Extract symbol names from PE import and export directories:
source: ImportTable,source: ExportTable)3. Integration Points
PEParserincrates/stringy-analyzer/src/parsers/pe.rsSectionInfoto include PE-specific weight heuristics.rdatasectionsTechnical Requirements
Requirement 1.2: Section classification by string likelihood
Requirement 1.4: Import/export table parsing
Dependencies
goblinPE parser capabilitiesSectionWeightenum may need extensionStringSourceenum needsImportTableandExportTablevariantsPerformance Considerations
Acceptance Criteria
.rdata,.data,.rsrc,.text,.bss,.reloc)ImportTableandExportTablevariants toStringSourceenumcargo clippy -- -D warningspasses without errorsTest Cases
Unit Tests
Integration Tests
kernel32.dllfunctions).rdatascore higher than.dataSnapshot Tests (insta)
Related Work
goblinPE parser foundationReferences
Task-ID: stringy-analyzer/pe-section-classification
Requirements: 1.2, 1.4
Milestone: v0.1
@traycerai branch:3-implement-pe-section-classification-and-importexport-table-parsing