Context
Windows PE (Portable Executable) binaries contain a .rsrc section with a hierarchical resource directory structure that holds rich metadata and embedded strings. These resources are currently not being extracted by Stringy, resulting in missed opportunities to identify:
- Version Information: Product names, company names, file descriptions, copyright strings
- Manifest Files: Embedded XML manifests with assembly information and dependencies
- String Tables: Localized strings stored in
STRINGTABLE resources
- Dialog Boxes: Control text, window titles, and UI strings
- Menu Definitions: Menu item text and tooltips
The PE resource directory follows a three-level tree structure:
- Type: Resource type (RT_VERSION, RT_MANIFEST, RT_STRING, RT_DIALOG, etc.)
- Name: Resource identifier (numeric ID or string name)
- Language: Language/locale identifier (LCID)
Resources commonly use UTF-16LE encoding, requiring proper string decoding beyond ASCII/UTF-8 extraction.
Proposed Solution
Implementation Approach
Extend the PE container parser (src/container/pe.rs) to parse the resource directory structure and extract strings from high-value resource types:
1. Resource Directory Parsing
- Locate the
.rsrc section using existing section enumeration
- Parse the three-level resource directory tree (Type → Name → Language)
- Navigate resource data entries to locate actual resource data
- Handle both numeric and string resource identifiers
2. Resource Type Handlers
Implement extraction for the following Win32 resource types:
RT_VERSION (Type 16) - Version Information
- Parse
VS_VERSIONINFO structure
- Extract
StringFileInfo fields:
FileDescription
ProductName
CompanyName
LegalCopyright
FileVersion
ProductVersion
InternalName
OriginalFilename
- Tag with
Version and Resource tags
- Assign high relevance scores (8-10)
RT_MANIFEST (Type 24) - Application Manifest
- Extract embedded XML manifest
- Parse manifest for assembly names, dependencies, and configuration strings
- Tag with
Manifest and Resource tags
- Assign high relevance scores (7-9)
RT_STRING (Type 6) - String Tables
- Parse
STRINGTABLE blocks (16 strings per block)
- Extract null-terminated UTF-16LE strings
- Tag with
Resource tag
- Assign medium-high relevance scores (6-8)
RT_DIALOG (Type 5) - Dialog Box Templates
- Parse
DLGTEMPLATE or DLGITEMTEMPLATE structures
- Extract window titles, control text, and tooltips
- Handle both
DIALOG and DIALOGEX formats
- Tag with
Resource tag
- Assign medium relevance scores (5-7)
RT_MENU (Type 4) - Menu Resources
- Parse
MENUITEMTEMPLATE structures
- Extract menu item text strings
- Tag with
Resource tag
- Assign medium relevance scores (5-7)
3. String Encoding
- Implement UTF-16LE decoding (resources typically use UTF-16)
- Handle null-terminated wide strings
- Convert to UTF-8 for storage in
ExtractedString
- Integrate with existing encoding detection if mixed encodings exist
4. Integration Points
- Add resource extraction method to
PeParser implementation
- Call during container parsing in
parse() method
- Store extracted strings with
StringSource::ResourceString
- Apply semantic tags:
Resource, Version, Manifest
- Integrate into existing section weighting/scoring system
5. Library Consideration
Option A: Use pelite crate (specialized PE parser with resource support)
- Pros: Handles resource parsing complexity, well-tested
- Cons: Adds dependency, may need integration work
Option B: Extend current goblin-based parser manually
- Pros: No new dependencies, full control
- Cons: More complex implementation, need to handle edge cases
Recommendation: Start with Option B (manual parsing) leveraging existing goblin::pe::PE structures, add pelite later if complexity warrants.
Acceptance Criteria
Implementation Notes
- Resource extraction should be optional/non-blocking (gracefully handle parse failures)
- Maintain performance: avoid deep recursion, limit resource tree depth
- Consider adding CLI flag
--resources to enable/disable resource extraction
- Test with real-world Windows binaries (e.g.,
notepad.exe, calc.exe)
Related Issues
Note: This issue consolidates and supersedes #5 and #56 with a more comprehensive approach.
References
Context
Windows PE (Portable Executable) binaries contain a
.rsrcsection with a hierarchical resource directory structure that holds rich metadata and embedded strings. These resources are currently not being extracted by Stringy, resulting in missed opportunities to identify:STRINGTABLEresourcesThe PE resource directory follows a three-level tree structure:
Resources commonly use UTF-16LE encoding, requiring proper string decoding beyond ASCII/UTF-8 extraction.
Proposed Solution
Implementation Approach
Extend the PE container parser (
src/container/pe.rs) to parse the resource directory structure and extract strings from high-value resource types:1. Resource Directory Parsing
.rsrcsection using existing section enumeration2. Resource Type Handlers
Implement extraction for the following Win32 resource types:
RT_VERSION (Type 16) - Version Information
VS_VERSIONINFOstructureStringFileInfofields:FileDescriptionProductNameCompanyNameLegalCopyrightFileVersionProductVersionInternalNameOriginalFilenameVersionandResourcetagsRT_MANIFEST (Type 24) - Application Manifest
ManifestandResourcetagsRT_STRING (Type 6) - String Tables
STRINGTABLEblocks (16 strings per block)ResourcetagRT_DIALOG (Type 5) - Dialog Box Templates
DLGTEMPLATEorDLGITEMTEMPLATEstructuresDIALOGandDIALOGEXformatsResourcetagRT_MENU (Type 4) - Menu Resources
MENUITEMTEMPLATEstructuresResourcetag3. String Encoding
ExtractedString4. Integration Points
PeParserimplementationparse()methodStringSource::ResourceStringResource,Version,Manifest5. Library Consideration
Option A: Use
pelitecrate (specialized PE parser with resource support)Option B: Extend current
goblin-based parser manuallyRecommendation: Start with Option B (manual parsing) leveraging existing
goblin::pe::PEstructures, addpelitelater if complexity warrants.Acceptance Criteria
FileDescription,ProductName,CompanyName, etc.)STRINGTABLEresources with UTF-16LE decodingResource,Version,Manifest)docs/src/binary-formats.mdImplementation Notes
--resourcesto enable/disable resource extractionnotepad.exe,calc.exe)Related Issues
Note: This issue consolidates and supersedes #5 and #56 with a more comprehensive approach.
References