Summary
Enhance the existing ELF import/export extraction to comprehensively parse the dynamic section, extract library dependencies (DT_NEEDED entries), map symbols to their originating libraries, and improve symbol classification beyond just functions.
Context
The current ELF parser in src/container/elf.rs provides basic import/export extraction by analyzing the dynamic symbol table (.dynsym). However, it has several limitations:
- No library mapping: Imports don't identify which shared library they come from (DT_NEEDED entries are not parsed)
- Limited symbol types: Only extracts function symbols (STT_FUNC), missing data objects, TLS variables, etc.
- Incomplete dynamic section parsing: The dynamic section contains additional metadata (version requirements, symbol versioning, etc.) that isn't currently extracted
- Missing symbol visibility: Doesn't account for symbol visibility (STV_DEFAULT, STV_HIDDEN, etc.)
This enhancement extends the existing functionality to provide more complete import/export analysis, matching the comprehensiveness of the PE parser implementation.
Technical Background
ELF Dynamic Section Structure:
- Contains
DT_NEEDED entries that specify required shared libraries
- Includes
DT_SYMTAB, DT_STRTAB, and DT_HASH/DT_GNU_HASH for symbol resolution
- May contain version information via
DT_VERNEED, DT_VERDEF, and DT_VERSYM
Symbol Classification:
- Imports: Undefined symbols (SHN_UNDEF) that need to be resolved at link/load time
- Exports: Defined symbols with global/weak binding available for other modules
- Symbol types: Functions (STT_FUNC), objects (STT_OBJECT), TLS (STT_TLS), IFuncs (STT_GNU_IFUNC)
Proposed Solution
Implementation Steps
-
Parse DT_NEEDED entries from dynamic section
- Extract all required library names
- Build a mapping of libraries for symbol attribution
-
Enhance import extraction (extract_imports)
- Keep current undefined symbol detection
- Extend to handle all symbol types (not just STT_FUNC):
- STT_OBJECT (data objects)
- STT_TLS (thread-local storage)
- STT_GNU_IFUNC (indirect functions)
- Attempt to map symbols to libraries using version information when available
- Preserve symbol visibility and binding information
-
Enhance export extraction (extract_exports)
- Include all globally visible defined symbols
- Add support for weak symbols (STB_WEAK)
- Include symbol type information in ExportInfo
- Filter out hidden symbols (STV_HIDDEN, STV_INTERNAL)
-
Extend data structures if needed
- May need to extend
ImportInfo to include symbol type and version
- May need to extend
ExportInfo to include symbol type and binding
-
Add comprehensive unit tests
- Test DT_NEEDED extraction with mock ELF data
- Test symbol classification for various types
- Test edge cases (weak symbols, versioned symbols, hidden symbols)
- Use insta snapshots for symbol extraction results
-
Add integration tests with real binaries
- Test with sample ELF binaries (e.g.,
/bin/ls equivalent)
- Verify correct library mapping
- Validate symbol counts and types
Code Structure
// New helper method
fn extract_needed_libraries(&self, elf: &Elf) -> Vec<String> {
// Parse DT_NEEDED entries from elf.dynamic
}
// Enhanced import extraction
fn extract_imports(&self, elf: &Elf, libraries: &[String]) -> Vec<ImportInfo> {
// Current logic + handle all symbol types + library mapping
}
// Enhanced export extraction
fn extract_exports(&self, elf: &Elf) -> Vec<ExportInfo> {
// Current logic + handle weak symbols + filter hidden
}
Requirements
4.2, 4.3
Acceptance Criteria
- ✅ Parse DT_NEEDED entries and extract library dependencies
- ✅ Map imported symbols to their originating libraries where possible
- ✅ Extract all relevant symbol types (functions, objects, TLS, IFuncs)
- ✅ Properly classify symbols as imports vs exports based on definition and binding
- ✅ Handle weak symbols and symbol visibility correctly
- ✅ Add comprehensive unit tests with
cargo test passing
- ✅ Add integration tests with real ELF binaries
- ✅ Use insta for snapshot testing of symbol extraction
- ✅ Ensure
cargo clippy -- -D warnings passes
- ✅ Update documentation in
docs/ explaining the enhanced extraction
- ✅ Add benchmarks with criterion if performance-sensitive paths are added
- ✅ Update justfile recipes if new test targets are added
- ✅ Ensure CI passes
Dependencies
Task-ID
stringy-analyzer/elf-import-export-extraction
Summary
Enhance the existing ELF import/export extraction to comprehensively parse the dynamic section, extract library dependencies (DT_NEEDED entries), map symbols to their originating libraries, and improve symbol classification beyond just functions.
Context
The current ELF parser in
src/container/elf.rsprovides basic import/export extraction by analyzing the dynamic symbol table (.dynsym). However, it has several limitations:This enhancement extends the existing functionality to provide more complete import/export analysis, matching the comprehensiveness of the PE parser implementation.
Technical Background
ELF Dynamic Section Structure:
DT_NEEDEDentries that specify required shared librariesDT_SYMTAB,DT_STRTAB, andDT_HASH/DT_GNU_HASHfor symbol resolutionDT_VERNEED,DT_VERDEF, andDT_VERSYMSymbol Classification:
Proposed Solution
Implementation Steps
Parse DT_NEEDED entries from dynamic section
Enhance import extraction (
extract_imports)Enhance export extraction (
extract_exports)Extend data structures if needed
ImportInfoto include symbol type and versionExportInfoto include symbol type and bindingAdd comprehensive unit tests
Add integration tests with real binaries
/bin/lsequivalent)Code Structure
Requirements
4.2, 4.3
Acceptance Criteria
cargo testpassingcargo clippy -- -D warningspassesdocs/explaining the enhanced extractionDependencies
Task-ID
stringy-analyzer/elf-import-export-extraction