Skip to content

Enhance ELF Dynamic Symbol Extraction with Library Mapping and Comprehensive Symbol Classification #2

@unclesp1d3r

Description

@unclesp1d3r

Summary

Enhance the existing ELF import/export extraction to comprehensively parse the dynamic section, extract library dependencies (DT_NEEDED entries), map symbols to their originating libraries, and improve symbol classification beyond just functions.

Context

The current ELF parser in src/container/elf.rs provides basic import/export extraction by analyzing the dynamic symbol table (.dynsym). However, it has several limitations:

  1. No library mapping: Imports don't identify which shared library they come from (DT_NEEDED entries are not parsed)
  2. Limited symbol types: Only extracts function symbols (STT_FUNC), missing data objects, TLS variables, etc.
  3. Incomplete dynamic section parsing: The dynamic section contains additional metadata (version requirements, symbol versioning, etc.) that isn't currently extracted
  4. Missing symbol visibility: Doesn't account for symbol visibility (STV_DEFAULT, STV_HIDDEN, etc.)

This enhancement extends the existing functionality to provide more complete import/export analysis, matching the comprehensiveness of the PE parser implementation.

Technical Background

ELF Dynamic Section Structure:

  • Contains DT_NEEDED entries that specify required shared libraries
  • Includes DT_SYMTAB, DT_STRTAB, and DT_HASH/DT_GNU_HASH for symbol resolution
  • May contain version information via DT_VERNEED, DT_VERDEF, and DT_VERSYM

Symbol Classification:

  • Imports: Undefined symbols (SHN_UNDEF) that need to be resolved at link/load time
  • Exports: Defined symbols with global/weak binding available for other modules
  • Symbol types: Functions (STT_FUNC), objects (STT_OBJECT), TLS (STT_TLS), IFuncs (STT_GNU_IFUNC)

Proposed Solution

Implementation Steps

  1. Parse DT_NEEDED entries from dynamic section

    • Extract all required library names
    • Build a mapping of libraries for symbol attribution
  2. Enhance import extraction (extract_imports)

    • Keep current undefined symbol detection
    • Extend to handle all symbol types (not just STT_FUNC):
      • STT_OBJECT (data objects)
      • STT_TLS (thread-local storage)
      • STT_GNU_IFUNC (indirect functions)
    • Attempt to map symbols to libraries using version information when available
    • Preserve symbol visibility and binding information
  3. Enhance export extraction (extract_exports)

    • Include all globally visible defined symbols
    • Add support for weak symbols (STB_WEAK)
    • Include symbol type information in ExportInfo
    • Filter out hidden symbols (STV_HIDDEN, STV_INTERNAL)
  4. Extend data structures if needed

    • May need to extend ImportInfo to include symbol type and version
    • May need to extend ExportInfo to include symbol type and binding
  5. Add comprehensive unit tests

    • Test DT_NEEDED extraction with mock ELF data
    • Test symbol classification for various types
    • Test edge cases (weak symbols, versioned symbols, hidden symbols)
    • Use insta snapshots for symbol extraction results
  6. Add integration tests with real binaries

    • Test with sample ELF binaries (e.g., /bin/ls equivalent)
    • Verify correct library mapping
    • Validate symbol counts and types

Code Structure

// New helper method
fn extract_needed_libraries(&self, elf: &Elf) -> Vec<String> {
    // Parse DT_NEEDED entries from elf.dynamic
}

// Enhanced import extraction
fn extract_imports(&self, elf: &Elf, libraries: &[String]) -> Vec<ImportInfo> {
    // Current logic + handle all symbol types + library mapping
}

// Enhanced export extraction  
fn extract_exports(&self, elf: &Elf) -> Vec<ExportInfo> {
    // Current logic + handle weak symbols + filter hidden
}

Requirements

4.2, 4.3

Acceptance Criteria

  • ✅ Parse DT_NEEDED entries and extract library dependencies
  • ✅ Map imported symbols to their originating libraries where possible
  • ✅ Extract all relevant symbol types (functions, objects, TLS, IFuncs)
  • ✅ Properly classify symbols as imports vs exports based on definition and binding
  • ✅ Handle weak symbols and symbol visibility correctly
  • ✅ Add comprehensive unit tests with cargo test passing
  • ✅ Add integration tests with real ELF binaries
  • ✅ Use insta for snapshot testing of symbol extraction
  • ✅ Ensure cargo clippy -- -D warnings passes
  • ✅ Update documentation in docs/ explaining the enhanced extraction
  • ✅ Add benchmarks with criterion if performance-sensitive paths are added
  • ✅ Update justfile recipes if new test targets are added
  • ✅ Ensure CI passes

Dependencies

Task-ID

stringy-analyzer/elf-import-export-extraction

Metadata

Metadata

Type

No fields configured for Task.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions