Overview
Extract meaningful strings and metadata from DWARF debug sections present in non-stripped binaries. DWARF debug information contains high-confidence, developer-chosen identifiers that provide significant value for binary analysis even when traditional symbols are unavailable.
Background: What is DWARF?
DWARF (Debugging With Attributed Record Formats) is a standardized debugging data format used by compilers to embed source-level information into binaries. It's the dominant debug format for ELF (Linux/Unix) and Mach-O (macOS) binaries compiled with -g flags.
Key DWARF Sections
.debug_info: Core debugging data containing DIEs (Debug Information Entries) with function/variable names, types, and source locations
.debug_str: String table containing all text references from .debug_info
.debug_line: Line number program mapping machine instructions to source file paths and line numbers
.debug_abbrev: Abbreviation declarations describing DIE structure (needed for parsing .debug_info)
.debug_types (DWARF 4), .debug_str_offsets (DWARF 5): Additional optimization sections
Value Proposition
High-Confidence Identifiers
DWARF data is structured and intentional—not arbitrary byte patterns. Extracting from DWARF provides:
- Function names:
handle_authentication, decrypt_payload, validate_license
- Variable names:
secret_key, api_endpoint, config_path
- Type information:
struct UserCredentials, class DatabaseConnection
- Source file paths:
/home/builder/project/src/crypto/aes.c, revealing project structure
Intelligence Value
- Build environment disclosure: Absolute paths expose developer machines, CI systems, directory structures
- Code organization: Source file names reveal architectural patterns and module boundaries
- Semantic context: Variable/function names provide intent rather than raw string literals
- Reverse engineering: Symbolic names dramatically accelerate understanding compiled code
Complementary to Existing Extraction
DWARF augments rather than replaces existing string extraction:
- Symbols: DWARF works when
.symtab is stripped but debug sections remain
- Section data: DWARF names often don't appear in
.rodata (they're in .debug_str)
- Imports/exports: DWARF reveals internal implementation details, not just API boundaries
Proposed Implementation
1. Add gimli Dependency
[dependencies]
gimli = "0.29"
object = "0.36" # Provides DWARF section reader interface
Rationale: gimli is the de-facto Rust DWARF parser, maintained by the Rust debugging tools team and used by addr2line, cargo, and debuggers.
2. Extend Existing Types
In src/types.rs:
StringSource::DebugInfo already exists ✅
- Add new
Tag variants:
pub enum Tag {
// ... existing tags
DwarfSymbol, // Function/variable name from DW_AT_name
DwarfFilePath, // Source file path from DW_AT_comp_dir / DW_AT_decl_file
DwarfType, // Type name from DW_TAG_structure_type, DW_TAG_class_type
}
3. Create DWARF Extraction Module
New file: src/extraction/dwarf.rs
use gimli::{Dwarf, EndianSlice, RunTimeEndian};
use object::{Object, ObjectSection};
pub struct DwarfExtractor {
skip_dwarf: bool,
max_section_size: Option<usize>,
}
impl DwarfExtractor {
pub fn extract(&self, data: &[u8], format: BinaryFormat) -> Result<Vec<FoundString>> {
// 1. Load DWARF sections using object crate
// 2. Parse with gimli::Dwarf
// 3. Iterate compilation units
// 4. Extract DW_AT_name, DW_AT_comp_dir, DW_AT_decl_file
// 5. Return FoundString with high scores (90+)
}
}
4. Integration Points
In src/container/elf.rs:
- Classify DWARF sections correctly (already done:
SectionType::Debug for .debug_* ✅)
- Ensure DWARF sections are included in
ContainerInfo
In src/extraction/mod.rs:
- Import and instantiate
DwarfExtractor
- Call after section-based extraction
- Coordinate with deduplication logic
5. CLI Integration
In src/main.rs (when CLI is implemented):
#[derive(Parser)]
struct Args {
// ... existing args
/// Skip DWARF debug section extraction (faster for large binaries)
#[arg(long)]
skip_dwarf: bool,
/// Maximum DWARF section size to process (in MB)
#[arg(long, default_value = "100")]
max_dwarf_size: usize,
}
Technical Considerations
Performance
- DWARF parsing is CPU-intensive: A single
.debug_info section can be 50-200MB in optimized Rust binaries with full debug info
- Mitigation strategies:
--skip-dwarf flag for performance-critical scenarios
- Size limits (
--max-dwarf-size) to skip enormous sections
- Lazy parsing: only process compilation units, skip full type graphs
- Skip
.debug_types initially (less string content, high complexity)
Binary Size Realities
- Stripped binaries (
strip --strip-debug): No DWARF sections → graceful no-op
- Partially stripped (
strip --strip-unneeded): May retain DWARF → full extraction
- Debug builds: Massive DWARF sections (10-100x code size) → sampling or limits needed
Scoring Strategy
Assign high scores since DWARF strings are definitionally meaningful:
- Function/variable names (
DW_AT_name): Score 95
- Source file paths (
DW_AT_comp_dir, DW_AT_decl_file): Score 90
- Type names (
DW_TAG_structure_type): Score 92
These should rank above most .rodata strings but below critical items like hardcoded URLs/keys.
Deduplication
-
Problem: A string like parse_config may appear in:
.debug_str (DWARF)
.symtab (symbol table)
.dynstr (dynamic symbols)
.rodata (error messages mentioning the function)
-
Solution: Deduplication by (text, encoding) tuple in final output, preserving highest score and all relevant tags.
Security & Privacy
- Build path disclosure: DWARF exposes absolute paths like
/home/alice/secret-project/src/
- This is intentional for binary analysis—users examining untrusted binaries want this intel
- Document in output that DWARF reveals build environment details
Implementation Checklist
Testing Strategy
Unit Tests
- Mock DWARF sections with known DIEs
- Verify extraction of specific
DW_AT_name values
- Test score assignment logic
Integration Tests
Create test fixtures:
tests/fixtures/hello_debug: Minimal C program compiled with -g
tests/fixtures/hello_stripped: Same binary with strip --strip-debug
tests/fixtures/rust_debug: Rust binary with full debug info (tests demangling + DWARF)
Expected behavior:
hello_debug: Extract function names (main, printf), source path
hello_stripped: Gracefully skip, no errors
rust_debug: Extract Rust function names, handle DWARF 4/5 versions
References
Success Criteria
- ✅ Extract all
DW_AT_name attributes from .debug_info
- ✅ Extract source file paths from line number programs
- ✅ Handle DWARF versions 2-5 gracefully
- ✅
--skip-dwarf flag works correctly
- ✅ No crashes on malformed DWARF (handle
gimli parse errors)
- ✅ Deduplication prevents duplicate strings from DWARF + symtab
- ✅ Documentation includes DWARF extraction examples
Overview
Extract meaningful strings and metadata from DWARF debug sections present in non-stripped binaries. DWARF debug information contains high-confidence, developer-chosen identifiers that provide significant value for binary analysis even when traditional symbols are unavailable.
Background: What is DWARF?
DWARF (Debugging With Attributed Record Formats) is a standardized debugging data format used by compilers to embed source-level information into binaries. It's the dominant debug format for ELF (Linux/Unix) and Mach-O (macOS) binaries compiled with
-gflags.Key DWARF Sections
.debug_info: Core debugging data containing DIEs (Debug Information Entries) with function/variable names, types, and source locations.debug_str: String table containing all text references from.debug_info.debug_line: Line number program mapping machine instructions to source file paths and line numbers.debug_abbrev: Abbreviation declarations describing DIE structure (needed for parsing.debug_info).debug_types(DWARF 4),.debug_str_offsets(DWARF 5): Additional optimization sectionsValue Proposition
High-Confidence Identifiers
DWARF data is structured and intentional—not arbitrary byte patterns. Extracting from DWARF provides:
handle_authentication,decrypt_payload,validate_licensesecret_key,api_endpoint,config_pathstruct UserCredentials,class DatabaseConnection/home/builder/project/src/crypto/aes.c, revealing project structureIntelligence Value
Complementary to Existing Extraction
DWARF augments rather than replaces existing string extraction:
.symtabis stripped but debug sections remain.rodata(they're in.debug_str)Proposed Implementation
1. Add
gimliDependencyRationale:
gimliis the de-facto Rust DWARF parser, maintained by the Rust debugging tools team and used byaddr2line,cargo, and debuggers.2. Extend Existing Types
In
src/types.rs:StringSource::DebugInfoalready exists ✅Tagvariants:3. Create DWARF Extraction Module
New file:
src/extraction/dwarf.rs4. Integration Points
In
src/container/elf.rs:SectionType::Debugfor.debug_*✅)ContainerInfoIn
src/extraction/mod.rs:DwarfExtractor5. CLI Integration
In
src/main.rs(when CLI is implemented):Technical Considerations
Performance
.debug_infosection can be 50-200MB in optimized Rust binaries with full debug info--skip-dwarfflag for performance-critical scenarios--max-dwarf-size) to skip enormous sections.debug_typesinitially (less string content, high complexity)Binary Size Realities
strip --strip-debug): No DWARF sections → graceful no-opstrip --strip-unneeded): May retain DWARF → full extractionScoring Strategy
Assign high scores since DWARF strings are definitionally meaningful:
DW_AT_name): Score 95DW_AT_comp_dir,DW_AT_decl_file): Score 90DW_TAG_structure_type): Score 92These should rank above most
.rodatastrings but below critical items like hardcoded URLs/keys.Deduplication
Problem: A string like
parse_configmay appear in:.debug_str(DWARF).symtab(symbol table).dynstr(dynamic symbols).rodata(error messages mentioning the function)Solution: Deduplication by
(text, encoding)tuple in final output, preserving highest score and all relevant tags.Security & Privacy
/home/alice/secret-project/src/Implementation Checklist
gimliandobjectdependencies toCargo.tomlsrc/extraction/dwarf.rsmoduleDwarfSymbol,DwarfFilePath,DwarfTypetags toTagenumDwarfExtractor::extract()method.debug_infocompilation unitsDW_AT_nameattributes →FoundStringwithTag::DwarfSymbolDW_AT_comp_dir/DW_AT_decl_file→Tag::DwarfFilePathTag::DwarfType--skip-dwarfand--max-dwarf-sizeCLI flags--helpTesting Strategy
Unit Tests
DW_AT_namevaluesIntegration Tests
Create test fixtures:
tests/fixtures/hello_debug: Minimal C program compiled with-gtests/fixtures/hello_stripped: Same binary withstrip --strip-debugtests/fixtures/rust_debug: Rust binary with full debug info (tests demangling + DWARF)Expected behavior:
hello_debug: Extract function names (main,printf), source pathhello_stripped: Gracefully skip, no errorsrust_debug: Extract Rust function names, handle DWARF 4/5 versionsReferences
gimlicrate documentationSuccess Criteria
DW_AT_nameattributes from.debug_info--skip-dwarfflag works correctlygimliparse errors)