Overview
Complete the integration of import/export symbol extraction into the main string extraction and ranking pipeline. The foundation already exists in the container parsers (ELF, PE, Mach-O), but these symbols need to be:
- Surfaced as searchable/filterable strings
- Classified with semantic tags for API categories
- Integrated into the scoring algorithm
- Exposed via CLI filtering options
Current Implementation Status
✅ Already Implemented
- Parser Infrastructure (
src/container/{elf,pe,macho}.rs):
extract_imports() and extract_exports() methods exist
- ELF: Reads
.dynsym and .symtab sections
- PE: Parses Import Directory Table and Export Address Table
- Mach-O: Extracts from symbol table (undefined = imports, defined = exports)
- Data Structures (
src/types.rs):
ImportInfo, ExportInfo, ContainerInfo
Tag::Import, Tag::Export enum variants
StringSource::ImportName, StringSource::ExportName
❌ Not Yet Implemented
- Main extraction pipeline to surface imports/exports as
FoundString objects
- Semantic classification of API categories (crypto, network, file I/O, etc.)
- Integration into scoring/ranking algorithm with adjustable weights
- CLI flags:
--imports, --exports, --symbols
- Test coverage for import/export extraction
Proposed Solution
1. Pipeline Integration (src/extraction/mod.rs)
Convert ImportInfo/ExportInfo from container parsers into FoundString objects:
- Set
source field to StringSource::ImportName or ExportName
- Apply base tags:
Tag::Import or Tag::Export
- Preserve library/ordinal metadata as context
- Calculate RVA/offset from address field
2. Semantic Classification (src/classification/mod.rs)
Implement API categorization based on symbol name patterns:
- Crypto APIs:
AES*, RSA*, SHA*, CryptEncrypt, EVP_*, CCCrypt*
- Network APIs:
socket, connect, recv, send, WSA*, getaddrinfo
- File I/O:
fopen, CreateFile, ReadFile, open, read, write
- Process APIs:
CreateProcess, execve, fork, WinExec
- Registry:
RegOpenKey, RegSetValue, etc. (Windows-specific)
Add corresponding tag variants to Tag enum (e.g., Tag::CryptoApi, Tag::NetworkApi).
3. Ranking Integration
Add configurable weights to scoring algorithm:
- Base score for all imports/exports (e.g., +5)
- Bonus for suspicious/high-value APIs (e.g., +15 for crypto, +10 for network)
- Consider frequency (rare imports score higher)
- Configurable via scoring parameters
4. CLI Enhancement (src/main.rs)
Add filtering flags to clap parser:
#[arg(long, help = "Show only imported symbols")]
imports: bool,
#[arg(long, help = "Show only exported symbols")]
exports: bool,
#[arg(long, help = "Show all symbols (imports + exports)")]
symbols: bool,
Acceptance Criteria
Implementation Notes
- Reuse existing parser methods—no changes needed to
src/container/* files
- Focus on
src/extraction/mod.rs for pipeline integration
- Consider adding a
classify_api_category() helper in src/classification/mod.rs
- Scoring weights should be configurable (future-proof for config file support)
- Output format should clearly distinguish imports from exports (e.g., library name for imports)
Testing Strategy
- Unit Tests: API pattern matching for crypto/network/file APIs
- Integration Tests:
tests/integration_elf.rs: Verify libc.so.6 imports (e.g., malloc, fopen)
tests/integration_pe.rs: Verify kernel32.dll imports (e.g., CreateFileW)
tests/integration_macho.rs: Verify libSystem.B.dylib symbols
- Snapshot Tests: Capture expected output with
--imports and --exports flags
References
Overview
Complete the integration of import/export symbol extraction into the main string extraction and ranking pipeline. The foundation already exists in the container parsers (ELF, PE, Mach-O), but these symbols need to be:
Current Implementation Status
✅ Already Implemented
src/container/{elf,pe,macho}.rs):extract_imports()andextract_exports()methods exist.dynsymand.symtabsectionssrc/types.rs):ImportInfo,ExportInfo,ContainerInfoTag::Import,Tag::Exportenum variantsStringSource::ImportName,StringSource::ExportName❌ Not Yet Implemented
FoundStringobjects--imports,--exports,--symbolsProposed Solution
1. Pipeline Integration (
src/extraction/mod.rs)Convert
ImportInfo/ExportInfofrom container parsers intoFoundStringobjects:sourcefield toStringSource::ImportNameorExportNameTag::ImportorTag::Export2. Semantic Classification (
src/classification/mod.rs)Implement API categorization based on symbol name patterns:
AES*,RSA*,SHA*,CryptEncrypt,EVP_*,CCCrypt*socket,connect,recv,send,WSA*,getaddrinfofopen,CreateFile,ReadFile,open,read,writeCreateProcess,execve,fork,WinExecRegOpenKey,RegSetValue, etc. (Windows-specific)Add corresponding tag variants to
Tagenum (e.g.,Tag::CryptoApi,Tag::NetworkApi).3. Ranking Integration
Add configurable weights to scoring algorithm:
4. CLI Enhancement (
src/main.rs)Add filtering flags to
clapparser:Acceptance Criteria
Tag::ImportorTag::Export--imports,--exports,--symbolsfilter output correctlyImplementation Notes
src/container/*filessrc/extraction/mod.rsfor pipeline integrationclassify_api_category()helper insrc/classification/mod.rsTesting Strategy
tests/integration_elf.rs: Verifylibc.so.6imports (e.g.,malloc,fopen)tests/integration_pe.rs: Verifykernel32.dllimports (e.g.,CreateFileW)tests/integration_macho.rs: VerifylibSystem.B.dylibsymbols--importsand--exportsflagsReferences