fix: Resolve integration test failures and enhance static binary export extraction#51
Conversation
|
Caution Review failedThe pull request is closed. Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. Summary by CodeRabbit
WalkthroughELF export collection logic now deduplicates symbol entries using a HashSet to track seen names. Exports are gathered from both dynamic and static symbol tables. Dynamic symbol handling skips empty names; static symbol handling includes FUNC, OBJECT, and NOTYPE symbol types from the static table and deduplicates against previously seen names. Ordinal assignment remains None. Changes
Sequence DiagramsequenceDiagram
participant Collector as Export Collector
participant DynSym as Dynamic Symbol Table
participant StaticSym as Static Symbol Table
participant Dedup as HashSet (Seen Names)
participant Result as Exports
Collector->>DynSym: Iterate dynamic symbols
DynSym-->>Collector: Symbol names
rect rgb(220, 240, 255)
note over Collector,Dedup: Process Dynamic Symbols
Collector->>Collector: Skip empty names
Collector->>Dedup: Check if seen
alt Not seen
Dedup-->>Collector: Add to set
Collector->>Result: Append export
else Already seen
Dedup-->>Collector: Skip
end
end
Collector->>StaticSym: Iterate static symbols
StaticSym-->>Collector: FUNC/OBJECT/NOTYPE symbols
rect rgb(240, 220, 255)
note over Collector,Dedup: Process Static Symbols
Collector->>Dedup: Check if seen
alt Not seen
Dedup-->>Collector: Add to set
Collector->>Result: Append export
else Already seen
Dedup-->>Collector: Skip (deduplicate)
end
end
Result-->>Collector: Final deduplicated exports
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related issues
Poem
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com>
…targeted-string-extraction' into copilot/sub-pr-45 Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
There was a problem hiding this comment.
Pull Request Overview
This PR enhances ELF export extraction to handle both dynamically linked and statically linked binaries. The main improvements include support for weak symbols, empty name filtering, and extraction from the regular symbol table (.symtab) in addition to the dynamic symbol table (.dynsyms).
Key Changes:
- Added support for weak symbol bindings (STB_WEAK) in addition to global symbols
- Added empty name filtering to avoid exporting unnamed symbols
- Implemented regular symbol table (.symtab) parsing to extract exports from statically linked binaries
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| tests/integration_elf.rs | Updated comment to clarify that fallback gcc compilation produces dynamically linked binaries |
| src/container/elf.rs | Enhanced export extraction to support weak symbols, filter empty names, and parse regular symbol table for static binary exports |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if let Some(name) = elf.strtab.get_at(sym.st_name) { | ||
| if !name.is_empty() { | ||
| // Avoid duplicates from dynamic symbol table | ||
| if !exports.iter().any(|exp| exp.name == name) { |
There was a problem hiding this comment.
The duplicate detection logic has O(n²) time complexity. Each time you check !exports.iter().any(|exp| exp.name == name), you iterate through all existing exports. For binaries with many symbols, this can be slow.
Consider using a HashSet to track seen names for O(1) lookups:
use std::collections::HashSet;
// At the start of extract_exports
let mut seen_names = HashSet::new();
// When adding exports from dynsyms:
if !name.is_empty() && seen_names.insert(name.to_string()) {
exports.push(ExportInfo { ... });
}
// When adding exports from syms:
if !name.is_empty() && seen_names.insert(name.to_string()) {
exports.push(ExportInfo { ... });
}There was a problem hiding this comment.
Optimized duplicate detection using HashSet for O(1) lookups. The seen_names HashSet now tracks all exported names, eliminating the O(n²) iteration through the exports vector. Commit: 9943114
|
@copilot apply changes based on the comments in this thread |
Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com>
|
@coderabbitai Enhance this issue body with additional context and a proposed solution. Improve the issue title and assign the proper labels and other metadata. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 44610 🏁 Script executed: Length of output: 28423 🏁 Script executed: Length of output: 7111 @unclesp1d3r: I couldn't locate a specific issue related to the review thread you referenced. The review comment is about the HashSet optimization for duplicate detection in Could you please clarify which issue you'd like me to enhance? You can:
Once you let me know the specific issue number, I'll be happy to enhance it with additional context, improve the title, and assign appropriate labels and metadata! |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
a145734
into
1-implement-intelligent-elf-section-classification-for-targeted-string-extraction
* docs(steering): Adds development standards and workflow Adds a comprehensive development standards document to standardize Rust code quality, error handling, testing, tooling, documentation, dependency selection, and performance practices across the project. Defines concrete expectations (e.g., no unsafe in application code, clippy warnings as errors, rustfmt, Just-based tasks, >85% test coverage, use of thiserror/anyhow, preferred crates, benchmarking and profiling guidance) to improve CI enforcement, consistency, contributor onboarding, and reduce regressions and performance issues. * docs(steering): Updates testing and performance guidance Refines the testing and tooling section to focus on project-specific needs, clarifying intended uses for insta and criterion in binary analysis workflows. Removes generic build command listings to keep the guide concise and actionable for contributors. Adds concrete performance recommendations: use memmap2 for large binaries, defer expensive features via lazy evaluation, compile and cache regexes once, and skip irrelevant sections during extraction to improve speed and memory usage. Makes onboarding and optimization guidance more practical for the binary-analysis project. * docs(specs): expand stringy-binary-analyzer task list Updates the task checklist to more fully specify the foundational project structure, core data types, and container parsing work to guide implementation. Adds a detailed module hierarchy and definitions for core types (string records, encodings, semantic tags, container/section metadata), and introduces a comprehensive error-handling framework and Result alias. Expands format-detection planning by specifying a ContainerParser trait and parser stubs for ELF/PE/Mach-O with basic section enumeration and unit tests. Clarifies requirement mappings and refines checklist items to better sequence future work. Improves planning clarity and prepares the codebase for subsequent implementation of section classification and extraction features. * chore(ci): Normalizes GitHub Actions YAML Normalizes indentation and spacing across GitHub Actions workflow files to a consistent 2-space style. Trims trailing whitespace and fixes end-of-file newline in lint and tooling configuration files, and removes an extra blank line in the task runner. Improves readability, reduces noise in diffs, and helps avoid formatting-related lint warnings; no functional behavior changes. * feat(container): add section-weight heuristics and improve ELF import extraction Add heuristics to score sections by likelihood of containing meaningful strings and improve ELF symbol extraction to better identify imports. Section weighting is introduced for ELF, Mach-O and PE parsers so string extraction can prioritize likely string-bearing sections (e.g., rodata, __cstring, .rdata) over code or debug segments. Section classification remains based on name and flags, but a numeric weight now accompanies section metadata to guide downstream analysis. ELF symbol handling is improved by expanding import detection to consider both dynamic and regular symbol tables, accepting global/weak bindings and multiple symbol types, skipping empty names, and providing a placeholder hook to derive library info from DT_NEEDED entries (returns None for accuracy until deeper relocation analysis is added). These changes increase reliability of import/export discovery and tagging. Additionally, parsers are re-exported for easier access and a suite of unit and integration tests were added to validate section weighting, classification, and symbol filtering behavior. Minor docs/task checklist and linter formatting tweaks are included. Future enhancement: map symbols to specific libraries by analyzing PLT/GOT/relocations for more precise library attribution. * docs: Updates docs to reflect core implementation Updates documentation to accurately represent current project progress and capabilities. Clarifies that the core container parsing, section classification (with weighting), and import/export symbol extraction are implemented and type-safe, while string extraction engines, semantic classification, ranking, output formatters, and advanced CLI features remain in development. Adds architecture and usage details (trait-based parser design, section weight system, data model) and updates CLI/quickstart guidance to show basic command support. Provides clearer guidance for contributors and users so expectations align with the codebase and to reduce confusion about implemented vs planned features. * 📝 CodeRabbit Chat: integration_elf: add static import/export test; rename dynamic test * 📝 CodeRabbit Chat: Add static ELF import/export extraction test * fix: Remove duplicate tests and correct static/dynamic ELF test behavior (#52) * Initial plan * fix: Remove duplicate tests and fix static/dynamic ELF test assertions Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com> * chore: Update cargo-dist to 0.30.2 Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com> * fix: Resolve integration test failures and enhance static binary export extraction (#51) * Initial plan * fix: CI test failures and improve ELF export extraction Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com> * perf: optimize export deduplication with HashSet (O(1) vs O(n)) Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com> --------- Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com> Co-authored-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io> * fix: Skip ELF integration tests on non-Unix platforms (#54) * Initial plan * fix: Skip ELF integration tests on Windows (non-Unix platforms) Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com> * Verify review fixes already applied in commit b27473d (#53) Initial plan Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io> --------- Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
* docs(steering): Adds development standards and workflow Adds a comprehensive development standards document to standardize Rust code quality, error handling, testing, tooling, documentation, dependency selection, and performance practices across the project. Defines concrete expectations (e.g., no unsafe in application code, clippy warnings as errors, rustfmt, Just-based tasks, >85% test coverage, use of thiserror/anyhow, preferred crates, benchmarking and profiling guidance) to improve CI enforcement, consistency, contributor onboarding, and reduce regressions and performance issues. * docs(steering): Updates testing and performance guidance Refines the testing and tooling section to focus on project-specific needs, clarifying intended uses for insta and criterion in binary analysis workflows. Removes generic build command listings to keep the guide concise and actionable for contributors. Adds concrete performance recommendations: use memmap2 for large binaries, defer expensive features via lazy evaluation, compile and cache regexes once, and skip irrelevant sections during extraction to improve speed and memory usage. Makes onboarding and optimization guidance more practical for the binary-analysis project. * docs(specs): expand stringy-binary-analyzer task list Updates the task checklist to more fully specify the foundational project structure, core data types, and container parsing work to guide implementation. Adds a detailed module hierarchy and definitions for core types (string records, encodings, semantic tags, container/section metadata), and introduces a comprehensive error-handling framework and Result alias. Expands format-detection planning by specifying a ContainerParser trait and parser stubs for ELF/PE/Mach-O with basic section enumeration and unit tests. Clarifies requirement mappings and refines checklist items to better sequence future work. Improves planning clarity and prepares the codebase for subsequent implementation of section classification and extraction features. * chore(ci): Normalizes GitHub Actions YAML Normalizes indentation and spacing across GitHub Actions workflow files to a consistent 2-space style. Trims trailing whitespace and fixes end-of-file newline in lint and tooling configuration files, and removes an extra blank line in the task runner. Improves readability, reduces noise in diffs, and helps avoid formatting-related lint warnings; no functional behavior changes. * feat(container): add section-weight heuristics and improve ELF import extraction Add heuristics to score sections by likelihood of containing meaningful strings and improve ELF symbol extraction to better identify imports. Section weighting is introduced for ELF, Mach-O and PE parsers so string extraction can prioritize likely string-bearing sections (e.g., rodata, __cstring, .rdata) over code or debug segments. Section classification remains based on name and flags, but a numeric weight now accompanies section metadata to guide downstream analysis. ELF symbol handling is improved by expanding import detection to consider both dynamic and regular symbol tables, accepting global/weak bindings and multiple symbol types, skipping empty names, and providing a placeholder hook to derive library info from DT_NEEDED entries (returns None for accuracy until deeper relocation analysis is added). These changes increase reliability of import/export discovery and tagging. Additionally, parsers are re-exported for easier access and a suite of unit and integration tests were added to validate section weighting, classification, and symbol filtering behavior. Minor docs/task checklist and linter formatting tweaks are included. Future enhancement: map symbols to specific libraries by analyzing PLT/GOT/relocations for more precise library attribution. * docs: Updates docs to reflect core implementation Updates documentation to accurately represent current project progress and capabilities. Clarifies that the core container parsing, section classification (with weighting), and import/export symbol extraction are implemented and type-safe, while string extraction engines, semantic classification, ranking, output formatters, and advanced CLI features remain in development. Adds architecture and usage details (trait-based parser design, section weight system, data model) and updates CLI/quickstart guidance to show basic command support. Provides clearer guidance for contributors and users so expectations align with the codebase and to reduce confusion about implemented vs planned features. * 📝 CodeRabbit Chat: integration_elf: add static import/export test; rename dynamic test * 📝 CodeRabbit Chat: Add static ELF import/export extraction test * fix: Remove duplicate tests and correct static/dynamic ELF test behavior (#52) * Initial plan * fix: Remove duplicate tests and fix static/dynamic ELF test assertions Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com> * chore: Update cargo-dist to 0.30.2 Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com> * fix: Resolve integration test failures and enhance static binary export extraction (#51) * Initial plan * fix: CI test failures and improve ELF export extraction Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com> * perf: optimize export deduplication with HashSet (O(1) vs O(n)) Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com> --------- Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com> Co-authored-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io> * fix: Skip ELF integration tests on non-Unix platforms (#54) * Initial plan * fix: Skip ELF integration tests on Windows (non-Unix platforms) Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com> * Verify review fixes already applied in commit b27473d (#53) Initial plan Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io> --------- Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.