fix: Resolve integration test failures and enhance static binary export extraction by Copilot · Pull Request #51 · EvilBit-Labs/Stringy

Copilot · 2025-11-09T21:55:41Z

Fix integration test compilation errors (duplicate function name, missing imports, duplicate test attribute)
Add tempfile dependency for test temp directory management
Fix dynamic test to compile without -static flag for proper import detection
Enhance ELF export extraction to check both dynamic and regular symbol tables
Support both global and weak bindings for exports (matching import logic)
Add symbol type filtering for exports (functions, objects, notype)
Fix formatting issues
Update cargo-dist version to 0.30.2 and regenerate release workflow
Optimize duplicate detection with HashSet for O(1) lookups instead of O(n²) iteration

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

coderabbitai · 2025-11-09T21:55:49Z

Caution

Review failed

The pull request is closed.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Summary by CodeRabbit

Bug Fixes
- Enhanced ELF export handling to eliminate duplicate symbols and ensure more comprehensive symbol coverage from multiple sources.

Walkthrough

ELF export collection logic now deduplicates symbol entries using a HashSet to track seen names. Exports are gathered from both dynamic and static symbol tables. Dynamic symbol handling skips empty names; static symbol handling includes FUNC, OBJECT, and NOTYPE symbol types from the static table and deduplicates against previously seen names. Ordinal assignment remains None.

Changes

Cohort / File(s)	Summary
ELF symbol table deduplication `src/container/elf.rs`	Introduces HashSet-based deduplication for export collection. Extends symbol gathering to both dynamic and static symbol tables. Dynamic symbols now skip empty names; static symbols filter by type (FUNC, OBJECT, NOTYPE) and use regular string table. Duplicate entries eliminated across both sources.
Test documentation `tests/integration_elf.rs`	Comment clarification noting that the regular gcc path is attempted for dynamically linked ELF output in cross-compiler fallback scenarios.

Sequence Diagram

sequenceDiagram
    participant Collector as Export Collector
    participant DynSym as Dynamic Symbol Table
    participant StaticSym as Static Symbol Table
    participant Dedup as HashSet (Seen Names)
    participant Result as Exports

    Collector->>DynSym: Iterate dynamic symbols
    DynSym-->>Collector: Symbol names
    
    rect rgb(220, 240, 255)
    note over Collector,Dedup: Process Dynamic Symbols
    Collector->>Collector: Skip empty names
    Collector->>Dedup: Check if seen
    alt Not seen
        Dedup-->>Collector: Add to set
        Collector->>Result: Append export
    else Already seen
        Dedup-->>Collector: Skip
    end
    end

    Collector->>StaticSym: Iterate static symbols
    StaticSym-->>Collector: FUNC/OBJECT/NOTYPE symbols
    
    rect rgb(240, 220, 255)
    note over Collector,Dedup: Process Static Symbols
    Collector->>Dedup: Check if seen
    alt Not seen
        Dedup-->>Collector: Add to set
        Collector->>Result: Append export
    else Already seen
        Dedup-->>Collector: Skip (deduplicate)
    end
    end

    Result-->>Collector: Final deduplicated exports

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Key areas requiring attention:
- Verify HashSet logic correctly deduplicates symbols across dynamic and static tables
- Confirm FUNC, OBJECT, and NOTYPE symbol type filtering is exhaustive and correct for the use case
- Validate that static symbol table string table lookup (regular string table vs. dynamic) is correct
- Ensure the two-pass collection doesn't introduce performance concerns or incorrect symbol ordering

Possibly related issues

Enhance ELF Dynamic Symbol Extraction with Library Mapping and Comprehensive Symbol Classification #2: Both changes modify ELF export extraction logic by gathering from .dynsym and static symbol tables, adding OBJECT/NOTYPE symbol type handling, and implementing deduplication mechanisms for symbol collection.

Poem

🐰 Hops through symbol tables with glee,
Deduplicating FUNC, OBJECT, what a spree!
Dynamic and static now work as one,
No duplicate exports—the cleanup is done! ✨

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch copilot/sub-pr-45

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b27473d and 9943114.

📒 Files selected for processing (2)

src/container/elf.rs (2 hunks)
tests/integration_elf.rs (1 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com>

…targeted-string-extraction' into copilot/sub-pr-45 Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

dosubot · 2025-11-09T22:40:24Z

Related Documentation

Checked 9 published document(s) in 1 knowledge base(s). No updates required.

^{How did I do? Any feedback?}

Copilot

Pull Request Overview

This PR enhances ELF export extraction to handle both dynamically linked and statically linked binaries. The main improvements include support for weak symbols, empty name filtering, and extraction from the regular symbol table (.symtab) in addition to the dynamic symbol table (.dynsyms).

Key Changes:

Added support for weak symbol bindings (STB_WEAK) in addition to global symbols
Added empty name filtering to avoid exporting unnamed symbols
Implemented regular symbol table (.symtab) parsing to extract exports from statically linked binaries

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
tests/integration_elf.rs	Updated comment to clarify that fallback gcc compilation produces dynamically linked binaries
src/container/elf.rs	Enhanced export extraction to support weak symbols, filter empty names, and parse regular symbol table for static binary exports

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-09T22:41:52Z

+                if let Some(name) = elf.strtab.get_at(sym.st_name) {
+                    if !name.is_empty() {
+                        // Avoid duplicates from dynamic symbol table
+                        if !exports.iter().any(|exp| exp.name == name) {


The duplicate detection logic has O(n²) time complexity. Each time you check !exports.iter().any(|exp| exp.name == name), you iterate through all existing exports. For binaries with many symbols, this can be slow.

Consider using a HashSet to track seen names for O(1) lookups:

use std::collections::HashSet; // At the start of extract_exports let mut seen_names = HashSet::new(); // When adding exports from dynsyms: if !name.is_empty() && seen_names.insert(name.to_string()) { exports.push(ExportInfo { ... }); } // When adding exports from syms: if !name.is_empty() && seen_names.insert(name.to_string()) { exports.push(ExportInfo { ... }); }

Optimized duplicate detection using HashSet for O(1) lookups. The seen_names HashSet now tracks all exported names, eliminating the O(n²) iteration through the exports vector. Commit: 9943114

unclesp1d3r · 2025-11-09T22:42:14Z

@copilot apply changes based on the comments in this thread

Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com>

unclesp1d3r · 2025-11-09T22:46:18Z

@coderabbitai Enhance this issue body with additional context and a proposed solution. Improve the issue title and assign the proper labels and other metadata.

coderabbitai · 2025-11-09T22:48:04Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# Get the PR review comments for review ID 3440404299
gh api /repos/EvilBit-Labs/StringyMcStringFace/pulls/51/reviews/3440404299/comments --jq '.[] | {path: .path, body: .body, line: .line}'

# Also get the PR details to understand context
gh pr view 51 --repo EvilBit-Labs/StringyMcStringFace --json title,body,comments,reviews

# List recent issues to see if one was created from this review
gh issue list --repo EvilBit-Labs/StringyMcStringFace --limit 10 --json number,title,body,url

Length of output: 44610

🏁 Script executed:

#!/bin/bash

# Search for any issue that might reference the PR or review
gh issue list --repo EvilBit-Labs/StringyMcStringFace --limit 50 --json number,title,body,url,labels | \
  jq '.[] | select(.body | contains("51") or contains("HashSet") or contains("duplicate") or contains("3440404299"))'

# Also check if there are any issues created very recently (today)
gh issue list --repo EvilBit-Labs/StringyMcStringFace --state all --limit 5 --json number,title,createdAt,url | jq '.'

# Search the PR for linked issues
gh pr view 51 --repo EvilBit-Labs/StringyMcStringFace --json body,comments --jq '.comments[] | select(.body | contains("#") or contains("issue"))'

Length of output: 28423

🏁 Script executed:

#!/bin/bash

# Check the PR body for any linked issues
gh pr view 51 --repo EvilBit-Labs/StringyMcStringFace --json body --jq '.body'

# Search for any issues created today or very recently
gh issue list --repo EvilBit-Labs/StringyMcStringFace --state all --json number,title,body,createdAt,url --limit 20 | \
  jq '.[] | select(.createdAt > "2025-11-09")'

# Also check if there are any open issues
gh issue list --repo EvilBit-Labs/StringyMcStringFace --state open --json number,title,createdAt,url | jq '.'

# Try to find issues with keywords related to the optimization
gh issue list --repo EvilBit-Labs/StringyMcStringFace --search "duplicate OR HashSet OR O(n) OR optimization OR export" --json number,title,body,url --limit 10

Length of output: 7111

@unclesp1d3r: I couldn't locate a specific issue related to the review thread you referenced. The review comment is about the HashSet optimization for duplicate detection in src/container/elf.rs, which has already been implemented and marked as complete in PR #51.

Could you please clarify which issue you'd like me to enhance? You can:

Provide the issue number (e.g., Implement String Deduplication with Metadata Preservation #13 for "Implement String Deduplication with Metadata Preservation" if that's the one)
Link directly to the issue you're viewing
Ask me to create a new issue to document the HashSet optimization that was completed in this PR

Once you let me know the specific issue number, I'll be happy to enhance it with additional context, improve the title, and assign appropriate labels and metadata!

unclesp1d3r · 2025-11-09T22:48:42Z

@coderabbitai full review

coderabbitai · 2025-11-09T22:48:50Z

✅ Actions performed

Full review triggered.

* docs(steering): Adds development standards and workflow Adds a comprehensive development standards document to standardize Rust code quality, error handling, testing, tooling, documentation, dependency selection, and performance practices across the project. Defines concrete expectations (e.g., no unsafe in application code, clippy warnings as errors, rustfmt, Just-based tasks, >85% test coverage, use of thiserror/anyhow, preferred crates, benchmarking and profiling guidance) to improve CI enforcement, consistency, contributor onboarding, and reduce regressions and performance issues. * docs(steering): Updates testing and performance guidance Refines the testing and tooling section to focus on project-specific needs, clarifying intended uses for insta and criterion in binary analysis workflows. Removes generic build command listings to keep the guide concise and actionable for contributors. Adds concrete performance recommendations: use memmap2 for large binaries, defer expensive features via lazy evaluation, compile and cache regexes once, and skip irrelevant sections during extraction to improve speed and memory usage. Makes onboarding and optimization guidance more practical for the binary-analysis project. * docs(specs): expand stringy-binary-analyzer task list Updates the task checklist to more fully specify the foundational project structure, core data types, and container parsing work to guide implementation. Adds a detailed module hierarchy and definitions for core types (string records, encodings, semantic tags, container/section metadata), and introduces a comprehensive error-handling framework and Result alias. Expands format-detection planning by specifying a ContainerParser trait and parser stubs for ELF/PE/Mach-O with basic section enumeration and unit tests. Clarifies requirement mappings and refines checklist items to better sequence future work. Improves planning clarity and prepares the codebase for subsequent implementation of section classification and extraction features. * chore(ci): Normalizes GitHub Actions YAML Normalizes indentation and spacing across GitHub Actions workflow files to a consistent 2-space style. Trims trailing whitespace and fixes end-of-file newline in lint and tooling configuration files, and removes an extra blank line in the task runner. Improves readability, reduces noise in diffs, and helps avoid formatting-related lint warnings; no functional behavior changes. * feat(container): add section-weight heuristics and improve ELF import extraction Add heuristics to score sections by likelihood of containing meaningful strings and improve ELF symbol extraction to better identify imports. Section weighting is introduced for ELF, Mach-O and PE parsers so string extraction can prioritize likely string-bearing sections (e.g., rodata, __cstring, .rdata) over code or debug segments. Section classification remains based on name and flags, but a numeric weight now accompanies section metadata to guide downstream analysis. ELF symbol handling is improved by expanding import detection to consider both dynamic and regular symbol tables, accepting global/weak bindings and multiple symbol types, skipping empty names, and providing a placeholder hook to derive library info from DT_NEEDED entries (returns None for accuracy until deeper relocation analysis is added). These changes increase reliability of import/export discovery and tagging. Additionally, parsers are re-exported for easier access and a suite of unit and integration tests were added to validate section weighting, classification, and symbol filtering behavior. Minor docs/task checklist and linter formatting tweaks are included. Future enhancement: map symbols to specific libraries by analyzing PLT/GOT/relocations for more precise library attribution. * docs: Updates docs to reflect core implementation Updates documentation to accurately represent current project progress and capabilities. Clarifies that the core container parsing, section classification (with weighting), and import/export symbol extraction are implemented and type-safe, while string extraction engines, semantic classification, ranking, output formatters, and advanced CLI features remain in development. Adds architecture and usage details (trait-based parser design, section weight system, data model) and updates CLI/quickstart guidance to show basic command support. Provides clearer guidance for contributors and users so expectations align with the codebase and to reduce confusion about implemented vs planned features. * 📝 CodeRabbit Chat: integration_elf: add static import/export test; rename dynamic test * 📝 CodeRabbit Chat: Add static ELF import/export extraction test * fix: Remove duplicate tests and correct static/dynamic ELF test behavior (#52) * Initial plan * fix: Remove duplicate tests and fix static/dynamic ELF test assertions Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com> * chore: Update cargo-dist to 0.30.2 Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com> * fix: Resolve integration test failures and enhance static binary export extraction (#51) * Initial plan * fix: CI test failures and improve ELF export extraction Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com> * perf: optimize export deduplication with HashSet (O(1) vs O(n)) Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com> --------- Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com> Co-authored-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io> * fix: Skip ELF integration tests on non-Unix platforms (#54) * Initial plan * fix: Skip ELF integration tests on Windows (non-Unix platforms) Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com> * Verify review fixes already applied in commit b27473d (#53) Initial plan Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io> --------- Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>

Initial plan

f432f79

Copilot AI assigned Copilot and unclesp1d3r Nov 9, 2025

Copilot AI mentioned this pull request Nov 9, 2025

feat: Adds section-weighting and improves ELF imports #45

Merged

Copilot started work on behalf of unclesp1d3r November 9, 2025 21:56 View session

fix: CI test failures and improve ELF export extraction

12f5bae

Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add section weighting and improve ELF imports~~ fix: Resolve integration test failures and enhance static binary export extraction Nov 9, 2025

Copilot AI requested a review from unclesp1d3r November 9, 2025 22:21

Copilot finished work on behalf of unclesp1d3r November 9, 2025 22:21

Merge branch '1-implement-intelligent-elf-section-classification-for-…

399814e

…targeted-string-extraction' into copilot/sub-pr-45 Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

unclesp1d3r marked this pull request as ready for review November 9, 2025 22:39

dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Nov 9, 2025

unclesp1d3r requested a review from Copilot November 9, 2025 22:40

dosubot Bot added bug Something isn't working enhancement New feature or request github_actions Pull requests that update GitHub Actions code lang:rust Rust implementation labels Nov 9, 2025

Copilot AI reviewed Nov 9, 2025

View reviewed changes

Copilot started work on behalf of unclesp1d3r November 9, 2025 22:42 View session

perf: optimize export deduplication with HashSet (O(1) vs O(n))

9943114

Co-authored-by: unclesp1d3r <251112+unclesp1d3r@users.noreply.github.com>

unclesp1d3r approved these changes Nov 9, 2025

View reviewed changes

dosubot Bot added the lgtm This PR has been approved by a maintainer label Nov 9, 2025

unclesp1d3r merged commit a145734 into 1-implement-intelligent-elf-section-classification-for-targeted-string-extraction Nov 9, 2025
7 checks passed

unclesp1d3r deleted the copilot/sub-pr-45 branch November 9, 2025 22:49

Copilot AI requested a review from unclesp1d3r November 9, 2025 22:49

Copilot stopped work on behalf of unclesp1d3r due to an error November 9, 2025 22:50
Copilot has encountered an error. See logs for additional details.

coderabbitai Bot mentioned this pull request Nov 10, 2025

Enhance ELF symbol extraction with comprehensive type support and visibility filtering #55

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Resolve integration test failures and enhance static binary export extraction#51

fix: Resolve integration test failures and enhance static binary export extraction#51
unclesp1d3r merged 4 commits into
1-implement-intelligent-elf-section-classification-for-targeted-string-extractionfrom
copilot/sub-pr-45

Copilot AI commented Nov 9, 2025 •

edited

Loading

Uh oh!

coderabbitai Bot commented Nov 9, 2025 •

edited

Loading

Review failed

Other AI code review bot(s) detected

Uh oh!

dosubot Bot commented Nov 9, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 9, 2025

Uh oh!

Copilot AI Nov 9, 2025

Uh oh!

unclesp1d3r commented Nov 9, 2025

Uh oh!

unclesp1d3r commented Nov 9, 2025

Uh oh!

coderabbitai Bot commented Nov 9, 2025

Uh oh!

unclesp1d3r commented Nov 9, 2025

Uh oh!

coderabbitai Bot commented Nov 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Copilot AI commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Other AI code review bot(s) detected

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related issues

Poem

Uh oh!

dosubot Bot commented Nov 9, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

unclesp1d3r commented Nov 9, 2025

Uh oh!

unclesp1d3r commented Nov 9, 2025

Uh oh!

coderabbitai Bot commented Nov 9, 2025

Uh oh!

unclesp1d3r commented Nov 9, 2025

Uh oh!

coderabbitai Bot commented Nov 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Nov 9, 2025 •

edited

Loading

coderabbitai Bot commented Nov 9, 2025 •

edited

Loading