diff --git a/.cursor/commands/ci_check.md b/.cursor/commands/ci_check.md
new file mode 100644
index 0000000..83138a0
--- /dev/null
+++ b/.cursor/commands/ci_check.md
@@ -0,0 +1,30 @@
+# CI Check
+
+## Description
+
+Ensure code changes pass all CI checks before merging.
+
+## Steps
+
+1. First, run `just ci-check` to identify any failures
+2. Analyze the output to understand what specific checks are failing. If everything passes, you’re done.
+3. Make minimal, targeted fixes to address ONLY the failing checks:
+ - For formatting issues: run `just format`
+ - For linting issues (clippy): fix the specific violations reported (rerun with `just lint-rust` / `just lint-rust-min`)
+ - For compilation/type errors: fix the underlying Rust code until `just check` (or `cargo check`) succeeds
+ - For test failures: fix the failing tests or underlying code (verify with `just test` or `just test-ci`)
+ - For dependency security/advisory issues: run `just audit` (cargo-audit) and/or update `Cargo.toml` then `cargo update`
+ - For license/compliance issues: run `just deny` and address cargo-deny findings
+4. After making fixes, run `just ci-check` again to verify all checks pass
+5. If any checks still fail, repeat steps 2-4 until all checks pass
+6. Provide a summary of what was fixed and confirm that `just ci-check` now passes completely
+
+Keep changes minimal and focused - only fix what's actually causing the CI failures. Do not make unnecessary refactoring or style changes beyond what's required to pass the checks.
+
+## Completion Checklist
+
+- [ ] Code conforms to Stringy project rules and standards
+- [ ] Tests pass (`just test`)
+- [ ] Linting is clean (`just lint`)
+- [ ] Full CI validation passes (`just ci-check`)
+- [ ] A short summary of what was done is reported
diff --git a/.cursor/commands/code_rabbit.md b/.cursor/commands/code_rabbit.md
new file mode 100644
index 0000000..5641f5e
--- /dev/null
+++ b/.cursor/commands/code_rabbit.md
@@ -0,0 +1,28 @@
+# CodeRabbit Review
+
+## Description
+
+Use CodeRabbit to identify issues and follow its recommendations in the current code branch.
+
+## Steps
+
+1. Run `coderabbit --prompt-only`, let it take as long as it needs to identify issues with this code branch. It will output a large list of recommended fixes and considerations.
+2. Evaluate the fixes and considerations. Fix major issues only, or fix any critical issues and ignore the nits.
+3. Once those changes are implemented, run CodeRabbit CLI one more time to make sure we addressed all the critical issues and didn't introduce any additional bugs.
+4. Do not change branches or mess with `git` at all. Just run the coderabbit tool, examine its output, fix its findings, and run it again to make sure you fixed everything.
+5. Then run `just ci-check` to make sure you didn't break anything and, if it does not complete without failures, fix those problems.
+6. Only run the loop (running coderabbit->fixing its recommendations->running `just ci-check`->fixing any failures) twice.
+7. If on the second run you don't find any critical issues, ignore the nits and you're complete.
+8. Give me a summary of everything that was completed and why.
+
+## Completion Checklist
+
+- [ ] Code conforms to Stringy project rules and standards
+- [ ] Tests pass (`just test`)
+- [ ] Linting is clean (`just lint`)
+- [ ] Full CI validation passes (`just ci-check`)
+- [ ] A short summary of what was done is reported
+- [ ] CodeRabbit issues have been addressed
+- [ ] CodeRabbit was run no more than twice
+- [ ] No unnecessary changes were made beyond addressing critical issues
+- [ ] No changes to git branches or history were made
diff --git a/.cursor/commands/code_review.md b/.cursor/commands/code_review.md
new file mode 100644
index 0000000..e17df87
--- /dev/null
+++ b/.cursor/commands/code_review.md
@@ -0,0 +1,62 @@
+# Code Review
+
+## Description
+
+Analyze diff for code quality issues and apply safe improvements while preserving public APIs.
+
+## Focus Categories
+
+Analyze only the changed files (diff scope) and improve them while preserving public APIs. Focus categories: (1) Code Smells (large/duplicate/complex) (2) Design Patterns (traits, builder, newtype, factory) (3) Best Practices (Rust 2024, project conventions) (4) Readability (naming, structure, cohesion) (5) Maintainability (modularization, clarity) (6) Performance (binary parsing, memory usage, allocation, zero-copy operations) (7) Type Safety (strong types, avoid needless Option/Result layering) (8) Error Handling (thiserror context, no silent failures). Context: Stringy = zero-warnings, CLI-first, memory conscious, synchronous binary analysis. Prefer clear + correct over clever.
+
+## Steps
+
+1. Collect diff file list. 2. Analyze per focus category. 3. Classify each finding: `safe-edit` (apply now), `deferred`, `requires-approval`. 4. Auto-apply only `safe-edit` (mechanical, internal, non-breaking, warning removal, correctness, error handling improvements). 5. Run `just lint` then `just test`. On failure: isolate failing hunk, revert it, re-run, document skip. 6. Generate report (summary table, applied edits + rationale, deferred backlog, approval-needed with risks, next-step roadmap). 7. Output unified diff (never commit). If zero safe edits: state "No safe automatic edits applied" and still output full report.
+
+## Auto-Edit Constraints (Strict)
+
+- Scope: Only diff-related files
+- Gates: Must pass `just lint` + tests
+- User Control: Never commit/stage
+- Public API: No signature/visibility/export changes
+- Validation: Always run quality gates before reporting
+
+## Critical Requirements
+
+- Actionable suggestions (code examples when clearer)
+- Auto-apply only clearly safe internal fixes
+- Prioritize runtime correctness, safety, type rigor, security posture
+- Preserve all public APIs (no signature/visibility changes)
+- Avoid cleverness; optimize for clarity & maintainability
+
+## Repo Rules (Reinforced)
+
+Zero warnings (clippy -D warnings) | No unsafe | Precise typing | Trait-based parsers | `thiserror` for errors | CLI-first | Memory efficient | Zero-copy parsing where possible | rustdoc for all public APIs
+
+---
+
+## Execution Checklist
+
+1 Diff scan 2 Analyze 3 Classify 4 Safe edits applied 5 Gates pass 6 Report (summary/applied/deferred/approval-needed/roadmap) 7 Output diff. On blocker: report + remediation guidance.
+
+## Quick Reference Matrix
+
+Category -> Examples of Safe Edits:
+
+- Smells: remove dead code, split oversized internal fn (no visibility change)
+- Patterns: introduce small private helper or trait impl internally
+- Best Practices: use zero-copy parsing, efficient string extraction
+- Readability: rename local vars (non-public), add rustdoc/examples
+- Maintainability: extract internal module (keep re-export stable)
+- Performance: eliminate needless clone, memoize constant, bound Vec growth, use slice-based operations
+- Type Safety: replace `String` boolean flags with small internal enum (private)
+- Error Handling: add context via error messages, convert generic String errors to structured variants if already internal
+
+If ambiguity arises, default to: classify (deferred) instead of applying.
+
+## Completion Checklist
+
+- [ ] Code conforms to Stringy project rules and standards
+- [ ] Tests pass (`just test`)
+- [ ] Linting is clean (`just lint`)
+- [ ] Full CI validation passes (`just ci-check`)
+- [ ] A short summary of what was done is reported
diff --git a/.cursor/commands/performance_tuning.md b/.cursor/commands/performance_tuning.md
new file mode 100644
index 0000000..20603e0
--- /dev/null
+++ b/.cursor/commands/performance_tuning.md
@@ -0,0 +1,83 @@
+# Performance Tuning
+
+## Description
+
+Analyze diff for performance, apply safe micro-optimizations, produce report.
+
+## FOCUS CATEGORIES
+
+Analyze ONLY changed files (diff scope) for runtime performance characteristics while preserving correctness, public APIs, and security constraints. Apply only clearly safe micro-optimizations.
+
+01. Algorithmic Complexity (unnecessary O(n^2), repeated scans, avoidable clones)
+02. Allocation Behavior (temporary allocations, Vec growth patterns, reserve vs push, string churn)
+03. Binary Parsing Efficiency (zero-copy operations, memory-mapped files for large binaries, efficient section iteration)
+04. I/O Efficiency (redundant reads, memory-mapped I/O for large files, efficient file handling)
+05. Data Structures (better fit: map vs vec scan, small vec, newtype for clarity/perf)
+06. Caching & Reuse (recomputing constants, repeated serialization, repeated formatting)
+07. Hot Path Error Handling (avoidable string formatting, cheap early exits)
+08. String Extraction (efficient UTF-8/UTF-16 parsing, slice-based operations, avoid unnecessary allocations)
+09. Memory Footprint (unbounded growth, retain vs shrink_to_fit decisions, large temporary clones, memory-mapped files)
+10. Instrumentation (where benchmarks would help future perf investigations)
+
+## Steps
+
+1 Diff list → 2 Perf analysis per category → 3 Classify (`safe-edit` / `deferred` / `requires-approval`) → 4 Apply only mechanical, behavior-preserving micro-optimizations (e.g., remove redundant clone, pre-allocate capacity, use zero-copy parsing, optimize string extraction) → 5 Run `just lint` & `just test` → 6 Revert failing hunk if gates fail → 7 Report (summary, applied, deferred, approval-needed, perf notes, next steps) → 8 Output unified diff (no commit).
+
+If zero safe edits: state "No safe performance edits applied" and still produce full report.
+
+## SAFE PERFORMANCE EDIT EXAMPLES
+
+- Replace `clone()` with reference when ownership not required
+- Preallocate Vec with `with_capacity` when length is known
+- Convert repeated `format!` in loop to pre-built prefix + push_str
+- Hoist constant regex / hashers / serializers
+- Short-circuit early on empty input slices
+- Use iterators instead of temporary Vec collects where semantic match
+- Use slice-based string extraction to avoid allocations
+- Use memory-mapped files for large binary processing
+- Prefer zero-copy parsing with goblin
+- Avoid converting to String just to log when `Display` exists
+
+## AUTO-EDIT CONSTRAINTS (STRICT)
+
+Scope: diff-only | Gates: `just lint` + tests must pass | No commits | No public signature/visibility changes | Validate after edits | No semantic changes
+
+## CRITICAL REQUIREMENTS
+
+- Do not trade readability or security for micro perf
+- Never introduce unsafe
+- Provide benchmarks only as recommendations (do not add heavy harness automatically)
+- Defer structural refactors (module splits) unless trivial & internal
+- Avoid premature caching introducing invalidation complexity
+
+## REPO RULES (REINFORCED)
+
+Zero warnings | No unsafe | Precise typing | Trait-based parsers | thiserror for errors | CLI-first | Memory efficiency | Zero-copy parsing | rustdoc for public APIs
+
+## EXECUTION CHECKLIST
+
+1 Diff scan 2 Analyze perf 3 Classify 4 Apply safe micro-optimizations 5 Gates pass 6 Report 7 Output diff | On blocker: report & remediate guidance.
+
+## QUICK PERFORMANCE MATRIX
+
+Category → Sample Safe Edit:
+
+- Complexity → Replace nested loop with `HashSet` membership check
+- Allocation → Pre-size Vec for known iteration length
+- Binary Parsing → Use memory-mapped files for large binaries, zero-copy section access
+- I/O → Use memory-mapped I/O for large file processing
+- Data Structure → Use `SmallVec` for typical \<=8 elements (internal)
+- Caching → Hoist constant serialization of static JSON template
+- String Extraction → Use slice-based operations, avoid unnecessary String allocations
+- Memory Footprint → Replace accumulating Vec with sliding window bound, use memory-mapped files
+- Instrumentation → Add benchmark tests for hot path performance
+
+Ambiguous? Defer and document.
+
+## Completion Checklist
+
+- [ ] Code conforms to Stringy project rules and standards
+- [ ] Tests pass (`just test`)
+- [ ] Linting is clean (`just lint`)
+- [ ] Full CI validation passes (`just ci-check`)
+- [ ] A short summary of what was done is reported
diff --git a/.cursor/commands/security_hardening.md b/.cursor/commands/security_hardening.md
new file mode 100644
index 0000000..f87edcc
--- /dev/null
+++ b/.cursor/commands/security_hardening.md
@@ -0,0 +1,82 @@
+# Security Hardening
+
+## Description
+
+Analyze diff for security posture, apply safe internal hardening edits, produce report.
+
+Analyze ONLY changed files (diff scope) for security posture and apply clearly safe hardening improvements while preserving all public APIs.
+
+## FOCUS CATEGORIES
+
+01. Memory Safety (no unsafe code, no added unsafe, boundary adherence)
+02. Input Validation & Parsing (CLI args, binary format detection, paths) – reject invalid early, no silent defaults
+03. Data Handling (no secrets logged, path validation, safe binary parsing, bounds checking)
+04. Binary Parsing Safety (validate offsets, check bounds, handle malformed binaries gracefully)
+05. Error Handling & Logging Hygiene (no sensitive leakage, structured context, no println! for operational info)
+06. Dependency & Surface Minimization (avoid unnecessary crates/features, dead code removal)
+07. Defense-in-Depth Opportunities (bounds checking, resource limits, memory usage bounds)
+08. Security Regression Risks (stubs flagged, TODOs categorized, unimplemented sections clearly documented)
+09. Supply Chain & Build Hygiene (forbid unsafe, clippy -D warnings, deny unknown features)
+10. File I/O Safety (validate file paths, handle large files safely, prevent path traversal)
+
+## Steps
+
+1 Diff list → 2 Security analysis per category → 3 Classify findings (`safe-edit` / `deferred` / `requires-approval`) → 4 Apply only mechanical non-breaking hardening edits (logging normalization, path validation + bound checks, converting println!/eprintln! to proper error handling, adding `#[deny(unsafe_code)]` locally if missing, adding missing error context, bounds checking for binary parsing) → 5 Run `just lint` & `just test` → 6 Revert any failing hunk → 7 Report (summary, applied, deferred, approval-needed, risk notes, roadmap) → 8 Output unified diff (no commit).
+
+If zero safe edits: state "No safe security edits applied" and still emit full report.
+
+## SAFE HARDENING EDIT EXAMPLES
+
+- Replace `println!/eprintln!` with proper error handling and structured output
+- Add bounds checking for binary parsing operations
+- Inline guard clauses for obvious panics or unchecked unwraps (if internal)
+- Validate file paths and prevent path traversal
+- Remove dead code exposing potential attack surface
+- Strengthen error messages (no raw system paths if sensitive)
+- Add length / size / iteration bounds for unbounded growth structures
+- Replace stringly-typed mode flags with private enums
+- Ensure all public API doc comments mention security considerations where relevant
+- Validate binary format headers before parsing
+- Check section offsets and sizes before accessing binary data
+
+## AUTO-EDIT CONSTRAINTS (STRICT)
+
+Scope: diff-only | Gates: `just lint` + tests must pass | No commits | No public signature/visibility changes | Validate after edits
+
+## CRITICAL REQUIREMENTS
+
+- Preserve functional behavior while reducing risk
+- No new dependencies unless strictly necessary for safety
+- Avoid speculative rewrites—minimal surface change
+- Avoid perf regressions; if added checks are non-trivial mark as deferred
+- Do not mask existing errors—surface with context instead
+
+## REPO RULES (REINFORCED)
+
+Zero warnings | No unsafe | Precise typing | Trait-based parsers | thiserror for errors | CLI-first | Memory efficiency | Safe binary parsing | Path validation | rustdoc for public APIs
+
+## EXECUTION CHECKLIST
+
+1 Diff scan 2 Analyze security 3 Classify 4 Apply safe hardening edits 5 Gates pass 6 Report 7 Output diff | On blocker: report with remediation.
+
+## QUICK SECURITY MATRIX
+
+Category → Sample Safe Edit:
+
+- Memory Safety → Remove unsafe code, add bounds checking
+- Input Validation → Add numeric range check before use, validate binary format headers
+- Data Handling → Validate file paths, check bounds before binary access
+- Binary Parsing → Add offset/size validation, handle malformed binaries gracefully
+- Error Handling → Replace raw error chain with safe error messages
+- Resource Bounds → Add comment + bound to vector growth pattern, limit memory usage
+- Stub Sections → Mark with `SECURITY_TODO:` prefix for tracking
+
+Ambiguous? Defer and document.
+
+## Completion Checklist
+
+- [ ] Code conforms to Stringy project rules and standards
+- [ ] Tests pass (`just test`)
+- [ ] Linting is clean (`just lint`)
+- [ ] Full CI validation passes (`just ci-check`)
+- [ ] A short summary of what was done is reported
diff --git a/.cursor/commands/update_llmstxt.md b/.cursor/commands/update_llmstxt.md
new file mode 100644
index 0000000..0c81cb6
--- /dev/null
+++ b/.cursor/commands/update_llmstxt.md
@@ -0,0 +1,388 @@
+# Update LLMs.txt File
+
+## Description
+
+Update the llms.txt file in the root folder to reflect changes in documentation or specifications following the llms.txt specification at .
+
+## Steps
+
+> SCOPE LIMITATION (IMPORTANT – READ FIRST)
+>
+> When executing this prompt you MUST restrict all repository content analysis exclusively to the already-provided attachments for `#file:../../docs` (including their subpaths such as `docs/src/`), and root-level documentation files. Treat those attachments as complete and authoritative for the purpose of updating `llms.txt`.
+>
+> DO NOT attempt to read, open, or re-scan any other project files (e.g., `.github/` instructions, source code, lockfiles, coverage reports) even if tools are available. Avoid recursive or repeated attempts to fetch additional instruction files. If a step below would normally "scan the repo", interpret it narrowly: only enumerate Markdown files inside the provided `docs` attachment tree plus top-level root Markdown files that are already known (you may assume `README.md`, `LICENSE`, `SECURITY.md` exist without re-reading them unless their content is required for a description — which it is not).
+>
+> If a tool invocation would cause broader traversal, SKIP it and proceed using the attachment lists. This prevents infinite loops and unnecessary I/O. The change detection algorithm should operate purely on:
+>
+> 1. Root-level Markdown files (assumed: `README.md`, `SECURITY.md`, `LICENSE`)
+> 2. `docs/src/**/*.md`
+>
+> Ignore generated HTML in `docs/book/` and any non-Markdown assets. Treat them as excluded artifacts automatically. Do not add them to `llms.txt`.
+
+Update the existing `llms.txt` file in the root of the repository to reflect changes in documentation, specifications, or repository structure. This file provides high-level guidance to large language models (LLMs) on where to find relevant content for understanding the repository's purpose and specifications.
+
+---
+
+## TL;DR (Quick Start for the Coding Agent)
+
+Perform these steps in order; do not skip validation:
+
+1. Read existing `/llms.txt` (if missing, treat as new file creation).
+2. Enumerate repo docs: top-level `*.md`, `docs/**`, key `Cargo.toml` metadata, security & contribution files.
+3. Detect additions / removals vs current file (simple set diff on relative paths referenced).
+4. Classify candidate files using Inclusion Heuristics (see below).
+5. Draft updated sections preserving required structure (H1, optional blockquote, H2 category lists).
+6. Ensure link syntax `[Readable Name](relative/path.md): concise description`.
+7. Run internal validation checklist (structure, dead links, redundancy, ordering, diff sanity).
+8. Output ONLY the new `llms.txt` file content (no commentary) when executing the update.
+
+If any critical invariant fails (see Invariants) you MUST adjust before finalizing.
+
+---
+
+## Primary Directive
+
+Update the existing `llms.txt` file to maintain accuracy and compliance with the llms.txt specification while reflecting current repository structure and content. The file must remain optimized for LLM consumption while staying human-readable.
+
+NOTE ON SCOPE ENFORCEMENT: All subsequent references to "scan", "enumerate", or "discover" files are to be interpreted under the Scope Limitation above. Do not widen scope.
+
+## Analysis and Planning Phase
+
+Before updating the `llms.txt` file, you must complete a thorough analysis:
+
+### Step 1: Review Current File and Specification
+
+- Read the existing `llms.txt` file to understand current structure, if it exists yet
+- Review the official specification at to ensure continued compliance
+- Identify areas that may need updates based on repository changes
+
+### Step 2: Repository Structure Analysis
+
+- Examine the current repository structure using appropriate tools
+- Compare current structure with what's documented in existing `llms.txt`
+- Identify new directories, files, or documentation that should be included
+- Note any removed or relocated files that need to be updated
+
+### Step 3: Content Discovery and Change Detection
+
+- Identify new README files and their locations
+- Find new documentation files (`.md` files in `/docs/`, `/spec/`, etc.)
+- Locate new specification files and their purposes
+- Discover new configuration files and their relevance
+- Find new example files and code samples
+- Identify any changes to existing documentation structure
+
+### Step 4: Create Update Plan
+
+Based on your analysis, create a structured plan that includes:
+
+- Changes needed to maintain accuracy
+- New files to be added to the llms.txt
+- Outdated references to be removed or updated
+- Organizational improvements to maintain clarity
+
+## Implementation Requirements
+
+### Format Compliance
+
+The updated `llms.txt` file must maintain this exact structure per the specification:
+
+1. **H1 Header**: Single line with repository/project name (required)
+2. **Blockquote Summary**: Brief description in blockquote format (optional but recommended)
+3. **Additional Details**: Zero or more markdown sections without headings for context
+4. **File List Sections**: Zero or more H2 sections containing markdown lists of links
+
+### Content Requirements
+
+#### Required Elements
+
+- **Project Name**: Clear, descriptive title as H1
+- **Summary**: Concise blockquote explaining the repository's purpose
+- **Key Files**: Essential files organized by category (H2 sections)
+
+#### File Link Format
+
+Each file link must follow: `[descriptive-name](relative-url): optional description`
+
+#### Section Organization
+
+Organize files into logical H2 sections such as:
+
+- **Documentation**: Core documentation files
+- **Specifications**: Technical specifications and requirements
+- **Examples**: Sample code and usage examples
+- **Configuration**: Setup and configuration files
+- **Optional**: Secondary files (special meaning - can be skipped for shorter context)
+
+### Content Guidelines
+
+#### Language and Style
+
+- Use concise, clear, unambiguous language
+- Avoid jargon without explanation
+- Write for both human and LLM readers
+- Be specific and informative in descriptions
+
+#### File Selection Criteria
+
+Include files that:
+
+- Explain the repository's purpose and scope
+- Provide essential technical documentation
+- Show usage examples and patterns
+- Define interfaces and specifications
+- Contain configuration and setup instructions
+
+Exclude files that:
+
+- Are purely implementation details
+- Contain redundant information
+- Are build artifacts or generated content
+- Are not relevant to understanding the project
+
+---
+
+## Inclusion / Exclusion Heuristics
+
+Score each candidate (keep those scoring >= 2 unless intentionally excluded):
+
+| Criterion | +1 Signal |
+| ---------------------- | -------------------------------------------------- |
+| Orientation Value | Explains purpose, architecture, security model |
+| Specification | Defines contracts, limits, protocols, data formats |
+| Operator Critical | Install, deploy, config, security hardening |
+| Cross-Cutting Policy | Contribution, security, licensing, threat model |
+| Representative Example | Shows canonical usage or pattern |
+
+Negative Exclusion Signals (any one usually drops): vendor lock file, autogenerated, code-only without explanatory context, temporary / experimental docs.
+
+Prefer the smallest representative set when many similar files exist (e.g., keep `architecture.md` but not all derived slide decks or exports).
+
+---
+
+## Invariants (MUST Always Hold)
+
+01. File name EXACTLY `llms.txt` at repo root.
+02. Single leading H1 only (no multiple H1s).
+03. All links are relative paths that exist at commit time.
+04. No absolute filesystem paths, no external HTTP links inside file list sections (context isolation).
+05. No duplicate file references.
+06. Descriptions \<= 140 chars, imperative/concise, no trailing periods unless multiple sentences needed.
+07. Section order: Documentation → Specifications → Examples → Configuration → Optional (omit empty ones without leaving gaps).
+08. Deterministic ordering inside sections (alphabetical by display name unless logical order is strongly beneficial; if logical order used, it must be consistent and minimal).
+09. Do not include compiled artifacts, coverage reports, `target/`, lockfiles (unless spec interest), or large binary assets.
+10. Preserve semantic meaning: do not rewrite project intent.
+
+---
+
+## Output Contract
+
+When finalizing, produce ONLY the complete desired contents of `/llms.txt` (no extra markdown fences, no commentary, no diff). If creation is not needed (no changes), you should explicitly state "NO CHANGE" instead of re‑emitting identical content (optimization for agents that may skip writes).
+
+---
+
+## Change Detection Algorithm (Deterministic)
+
+1. Parse existing file, extract referenced relative paths.
+2. Glob for candidate docs: `*.md` in root, `docs/**/*.md`, security & community files (LICENSE, SECURITY.md, CONTRIBUTING.md, CODE_OF_CONDUCT\* if present).
+3. Build two sets: CURRENT_REFERENCED, CANDIDATES.
+4. NEW = CANDIDATES − CURRENT_REFERENCED filtered through heuristics.
+5. STALE = CURRENT_REFERENCED − CANDIDATES (verify truly removed vs renamed via simple name match search).
+6. For renames, map old → new path and update entry in place (preserve description with minor adjustments).
+7. Recompute categories; if a section becomes empty, drop it.
+8. Produce updated ordered lists, ensuring invariants.
+
+---
+
+## Failure Modes & Recovery
+
+| Failure | Mitigation |
+| ------------------------- | ------------------------------------------------------------------------------- |
+| Dead link introduced | Re-scan path; if file newly added but uncommitted, note and omit until present. |
+| Overly verbose list | Apply heuristics; collapse by linking an umbrella doc instead of every subpage. |
+| Missing critical spec | Escalate by adding under Specifications with concise description. |
+| Duplicate classification | Keep in first most appropriate section; remove from others. |
+| Section bloat (>15 items) | Split logically or prune low-signal entries (score \<2). |
+
+---
+
+## Non-Goals
+
+- Not a full index of source code.
+- Not a changelog replacement.
+- Not a substitute for inline docs.
+- Avoid summarizing file contents; only describe purpose.
+
+---
+
+## Minimal vs Comprehensive Example (Illustrative)
+
+Minimal (small repo): 1 H1, 1 blockquote, 1 Documentation section with 3–6 items. Comprehensive (larger repo like this): Up to 5 sections, each ≤ 12 items, Optional section is last and may be omitted in constrained contexts.
+
+---
+
+## Validation Checklist (Condensed)
+
+Run prior to output:
+
+1. Structure: Single H1, ordered sections, no empty lists.
+2. Links: All referenced relative paths exist (limit scope to root \*.md, docs/src/\*\*/\*.md).
+3. No duplicates: A file appears in exactly one section.
+4. Descriptions: \<=140 chars, concise, no trailing period unless multi-sentence.
+5. Coverage: Include `README.md`, core `docs/src/*.md` (architecture, installation, usage, binary formats).
+6. Exclusions: Omit generated `docs/book/**`, binaries, coverage, `target/`, lockfiles.
+7. Ordering: Deterministic (alphabetical or intentional logical grouping documented in comments if used).
+8. Delta sanity: NEW and STALE sets evaluated under constrained scope.
+
+---
+
+## Execution Steps
+
+### Step 1: Current State Analysis
+
+1. Read the existing `llms.txt` file thoroughly
+2. Examine the current repository structure completely
+3. Compare existing file references with actual repository content
+4. Identify outdated, missing, or incorrect references
+5. Note any structural issues with the current file
+
+### Step 2: Content Planning
+
+1. Determine if the primary purpose statement needs updates
+2. Review and update the summary blockquote if needed
+3. Plan additions for new files and directories
+4. Plan removals for outdated or moved content
+5. Reorganize sections if needed for better clarity
+
+### Step 3: File Updates
+
+1. Update the existing `llms.txt` file in the repository root
+2. Maintain compliance with the exact format specification
+3. Add new file references with appropriate descriptions
+4. Remove or update outdated references
+5. Ensure all links are valid relative paths
+
+### Step 4: Validation
+
+1. Verify continued compliance with specification
+2. Check that all links are valid and accessible
+3. Ensure the file still serves as an effective LLM navigation tool
+4. Confirm the file remains both human and machine readable
+
+## Quality Assurance
+
+### Format Validation
+
+- ✅ H1 header with project name
+- ✅ Blockquote summary (if included)
+- ✅ H2 sections for file lists
+- ✅ Proper markdown link format
+- ✅ No broken or invalid links
+- ✅ Consistent formatting throughout
+
+### Content Validation
+
+- ✅ Clear, unambiguous language
+- ✅ Comprehensive coverage of essential files
+- ✅ Logical organization of content
+- ✅ Appropriate file descriptions
+- ✅ Serves as effective LLM navigation tool
+
+### Specification Compliance
+
+- ✅ Follows format exactly
+- ✅ Uses required markdown structure
+- ✅ Implements optional sections appropriately
+- ✅ File located at repository root (`/llms.txt`)
+
+## Update Strategy
+
+### Addition Process
+
+When adding new content:
+
+1. Identify the appropriate section for new files
+2. Create clear, descriptive names for links
+3. Write concise but informative descriptions
+4. Maintain alphabetical or logical ordering within sections
+5. Consider if new sections are needed for new content types
+
+### Removal Process
+
+When removing outdated content:
+
+1. Verify files are actually removed or relocated
+2. Check if relocated files should be updated rather than removed
+3. Remove entire sections if they become empty
+4. Update cross-references if needed
+
+### Reorganization Process
+
+When restructuring content:
+
+1. Maintain logical flow from general to specific
+2. Keep essential documentation in primary sections
+3. Move secondary content to "Optional" section if appropriate
+4. Ensure new organization improves LLM navigation
+
+Example structure for `llms.txt`:
+
+```markdown
+# [Repository Name]
+
+> [Concise description of the repository's purpose and scope]
+
+[Optional additional context paragraphs without headings]
+
+## Documentation
+
+- [Main README](README.md): Primary project documentation and getting started guide
+- [Contributing Guide](CONTRIBUTING.md): Guidelines for contributing to the project
+- [Code of Conduct](CODE_OF_CONDUCT.md): Community guidelines and expectations
+
+## Specifications
+
+- [Technical Specification](spec/technical-spec.md): Detailed technical requirements and constraints
+- [API Specification](spec/api-spec.md): Interface definitions and data contracts
+
+## Examples
+
+- [Basic Example](examples/basic-usage.md): Simple usage demonstration
+- [Advanced Example](examples/advanced-usage.md): Complex implementation patterns
+
+## Configuration
+
+- [Setup Guide](docs/setup.md): Installation and configuration instructions
+- [Deployment Guide](docs/deployment.md): Production deployment guidelines
+
+## Optional
+
+- [Architecture Documentation](docs/architecture.md): Detailed system architecture
+- [Design Decisions](docs/decisions.md): Historical design decision records
+```
+
+Note: The above example block uses illustrative file paths that may not exist in this repository; they are placeholders to demonstrate formatting only.
+
+## Success Criteria
+
+The updated `llms.txt` file should:
+
+1. Accurately reflect the current repository structure and content
+2. Maintain compliance with the llms.txt specification
+3. Provide clear navigation to essential documentation
+4. Remove outdated or incorrect references
+5. Include new important files and documentation
+6. Maintain logical organization for easy LLM consumption
+7. Use clear, unambiguous language throughout
+8. Continue to serve both human and machine readers effectively
+
+---
+
+## Agent Implementation Notes
+
+- Prefer idempotent operations: if no change required, respond with "NO CHANGE".
+- If changes small (≤3 edits), still re-emit full file (atomic replace model).
+- Use stable naming: Convert file names to Title Case (minus extensions) unless a proper noun/acronym (e.g., "API", "IPC").
+- For Rust workspace crates, generally include only root `README.md` or high-level `lib.rs` doc if it acts as specification (otherwise omit code internals).
+- Security-critical docs (SECURITY.md, threat models) ALWAYS included unless empty.
+- If both `docs/` and `spec/` contain overlapping material, prefer placing normative protocol details under Specifications, conceptual overviews under Documentation.
diff --git a/.cursor/commands/work_next_task.md b/.cursor/commands/work_next_task.md
new file mode 100644
index 0000000..e4b30c1
--- /dev/null
+++ b/.cursor/commands/work_next_task.md
@@ -0,0 +1,42 @@
+# Work Next Task
+
+## Description
+
+Work on the next unchecked task in the current task list.
+
+## Steps
+
+1. Read the entire currently open task list document before beginning. Do not skip this step.
+2. **Gather Context**: Before starting work on the task:
+ - If the task list is in a folder, check for `requirements.md` and `design.md` files in the same directory and read them for essential context
+ - If the task item contains a link to a GitHub issue, examine the issue thoroughly for additional context, acceptance criteria, and potential solutions
+ - Review any referenced documentation or specifications
+3. Identify the next unchecked task in the checklist. The task will typically have an associated github issue linked to it with additional context and a potential solution that should be reviewed as well.
+
+> ⚠️ Important: Some tasks may appear implemented but are still unchecked. You must verify that each task meets all project standards. "Complete" means the code is fully implemented, idiomatic, tested, lint-free, follows Stringy's architecture, and aligns with all coding and architectural rules.
+
+### Task Execution Process
+
+- Review the codebase to determine whether the task is already complete **according to project standards**.
+- If the task is not fully compliant:
+ - Make necessary code changes using idiomatic, maintainable approaches following Stringy's patterns.
+ - Run `just fmt` to apply formatting rules.
+ - Add or update tests to ensure correctness.
+ - Run the test suites:
+ - `just test`
+ - Fix any failing tests.
+ - Run the linters:
+ - `just lint`
+ - Fix all linter issues.
+- Run `just ci-check` to confirm the full codebase passes comprehensive CI validation (format, lint, test, build, audit).
+
+## Completion Checklist
+
+- [ ] Code conforms to Stringy project rules and standards
+- [ ] Tests pass (`just test`)
+- [ ] Linting is clean (`just lint`)
+- [ ] Full CI validation passes (`just ci-check`)
+- [ ] Task is marked complete in the checklist
+- [ ] A short summary of what was done is reported
+
+> Update the current task list with any items that are implemented and need test coverage, checking off items that have implemented tests. ❌ Do **not** commit or check in any code ⏸️ Do **not** begin another task ✅ Stop and wait for further instruction after completing this task
diff --git a/.cursor/rules/rust/cargo-toml.mdc b/.cursor/rules/rust/cargo-toml.mdc
new file mode 100644
index 0000000..3866bdb
--- /dev/null
+++ b/.cursor/rules/rust/cargo-toml.mdc
@@ -0,0 +1,81 @@
+---
+globs: Cargo.toml
+---
+
+# Cargo.toml Standards for Stringy
+
+## Package Configuration
+
+- Use **Rust 2024 Edition** (MSRV: 1.91+) as specified in the package
+- Single crate structure (not a workspace)
+- Enforce lint policy via `[lints.rust]` configuration
+ - Forbid unsafe code globally
+ - Deny all warnings to preserve code quality
+
+Example `Cargo.toml` structure:
+
+```toml
+[package]
+name = "stringy"
+version = "0.1.0"
+edition = "2024"
+authors = ["UncleSp1d3r "]
+description = "A smarter alternative to the strings command that leverages format-specific knowledge"
+license = "Apache-2.0"
+repository = "https://github.com/EvilBit-Labs/StringyMcStringFace"
+homepage = "http://evilbitlabs.io/StringyMcStringFace/"
+keywords = ["binary", "strings", "analysis", "reverse-engineering", "malware"]
+categories = ["command-line-utilities", "development-tools"]
+
+[lib]
+name = "stringy"
+path = "src/lib.rs"
+
+[[bin]]
+name = "stringy"
+path = "src/main.rs"
+
+[lints.rust]
+unsafe_code = "forbid"
+warnings = "deny"
+```
+
+## Dependencies
+
+- **Core dependencies**:
+ - `clap = { version = "4.5", features = ["derive"] }` - CLI argument parsing
+ - `goblin = "0.10"` - Binary format parsing (ELF, PE, Mach-O)
+ - `serde = { version = "1.0", features = ["derive"] }` - Serialization
+ - `serde_json = "1.0"` - JSON output
+ - `thiserror = "2.0"` - Structured error types
+
+- **Dev dependencies**:
+ - `criterion = "0.7"` - Benchmarking
+ - `insta = "1.0"` - Snapshot testing
+ - `tempfile = "3.8"` - Temporary file handling in tests
+
+## Build Profiles
+
+- Use `[profile.dist]` for distribution builds with LTO:
+
+```toml
+[profile.dist]
+inherits = "release"
+lto = "thin"
+```
+
+## Benchmarks
+
+- Define benchmarks in `[[bench]]` sections:
+
+```toml
+[[bench]]
+name = "elf"
+harness = false
+```
+
+## Package Metadata
+
+- Include proper license (Apache-2.0)
+- Provide clear description for binary analysis tool
+- Include relevant keywords for discoverability
diff --git a/.cursor/rules/rust/configuration-management.mdc b/.cursor/rules/rust/configuration-management.mdc
new file mode 100644
index 0000000..521d507
--- /dev/null
+++ b/.cursor/rules/rust/configuration-management.mdc
@@ -0,0 +1,80 @@
+---
+globs: **/config*.rs,**/*config*.rs
+alwaysApply: false
+---
+
+# Configuration Management Standards for Stringy
+
+## Configuration Architecture
+
+Stringy uses **CLI arguments only** for configuration via `clap`:
+
+- **Command-line flags** (only source of configuration)
+- **No configuration files** - all options specified via CLI
+- **No environment variables** - use CLI flags instead
+- **No hierarchical configuration** - simple argument parsing
+
+## CLI Configuration
+
+Define CLI arguments using `clap` with derive macros:
+
+```rust
+use clap::Parser;
+use std::path::PathBuf;
+
+#[derive(Parser)]
+#[command(name = "stringy")]
+#[command(about = "Extract meaningful strings from binary files")]
+struct Cli {
+ /// Input binary file to analyze
+ #[arg(value_name = "FILE")]
+ input: PathBuf,
+
+ /// Minimum string length
+ #[arg(short, long, default_value_t = 4)]
+ min_len: usize,
+
+ /// Output format (json, text, yara)
+ #[arg(short, long, default_value = "text")]
+ format: String,
+
+ /// Only extract strings matching specific tags
+ #[arg(long)]
+ only: Option>,
+}
+```
+
+## Configuration Validation
+
+Validate CLI arguments:
+
+```rust
+impl Cli {
+ pub fn validate(&self) -> Result<(), StringyError> {
+ if self.min_len < 1 {
+ return Err(StringyError::ConfigError(
+ "Minimum string length must be at least 1".to_string()
+ ));
+ }
+
+ let valid_formats = ["json", "text", "yara"];
+ if !valid_formats.contains(&self.format.as_str()) {
+ return Err(StringyError::ConfigError(
+ format!("Invalid output format: {}. Must be one of: {:?}",
+ self.format, valid_formats)
+ ));
+ }
+
+ Ok(())
+ }
+}
+```
+
+## No File-Based Configuration
+
+Stringy intentionally does not support configuration files to keep it simple and portable:
+
+- All configuration comes from CLI arguments
+- No need to manage config file locations
+- Works the same way across all environments
+- Easier to use in scripts and pipelines
diff --git a/.cursor/rules/rust/error-handling-patterns.mdc b/.cursor/rules/rust/error-handling-patterns.mdc
new file mode 100644
index 0000000..d5f8778
--- /dev/null
+++ b/.cursor/rules/rust/error-handling-patterns.mdc
@@ -0,0 +1,249 @@
+---
+globs: **/*.rs
+alwaysApply: false
+---
+
+# Error Handling Patterns for Stringy
+
+## Error Handling Philosophy
+
+Stringy uses structured error handling with clear error boundaries:
+
+- **Structured Errors**: Use `thiserror` for all error types with derive macros
+- **Error Context**: Provide detailed context in error messages (offsets, section names, file paths)
+- **Error Boundaries**: Implement proper error boundaries for different components (parsing, extraction, classification)
+- **Recovery Strategies**: Continue processing when possible, provide partial results
+
+## Error Type Definition
+
+Define structured error types using thiserror:
+
+```rust
+use thiserror::Error;
+
+#[derive(Debug, Error)]
+pub enum StringyError {
+ #[error("Unsupported file format")]
+ UnsupportedFormat,
+
+ #[error("File I/O error: {0}")]
+ IoError(#[from] std::io::Error),
+
+ #[error("Binary parsing error: {0}")]
+ ParseError(String),
+
+ #[error("Invalid encoding in string at offset {offset}")]
+ EncodingError { offset: u64 },
+
+ #[error("Configuration error: {0}")]
+ ConfigError(String),
+
+ #[error("Memory mapping error: {0}")]
+ MemoryMapError(String),
+}
+
+// Convert from goblin errors
+impl From for StringyError {
+ fn from(err: goblin::error::Error) -> Self {
+ StringyError::ParseError(err.to_string())
+ }
+}
+```
+
+## Error Context
+
+Provide detailed error context:
+
+```rust
+fn parse_elf_section(data: &[u8], section_name: &str) -> Result {
+ let section = goblin::elf::Elf::parse(data)
+ .map_err(|e| StringyError::ParseError(
+ format!("Failed to parse ELF section '{}': {}", section_name, e)
+ ))?;
+
+ // ... parsing logic
+ Ok(section_info)
+}
+```
+
+## Component-Specific Error Handling
+
+Implement error boundaries for different components:
+
+```rust
+// Container parsing errors
+#[derive(Debug, Error)]
+pub enum ContainerError {
+ #[error("Failed to detect binary format")]
+ FormatDetectionFailed,
+
+ #[error("Unsupported binary format: {format:?}")]
+ UnsupportedFormat { format: BinaryFormat },
+
+ #[error("Section parsing failed: {section}: {error}")]
+ SectionParseFailed { section: String, error: String },
+}
+
+// String extraction errors
+#[derive(Debug, Error)]
+pub enum ExtractionError {
+ #[error("Invalid encoding at offset {offset}")]
+ InvalidEncoding { offset: u64 },
+
+ #[error("String extraction failed: {0}")]
+ ExtractionFailed(String),
+}
+
+// Classification errors
+#[derive(Debug, Error)]
+pub enum ClassificationError {
+ #[error("Tagging failed: {0}")]
+ TaggingFailed(String),
+}
+```
+
+## Error Recovery Patterns
+
+Implement graceful degradation:
+
+```rust
+fn extract_strings_from_sections(
+ data: &[u8],
+ sections: &[SectionInfo]
+) -> Vec {
+ let mut found_strings = Vec::new();
+
+ for section in sections {
+ match extract_strings_from_section(data, section) {
+ Ok(strings) => found_strings.extend(strings),
+ Err(e) => {
+ eprintln!("Warning: Failed to extract strings from section '{}': {}",
+ section.name, e);
+ // Continue with other sections
+ }
+ }
+ }
+
+ found_strings
+}
+```
+
+## Error Logging
+
+Implement structured error logging:
+
+```rust
+fn handle_parsing_error(error: StringyError, context: &str) {
+ match error {
+ StringyError::UnsupportedFormat => {
+ eprintln!("Error: {} - Unsupported binary format", context);
+ }
+ StringyError::ParseError(msg) => {
+ eprintln!("Error: {} - Parse error: {}", context, msg);
+ }
+ StringyError::EncodingError { offset } => {
+ eprintln!("Error: {} - Invalid encoding at offset 0x{:x}", context, offset);
+ }
+ StringyError::IoError(e) => {
+ eprintln!("Error: {} - I/O error: {}", context, e);
+ }
+ _ => {
+ eprintln!("Error: {} - {}", context, error);
+ }
+ }
+}
+```
+
+## Error Propagation
+
+Use proper error propagation patterns:
+
+```rust
+// Use ? operator for early returns
+fn parse_binary_file(path: &Path) -> Result {
+ let data = std::fs::read(path)
+ .map_err(|e| StringyError::IoError(e))?;
+
+ let format = detect_format(&data);
+ let parser = create_parser(format)?;
+ let container_info = parser.parse(&data)?;
+
+ Ok(container_info)
+}
+
+// Use map_err for error transformation
+fn extract_from_section(data: &[u8], section: &SectionInfo) -> Result> {
+ let section_data = &data[section.offset as usize..(section.offset + section.size) as usize];
+
+ extract_strings(section_data)
+ .map_err(|e| StringyError::ExtractionError {
+ section: section.name.clone(),
+ error: e.to_string(),
+ })
+}
+```
+
+## Error Testing
+
+Test error conditions thoroughly:
+
+```rust
+#[test]
+fn test_unsupported_format() {
+ let data = b"NOT_A_BINARY_FORMAT";
+ let format = detect_format(data);
+ assert_eq!(format, BinaryFormat::Unknown);
+
+ let result = create_parser(format);
+ assert!(matches!(result, Err(StringyError::UnsupportedFormat)));
+}
+
+#[test]
+fn test_malformed_binary() {
+ let data = b"\x7fELF\x01\x01\x01"; // Invalid ELF header
+ let parser = ElfParser::new();
+ let result = parser.parse(data);
+
+ assert!(result.is_err());
+ if let Err(StringyError::ParseError(_)) = result {
+ // Expected
+ } else {
+ panic!("Expected ParseError");
+ }
+}
+
+#[test]
+fn test_error_recovery() {
+ // Test that errors in one section don't stop processing of others
+ let result = extract_strings_from_sections(&data, §ions);
+ // Should return partial results even if some sections fail
+ assert!(!result.is_empty());
+}
+```
+
+## Error Documentation
+
+Document error conditions in rustdoc:
+
+```rust
+/// Parses a binary file and extracts container information.
+///
+/// # Errors
+///
+/// Returns [`StringyError`] for:
+/// - Unsupported binary formats
+/// - I/O errors reading the file
+/// - Binary parsing errors (malformed headers, invalid structures)
+///
+/// # Examples
+///
+/// ```rust,no_run
+/// use stringy::container::parse_binary_file;
+///
+/// let info = parse_binary_file("binary.exe")?;
+/// println!("Format: {:?}", info.format);
+/// ```
+pub fn parse_binary_file(path: &Path) -> Result {
+ // Implementation
+}
+```
diff --git a/.cursor/rules/rust/error-handling.mdc b/.cursor/rules/rust/error-handling.mdc
new file mode 100644
index 0000000..4a5cb5e
--- /dev/null
+++ b/.cursor/rules/rust/error-handling.mdc
@@ -0,0 +1,62 @@
+---
+globs: **/*.rs
+---
+
+# Error Handling Standards for Stringy
+
+## Error Types
+
+Use `thiserror` for structured error types:
+
+```rust
+#[derive(Debug, thiserror::Error)]
+pub enum StringyError {
+ #[error("Unsupported file format")]
+ UnsupportedFormat,
+
+ #[error("File I/O error: {0}")]
+ IoError(#[from] std::io::Error),
+
+ #[error("Binary parsing error: {0}")]
+ ParseError(String),
+
+ #[error("Invalid encoding in string at offset {offset}")]
+ EncodingError { offset: u64 },
+
+ #[error("Configuration error: {0}")]
+ ConfigError(String),
+
+ #[error("Memory mapping error: {0}")]
+ MemoryMapError(String),
+}
+```
+
+## Error Context
+
+- Provide detailed error messages with actionable suggestions
+- Include relevant context information (file paths, offsets, section names, etc.)
+- Convert `goblin` errors to `StringyError` using `From` implementations
+
+## Error Propagation
+
+- Use `?` operator for error propagation
+- Convert between error types using `From` implementations
+- Avoid `unwrap()` and `expect()` in production code
+
+## Binary Parsing Errors
+
+- Handle malformed binary files gracefully
+- Provide clear error messages indicating what went wrong
+- Include file offset information when available
+
+## Error Recovery
+
+- Continue processing other sections when one section fails
+- Provide partial results when possible
+- Log errors but don't crash on non-critical failures
+
+## Error Testing
+
+- Test all error conditions (invalid formats, malformed binaries, I/O errors)
+- Validate error messages and context
+- Test error propagation through the parsing pipeline
diff --git a/.cursor/rules/rust/linting-rules.mdc b/.cursor/rules/rust/linting-rules.mdc
new file mode 100644
index 0000000..a0c6cc8
--- /dev/null
+++ b/.cursor/rules/rust/linting-rules.mdc
@@ -0,0 +1,452 @@
+---
+globs: **/*.rs
+---
+
+# Rust Linting Rules for Stringy
+
+This document explains the intent behind each clippy lint rule in Stringy's Cargo.toml. These rules are carefully chosen for a binary analysis tool and should not be disabled without understanding their purpose.
+
+## Critical Security Rules (FORBIDDEN)
+
+### `panic = "forbid"`
+
+**Intent**: Panics crash the binary analysis tool, leaving users without results. In a CLI tool context, this is unacceptable.
+**Why not to disable**: A crashed tool provides no value to users analyzing binaries.
+
+### `unwrap_used = "forbid"`
+
+**Intent**: Unwraps can cause unexpected crashes. Production CLI tools must handle all error cases gracefully.
+**Why not to disable**: Silent failures in error handling can mask parsing errors or provide incomplete results.
+
+### `await_holding_lock = "deny"`
+
+**Intent**: Prevents deadlocks in async code (though Stringy is synchronous, this rule still applies if async is added).
+**Why not to disable**: Deadlocks freeze the tool, providing no value to users.
+
+## Memory Safety Rules
+
+### `as_conversions = "warn"`
+
+**Intent**: Prevents potentially lossy type conversions that could corrupt data or cause security bypasses.
+**Why not to disable**: Data corruption in security monitoring can lead to false negatives.
+
+### `as_ptr_cast_mut = "warn"`
+
+**Intent**: Prevents dangerous mutable pointer casts that could lead to memory corruption or use-after-free.
+**Why not to disable**: Memory corruption in security-critical code is a major vulnerability.
+
+### `cast_ptr_alignment = "warn"`
+
+**Intent**: Ensures pointer alignment is correct to prevent undefined behavior and potential crashes.
+**Why not to disable**: Misaligned pointers can cause crashes or security issues.
+
+### `indexing_slicing = "warn"`
+
+**Intent**: Prevents out-of-bounds array access that could cause crashes when parsing binary data.
+**Why not to disable**: Buffer overflows when parsing binary sections can crash the tool or produce incorrect results.
+
+## Arithmetic Safety Rules
+
+### `arithmetic_side_effects = "warn"`
+
+**Intent**: Catches unintended arithmetic operations that could lead to incorrect calculations or security bypasses.
+**Why not to disable**: Incorrect arithmetic in security calculations can create vulnerabilities.
+
+### `integer_division = "warn"`
+
+**Intent**: Warns about potential division by zero that could crash the system.
+**Why not to disable**: Division by zero crashes can be exploited or cause monitoring failures.
+
+### `modulo_arithmetic = "warn"`
+
+**Intent**: Prevents modulo by zero errors that could crash the system.
+**Why not to disable**: Similar to division by zero, this can cause system crashes.
+
+### `float_cmp = "warn"`
+
+**Intent**: Ensures safe floating-point comparisons to prevent incorrect security decisions.
+**Why not to disable**: Incorrect float comparisons can lead to wrong threat assessments.
+
+## Performance Rules
+
+### `clone_on_ref_ptr = "warn"`
+
+**Intent**: Prevents unnecessary cloning of reference-counted types that wastes memory and CPU.
+**Why not to disable**: In a monitoring system, performance directly impacts detection capability.
+
+### `rc_buffer = "warn"`
+
+**Intent**: Optimizes reference-counted buffer usage for better performance.
+**Why not to disable**: Poor buffer management can cause memory pressure and slow detection.
+
+### `rc_mutex = "warn"`
+
+**Intent**: Warns about inefficient reference-counted mutex usage that can cause contention.
+**Why not to disable**: Lock contention can slow down threat detection.
+
+### `large_stack_arrays = "warn"`
+
+**Intent**: Prevents stack overflow from large arrays that could crash the system.
+**Why not to disable**: Stack overflows can crash the monitoring system.
+
+### `str_to_string = "warn"`
+
+**Intent**: Avoids unnecessary string allocations that waste memory.
+**Why not to disable**: String allocation overhead can impact performance in high-throughput monitoring.
+
+### `string_add = "warn"`
+
+**Intent**: Prevents inefficient string concatenation that can cause performance issues.
+**Why not to disable**: String concatenation performance matters in log processing.
+
+### `string_add_assign = "warn"`
+
+**Intent**: Optimizes string building operations for better performance.
+**Why not to disable**: String building is common in alert generation and logging.
+
+### `unused_async = "warn"`
+
+**Intent**: Removes unnecessary async overhead that wastes resources.
+**Why not to disable**: Unnecessary async can impact system responsiveness.
+
+## Correctness Rules
+
+### `correctness = { level = "deny", priority = -1 }`
+
+**Intent**: Denies all correctness issues that could lead to bugs or security vulnerabilities.
+**Why not to disable**: Correctness is fundamental to security monitoring.
+
+### `suspicious = { level = "warn", priority = -1 }`
+
+**Intent**: Warns about suspicious patterns that might indicate bugs or security issues.
+**Why not to disable**: Suspicious patterns often indicate real problems.
+
+### `perf = { level = "warn", priority = -1 }`
+
+**Intent**: Optimizes performance-critical code paths.
+**Why not to disable**: Performance directly impacts security monitoring effectiveness.
+
+## Error Handling Rules
+
+### `expect_used = "warn"`
+
+**Intent**: Prefers proper error handling over expect() for better error messages and handling.
+**Why not to disable**: Proper error handling is crucial for debugging security issues.
+
+### `map_err_ignore = "warn"`
+
+**Intent**: Ensures error transformations are meaningful and not ignored.
+**Why not to disable**: Ignored errors can mask security problems.
+
+### `let_underscore_must_use = "warn"`
+
+**Intent**: Prevents ignoring important return values that might indicate errors.
+**Why not to disable**: Ignored return values can hide security-relevant information.
+
+## Code Organization Rules
+
+### `missing_docs_in_private_items = "allow"`
+
+**Intent**: Private items don't need documentation to reduce noise.
+**Why this exception**: Private implementation details don't need public documentation.
+
+### `redundant_type_annotations = "warn"`
+
+**Intent**: Removes unnecessary type annotations that clutter code.
+**Why not to disable**: Clean code is easier to audit for security issues.
+
+### `ref_binding_to_reference = "warn"`
+
+**Intent**: Prevents unnecessary reference binding that can hide ownership issues.
+**Why not to disable**: Ownership issues can lead to use-after-free vulnerabilities.
+
+### `pattern_type_mismatch = "warn"`
+
+**Intent**: Ensures pattern matching is type-safe to prevent runtime errors.
+**Why not to disable**: Type mismatches can cause crashes or security bypasses.
+
+## Additional Security Rules
+
+### `dbg_macro = "warn"`
+
+**Intent**: Prevents debug output from accidentally reaching production logs.
+**Why not to disable**: Debug output in production can leak sensitive information.
+
+### `todo = "warn"`
+
+**Intent**: Ensures TODO comments are addressed before production deployment.
+**Why not to disable**: Unfinished code in production monitoring is a security risk.
+
+### `unimplemented = "warn"`
+
+**Intent**: Prevents unimplemented code from reaching production.
+**Why not to disable**: Unimplemented code will panic at runtime.
+
+### `unreachable = "warn"`
+
+**Intent**: Identifies unreachable code that might indicate logic errors.
+**Why not to disable**: Unreachable code often indicates security bypasses or bugs.
+
+## Performance and Resource Rules
+
+### `create_dir = "warn"`
+
+**Intent**: Ensures directory creation is handled properly to prevent race conditions.
+**Why not to disable**: Race conditions in file operations can cause security issues.
+
+### `exit = "warn"`
+
+**Intent**: Prevents unexpected program termination that could leave systems unmonitored.
+**Why not to disable**: Unexpected exits can leave security gaps.
+
+### `filetype_is_file = "warn"`
+
+**Intent**: Ensures proper file type checking to prevent security bypasses.
+**Why not to disable**: Incorrect file type checks can lead to security vulnerabilities.
+
+### `float_equality_without_abs = "warn"`
+
+**Intent**: Prevents incorrect floating-point comparisons that could affect security calculations.
+**Why not to disable**: Incorrect comparisons can lead to wrong security decisions.
+
+### `if_then_some_else_none = "warn"`
+
+**Intent**: Identifies potentially confusing conditional logic that might hide bugs.
+**Why not to disable**: Confusing logic can hide security vulnerabilities.
+
+### `lossy_float_literal = "warn"`
+
+**Intent**: Prevents precision loss in floating-point calculations that could affect security metrics.
+**Why not to disable**: Precision loss can lead to incorrect security assessments.
+
+### `match_same_arms = "warn"`
+
+**Intent**: Identifies duplicate match arms that might indicate copy-paste errors or logic bugs.
+**Why not to disable**: Duplicate arms can hide security logic errors.
+
+### `missing_assert_message = "warn"`
+
+**Intent**: Ensures assertions have meaningful messages for debugging security issues.
+**Why not to disable**: Good assertion messages are crucial for security debugging.
+
+### `mixed_read_write_in_expression = "warn"`
+
+**Intent**: Prevents confusing read/write operations that could hide race conditions.
+**Why not to disable**: Race conditions can lead to security vulnerabilities.
+
+### `mutex_atomic = "warn"`
+
+**Intent**: Suggests using atomic operations instead of mutexes for better performance.
+**Why not to disable**: Performance matters in high-throughput security monitoring.
+
+### `mutex_integer = "warn"`
+
+**Intent**: Suggests using atomic integers instead of mutex-protected integers.
+**Why not to disable**: Atomic operations are more efficient for simple data.
+
+### `non_ascii_literal = "warn"`
+
+**Intent**: Ensures non-ASCII literals are intentional and properly handled.
+**Why not to disable**: Improper handling of non-ASCII can lead to security bypasses.
+
+### `non_send_fields_in_send_ty = "warn"`
+
+**Intent**: Ensures thread safety in async code that processes security data.
+**Why not to disable**: Thread safety is crucial for concurrent security processing.
+
+### `partial_pub_fields = "warn"`
+
+**Intent**: Prevents partially public structs that can break encapsulation.
+**Why not to disable**: Encapsulation is important for security-critical data structures.
+
+### `same_name_method = "warn"`
+
+**Intent**: Prevents method name conflicts that could lead to confusion or bugs.
+**Why not to disable**: Confusing method names can hide security logic errors.
+
+### `self_named_module_files = "warn"`
+
+**Intent**: Ensures consistent module naming that aids in code organization and security auditing.
+**Why not to disable**: Consistent naming helps with security code reviews.
+
+### `semicolon_inside_block = "warn"`
+
+**Intent**: Prevents confusing semicolon usage that could change code behavior.
+**Why not to disable**: Incorrect semicolons can change security logic.
+
+### `semicolon_outside_block = "warn"`
+
+**Intent**: Ensures proper semicolon usage for clear code structure.
+**Why not to disable**: Clear code structure aids in security auditing.
+
+### `shadow_reuse = "warn"`
+
+**Intent**: Prevents variable shadowing that can hide bugs or security issues.
+**Why not to disable**: Variable shadowing can hide security logic errors.
+
+### `shadow_same = "warn"`
+
+**Intent**: Prevents shadowing variables with the same name.
+**Why not to disable**: Same-name shadowing can hide bugs.
+
+### `shadow_unrelated = "warn"`
+
+**Intent**: Prevents shadowing unrelated variables that can cause confusion.
+**Why not to disable**: Unrelated shadowing can hide security bugs.
+
+### `string_lit_as_bytes = "warn"`
+
+**Intent**: Prevents unnecessary string literal to bytes conversion.
+**Why not to disable**: Unnecessary conversions waste resources in high-throughput monitoring.
+
+### `string_slice = "warn"`
+
+**Intent**: Optimizes string slicing operations for better performance.
+**Why not to disable**: String operations are common in log processing and alert generation.
+
+### `suspicious_operation_groupings = "warn"`
+
+**Intent**: Identifies suspicious operation groupings that might indicate bugs.
+**Why not to disable**: Suspicious patterns often indicate real security issues.
+
+### `trailing_empty_array = "warn"`
+
+**Intent**: Prevents trailing empty arrays that can cause confusion or bugs.
+**Why not to disable**: Confusing array structures can hide security bugs.
+
+### `transmute_undefined_repr = "warn"`
+
+**Intent**: Prevents undefined behavior from transmute operations.
+**Why not to disable**: Undefined behavior can lead to security vulnerabilities.
+
+### `trivial_regex = "warn"`
+
+**Intent**: Identifies trivial regex patterns that could be simplified.
+**Why not to disable**: Simple patterns are easier to audit for security issues.
+
+### `undocumented_unsafe_blocks = "warn"`
+
+**Intent**: Ensures unsafe blocks are documented to explain their necessity.
+**Why not to disable**: Unsafe code must be carefully documented for security auditing.
+
+### `unnecessary_self_imports = "warn"`
+
+**Intent**: Removes unnecessary self imports that clutter code.
+**Why not to disable**: Clean code is easier to audit for security issues.
+
+### `unseparated_literal_suffix = "warn"`
+
+**Intent**: Ensures proper literal suffix formatting for readability.
+**Why not to disable**: Readable code aids in security auditing.
+
+### `unused_peekable = "warn"`
+
+**Intent**: Removes unused peekable iterators that waste resources.
+**Why not to disable**: Unused resources can impact performance in monitoring systems.
+
+### `unused_rounding = "warn"`
+
+**Intent**: Removes unused rounding operations that waste CPU cycles.
+**Why not to disable**: CPU cycles matter in high-throughput security monitoring.
+
+### `use_debug = "warn"`
+
+**Intent**: Prevents debug formatting in production code that can leak information.
+**Why not to disable**: Debug formatting can leak sensitive information in logs.
+
+### `verbose_file_reads = "warn"`
+
+**Intent**: Optimizes file reading operations for better performance.
+**Why not to disable**: File I/O performance matters in log processing.
+
+### `wildcard_enum_match_arm = "warn"`
+
+**Intent**: Prevents wildcard enum matching that can hide security logic errors.
+**Why not to disable**: Wildcard matching can hide important security cases.
+
+### `zero_sized_map_values = "warn"`
+
+**Intent**: Identifies zero-sized map values that might indicate inefficient data structures.
+**Why not to disable**: Inefficient data structures can impact monitoring performance.
+
+## Pragmatic Exceptions
+
+These exceptions are allowed currently while the project is in early development.
+
+### `missing_errors_doc = "allow"`
+
+**Intent**: Error documentation can be verbose and obvious from context.
+**Why this exception**: Reduces noise while maintaining code clarity.
+
+### `missing_panics_doc = "allow"`
+
+**Intent**: Panic documentation is often obvious from the panic message.
+**Why this exception**: Reduces documentation overhead for obvious cases.
+
+### `must_use_candidate = "allow"`
+
+**Intent**: Some must-use candidates are too noisy for this project.
+**Why this exception**: Balances safety with developer productivity.
+
+### `cast_possible_truncation = "allow"`
+
+**Intent**: Some truncation warnings are too noisy for this project.
+**Why this exception**: Reduces noise while maintaining type safety.
+
+### `cast_precision_loss = "allow"`
+
+**Intent**: Some precision loss warnings are acceptable in this context.
+**Why this exception**: Balances precision with practical considerations.
+
+### `cast_sign_loss = "allow"`
+
+**Intent**: Some sign loss warnings are acceptable in this context.
+**Why this exception**: Reduces noise while maintaining correctness.
+
+### `module_name_repetitions = "allow"`
+
+**Intent**: Module name repetitions are sometimes necessary for clarity.
+**Why this exception**: Allows clear module organization.
+
+### `similar_names = "allow"`
+
+**Intent**: Similar names are sometimes necessary for related functionality.
+**Why this exception**: Reduces noise while maintaining clear naming.
+
+### `too_many_lines = "allow"`
+
+**Intent**: Some modules are naturally large due to their complexity.
+**Why this exception**: Allows complex modules when necessary.
+
+### `type_complexity = "allow"`
+
+**Intent**: Complex types are sometimes necessary for security-critical code.
+**Why this exception**: Balances complexity with functionality.
+
+### `async_yields_async = "allow"`
+
+**Intent**: Some async yields are necessary for proper async patterns.
+**Why this exception**: Allows necessary async patterns.
+
+### `large_futures = "allow"`
+
+**Intent**: Some futures are naturally large due to their complexity.
+**Why this exception**: Allows complex futures when necessary.
+
+### `result_large_err = "allow"`
+
+**Intent**: Some error types are naturally large due to their complexity.
+**Why this exception**: Allows complex error types when necessary.
+
+### `cargo_common_metadata = "allow"`
+
+**Intent**: Common metadata warnings are too noisy for this project.
+**Why this exception**: Reduces noise while maintaining package metadata.
+
+## Summary
+
+These linting rules are carefully chosen for a binary analysis tool. Each rule serves a specific purpose in preventing crashes, ensuring performance, and maintaining code quality. They should not be disabled without understanding their intent and the implications of doing so.
+
+## AI Assistant Restrictions
+
+**CRITICAL**: AI assistants are explicitly prohibited from removing clippy restrictions or allowing linters marked as `deny` without explicit permission. All `-D warnings` and `deny` attributes must be preserved. Any changes to linting configuration require explicit user approval.
diff --git a/.cursor/rules/rust/performance-optimization.mdc b/.cursor/rules/rust/performance-optimization.mdc
new file mode 100644
index 0000000..1cdddfc
--- /dev/null
+++ b/.cursor/rules/rust/performance-optimization.mdc
@@ -0,0 +1,275 @@
+---
+globs: **/benches/**/*.rs,**/*bench*.rs,**/*performance*.rs
+alwaysApply: false
+---
+
+# Performance Optimization Standards for Stringy
+
+## High-Performance Binary Processing
+
+Stringy uses idiomatic best practices for high-performance binary analysis:
+
+- **Zero-Copy Parsing**: Use `goblin` for efficient binary format parsing without unnecessary allocations
+- **Memory-Mapped Files**: Consider memory-mapped I/O for large binary files
+- **Lazy Evaluation**: Process sections on-demand rather than loading everything into memory
+- **Efficient String Extraction**: Use slice-based operations for string extraction to avoid allocations
+
+## Performance Targets
+
+Stringy must meet strict performance requirements:
+
+- **Large Binary Processing**: Handle binaries up to several GB efficiently
+- **Memory Usage**: Minimize memory footprint, especially for large binaries
+- **Processing Speed**: Process typical binaries (10-100 MB) in < 1 second
+- **String Extraction**: Extract strings from sections efficiently without excessive allocations
+- **Format Detection**: Detect binary format in < 10ms
+
+## Benchmarking with Criterion
+
+Use Criterion for performance benchmarking:
+
+```rust
+use criterion::{black_box, criterion_group, criterion_main, Criterion};
+use std::fs;
+
+fn benchmark_format_detection(c: &mut Criterion) {
+ let mut group = c.benchmark_group("format_detection");
+
+ let elf_data = fs::read("test_data/sample.elf").unwrap();
+ let pe_data = fs::read("test_data/sample.exe").unwrap();
+ let macho_data = fs::read("test_data/sample.macho").unwrap();
+
+ group.bench_function("detect_elf", |b| {
+ b.iter(|| black_box(stringy::container::detect_format(&elf_data)))
+ });
+
+ group.bench_function("detect_pe", |b| {
+ b.iter(|| black_box(stringy::container::detect_format(&pe_data)))
+ });
+
+ group.bench_function("detect_macho", |b| {
+ b.iter(|| black_box(stringy::container::detect_format(&macho_data)))
+ });
+
+ group.finish();
+}
+
+fn benchmark_string_extraction(c: &mut Criterion) {
+ let mut group = c.benchmark_group("string_extraction");
+
+ let binary_data = fs::read("test_data/large_binary").unwrap();
+ let container_info = stringy::container::parse_binary(&binary_data).unwrap();
+
+ group.bench_function("extract_strings", |b| {
+ b.iter(|| {
+ black_box(stringy::extraction::extract_strings(
+ &binary_data,
+ &container_info.sections
+ ))
+ })
+ });
+
+ group.finish();
+}
+
+criterion_group!(
+ benches,
+ benchmark_format_detection,
+ benchmark_string_extraction
+);
+criterion_main!(benches);
+```
+
+## Memory Management
+
+Implement efficient memory usage patterns for binary processing:
+
+```rust
+use std::fs::File;
+use memmap2::MmapOptions;
+
+// Memory-mapped file for large binaries
+fn process_large_binary(path: &Path) -> Result {
+ let file = File::open(path)?;
+ let mmap = unsafe { MmapOptions::new().map(&file)? };
+
+ // Process memory-mapped data without loading entire file
+ let format = detect_format(&mmap);
+ let parser = create_parser(format)?;
+ let container_info = parser.parse(&mmap)?;
+
+ Ok(container_info)
+}
+
+// Slice-based string extraction to avoid allocations
+fn extract_strings_efficient(data: &[u8]) -> Vec {
+ let mut strings = Vec::new();
+ let mut i = 0;
+
+ while i < data.len() {
+ if let Some(string) = find_string_at_offset(data, i) {
+ strings.push(string);
+ i += string.length as usize;
+ } else {
+ i += 1;
+ }
+ }
+
+ strings
+}
+```
+
+## Binary Parsing Optimization
+
+Optimize binary format parsing:
+
+```rust
+// Use goblin's zero-copy parsing
+fn parse_elf_efficient(data: &[u8]) -> Result {
+ let elf = goblin::elf::Elf::parse(data)?;
+
+ // Process sections without cloning
+ let sections: Vec = elf.section_headers
+ .iter()
+ .enumerate()
+ .filter_map(|(idx, header)| {
+ // Only process sections that are likely to contain strings
+ if is_string_section(&elf, idx) {
+ Some(parse_section_info(&elf, header, idx))
+ } else {
+ None
+ }
+ })
+ .collect();
+
+ Ok(ContainerInfo {
+ format: BinaryFormat::Elf,
+ sections,
+ imports: extract_imports(&elf)?,
+ exports: extract_exports(&elf)?,
+ })
+}
+```
+
+## String Extraction Optimization
+
+Optimize string extraction algorithms:
+
+```rust
+// Efficient UTF-8 string extraction
+fn extract_utf8_strings(data: &[u8], min_len: usize) -> Vec {
+ let mut strings = Vec::new();
+ let mut start = None;
+
+ for (i, &byte) in data.iter().enumerate() {
+ if byte.is_ascii() && byte >= 0x20 && byte < 0x7F {
+ if start.is_none() {
+ start = Some(i);
+ }
+ } else if byte == 0 {
+ if let Some(s) = start {
+ let len = i - s;
+ if len >= min_len {
+ if let Ok(text) = std::str::from_utf8(&data[s..i]) {
+ strings.push(FoundString {
+ text: text.to_string(),
+ offset: s as u64,
+ length: len as u32,
+ // ... other fields
+ });
+ }
+ }
+ }
+ start = None;
+ } else {
+ start = None;
+ }
+ }
+
+ strings
+}
+```
+
+## Section Processing Optimization
+
+Process sections efficiently:
+
+```rust
+// Process sections in priority order (highest weight first)
+fn process_sections_prioritized(
+ data: &[u8],
+ sections: &[SectionInfo]
+) -> Vec {
+ let mut sections = sections.to_vec();
+
+ // Sort by weight (descending) to process high-value sections first
+ sections.sort_by(|a, b| b.weight.partial_cmp(&a.weight).unwrap());
+
+ let mut all_strings = Vec::new();
+
+ for section in sections {
+ if let Ok(strings) = extract_strings_from_section(data, §ion) {
+ all_strings.extend(strings);
+ }
+ }
+
+ all_strings
+}
+```
+
+## Performance Testing
+
+Include performance regression tests:
+
+```rust
+#[test]
+fn test_format_detection_performance() {
+ let data = include_bytes!("../test_data/sample.elf");
+ let start = Instant::now();
+
+ for _ in 0..1000 {
+ let _format = detect_format(data);
+ }
+
+ let duration = start.elapsed();
+
+ // Must complete 1000 detections in < 100ms
+ assert!(duration < Duration::from_millis(100));
+}
+
+#[test]
+fn test_large_binary_processing() {
+ let data = fs::read("test_data/large_binary").unwrap();
+ let start = Instant::now();
+
+ let format = detect_format(&data);
+ let parser = create_parser(format).unwrap();
+ let _container_info = parser.parse(&data).unwrap();
+
+ let duration = start.elapsed();
+
+ // Must process 100MB binary in < 2 seconds
+ assert!(duration < Duration::from_secs(2));
+}
+```
+
+## Memory Usage Testing
+
+Test memory efficiency:
+
+```rust
+#[test]
+fn test_memory_efficiency() {
+ let data = fs::read("test_data/large_binary").unwrap();
+
+ // Process binary multiple times
+ for _ in 0..10 {
+ let format = detect_format(&data);
+ let parser = create_parser(format).unwrap();
+ let _container_info = parser.parse(&data).unwrap();
+ }
+
+ // Memory should not grow unbounded
+ // (In a real test, you'd measure actual memory usage)
+}
+```
diff --git a/.cursor/rules/rust/rust-standards.mdc b/.cursor/rules/rust/rust-standards.mdc
new file mode 100644
index 0000000..f17a407
--- /dev/null
+++ b/.cursor/rules/rust/rust-standards.mdc
@@ -0,0 +1,42 @@
+---
+globs: **/*.rs
+alwaysApply: false
+---
+# Rust Coding Standards for Stringy
+
+## Language and Edition
+
+- Always use **Rust 2024 Edition** (MSRV: 1.91+) as specified in [Cargo.toml](mdc:Cargo.toml)
+- Follow the package configuration in [Cargo.toml](mdc:Cargo.toml) with `unsafe_code = "forbid"` and `warnings = "deny"`
+
+## Code Quality Requirements
+
+- **Zero warnings policy**: All code must pass `cargo clippy -- -D warnings`
+- **No unsafe code**: `unsafe_code = "forbid"` is enforced at package level
+- **Formatting**: Use standard `rustfmt` with project-specific line length
+- **Error Handling**: Use `thiserror` for structured errors
+- **Synchronous Design**: This is a synchronous CLI tool - no async runtime needed
+- **Focused and Manageable Files**: Source files should be focused and manageable. Large files should be split into smaller, more focused files; no larger than 500-600 lines, when possible.
+- **Strictness**: `warnings = "deny"` enforced at package level; any use of `allow` **MUST** be accompanied by a justification in the code and cannot be applied to entire files or modules.
+
+## Code Organization
+
+- Use trait-based interfaces for format parsers (`ContainerParser` trait)
+- Implement comprehensive error handling with `thiserror`
+- Use strongly-typed structures with `serde` for serialization
+- Organize by domain: `container/`, `extraction/`, `classification/`, `output/`, `types/`
+
+## Module Structure
+
+- **container/**: Binary format detection and parsing (ELF, PE, Mach-O)
+- **extraction/**: String extraction algorithms
+- **classification/**: Semantic analysis and tagging
+- **output/**: Result formatting (JSON, human-readable, YARA-friendly)
+- **types/**: Core data structures and error handling
+
+## Testing Requirements
+
+- Include comprehensive tests with `insta` for snapshot testing
+- Test binary format detection and parsing
+- Test string extraction from various formats
+- Use `tempfile` for temporary binary files in tests
diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
index a258290..81a62ae 100644
--- a/.github/workflows/docs.yml
+++ b/.github/workflows/docs.yml
@@ -36,7 +36,7 @@ jobs:
uses: jontze/action-mdbook@v4
with:
token: ${{ secrets.GITHUB_TOKEN }}
- mdbook-version: latest
+ mdbook-version: 0.4.52
use-mermaid: true
use-toc: true
use-admonish: true
@@ -47,17 +47,17 @@ jobs:
- name: Install mdbook plugins
run: cargo binstall mdbook-tabs mdbook-i18n-helpers mdbook-alerts mdbook-yml-header mdbook-image-size --no-confirm
- - name: Build rustdoc
- run: |
- cargo doc --no-deps --document-private-items --target-dir target
- mkdir -p docs/book/api
- cp -r target/doc/* docs/book/api/
-
- name: Build mdBook
run: |
cd docs
mdbook build
+ - name: Build rustdoc
+ run: |
+ cargo doc --no-deps --document-private-items
+ mkdir -p docs/book/api
+ cp -r target/doc/* docs/book/api/
+
- name: Setup Pages
if: github.ref == 'refs/heads/main'
uses: actions/configure-pages@v5
diff --git a/Cargo.toml b/Cargo.toml
index 36f62e6..c8d5da9 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -1,7 +1,7 @@
[package]
name = "stringy"
version = "0.1.0"
-edition = "2021"
+edition = "2024"
authors = ["UncleSp1d3r "]
description = "A smarter alternative to the strings command that leverages format-specific knowledge"
license = "Apache-2.0"
@@ -19,18 +19,22 @@ name = "stringy"
path = "src/main.rs"
[dependencies]
-clap = { version = "4.5.48", features = ["derive"] }
-goblin = "0.10.1"
+clap = { version = "4.5.51", features = ["derive"] }
+goblin = "0.10.3"
serde = { version = "1.0.228", features = ["derive"] }
serde_json = "1.0"
thiserror = "2.0.17"
[dev-dependencies]
criterion = "0.7.0"
-insta = "1.0"
-tempfile = "3.8"
+insta = "1.43"
+tempfile = "3.23"
# The profile that 'dist' will build with
[profile.dist]
inherits = "release"
lto = "thin"
+
+[[bench]]
+name = "elf"
+harness = false
diff --git a/benches/elf.rs b/benches/elf.rs
new file mode 100644
index 0000000..c98465e
--- /dev/null
+++ b/benches/elf.rs
@@ -0,0 +1,72 @@
+use criterion::{Criterion, criterion_group, criterion_main};
+use std::hint::black_box;
+use stringy::container::{ContainerParser, ElfParser};
+
+fn bench_elf_full_parse(c: &mut Criterion) {
+ // Use the current test binary as a sample ELF file
+ let current_exe = std::env::current_exe().expect("Failed to get current executable");
+ let data = std::fs::read(¤t_exe).expect("Failed to read test binary");
+
+ // Only benchmark if it's actually an ELF file
+ if !stringy::container::ElfParser::detect(&data) {
+ return;
+ }
+
+ let parser = ElfParser::new();
+ c.bench_function("elf_full_parse", |b| {
+ b.iter(|| {
+ let _ = parser.parse(black_box(&data));
+ });
+ });
+}
+
+fn bench_elf_parse_with_imports(c: &mut Criterion) {
+ let current_exe = std::env::current_exe().expect("Failed to get current executable");
+ let data = std::fs::read(¤t_exe).expect("Failed to read test binary");
+
+ if !stringy::container::ElfParser::detect(&data) {
+ return;
+ }
+
+ let parser = ElfParser::new();
+ c.bench_function("elf_parse_with_imports", |b| {
+ b.iter(|| {
+ if let Ok(container_info) = parser.parse(black_box(&data)) {
+ // Access imports to ensure mapping is performed
+ let _import_count = container_info.imports.len();
+ let _imports_with_libs = container_info
+ .imports
+ .iter()
+ .filter(|imp| imp.library.is_some())
+ .count();
+ }
+ });
+ });
+}
+
+fn bench_elf_parse_with_exports(c: &mut Criterion) {
+ let current_exe = std::env::current_exe().expect("Failed to get current executable");
+ let data = std::fs::read(¤t_exe).expect("Failed to read test binary");
+
+ if !stringy::container::ElfParser::detect(&data) {
+ return;
+ }
+
+ let parser = ElfParser::new();
+ c.bench_function("elf_parse_with_exports", |b| {
+ b.iter(|| {
+ if let Ok(container_info) = parser.parse(black_box(&data)) {
+ // Access exports to ensure filtering is performed
+ let _export_count = container_info.exports.len();
+ }
+ });
+ });
+}
+
+criterion_group!(
+ elf_benches,
+ bench_elf_full_parse,
+ bench_elf_parse_with_imports,
+ bench_elf_parse_with_exports
+);
+criterion_main!(elf_benches);
diff --git a/deny.toml b/deny.toml
index 56fc895..7c1ba4c 100644
--- a/deny.toml
+++ b/deny.toml
@@ -18,10 +18,12 @@ unused-allowed-license = "allow"
allow = [
"Apache-2.0",
"Apache-2.0 WITH LLVM-exception",
+ "CC0-1.0",
"MIT",
"BSD-3-Clause",
"ISC",
"Unicode-3.0",
+ "Unlicense",
"Zlib",
]
include-dev = true
@@ -38,12 +40,9 @@ deny = [
{ crate = "openssl-sys", use-instead = "rustls" },
"libssh2-sys",
{ crate = "cmake", use-instead = "cc" },
- { crate = "windows", reason = "bloated and unnecessary", use-instead = "ideally inline bindings, practically, windows-sys" },
]
skip = []
-skip-tree = [
- { crate = "windows-sys", reason = "a foundational crate for many that bumps far too frequently to ever have a shared version" },
-]
+skip-tree = []
[advisories]
@@ -55,3 +54,10 @@ ignore = []
# Allow crates from crates.io
unknown-registry = "deny"
unknown-git = "deny"
+allow-registry = ["https://github.com/rust-lang/crates.io-index"]
+
+[sources.allow-org]
+# Allow specific organizations for git sources if needed
+github = []
+gitlab = []
+bitbucket = []
diff --git a/docs/src/binary-formats.md b/docs/src/binary-formats.md
index c65d698..d06cfa5 100644
--- a/docs/src/binary-formats.md
+++ b/docs/src/binary-formats.md
@@ -22,6 +22,37 @@ Used primarily on Linux and other Unix-like systems.
- **Dynamic Strings**: Process `.dynstr` for library names and symbols
- **Section Flags**: Use `SHF_EXECINSTR` and `SHF_WRITE` for classification
- **Virtual Addresses**: Map file offsets to runtime addresses
+- **Dynamic Linking**: Parse `DT_NEEDED` entries to extract library dependencies
+- **Symbol Types**: Support for functions (STT_FUNC), objects (STT_OBJECT), TLS variables (STT_TLS), and indirect functions (STT_GNU_IFUNC)
+- **Symbol Visibility**: Filter hidden and internal symbols from exports (STV_HIDDEN, STV_INTERNAL)
+
+### Enhanced Symbol Extraction
+
+The ELF parser now provides comprehensive symbol extraction with:
+
+1. **Import Detection**: Identifies all undefined symbols (SHN_UNDEF) that need runtime resolution
+
+ - Supports multiple symbol types: functions, objects, TLS variables, and indirect functions
+ - Handles both global and weak bindings
+ - Maps symbols to their providing libraries using version information
+
+2. **Export Detection**: Extracts all globally visible defined symbols
+
+ - Filters out hidden (STV_HIDDEN) and internal (STV_INTERNAL) symbols
+ - Includes both strong and weak symbols
+ - Supports all relevant symbol types
+
+3. **Library Dependencies**: Extracts DT_NEEDED entries from the dynamic section
+
+ - Provides list of required shared libraries
+ - Used in conjunction with version information for symbol-to-library mapping
+
+4. **Symbol-to-Library Mapping**: Maps imported symbols to their providing libraries
+
+ - Uses ELF version tables (versym and verneed) for best-effort attribution
+ - Process: versym index → verneed entry → library filename
+ - Falls back to heuristics for unversioned symbols (e.g., common libc symbols)
+ - Returns `None` when version information is unavailable or ambiguous
### Implementation Details
@@ -40,9 +71,79 @@ impl ElfParser {
// ... more classifications
}
}
+
+ fn extract_imports(&self, elf: &Elf, libraries: &[String]) -> Vec {
+ // Extract undefined symbols from dynamic symbol table
+ // Supports STT_FUNC, STT_OBJECT, STT_TLS, STT_GNU_IFUNC, STT_NOTYPE
+ // Handles both STB_GLOBAL and STB_WEAK bindings
+ // Maps symbols to libraries using version information
+ }
+
+ fn extract_exports(&self, elf: &Elf) -> Vec {
+ // Extract defined symbols with global/weak binding
+ // Filters out STV_HIDDEN and STV_INTERNAL symbols
+ // Includes all relevant symbol types
+ }
+
+ fn extract_needed_libraries(&self, elf: &Elf) -> Vec {
+ // Parse DT_NEEDED entries from dynamic section
+ // Returns list of required shared library names
+ }
+
+ fn get_symbol_providing_library(
+ &self,
+ elf: &Elf,
+ sym_index: usize,
+ libraries: &[String],
+ ) -> Option {
+ // 1. Get version index from versym table for this symbol
+ // 2. Look up version in verneed to find library name
+ // 3. Match with DT_NEEDED entries
+ // 4. Fallback to heuristics for unversioned symbols
+ }
}
```
+### Library Dependency Mapping
+
+The ELF parser implements symbol-to-library mapping using ELF version information:
+
+1. **Version Symbol Table (versym)**: Maps each dynamic symbol to a version index
+
+ - Index 0 (VER_NDX_LOCAL): Local symbol, not available externally
+ - Index 1 (VER_NDX_GLOBAL): Global symbol, no specific version
+ - Index ≥ 2: Versioned symbol, references verneed entry
+
+2. **Version Needed Table (verneed)**: Lists library dependencies with version requirements
+
+ - Each entry contains a library filename (from DT_NEEDED)
+ - Auxiliary entries specify version names and indices
+ - Links version indices to specific libraries
+
+3. **Mapping Process**:
+
+ ```
+ Symbol → versym[sym_index] → version_index → verneed lookup → library_name
+ ```
+
+4. **Fallback Strategies**:
+
+ - For unversioned symbols: Attempt to match common symbols (e.g., `printf`, `malloc`) to libc
+ - If only one library is needed: Attribute to that library (least accurate)
+ - Otherwise: Return `None` to avoid false positives
+
+### Limitations
+
+ELF's indirect linking model means symbol-to-library mapping is best-effort:
+
+- **Accuracy**: Version-based mapping is accurate when version information is present, but many binaries lack version info
+- **Unversioned Symbols**: Symbols without version information cannot be definitively mapped without relocation analysis
+- **Relocation Tables**: PLT/GOT relocations would provide definitive mapping but require complex analysis
+- **Static Linking**: Statically linked binaries have no dynamic section, so all imports have `library: None`
+- **Stripped Binaries**: Stripped binaries may lack symbol tables entirely
+
+The current implementation is sufficient for most string classification use cases where approximate library attribution is acceptable.
+
## PE (Portable Executable)
Used on Windows for executables, DLLs, and drivers.
diff --git a/src/container/elf.rs b/src/container/elf.rs
index 6529e41..6d0a7dc 100644
--- a/src/container/elf.rs
+++ b/src/container/elf.rs
@@ -82,36 +82,37 @@ impl ElfParser {
/// Extract import information from ELF dynamic section
/// Imports are symbols that are undefined (SHN_UNDEF) and need to be resolved at runtime
- fn extract_imports(&self, elf: &Elf) -> Vec {
+ fn extract_imports(&self, elf: &Elf, libraries: &[String]) -> Vec {
let mut imports = Vec::new();
+ let mut seen_names = HashSet::new();
// Extract from dynamic symbol table
- for sym in &elf.dynsyms {
+ for (sym_index, sym) in elf.dynsyms.iter().enumerate() {
// Import symbols are:
// - Undefined (st_shndx == SHN_UNDEF)
// - Global or weak binding
- // - Functions or objects
+ // - Functions, objects, TLS variables, or IFuncs
if sym.st_shndx == (goblin::elf::section_header::SHN_UNDEF as usize)
&& (sym.st_bind() == goblin::elf::sym::STB_GLOBAL
|| sym.st_bind() == goblin::elf::sym::STB_WEAK)
&& (sym.st_type() == goblin::elf::sym::STT_FUNC
|| sym.st_type() == goblin::elf::sym::STT_OBJECT
+ || sym.st_type() == goblin::elf::sym::STT_TLS
+ || sym.st_type() == goblin::elf::sym::STT_GNU_IFUNC
|| sym.st_type() == goblin::elf::sym::STT_NOTYPE)
+ && let Some(name) = elf.dynstrtab.get_at(sym.st_name)
+ && !name.is_empty()
+ && seen_names.insert(name.to_string())
{
- if let Some(name) = elf.dynstrtab.get_at(sym.st_name) {
- // Skip empty names
- if !name.is_empty() {
- imports.push(ImportInfo {
- name: name.to_string(),
- library: self.extract_library_from_needed(elf, name),
- address: if sym.st_value != 0 {
- Some(sym.st_value)
- } else {
- None
- },
- });
- }
- }
+ imports.push(ImportInfo {
+ name: name.to_string(),
+ library: self.get_symbol_providing_library(elf, sym_index, libraries),
+ address: if sym.st_value != 0 {
+ Some(sym.st_value)
+ } else {
+ None
+ },
+ });
}
}
@@ -122,42 +123,137 @@ impl ElfParser {
|| sym.st_bind() == goblin::elf::sym::STB_WEAK)
&& (sym.st_type() == goblin::elf::sym::STT_FUNC
|| sym.st_type() == goblin::elf::sym::STT_OBJECT
+ || sym.st_type() == goblin::elf::sym::STT_TLS
+ || sym.st_type() == goblin::elf::sym::STT_GNU_IFUNC
|| sym.st_type() == goblin::elf::sym::STT_NOTYPE)
+ && let Some(name) = elf.strtab.get_at(sym.st_name)
+ && !name.is_empty()
+ && seen_names.insert(name.to_string())
{
- if let Some(name) = elf.strtab.get_at(sym.st_name) {
- if !name.is_empty() {
- // Avoid duplicates from dynamic symbol table
- if !imports.iter().any(|imp| imp.name == name) {
- imports.push(ImportInfo {
- name: name.to_string(),
- library: None, // Static symbols don't have library info
- address: if sym.st_value != 0 {
- Some(sym.st_value)
- } else {
- None
- },
- });
- }
+ imports.push(ImportInfo {
+ name: name.to_string(),
+ library: None, // Static symbols don't have library info
+ address: if sym.st_value != 0 {
+ Some(sym.st_value)
+ } else {
+ None
+ },
+ });
+ }
+ }
+
+ imports
+ }
+
+ /// Extract DT_NEEDED entries (library dependencies) from ELF dynamic section
+ ///
+ /// Returns a list of required shared library names that the binary depends on.
+ /// These are used in conjunction with version information to map symbols to their
+ /// providing libraries.
+ fn extract_needed_libraries(&self, elf: &Elf) -> Vec {
+ if let Some(ref dynamic) = elf.dynamic {
+ dynamic
+ .get_libraries(&elf.dynstrtab)
+ .iter()
+ .map(|&s| s.to_string())
+ .collect()
+ } else {
+ Vec::new()
+ }
+ }
+
+ /// Get the library that provides a symbol using version information
+ /// This is a best-effort approach using versym and verneed tables
+ fn get_symbol_providing_library(
+ &self,
+ elf: &Elf,
+ sym_index: usize,
+ libraries: &[String],
+ ) -> Option {
+ // If no libraries are available, return None
+ if libraries.is_empty() {
+ return None;
+ }
+
+ // Try to resolve version information for this symbol
+ if let Some(version_index) = self.resolve_versym(elf, sym_index) {
+ // Version index 0 (VER_NDX_LOCAL) and 1 (VER_NDX_GLOBAL) are special
+ // and don't correspond to specific libraries
+ if version_index >= 2
+ && let Some((library_name, _version_name)) =
+ self.parse_verneed_entry(elf, version_index)
+ {
+ // Match the library name from verneed with DT_NEEDED entries
+ for lib in libraries {
+ if lib.contains(&library_name) || library_name.contains(lib) {
+ return Some(lib.clone());
}
}
+ // If exact match not found, return the library name from verneed
+ return Some(library_name);
}
}
- imports
+ // Fallback: For common libc symbols, attribute to first libc library found
+ // This is a heuristic and may not always be accurate
+ if let Some(libc_lib) = libraries.iter().find(|lib| {
+ lib.contains("libc") || lib.contains("libSystem") || lib.contains("libc.so")
+ }) {
+ return Some(libc_lib.clone());
+ }
+
+ // Last resort: return first library (least accurate)
+ if libraries.len() == 1 {
+ return Some(libraries[0].clone());
+ }
+
+ None
}
- /// Attempt to extract library information from DT_NEEDED entries
- /// This is a best-effort approach since ELF doesn't directly link symbols to libraries
- fn extract_library_from_needed(&self, elf: &Elf, _symbol_name: &str) -> Option {
- // For now, we can't reliably determine which specific library a symbol comes from
- // in ELF without additional information like version symbols or relocation data.
- // This would require more complex analysis of the dynamic linking process.
+ /// Resolve version symbol index from versym table
+ fn resolve_versym(&self, elf: &Elf, sym_index: usize) -> Option {
+ // Check if versym table exists and has entry for this symbol
+ let versym = elf.versym.as_ref()?;
+ if versym.is_empty() || sym_index >= versym.len() {
+ return None;
+ }
- // We could potentially return the first DT_NEEDED entry as a fallback,
- // but that would be misleading. Better to return None for accuracy.
+ if let Some(versym_entry) = versym.get_at(sym_index) {
+ let version_index = versym_entry.vs_val;
+ // VER_NDX_LOCAL (0) and VER_NDX_GLOBAL (1) are special values
+ // that don't correspond to versioned symbols
+ if version_index >= 2 {
+ return Some(version_index);
+ }
+ }
+
+ None
+ }
+
+ /// Parse verneed entry to extract library name and version name
+ /// Returns (library_name, version_name) if found
+ fn parse_verneed_entry(&self, elf: &Elf, version_index: u16) -> Option<(String, String)> {
+ let verneed = elf.verneed.as_ref()?;
+
+ // Iterate through verneed entries to find the one matching version_index
+ for verneed_entry in verneed.iter() {
+ // Extract library name from verneed entry
+ let library_name = elf
+ .dynstrtab
+ .get_at(verneed_entry.vn_file)
+ .unwrap_or("")
+ .to_string();
+
+ // Check auxiliary versions in this verneed entry
+ for aux in verneed_entry.iter() {
+ if aux.vna_other == version_index {
+ // Found matching version, extract version name
+ let version_name = elf.dynstrtab.get_at(aux.vna_name).unwrap_or("").to_string();
+ return Some((library_name, version_name));
+ }
+ }
+ }
- // Future enhancement: analyze PLT/GOT relocations to match symbols to libraries
- let _ = elf; // Suppress unused parameter warning
None
}
@@ -168,20 +264,26 @@ impl ElfParser {
// Extract from dynamic symbol table
for sym in &elf.dynsyms {
+ // Export symbols must be:
+ // - Defined (not SHN_UNDEF)
+ // - Global or weak binding
+ // - Visible (not hidden or internal)
+ // - Have a valid address
if (sym.st_bind() == goblin::elf::sym::STB_GLOBAL
|| sym.st_bind() == goblin::elf::sym::STB_WEAK)
&& sym.st_shndx != (goblin::elf::section_header::SHN_UNDEF as usize)
&& sym.st_value != 0
+ && sym.st_visibility() != goblin::elf::sym::STV_HIDDEN
+ && sym.st_visibility() != goblin::elf::sym::STV_INTERNAL
+ && let Some(name) = elf.dynstrtab.get_at(sym.st_name)
+ && !name.is_empty()
+ && seen_names.insert(name.to_string())
{
- if let Some(name) = elf.dynstrtab.get_at(sym.st_name) {
- if !name.is_empty() && seen_names.insert(name.to_string()) {
- exports.push(ExportInfo {
- name: name.to_string(),
- address: sym.st_value,
- ordinal: None, // ELF doesn't use ordinals
- });
- }
- }
+ exports.push(ExportInfo {
+ name: name.to_string(),
+ address: sym.st_value,
+ ordinal: None, // ELF doesn't use ordinals
+ });
}
}
@@ -191,19 +293,22 @@ impl ElfParser {
|| sym.st_bind() == goblin::elf::sym::STB_WEAK)
&& sym.st_shndx != (goblin::elf::section_header::SHN_UNDEF as usize)
&& sym.st_value != 0
+ && sym.st_visibility() != goblin::elf::sym::STV_HIDDEN
+ && sym.st_visibility() != goblin::elf::sym::STV_INTERNAL
&& (sym.st_type() == goblin::elf::sym::STT_FUNC
|| sym.st_type() == goblin::elf::sym::STT_OBJECT
+ || sym.st_type() == goblin::elf::sym::STT_TLS
+ || sym.st_type() == goblin::elf::sym::STT_GNU_IFUNC
|| sym.st_type() == goblin::elf::sym::STT_NOTYPE)
+ && let Some(name) = elf.strtab.get_at(sym.st_name)
+ && !name.is_empty()
+ && seen_names.insert(name.to_string())
{
- if let Some(name) = elf.strtab.get_at(sym.st_name) {
- if !name.is_empty() && seen_names.insert(name.to_string()) {
- exports.push(ExportInfo {
- name: name.to_string(),
- address: sym.st_value,
- ordinal: None, // ELF doesn't use ordinals
- });
- }
- }
+ exports.push(ExportInfo {
+ name: name.to_string(),
+ address: sym.st_value,
+ ordinal: None, // ELF doesn't use ordinals
+ });
}
}
@@ -256,7 +361,8 @@ impl ContainerParser for ElfParser {
});
}
- let imports = self.extract_imports(&elf);
+ let libraries = self.extract_needed_libraries(&elf);
+ let imports = self.extract_imports(&elf, &libraries);
let exports = self.extract_exports(&elf);
Ok(ContainerInfo {
@@ -446,7 +552,6 @@ mod tests {
// by checking that they compile and can be referenced
let _extract_imports = ElfParser::extract_imports;
let _extract_exports = ElfParser::extract_exports;
- let _extract_library = ElfParser::extract_library_from_needed;
// Verify parser can be created (this is a compile-time check)
let _ = parser;
@@ -461,14 +566,59 @@ mod tests {
// We can't use Elf::default() as it doesn't exist, so we'll test the behavior
// by verifying that the method signature is correct and the documented behavior
- // The extract_library_from_needed method should return None as documented
- // since ELF doesn't directly link symbols to libraries without additional analysis
+ // The get_symbol_providing_library method uses version information to map symbols
+ // to libraries, which is a best-effort approach
- // This is a compile-time test to ensure the method exists with correct signature
- let _method_ref: fn(&ElfParser, &Elf, &str) -> Option =
- ElfParser::extract_library_from_needed;
+ // This is a compile-time test to ensure the methods exist with correct signatures
+ let _method_ref: fn(&ElfParser, &Elf, usize, &[String]) -> Option =
+ ElfParser::get_symbol_providing_library;
// Verify the parser exists
let _ = parser;
}
+
+ #[test]
+ fn test_extract_needed_libraries_with_test_binary() {
+ // Test library extraction with the current test binary
+ // This test demonstrates the extract_needed_libraries method works with real ELF files
+ let current_exe = std::env::current_exe().expect("Failed to get current executable");
+
+ if let Ok(data) = std::fs::read(¤t_exe)
+ && let Ok(goblin::Object::Elf(elf)) = goblin::Object::parse(&data)
+ {
+ let parser = ElfParser::new();
+ let libraries = parser.extract_needed_libraries(&elf);
+
+ // The test binary should have some libraries (e.g., libc) unless statically linked
+ println!("Test binary libraries: {:?}", libraries);
+
+ // Just verify the method runs without panicking
+ // Actual library content depends on the build environment
+ }
+ }
+
+ #[test]
+ fn test_symbol_type_constants() {
+ // Test additional symbol type constants we're now using
+ use goblin::elf::sym::{STT_GNU_IFUNC, STT_TLS};
+
+ // Verify the constants we're now using in import/export filtering
+ assert_eq!(STT_TLS, 6); // Thread-local storage
+ assert_eq!(STT_GNU_IFUNC, 10); // Indirect function
+
+ // These constants are used in our enhanced import/export filtering logic
+ }
+
+ #[test]
+ fn test_symbol_visibility_constants() {
+ // Test symbol visibility constants
+ use goblin::elf::sym::{STV_DEFAULT, STV_HIDDEN, STV_INTERNAL};
+
+ // Verify the visibility constants we're using for filtering
+ assert_eq!(STV_DEFAULT, 0);
+ assert_eq!(STV_HIDDEN, 2);
+ assert_eq!(STV_INTERNAL, 1);
+
+ // These constants are used to filter out hidden and internal symbols from exports
+ }
}
diff --git a/tests/fixtures/README.md b/tests/fixtures/README.md
new file mode 100644
index 0000000..69ef340
--- /dev/null
+++ b/tests/fixtures/README.md
@@ -0,0 +1,45 @@
+# Test Fixtures
+
+This directory contains pre-compiled binary test fixtures used for snapshot testing.
+
+## Fixtures
+
+- `test_binary_elf` - x86-64 ELF binary
+- `test_binary_macho` - ARM64 Mach-O binary
+- `test_binary_pe.exe` - x86-64 PE binary
+
+## Source
+
+All fixtures are compiled from `test_binary.c`, a simple C program with:
+
+- Exported functions: `exported_function`, `helper_function`
+- Imports from libc: `printf`, `malloc`, `free`
+- A `main` function
+
+## Rebuilding Fixtures
+
+If you need to rebuild the fixtures:
+
+### ELF (x86-64)
+
+```bash
+docker run --rm -v "$(pwd):/work" -w /work --platform linux/amd64 gcc:latest gcc -o test_binary_elf test_binary.c
+```
+
+### Mach-O (ARM64)
+
+```bash
+clang -o test_binary_macho test_binary.c
+```
+
+### PE (x86-64)
+
+```bash
+docker run --rm -v "$(pwd):/work" -w /work mcr.microsoft.com/devcontainers/cpp:latest bash -c "apt-get update -qq && apt-get install -y -qq mingw-w64 && x86_64-w64-mingw32-gcc -o test_binary_pe.exe test_binary.c"
+```
+
+## Notes
+
+- These fixtures are checked into git to ensure consistent test results
+- The fixtures should not be modified unless the test requirements change
+- If you modify `test_binary.c`, rebuild all fixtures and update snapshots
diff --git a/tests/fixtures/test_binary.c b/tests/fixtures/test_binary.c
new file mode 100644
index 0000000..6294553
--- /dev/null
+++ b/tests/fixtures/test_binary.c
@@ -0,0 +1,22 @@
+#include
+#include
+
+// Export a function
+int exported_function(int x) {
+ return x * 2;
+}
+
+// Another exported function
+void helper_function(void) {
+ printf("Helper called\n");
+}
+
+// Use some imports
+int main() {
+ printf("Hello, world!\n"); // Import from libc
+ void* ptr = malloc(100); // Import from libc
+ free(ptr); // Import from libc
+ exported_function(42);
+ return 0;
+}
+
diff --git a/tests/fixtures/test_binary_elf b/tests/fixtures/test_binary_elf
new file mode 100755
index 0000000..0fdf969
Binary files /dev/null and b/tests/fixtures/test_binary_elf differ
diff --git a/tests/fixtures/test_binary_macho b/tests/fixtures/test_binary_macho
new file mode 100755
index 0000000..939dd14
Binary files /dev/null and b/tests/fixtures/test_binary_macho differ
diff --git a/tests/fixtures/test_binary_pe.exe b/tests/fixtures/test_binary_pe.exe
new file mode 100755
index 0000000..3cf2e32
Binary files /dev/null and b/tests/fixtures/test_binary_pe.exe differ
diff --git a/tests/integration_elf.rs b/tests/integration_elf.rs
index 28dc765..6d36813 100644
--- a/tests/integration_elf.rs
+++ b/tests/integration_elf.rs
@@ -1,337 +1,448 @@
+use insta::assert_snapshot;
use std::fs;
-use std::fs::File;
-use std::io::Write;
-use std::process::Command;
use stringy::container::{ContainerParser, ElfParser};
-use tempfile::TempDir;
+
+fn get_fixture_path(name: &str) -> std::path::PathBuf {
+ std::path::Path::new(env!("CARGO_MANIFEST_DIR"))
+ .join("tests")
+ .join("fixtures")
+ .join(name)
+}
#[test]
-#[cfg(target_family = "unix")]
fn test_elf_import_export_extraction_dynamic() {
- // Create a simple C program that we can compile to test with
- let c_code = r#"
-#include
-#include
-
-// Export a function
-int exported_function(int x) {
- return x * 2;
+ // Test with the ELF fixture
+ let fixture_path = get_fixture_path("test_binary_elf");
+ let elf_data = fs::read(&fixture_path)
+ .expect("Failed to read ELF fixture. Run the build script to generate fixtures.");
+
+ // Verify it's an ELF file
+ assert!(ElfParser::detect(&elf_data), "ELF detection should succeed");
+
+ // Test parsing
+ let parser = ElfParser::new();
+ let container_info = parser.parse(&elf_data).expect("Failed to parse ELF");
+
+ // Verify we found some imports
+ assert!(
+ !container_info.imports.is_empty(),
+ "Should find imports like printf, malloc, free"
+ );
+
+ // Check that we found expected imports
+ let import_names: Vec<&str> = container_info
+ .imports
+ .iter()
+ .map(|imp| imp.name.as_str())
+ .collect();
+
+ // We should find at least some of these common libc functions
+ let expected_imports = ["malloc", "free", "__libc_start_main"];
+ let found_expected = expected_imports
+ .iter()
+ .any(|&expected| import_names.iter().any(|&name| name.contains(expected)));
+
+ assert!(
+ found_expected,
+ "Should find at least one expected import. Found: {:?}",
+ import_names
+ );
+
+ // Verify we found some exports (at least main and our exported function)
+ let export_names: Vec<&str> = container_info
+ .exports
+ .iter()
+ .map(|exp| exp.name.as_str())
+ .collect();
+
+ assert!(
+ export_names.contains(&"main"),
+ "Should find main export. Found: {:?}",
+ export_names
+ );
+ assert!(
+ export_names.contains(&"exported_function"),
+ "Should find exported_function export. Found: {:?}",
+ export_names
+ );
+
+ println!(
+ "Found {} imports and {} exports",
+ container_info.imports.len(),
+ container_info.exports.len()
+ );
}
-// Use some imports
-int main() {
- printf("Hello, world!\n"); // Import from libc
- void* ptr = malloc(100); // Import from libc
- free(ptr); // Import from libc
- return 0;
+#[test]
+fn test_elf_import_export_extraction_static() {
+ // Test with the ELF fixture (dynamically linked, but we can still test parsing)
+ // Note: For true static binary testing, we'd need a separate static fixture
+ let fixture_path = get_fixture_path("test_binary_elf");
+ let elf_data = fs::read(&fixture_path)
+ .expect("Failed to read ELF fixture. Run the build script to generate fixtures.");
+
+ let parser = ElfParser::new();
+ let container_info = parser.parse(&elf_data).expect("Failed to parse ELF");
+
+ // Our fixture is dynamically linked, so it should have imports
+ println!("Binary imports found: {}", container_info.imports.len());
+
+ // Check exports
+ let export_names: Vec = container_info
+ .exports
+ .iter()
+ .map(|e| e.name.clone())
+ .collect();
+
+ println!(
+ "Binary exports found: {} exports: {:?}",
+ container_info.exports.len(),
+ export_names
+ );
+
+ // Verify expected exports exist
+ assert!(
+ export_names.contains(&"main".to_string()),
+ "Should find main export"
+ );
+ assert!(
+ export_names.contains(&"exported_function".to_string()),
+ "Should find exported_function export"
+ );
}
-"#;
-
- // Write the C code to a temporary file
- let temp_dir = std::env::temp_dir();
- let c_file = temp_dir.join("test_elf.c");
- let elf_file = temp_dir.join("test_elf");
-
- fs::write(&c_file, c_code).expect("Failed to write C file");
-
- // Try to compile it with gcc, attempting to force ELF output
- // First try with a cross-compiler for Linux if available
- // NOTE: This is for dynamic linking test, so we DON'T use -static
- let mut output = Command::new("x86_64-linux-gnu-gcc")
- .args(["-o", elf_file.to_str().unwrap(), c_file.to_str().unwrap()])
- .output();
-
- // If cross-compiler not available, try regular gcc (dynamically linked)
- if output.is_err() {
- output = Command::new("gcc")
- .args(["-o", elf_file.to_str().unwrap(), c_file.to_str().unwrap()])
- .output();
- }
-
- match output {
- Ok(result) if result.status.success() => {
- // Successfully compiled, now test our ELF parser
- let elf_data = fs::read(&elf_file).expect("Failed to read ELF file");
-
- // Check what format we actually got
- match goblin::Object::parse(&elf_data) {
- Ok(goblin::Object::Elf(_)) => {
- // Great! We have an ELF binary, test our parser
- assert!(ElfParser::detect(&elf_data), "ELF detection should succeed");
- }
- Ok(goblin::Object::Mach(_)) => {
- println!("Got Mach-O binary (expected on macOS), skipping ELF-specific test");
- // Clean up and return early
- let _ = fs::remove_file(&c_file);
- let _ = fs::remove_file(&elf_file);
- return;
- }
- Ok(other) => {
- println!(
- "Got unexpected binary format: {:?}, skipping test",
- std::mem::discriminant(&other)
- );
- let _ = fs::remove_file(&c_file);
- let _ = fs::remove_file(&elf_file);
- return;
- }
- Err(e) => {
- println!("Failed to parse binary: {}, skipping test", e);
- let _ = fs::remove_file(&c_file);
- let _ = fs::remove_file(&elf_file);
- return;
- }
- }
-
- // Test parsing
- let parser = ElfParser::new();
- let container_info = parser.parse(&elf_data).expect("Failed to parse ELF");
- // Verify we found some imports
+#[test]
+fn test_elf_section_classification_integration() {
+ // Test with the ELF fixture
+ let fixture_path = get_fixture_path("test_binary_elf");
+ let elf_data = fs::read(&fixture_path)
+ .expect("Failed to read ELF fixture. Run the build script to generate fixtures.");
+
+ if ElfParser::detect(&elf_data) {
+ let container_info = ElfParser::new()
+ .parse(&elf_data)
+ .expect("Failed to parse ELF fixture");
+ // Verify we found sections and classified them
+ assert!(
+ !container_info.sections.is_empty(),
+ "Should find sections in ELF binary"
+ );
+
+ // Look for common ELF sections and verify weights are assigned
+ let section_names: Vec<&str> = container_info
+ .sections
+ .iter()
+ .map(|sec| sec.name.as_str())
+ .collect();
+
+ println!("Found sections: {:?}", section_names);
+
+ // Verify that all sections have weights assigned
+ for section in &container_info.sections {
assert!(
- !container_info.imports.is_empty(),
- "Should find imports like printf, malloc, free"
+ section.weight > 0.0,
+ "Section {} should have a positive weight, got {}",
+ section.name,
+ section.weight
);
+ }
- // Check that we found expected imports
- let import_names: Vec<&str> = container_info
- .imports
+ // Check that string data sections get higher weights than code sections
+ let string_sections: Vec<_> = container_info
+ .sections
+ .iter()
+ .filter(|sec| matches!(sec.section_type, stringy::types::SectionType::StringData))
+ .collect();
+ let code_sections: Vec<_> = container_info
+ .sections
+ .iter()
+ .filter(|sec| matches!(sec.section_type, stringy::types::SectionType::Code))
+ .collect();
+
+ if !string_sections.is_empty() && !code_sections.is_empty() {
+ let max_string_weight = string_sections
.iter()
- .map(|imp| imp.name.as_str())
- .collect();
-
- // We should find at least some of these common libc functions
- let expected_imports = ["printf", "malloc", "free", "__libc_start_main"];
- let found_expected = expected_imports
+ .map(|s| s.weight)
+ .fold(0.0f32, f32::max);
+ let max_code_weight = code_sections
.iter()
- .any(|&expected| import_names.contains(&expected));
-
+ .map(|s| s.weight)
+ .fold(0.0f32, f32::max);
assert!(
- found_expected,
- "Should find at least one expected import. Found: {:?}",
- import_names
+ max_string_weight > max_code_weight,
+ "String sections should have higher weight than code sections"
);
+ }
- // Verify we found some exports (at least main and our exported function)
- // Note: exports might be stripped in some builds, so we'll be lenient
- println!(
- "Found {} imports and {} exports",
- container_info.imports.len(),
- container_info.exports.len()
- );
+ // We should find at least some standard sections
+ let has_text = section_names.iter().any(|&name| name.contains(".text"));
+ let has_rodata = section_names.iter().any(|&name| name.contains(".rodata"));
+
+ // At least one of these should be present in a typical ELF
+ assert!(
+ has_text || has_rodata,
+ "Should find .text or .rodata sections"
+ );
+ } else {
+ panic!("ELF fixture is not a valid ELF file");
+ }
+}
+
+#[test]
+fn test_elf_library_dependencies() {
+ // Test with the ELF fixture
+ let fixture_path = get_fixture_path("test_binary_elf");
+ let elf_data = fs::read(&fixture_path)
+ .expect("Failed to read ELF fixture. Run the build script to generate fixtures.");
+
+ // Parse with goblin to check if it's ELF
+ match goblin::Object::parse(&elf_data) {
+ Ok(goblin::Object::Elf(elf)) => {
+ // Check if we have a dynamic section
+ if let Some(ref dynamic) = elf.dynamic {
+ // Extract libraries using the method we're testing
+ let libraries = dynamic.get_libraries(&elf.dynstrtab);
+
+ println!("Found {} library dependencies:", libraries.len());
+ for lib in &libraries {
+ println!(" - {}", lib);
+ }
- // Clean up
- let _ = fs::remove_file(&c_file);
- let _ = fs::remove_file(&elf_file);
+ // A dynamically linked ELF binary should typically have at least one library
+ // (e.g., libc.so.6 on Linux)
+ // But we'll be lenient here since we might be on a different platform
+ if !libraries.is_empty() {
+ // Verify at least one common library is present
+ let has_libc = libraries.iter().any(|lib| lib.contains("libc"));
+ let has_libpthread = libraries.iter().any(|lib| lib.contains("pthread"));
+ let has_libm = libraries.iter().any(|lib| lib.contains("libm"));
+
+ // At least one common library should be present in a typical executable
+ if has_libc || has_libpthread || has_libm {
+ println!("✓ Found expected library dependencies");
+ }
+ } else {
+ println!(
+ "No library dependencies found. This might be a static binary or on a non-Linux platform."
+ );
+ }
+ } else {
+ println!("No dynamic section found. This might be a static binary.");
+ }
}
Ok(_) => {
- println!("gcc compilation failed, skipping ELF integration test");
- // This is not a test failure - just means gcc isn't available
+ panic!("Expected ELF binary from fixture");
}
- Err(_) => {
- println!("gcc not found, skipping ELF integration test");
- // This is not a test failure - just means gcc isn't available
+ Err(e) => {
+ panic!("Failed to parse ELF fixture: {}", e);
}
}
}
#[test]
-#[cfg(target_family = "unix")]
-fn test_elf_import_export_extraction_static() {
- let temp_dir = TempDir::new().expect("Failed to create temp dir");
- let c_file = temp_dir.path().join("test_static.c");
- let elf_file = temp_dir.path().join("test_static");
+fn test_elf_symbol_extraction_snapshot() {
+ // Test with a fixed ELF fixture to create a consistent snapshot
+ let fixture_path = get_fixture_path("test_binary_elf");
+
+ let elf_data = fs::read(&fixture_path)
+ .expect("Failed to read ELF fixture. Run the build script to generate fixtures.");
+
+ if ElfParser::detect(&elf_data) {
+ let container_info = ElfParser::new()
+ .parse(&elf_data)
+ .expect("Failed to parse ELF fixture");
+ // Create a formatted output for snapshot testing
+ let mut output = String::new();
+
+ // Document imports
+ output.push_str("=== IMPORTS ===\n");
+ output.push_str(&format!("Total: {}\n\n", container_info.imports.len()));
+
+ // Take first 10 imports for snapshot (to keep it manageable)
+ for (i, import) in container_info.imports.iter().take(10).enumerate() {
+ output.push_str(&format!("Import {}: {}\n", i + 1, import.name));
+ if let Some(ref lib) = import.library {
+ output.push_str(&format!(" Library: {}\n", lib));
+ }
+ if let Some(addr) = import.address {
+ output.push_str(&format!(" Address: 0x{:x}\n", addr));
+ }
+ output.push('\n');
+ }
+
+ if container_info.imports.len() > 10 {
+ output.push_str(&format!(
+ "... and {} more imports\n\n",
+ container_info.imports.len() - 10
+ ));
+ }
- let c_code = r#"
- #include
- #include
+ // Document exports
+ output.push_str("=== EXPORTS ===\n");
+ output.push_str(&format!("Total: {}\n\n", container_info.exports.len()));
- void exported_function() {
- printf("Hello from exported function\n");
+ // Take first 10 exports for snapshot
+ for (i, export) in container_info.exports.iter().take(10).enumerate() {
+ output.push_str(&format!("Export {}: {}\n", i + 1, export.name));
+ output.push_str(&format!(" Address: 0x{:x}\n", export.address));
+ if let Some(ord) = export.ordinal {
+ output.push_str(&format!(" Ordinal: {}\n", ord));
+ }
+ output.push('\n');
}
- int main() {
- void *ptr = malloc(100);
- printf("Allocated memory\n");
- free(ptr);
- exported_function();
- return 0;
+ if container_info.exports.len() > 10 {
+ output.push_str(&format!(
+ "... and {} more exports\n",
+ container_info.exports.len() - 10
+ ));
}
- "#;
-
- File::create(&c_file)
- .expect("Failed to create C file")
- .write_all(c_code.as_bytes())
- .expect("Failed to write C code");
-
- // Compile statically-linked binary with -static flag
- let mut output = Command::new("x86_64-linux-gnu-gcc")
- .args([
- "-static",
- "-o",
- elf_file.to_str().unwrap(),
- c_file.to_str().unwrap(),
- ])
- .output();
-
- if output.is_err() || !output.as_ref().map(|o| o.status.success()).unwrap_or(false) {
- output = Command::new("gcc")
- .args([
- "-static",
- "-o",
- elf_file.to_str().unwrap(),
- c_file.to_str().unwrap(),
- ])
- .output();
- }
- match output {
- Ok(output) if output.status.success() => {
- let elf_data = fs::read(&elf_file).expect("Failed to read ELF file");
+ // Snapshot the output
+ assert_snapshot!("elf_symbol_extraction", output);
+ } else {
+ panic!("ELF fixture is not a valid ELF file");
+ }
+}
- let format_obj = goblin::Object::parse(&elf_data).expect("Failed to parse with goblin");
+#[test]
+fn test_elf_symbol_library_mapping() {
+ // Test symbol-to-library mapping using version information
+ let fixture_path = get_fixture_path("test_binary_elf");
+ let elf_data = fs::read(&fixture_path)
+ .expect("Failed to read ELF fixture. Run the build script to generate fixtures.");
+
+ match goblin::Object::parse(&elf_data) {
+ Ok(goblin::Object::Elf(_)) => {
+ let parser = ElfParser::new();
+ let container_info = parser.parse(&elf_data).expect("Failed to parse ELF");
- match format_obj {
- goblin::Object::Elf(_elf) => {
- let parser = ElfParser::new();
- let container_info = parser.parse(&elf_data).expect("Failed to parse ELF");
+ // Check that we found imports
+ assert!(!container_info.imports.is_empty(), "Should find imports");
- // Statically-linked binaries typically have no or very few dynamic imports
- // since all dependencies are embedded
- println!(
- "Static binary imports found: {} (expected: 0 or very few)",
- container_info.imports.len()
- );
+ // Check that some imports have library information populated
+ let imports_with_libs: Vec<_> = container_info
+ .imports
+ .iter()
+ .filter(|imp| imp.library.is_some())
+ .collect();
- // Check exports - note that static binaries may have symbols stripped
- // or may not expose them depending on compilation flags
- let export_names: Vec = container_info
- .exports
- .iter()
- .map(|e| e.name.clone())
- .collect();
+ println!(
+ "Found {} imports with library information out of {} total imports",
+ imports_with_libs.len(),
+ container_info.imports.len()
+ );
- println!(
- "Static binary exports found: {} exports: {:?}",
- container_info.exports.len(),
- export_names
- );
+ // Common libc symbols should have library info if version info is available
+ let malloc_import = container_info
+ .imports
+ .iter()
+ .find(|imp| imp.name.contains("malloc"));
- // If exports are present, verify expected ones exist
- // Note: Exports may be stripped in static binaries, so this is not always guaranteed
- if !container_info.exports.is_empty() {
- let has_main = export_names.iter().any(|name| name == "main");
- let has_exported_function =
- export_names.iter().any(|name| name == "exported_function");
-
- if has_main || has_exported_function {
- println!(
- "Found expected exports: main={}, exported_function={}",
- has_main, has_exported_function
- );
- }
- } else {
- println!(
- "No exports found in static binary. This can happen when symbols are stripped or not exported."
- );
- }
- }
- goblin::Object::Mach(_) => {
- println!("Compiled to Mach-O, skipping ELF-specific test");
- }
- _ => panic!("Unexpected binary format"),
+ if let Some(malloc) = malloc_import {
+ println!("malloc import: {:?}", malloc);
}
+
+ // At least verify the mapping logic runs without errors
+ // Actual library attribution depends on binary's version info
}
- Ok(output) => {
- let stderr = String::from_utf8_lossy(&output.stderr);
- println!(
- "Static compilation failed, skipping test. This is expected if static libraries are not available.\nError: {}",
- stderr
- );
+ Ok(_) => {
+ panic!("Expected ELF binary from fixture");
}
Err(e) => {
- println!(
- "GCC not available, skipping test. This is expected in some CI environments. Error: {}",
- e
- );
+ panic!("Failed to parse ELF fixture: {}", e);
}
}
}
#[test]
-#[cfg(target_family = "unix")]
-fn test_elf_section_classification_integration() {
- // Test with the current binary (this test executable)
- let current_exe = std::env::current_exe().expect("Failed to get current executable path");
+fn test_elf_unversioned_symbols() {
+ // Test handling of symbols without version info
+ let fixture_path = get_fixture_path("test_binary_elf");
+ let elf_data = fs::read(&fixture_path)
+ .expect("Failed to read ELF fixture. Run the build script to generate fixtures.");
+
+ if ElfParser::detect(&elf_data) {
+ let container_info = ElfParser::new()
+ .parse(&elf_data)
+ .expect("Failed to parse ELF fixture");
+ // Count imports with and without library info
+ let with_lib = container_info
+ .imports
+ .iter()
+ .filter(|imp| imp.library.is_some())
+ .count();
+ let without_lib = container_info
+ .imports
+ .iter()
+ .filter(|imp| imp.library.is_none())
+ .count();
+
+ println!(
+ "Imports with library: {}, without library: {}",
+ with_lib, without_lib
+ );
+
+ // Both cases are valid - versioned symbols get libraries,
+ // unversioned symbols may not
+ assert!(
+ !container_info.imports.is_empty(),
+ "Should find at least some imports"
+ );
+ } else {
+ panic!("ELF fixture is not a valid ELF file");
+ }
+}
- if let Ok(elf_data) = fs::read(¤t_exe) {
- if ElfParser::detect(&elf_data) {
+#[test]
+fn test_elf_no_dynamic_section() {
+ // Test with the ELF fixture (dynamically linked, but we can test parsing)
+ // Note: For true static binary testing, we'd need a separate static fixture
+ let fixture_path = get_fixture_path("test_binary_elf");
+ let elf_data = fs::read(&fixture_path)
+ .expect("Failed to read ELF fixture. Run the build script to generate fixtures.");
+
+ match goblin::Object::parse(&elf_data) {
+ Ok(goblin::Object::Elf(_)) => {
let parser = ElfParser::new();
- if let Ok(container_info) = parser.parse(&elf_data) {
- // Verify we found sections and classified them
- assert!(
- !container_info.sections.is_empty(),
- "Should find sections in ELF binary"
- );
-
- // Look for common ELF sections and verify weights are assigned
- let section_names: Vec<&str> = container_info
- .sections
- .iter()
- .map(|sec| sec.name.as_str())
- .collect();
-
- println!("Found sections: {:?}", section_names);
-
- // Verify that all sections have weights assigned
- for section in &container_info.sections {
- assert!(
- section.weight > 0.0,
- "Section {} should have a positive weight, got {}",
- section.name,
- section.weight
- );
- }
+ let container_info = parser.parse(&elf_data).expect("Failed to parse ELF");
- // Check that string data sections get higher weights than code sections
- let string_sections: Vec<_> = container_info
- .sections
- .iter()
- .filter(|sec| {
- matches!(sec.section_type, stringy::types::SectionType::StringData)
- })
- .collect();
- let code_sections: Vec<_> = container_info
- .sections
- .iter()
- .filter(|sec| matches!(sec.section_type, stringy::types::SectionType::Code))
- .collect();
-
- if !string_sections.is_empty() && !code_sections.is_empty() {
- let max_string_weight = string_sections
- .iter()
- .map(|s| s.weight)
- .fold(0.0f32, f32::max);
- let max_code_weight = code_sections
- .iter()
- .map(|s| s.weight)
- .fold(0.0f32, f32::max);
- assert!(
- max_string_weight > max_code_weight,
- "String sections should have higher weight than code sections"
- );
- }
+ // Our fixture is dynamically linked, so it should have imports
+ // Some may have library info if version info is available
+ println!("Binary: {} imports", container_info.imports.len());
- // We should find at least some standard sections
- let has_text = section_names.iter().any(|&name| name.contains(".text"));
- let has_rodata = section_names.iter().any(|&name| name.contains(".rodata"));
+ // Verify parsing works correctly
+ assert!(!container_info.sections.is_empty(), "Should have sections");
+ }
+ _ => {
+ panic!("Expected ELF binary from fixture");
+ }
+ }
+}
- // At least one of these should be present in a typical ELF
- assert!(
- has_text || has_rodata,
- "Should find .text or .rodata sections"
- );
- }
+#[test]
+fn test_elf_stripped_binary() {
+ // Test with the ELF fixture (not stripped, but we can test parsing)
+ // Note: For true stripped binary testing, we'd need a separate stripped fixture
+ let fixture_path = get_fixture_path("test_binary_elf");
+ let elf_data = fs::read(&fixture_path)
+ .expect("Failed to read ELF fixture. Run the build script to generate fixtures.");
+
+ match goblin::Object::parse(&elf_data) {
+ Ok(goblin::Object::Elf(_)) => {
+ let parser = ElfParser::new();
+ // Should handle gracefully
+ let container_info = parser.parse(&elf_data).expect("Failed to parse ELF");
+ println!(
+ "Binary: {} imports, {} exports",
+ container_info.imports.len(),
+ container_info.exports.len()
+ );
+ // Parsing should succeed
+ assert!(!container_info.sections.is_empty(), "Should have sections");
+ }
+ _ => {
+ panic!("Expected ELF binary from fixture");
}
}
}
diff --git a/tests/integration_macho.rs b/tests/integration_macho.rs
new file mode 100644
index 0000000..fac959d
--- /dev/null
+++ b/tests/integration_macho.rs
@@ -0,0 +1,111 @@
+use std::fs;
+use stringy::container::{ContainerParser, MachoParser};
+
+fn get_fixture_path(name: &str) -> std::path::PathBuf {
+ std::path::Path::new(env!("CARGO_MANIFEST_DIR"))
+ .join("tests")
+ .join("fixtures")
+ .join(name)
+}
+
+#[test]
+fn test_macho_import_export_extraction() {
+ // Test with the Mach-O fixture
+ let fixture_path = get_fixture_path("test_binary_macho");
+ let macho_data = fs::read(&fixture_path)
+ .expect("Failed to read Mach-O fixture. Run the build script to generate fixtures.");
+
+ // Verify it's a Mach-O file
+ assert!(
+ MachoParser::detect(&macho_data),
+ "Mach-O detection should succeed"
+ );
+
+ // Test parsing
+ let parser = MachoParser::new();
+ let container_info = parser.parse(&macho_data).expect("Failed to parse Mach-O");
+
+ // Verify we found some sections
+ assert!(
+ !container_info.sections.is_empty(),
+ "Should find sections in Mach-O binary"
+ );
+
+ // Check exports
+ let export_names: Vec<&str> = container_info
+ .exports
+ .iter()
+ .map(|exp| exp.name.as_str())
+ .collect();
+
+ assert!(
+ export_names
+ .iter()
+ .any(|&name| name == "main" || name == "_main"),
+ "Should find main export. Found: {:?}",
+ export_names
+ );
+ assert!(
+ export_names
+ .iter()
+ .any(|&name| name == "exported_function" || name == "_exported_function"),
+ "Should find exported_function export. Found: {:?}",
+ export_names
+ );
+
+ println!(
+ "Found {} imports and {} exports",
+ container_info.imports.len(),
+ container_info.exports.len()
+ );
+}
+
+#[test]
+fn test_macho_section_classification() {
+ // Test with the Mach-O fixture
+ let fixture_path = get_fixture_path("test_binary_macho");
+ let macho_data = fs::read(&fixture_path)
+ .expect("Failed to read Mach-O fixture. Run the build script to generate fixtures.");
+
+ if MachoParser::detect(&macho_data) {
+ let container_info = MachoParser::new()
+ .parse(&macho_data)
+ .expect("Failed to parse Mach-O fixture");
+
+ // Verify we found sections and classified them
+ assert!(
+ !container_info.sections.is_empty(),
+ "Should find sections in Mach-O binary"
+ );
+
+ // Verify that all sections have weights assigned
+ for section in &container_info.sections {
+ assert!(
+ section.weight > 0.0,
+ "Section {} should have a positive weight, got {}",
+ section.name,
+ section.weight
+ );
+ }
+
+ // Look for common Mach-O sections
+ let section_names: Vec<&str> = container_info
+ .sections
+ .iter()
+ .map(|sec| sec.name.as_str())
+ .collect();
+
+ println!("Found sections: {:?}", section_names);
+
+ // Should find at least some standard Mach-O sections
+ let has_text = section_names.iter().any(|&name| name.contains("__TEXT"));
+ let has_data = section_names.iter().any(|&name| name.contains("__DATA"));
+
+ assert!(
+ has_text || has_data,
+ "Should find __TEXT or __DATA sections"
+ );
+ } else {
+ panic!("Mach-O fixture is not a valid Mach-O file");
+ }
+}
diff --git a/tests/integration_pe.rs b/tests/integration_pe.rs
new file mode 100644
index 0000000..8248a33
--- /dev/null
+++ b/tests/integration_pe.rs
@@ -0,0 +1,118 @@
+use std::fs;
+use stringy::container::{ContainerParser, PeParser};
+
+fn get_fixture_path(name: &str) -> std::path::PathBuf {
+ std::path::Path::new(env!("CARGO_MANIFEST_DIR"))
+ .join("tests")
+ .join("fixtures")
+ .join(name)
+}
+
+#[test]
+fn test_pe_import_export_extraction() {
+ // Test with the PE fixture
+ let fixture_path = get_fixture_path("test_binary_pe.exe");
+ let pe_data = fs::read(&fixture_path)
+ .expect("Failed to read PE fixture. Run the build script to generate fixtures.");
+
+ // Verify it's a PE file
+ assert!(PeParser::detect(&pe_data), "PE detection should succeed");
+
+ // Test parsing
+ let parser = PeParser::new();
+ let container_info = parser.parse(&pe_data).expect("Failed to parse PE");
+
+ // Verify we found some sections
+ assert!(
+ !container_info.sections.is_empty(),
+ "Should find sections in PE binary"
+ );
+
+ // Check exports (PE executables may not have exports, only DLLs typically do)
+ let export_names: Vec<&str> = container_info
+ .exports
+ .iter()
+ .map(|exp| exp.name.as_str())
+ .collect();
+
+ println!("PE exports found: {:?}", export_names);
+
+ // PE executables typically don't export symbols (only DLLs do)
+ // So we just verify parsing works and sections are found
+ if !export_names.is_empty() {
+ // If exports are present, check for expected ones
+ let has_main = export_names
+ .iter()
+ .any(|&name| name == "main" || name.contains("main"));
+ let has_exported = export_names
+ .iter()
+ .any(|&name| name == "exported_function" || name.contains("exported_function"));
+
+ if has_main || has_exported {
+ println!(
+ "Found expected exports: main={}, exported_function={}",
+ has_main, has_exported
+ );
+ }
+ } else {
+ println!("No exports found (expected for PE executables, only DLLs export symbols)");
+ }
+
+ println!(
+ "Found {} imports and {} exports",
+ container_info.imports.len(),
+ container_info.exports.len()
+ );
+}
+
+#[test]
+fn test_pe_section_classification() {
+ // Test with the PE fixture
+ let fixture_path = get_fixture_path("test_binary_pe.exe");
+ let pe_data = fs::read(&fixture_path)
+ .expect("Failed to read PE fixture. Run the build script to generate fixtures.");
+
+ if PeParser::detect(&pe_data) {
+ let container_info = PeParser::new()
+ .parse(&pe_data)
+ .expect("Failed to parse PE fixture");
+
+ // Verify we found sections and classified them
+ assert!(
+ !container_info.sections.is_empty(),
+ "Should find sections in PE binary"
+ );
+
+ // Verify that all sections have weights assigned
+ for section in &container_info.sections {
+ assert!(
+ section.weight > 0.0,
+ "Section {} should have a positive weight, got {}",
+ section.name,
+ section.weight
+ );
+ }
+
+ // Look for common PE sections
+ let section_names: Vec<&str> = container_info
+ .sections
+ .iter()
+ .map(|sec| sec.name.as_str())
+ .collect();
+
+ println!("Found sections: {:?}", section_names);
+
+ // Should find at least some standard PE sections
+ let has_text = section_names.iter().any(|&name| name.contains(".text"));
+ let has_data = section_names
+ .iter()
+ .any(|&name| name.contains(".data") || name.contains(".rdata"));
+
+ assert!(
+ has_text || has_data,
+ "Should find .text or .data/.rdata sections"
+ );
+ } else {
+ panic!("PE fixture is not a valid PE file");
+ }
+}
diff --git a/tests/snapshots/integration_elf__elf_symbol_extraction.snap b/tests/snapshots/integration_elf__elf_symbol_extraction.snap
new file mode 100644
index 0000000..f687f84
--- /dev/null
+++ b/tests/snapshots/integration_elf__elf_symbol_extraction.snap
@@ -0,0 +1,62 @@
+---
+source: tests/integration_elf.rs
+expression: output
+---
+=== IMPORTS ===
+Total: 9
+
+Import 1: free
+ Library: libc.so.6
+
+Import 2: __libc_start_main
+ Library: libc.so.6
+
+Import 3: puts
+ Library: libc.so.6
+
+Import 4: __gmon_start__
+ Library: libc.so.6
+
+Import 5: malloc
+ Library: libc.so.6
+
+Import 6: free@GLIBC_2.2.5
+
+Import 7: __libc_start_main@GLIBC_2.34
+
+Import 8: puts@GLIBC_2.2.5
+
+Import 9: malloc@GLIBC_2.2.5
+
+=== EXPORTS ===
+Total: 10
+
+Export 1: data_start
+ Address: 0x404018
+
+Export 2: _edata
+ Address: 0x404028
+
+Export 3: exported_function
+ Address: 0x401146
+
+Export 4: helper_function
+ Address: 0x401154
+
+Export 5: __data_start
+ Address: 0x404018
+
+Export 6: _IO_stdin_used
+ Address: 0x402000
+
+Export 7: _end
+ Address: 0x404030
+
+Export 8: _start
+ Address: 0x401060
+
+Export 9: __bss_start
+ Address: 0x404028
+
+Export 10: main
+ Address: 0x401165