Implement ASCII String Extractor with Configurable Length Filtering

## Summary

Implement the foundational ASCII string extraction module that scans binary data for printable ASCII character sequences (0x20-0x7E) and returns them as `FoundString` objects with configurable minimum length filtering.

## Context

ASCII extraction is the foundational encoding type for StringyMcStringFace's string extraction pipeline. Unlike the traditional `strings` command which blindly extracts all printable sequences, this implementation will:

- Be section-aware (integrate with `SectionInfo` from the container parsing)
- Return properly structured `FoundString` objects with metadata (offset, RVA, section name, encoding type)
- Support configurable minimum length to reduce noise from random byte sequences
- Serve as the reference implementation for future encodings (UTF-8, UTF-16LE, UTF-16BE)

The ASCII extractor will be the first concrete implementation of the string extraction framework and will be used by all binary formats (ELF, PE, Mach-O).

## Requirements

- **Requirement 2.1**: Implement basic string extraction for ASCII encoding
- Must scan byte sequences for contiguous printable ASCII characters (0x20-0x7E)
- Must support configurable minimum length threshold (default: 4 characters)
- Must return `FoundString` objects as defined in `src/types.rs`
- Must properly populate metadata fields: offset, section name, encoding, length, source
- Must handle section boundaries correctly (don't span strings across sections)

## Proposed Solution

### File Structure
Create `src/extraction/ascii.rs` with the following components:

### Core Functions

1. **`extract_ascii_strings(data: &[u8], config: &ExtractionConfig) -> Vec<FoundString>`**
   - Main extraction function
   - Scans byte slice for printable ASCII runs
   - Filters by minimum length
   - Returns vector of FoundString objects

2. **`is_printable_ascii(byte: u8) -> bool`**
   - Helper to check if byte is in printable range (0x20-0x7E)
   - Inline for performance

3. **`extract_from_section(section: &SectionInfo, data: &[u8], config: &ExtractionConfig) -> Vec<FoundString>`**
   - Section-aware extraction wrapper
   - Calculates correct offsets and RVAs
   - Populates section metadata

### Configuration Structure

```rust
pub struct ExtractionConfig {
    pub min_length: usize,
    pub max_length: Option<usize>,
    // Future: encoding preferences, tag filters, etc.
}
```

### Algorithm

1. Iterate through byte slice
2. Track current string start position and length
3. When encountering non-printable byte:
   - If accumulated length >= min_length, create FoundString
   - Reset accumulator
4. Handle end-of-buffer edge case
5. Calculate offsets (file offset + buffer start)
6. Set encoding to `Encoding::Ascii`
7. Set source to `StringSource::SectionData`

### Edge Cases to Handle

- Empty sections or zero-length data
- Strings at section boundaries
- Very long continuous runs (potential padding or data tables)
- Null terminators within printable sequences
- Sections smaller than minimum length
- Buffer boundaries

## Acceptance Criteria

- [ ] `src/extraction/ascii.rs` created with extraction logic
- [ ] Configurable minimum length parameter (default: 4)
- [ ] Correctly identifies printable ASCII range (0x20-0x7E)
- [ ] Returns `FoundString` objects with all required fields populated
- [ ] Unit tests covering:
  - [ ] Basic extraction with default minimum length
  - [ ] Custom minimum length filtering
  - [ ] Edge case: empty input
  - [ ] Edge case: no strings found
  - [ ] Edge case: string at buffer start
  - [ ] Edge case: string at buffer end
  - [ ] Edge case: single character (below minimum)
  - [ ] Edge case: exact minimum length string
  - [ ] Offset calculation correctness
  - [ ] Section metadata population
- [ ] Documentation with examples
- [ ] Integrated into `src/extraction/mod.rs`

## Implementation Notes

- Start with simple implementation; optimize later if profiling shows bottlenecks
- Consider using SIMD or vectorization in future iterations for performance
- ASCII extraction should be the reference for implementing UTF-8, UTF-16LE, UTF-16BE
- Do not implement semantic tagging yet (that's a separate issue)
- Do not implement scoring yet (that's a separate issue)

## Dependencies

- **Blocked by**: #8 String Extraction Framework (need base traits/interfaces)
- **Blocks**: UTF-8 extraction, UTF-16 extraction, main pipeline orchestrator

## Related Issues

- #39 Epic: MVP Weekend Implementation
- #23 Semantic Boost Scoring (future enhancement)
- #24 Noise Penalty Detection (future enhancement)

## Definition of Done

- Code passes `cargo test`
- Code passes `cargo clippy` with no warnings
- Unit test coverage >= 80%
- Module properly exported in `extraction/mod.rs`
- Inline documentation for public API
- Ready for integration with container parsers

---

**Task-ID**: stringy-analyzer/basic-ascii-string-extraction

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement ASCII String Extractor with Configurable Length Filtering #9

Summary

Context

Requirements

Proposed Solution

File Structure

Core Functions

Configuration Structure

Algorithm

Edge Cases to Handle

Acceptance Criteria

Implementation Notes

Dependencies

Related Issues

Definition of Done

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Implement ASCII String Extractor with Configurable Length Filtering #9

Description

Summary

Context

Requirements

Proposed Solution

File Structure

Core Functions

Configuration Structure

Algorithm

Edge Cases to Handle

Acceptance Criteria

Implementation Notes

Dependencies

Related Issues

Definition of Done

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions