Overview
This epic tracks the implementation of the core MVP functionality for StringyMcStringFace - the complete pipeline from binary parsing to user output. This represents the minimal viable product for a weekend demonstration.
Context
StringyMcStringFace is a smarter alternative to the Unix strings command that uses binary analysis to extract meaningful strings from executables. The project foundation is complete with:
- ✅ Core infrastructure and data types
- ✅ Format detection (ELF, PE, Mach-O) via
goblin
- ✅ Container parsers with section classification
This epic covers the remaining components needed for an end-to-end working demo.
Pipeline Architecture
Binary Input
↓
[goblin] Parse format & extract sections
↓
[Section List] Classify sections by string likelihood
↓
[Extraction] ASCII/UTF-8 + UTF-16LE/BE extraction
↓
[Classification] Tag strings (URL, path, GUID, etc.)
↓
[Ranking] Score by relevance & section importance
↓
[Output] JSONL format + Human-readable TTY view
Scope
In Scope for MVP
-
String Extraction Engine (src/extraction/mod.rs)
- ASCII/UTF-8 extraction from byte streams
- UTF-16LE/BE extraction (critical for PE binaries)
- Minimum length filtering (default: 4 chars)
- Confidence scoring for encoding detection
-
Semantic Classification (src/classification/mod.rs)
- Pattern matching for high-value strings:
- URLs (http://, https://)
- File paths (Unix:
/, Windows: C:\)
- GUIDs (
{xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx})
- IP addresses
- Format strings (
%s, %d, etc.)
- Tagging system for multiple classifications per string
-
Ranking System
- Section-based scoring (e.g.,
.rodata > .data)
- Classification boost (URLs, GUIDs rank higher)
- Length penalties for very short/long strings
- Top-N selection for output
-
Output Formats (src/output/mod.rs)
- JSONL: One JSON object per line for pipeline integration
- TTY/Human-readable: Formatted table with columns:
- Score
- Offset (hex)
- Section name
- Tags
- String content (truncated if needed)
-
CLI Integration (src/main.rs)
- Accept binary path as positional argument
- Basic flags:
--json, --format, --min-len
- Wire up the complete pipeline
Out of Scope (Post-MVP)
- Advanced filters (
--only url,filepath)
- YARA output format
- PE-specific resource extraction
- Rust symbol demangling in output
- Configuration file support
- Progress indicators for large binaries
Acceptance Criteria
Implementation Order
- String Extraction - Foundation for everything else
- Basic Classification - URL + filepath patterns first
- Ranking System - Section scoring + classification boost
- JSONL Output - Easiest output format
- TTY Output - Human-friendly display
- CLI Wiring - Connect all components
Testing Strategy
For MVP, manual testing is acceptable:
- Test against
/bin/ls (ELF, Unix)
- Test against
notepad.exe (PE, Windows) if available
- Compare output quality vs.
strings command
- Verify JSON is valid with
jq
Success Metrics
- Noise Reduction: 50%+ fewer irrelevant strings than
strings
- Signal Boost: URLs, paths, GUIDs appear in top 50 results
- Performance: Processes typical binary (<10MB) in under 1 second
- Demo-Ready: Can show side-by-side comparison with
strings
Related Issues
Child Tasks (v0.1 Milestone)
Related Epics
Notes
This is a time-boxed weekend implementation. Focus on "working" over "perfect". Code quality and comprehensive testing can be improved post-MVP.
Overview
This epic tracks the implementation of the core MVP functionality for StringyMcStringFace - the complete pipeline from binary parsing to user output. This represents the minimal viable product for a weekend demonstration.
Context
StringyMcStringFace is a smarter alternative to the Unix
stringscommand that uses binary analysis to extract meaningful strings from executables. The project foundation is complete with:goblinThis epic covers the remaining components needed for an end-to-end working demo.
Pipeline Architecture
Scope
In Scope for MVP
String Extraction Engine (
src/extraction/mod.rs)Semantic Classification (
src/classification/mod.rs)/, Windows:C:\){xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx})%s,%d, etc.)Ranking System
.rodata>.data)Output Formats (
src/output/mod.rs)CLI Integration (
src/main.rs)--json,--format,--min-lenOut of Scope (Post-MVP)
--only url,filepath)Acceptance Criteria
.bssjq/bin/ls) without errorsstringscommandImplementation Order
Testing Strategy
For MVP, manual testing is acceptable:
/bin/ls(ELF, Unix)notepad.exe(PE, Windows) if availablestringscommandjqSuccess Metrics
stringsstringsRelated Issues
Child Tasks (v0.1 Milestone)
Related Epics
Notes
This is a time-boxed weekend implementation. Focus on "working" over "perfect". Code quality and comprehensive testing can be improved post-MVP.