Summary
Create a flexible RankingEngine that assigns importance scores to extracted strings based on their semantic tags, source location, and section characteristics. This enables prioritization of potentially interesting strings in binary analysis.
Background
The string analyzer extracts and classifies strings from binaries with semantic tags (URLs, IPs, file paths, etc.), section types (code, data, resources), and source locations (imports, exports, section data). However, not all strings are equally interesting for analysis. A ranking system is needed to:
- Prioritize high-value strings (e.g., network indicators, file paths, registry keys)
- Deprioritize noise (e.g., common debug strings, version info)
- Weight by context (strings from executable sections vs. debug sections)
- Enable customizable scoring for different analysis scenarios (malware analysis, reverse engineering, compliance scanning)
Proposed Solution
Architecture
Create src/classification/ranking.rs with the following components:
- RankingEngine struct: Main scoring engine with configurable weights
- ScoreConfig struct: Configuration for tag weights, source weights, and section type multipliers
- StringScore struct: Returned score with breakdown for transparency
- Default scoring profiles: Presets for common use cases (malware analysis, general strings, etc.)
Scoring Algorithm
final_score = (tag_weight + source_weight) × section_type_multiplier
Tag Weights (base importance):
- High value (8-10): URLs, Domains, IPv4/IPv6, Email, Registry paths
- Medium value (5-7): File paths, GUIDs, Base64 (potential encoding)
- Lower value (2-4): Format strings, User agents
- Contextual (variable): Imports/Exports (depends on name), Version strings
Source Weights:
- ImportName/ExportName: +3 (API calls are interesting)
- SectionData: +2 (hardcoded strings)
- ResourceString: +1 (UI strings, less critical)
- DebugInfo: -2 (usually noise)
Section Type Multipliers:
- Code sections: ×1.5 (strings in executable code are unusual)
- StringData/ReadOnlyData: ×1.0 (expected location)
- WritableData: ×1.2 (potentially modified at runtime)
- Resources: ×0.8 (often benign UI strings)
- Debug: ×0.3 (low priority noise)
Implementation Details
pub struct RankingEngine {
config: ScoreConfig,
}
pub struct ScoreConfig {
tag_weights: HashMap<Tag, f32>,
source_weights: HashMap<StringSource, f32>,
section_multipliers: HashMap<SectionType, f32>,
}
pub struct StringScore {
pub total: f32,
pub tag_weight: f32,
pub source_weight: f32,
pub section_multiplier: f32,
}
impl RankingEngine {
pub fn new(config: ScoreConfig) -> Self;
pub fn with_defaults() -> Self;
pub fn score(&self, tag: &Tag, source: StringSource, section: SectionType) -> StringScore;
}
Acceptance Criteria
Technical Notes
- Use
f32 for scores to allow fractional weights
- Consider using builder pattern for
ScoreConfig customization
- Scores should be normalized (0-100 range recommended)
- Future enhancement: Machine learning-based weight tuning
Dependencies
- Requires existing types from
src/classification/mod.rs: Tag, StringSource, SectionType
- No external crate dependencies expected for MVP
References
- Requirements: 5.1
- Task-ID: stringy-analyzer/ranking-system-foundation
Summary
Create a flexible RankingEngine that assigns importance scores to extracted strings based on their semantic tags, source location, and section characteristics. This enables prioritization of potentially interesting strings in binary analysis.
Background
The string analyzer extracts and classifies strings from binaries with semantic tags (URLs, IPs, file paths, etc.), section types (code, data, resources), and source locations (imports, exports, section data). However, not all strings are equally interesting for analysis. A ranking system is needed to:
Proposed Solution
Architecture
Create
src/classification/ranking.rswith the following components:Scoring Algorithm
Tag Weights (base importance):
Source Weights:
Section Type Multipliers:
Implementation Details
Acceptance Criteria
RankingEnginestruct created with configurable scoringScoreConfigsupports custom weights for tags, sources, and sectionsscore()method returns detailedStringScorewith breakdownmod.rs)Technical Notes
f32for scores to allow fractional weightsScoreConfigcustomizationDependencies
src/classification/mod.rs:Tag,StringSource,SectionTypeReferences