Overview
This task implements a YARA rule formatter for StringyMcStringFace that outputs extracted strings in valid YARA rule syntax. YARA is a widely-used pattern matching tool in malware research and incident response, making YARA-compatible output a critical feature for security analysts who want to create detection rules from extracted strings.
Context
The binary analyzer currently extracts strings from ELF, PE, and Mach-O binaries with rich metadata including:
- Encoding types (ASCII, UTF-8, UTF-16LE, UTF-16BE)
- Semantic tags (URLs, domains, IPs, file paths, etc.)
- Offsets, RVAs, and section information
- Relevance scores
To maximize utility for security practitioners, we need to format this data as valid YARA rules that can be immediately used for threat hunting and detection.
Technical Requirements
1. String Escaping
Implement proper C-style escaping for YARA text strings:
- Escape double quotes:
\"
- Escape backslashes:
\\
- Escape newlines:
\n
- Escape carriage returns:
\r
- Escape tabs:
\t
- Non-printable bytes:
\xNN (hex notation)
2. Encoding Mapping
Map FoundString encoding types to YARA string modifiers:
Encoding::Ascii / Encoding::Utf8 → ascii (default)
Encoding::Utf16Le / Encoding::Utf16Be → wide modifier
- Consider adding
ascii wide for broader matching when appropriate
3. String Truncation
Apply truncation rules for excessively long strings:
- Maximum string length: 256 bytes (configurable)
- Truncate with indicator comment (e.g.,
// truncated from N bytes)
- Consider hex string format for binary data or very long strings
4. Semantic Tag Integration
Leverage semantic tags to enhance rule conditions:
- Group strings by tag type in rule conditions
- Add metadata comments indicating tag classifications
- Generate compound conditions (e.g.,
any of ($url*))
5. Rule Structure
Generate complete, valid YARA rules:
rule binary_name_strings {
meta:
description = "Extracted strings from binary_name"
format = "ELF/PE/MachO"
generated_by = "StringyMcStringFace"
strings:
$s1 = "extracted_string" ascii
$s2 = "wide_string" wide
$url1 = "https://example.com" nocase
condition:
any of them
}
Proposed Implementation
File: src/output/yara.rs
pub struct YaraFormatter {
max_string_length: usize,
include_metadata: bool,
rule_name: String,
}
impl YaraFormatter {
pub fn format_rule(&self, strings: &[FoundString], binary_info: &ContainerInfo) -> String;
fn escape_string(&self, s: &str) -> String;
fn get_string_modifiers(&self, string: &FoundString) -> Vec<&str>;
fn truncate_if_needed(&self, s: &str) -> (String, bool);
fn generate_condition(&self, strings: &[FoundString]) -> String;
}
Module Registration: src/output/mod.rs
pub mod yara;
pub enum OutputFormat {
Json,
Yara,
// ... other formats
}
pub trait Formatter {
fn format(&self, strings: &[FoundString], info: &ContainerInfo) -> String;
}
Example Output
Given extracted strings from a binary, the formatter should produce:
rule suspicious_binary_strings {
meta:
description = "Strings extracted from suspicious.exe"
format = "PE"
generated_by = "StringyMcStringFace v0.1"
extracted_count = 47
strings:
// URLs and network indicators
$url1 = "http://malicious.example.com/payload" nocase
$ip1 = "192.168.1.100" ascii
// File paths
$path1 = "C:\\Windows\\System32\\evil.dll" nocase
// Wide strings (UTF-16)
$wide1 = "WideString" wide
// Import/Export names
$imp1 = "CreateRemoteThread" ascii
$imp2 = "VirtualAllocEx" ascii
condition:
any of ($url*) or
2 of ($imp*) or
any of ($path*)
}
Test Scenarios
Unit tests should cover:
-
String Escaping
- Quotes, backslashes, newlines in strings
- Non-ASCII characters (hex escape)
- Already-escaped content (no double-escaping)
-
Encoding Handling
- ASCII strings →
ascii modifier
- UTF-16LE/BE →
wide modifier
- Mixed encoding in single rule
-
Truncation
- Strings under limit → no truncation
- Strings over limit → truncated with comment
- Extreme cases (empty strings, very long strings)
-
Rule Generation
- Valid YARA syntax (parseable by YARA)
- Proper section formatting (meta, strings, condition)
- Special characters in rule names
-
Semantic Tags
- URLs grouped and commented
- Network indicators (IPs, domains)
- Import/Export grouping
Dependencies
Acceptance Criteria
References
Related Issues
- Output Formatting Framework (dependency)
- Requirement 6.3 implementation
Task ID: stringy-analyzer/yara-friendly-output
Overview
This task implements a YARA rule formatter for StringyMcStringFace that outputs extracted strings in valid YARA rule syntax. YARA is a widely-used pattern matching tool in malware research and incident response, making YARA-compatible output a critical feature for security analysts who want to create detection rules from extracted strings.
Context
The binary analyzer currently extracts strings from ELF, PE, and Mach-O binaries with rich metadata including:
To maximize utility for security practitioners, we need to format this data as valid YARA rules that can be immediately used for threat hunting and detection.
Technical Requirements
1. String Escaping
Implement proper C-style escaping for YARA text strings:
\"\\\n\r\t\xNN(hex notation)2. Encoding Mapping
Map
FoundStringencoding types to YARA string modifiers:Encoding::Ascii/Encoding::Utf8→ascii(default)Encoding::Utf16Le/Encoding::Utf16Be→widemodifierascii widefor broader matching when appropriate3. String Truncation
Apply truncation rules for excessively long strings:
// truncated from N bytes)4. Semantic Tag Integration
Leverage semantic tags to enhance rule conditions:
any of ($url*))5. Rule Structure
Generate complete, valid YARA rules:
Proposed Implementation
File:
src/output/yara.rsModule Registration:
src/output/mod.rsExample Output
Given extracted strings from a binary, the formatter should produce:
Test Scenarios
Unit tests should cover:
String Escaping
Encoding Handling
asciimodifierwidemodifierTruncation
Rule Generation
Semantic Tags
Dependencies
OutputFormatenum andFormattertraitAcceptance Criteria
src/output/yara.rsimplementsYaraFormatterReferences
Related Issues
Task ID: stringy-analyzer/yara-friendly-output