feat(tls): flag SNI hostnames containing ASCII control bytes (C0+DEL)#61
feat(tls): flag SNI hostnames containing ASCII control bytes (C0+DEL)#61
Conversation
Closes #54. The TLS analyzer previously silently accepted SNI hostnames containing bytes in 0x00-0x1F or 0x7F (e.g., "foo\x1b[31m.example") into the pure-ASCII Ascii(String) variant — `char::is_ascii()` returns true for every C0 byte and DEL, so no finding was emitted and the raw bytes were inserted into sni_counts unchanged. RFC 6066 §3 requires ASCII, and the DNS preferred hostname syntax (RFC 952 / RFC 1123, preserved under RFC 5890 A-label construction) restricts to LDH — no control codes. An SNI with embedded control bytes is simultaneously a protocol violation, a potential terminal-injection vector, and a plausible log-poisoning / evasion / covert-channel signal. Implementation mirrors the existing NonAsciiUtf8 pattern: * New SniValue::AsciiWithControl { hostname, hex } variant * Detection range b < 0x20 || b == 0x7f (full C0 + DEL — narrower ranges like ESC+DEL alone would miss BEL/CR/LF/tab variants) * Anomaly / Inconclusive / Low finding with hex evidence * Per ADR 0003, sni_counts and finding summary store raw bytes; terminal reporter escapes C0+DEL+C1+backslash at render time 10 new tests pin: primary ESC-injection shape, BEL/DEL/tab/CR/LF variants, regressions (printable ASCII + Punycode A-label), mutual exclusivity with NonAsciiUtf8, C0 boundary pair (0x00, 0x1F, 0x20), and single-finding-per-hostname cardinality with multi-byte hex evidence preservation.
…laim Addresses round 2 comment-analyzer nits: - "preserved under RFC 5890 A-label construction" → "inherited by ..." (RFC 952/1123 *originated* LDH; RFC 5890 inherits rather than preserves it). - "No legitimate TLS client emits..." → "No conforming TLS client should emit..." (the narrower conditional claim is robust to a single buggy stack in the wild).
There was a problem hiding this comment.
Pull request overview
Adds detection and reporting for TLS SNI hostnames that are ASCII but contain control bytes (C0 range + DEL), closing a gap where such inputs previously passed silently.
Changes:
- Introduces
SniValue::AsciiWithControl { hostname, hex }and detection viab < 0x20 || b == 0x7f. - Emits a new Anomaly/Inconclusive/Low finding for ASCII-control SNI values and preserves raw bytes per ADR 0003.
- Adds a new test suite covering ESC/BEL/DEL/tab/CR/LF variants, boundary cases, exclusivity vs
NonAsciiUtf8, and single-finding cardinality.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/analyzer/tls.rs |
Adds AsciiWithControl classification, control-byte detection helper, SNI counting key behavior, and new finding emission. |
tests/tls_analyzer_tests.rs |
Adds focused regression/pin tests for ASCII control-byte SNI handling and evidence/keying expectations. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…condition
Addresses Copilot review round 1:
- SniValue::Ascii doc narrowed too far ("pure printable ASCII 0x20–0x7E
— RFC 6066 compliant"). Empty SNI (""), space, and other non-LDH
printable ASCII all land here too. Reworded to describe what the arm
actually does (no C0/DEL detected) and point readers at broader LDH
compliance as out-of-scope per issue #54.
- `Ascii(_) => {}` match-arm comment now says "No C0/DEL detected; no
finding emitted at this layer" instead of the misleading
"RFC-compliant".
- contains_c0_or_del now carries a debug_assert on its ASCII
precondition so future misuse (passing non-ASCII UTF-8, where
continuation bytes ≥ 0x80 would produce a false negative) panics in
debug builds.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…w ESC
The assertion "summary should cite RFC 6066" previously formatted
f.summary with `{}` (Display). That summary is constructed from a
hostname deliberately containing an ESC byte (the terminal-injection
test shape), so a failing assertion would write raw control bytes to
CI logs and potentially corrupt the log viewer. Debug formatting
(`{:?}`) escapes the control bytes via escape_default so log output
stays terminal-safe even on regression.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
9 tasks, TDD-per-task, small commits. Task 9 is the full CI-equivalent check (cargo fmt + clippy + test) — encodes the lesson from PR #61 where skipping fmt-check locally caused a CI Format regression on initial push.
Closes #54.
Summary
SniValue::AsciiWithControl { hostname, hex }variant — detection range isb < 0x20 || b == 0x7f(full C0 + DEL). Narrower ranges (ESC+DEL alone) would miss BEL/CR/LF/tab evasion variants.NonAsciiUtf8shape. Per ADR 0003,sni_countsand the finding summary store raw bytes; the terminal reporter escapes C0+DEL+C1+backslash at display time.tests/tls_analyzer_tests.rspin the primary ESC-injection shape, BEL/DEL/tab/CR/LF variants, regression guards (printable ASCII + Punycode A-label), mutual exclusivity withNonAsciiUtf8, C0 boundary triple (0x00, 0x1F, 0x20), and single-finding-per-hostname cardinality.Why
RFC 6066 §3 requires ASCII; DNS preferred hostname syntax (RFC 952 / RFC 1123, inherited by RFC 5890 A-label construction) restricts to LDH — no whitespace or control codes. The pre-fix analyzer matched every C0 byte and DEL through
char::is_ascii()and silently dropped the SNI intosni_countswith no finding.Test plan
cargo test --test tls_analyzer_tests— 36 tests pass (26 existing + 10 new)cargo test— full suite greencargo clippy --all-targets -- -D warnings— clean