Skip to content

feat(tls): flag SNI hostnames containing ASCII control bytes (C0+DEL)#61

Merged
Zious11 merged 5 commits intodevelopfrom
worktree-issue-54-sni-ascii-control
Apr 13, 2026
Merged

feat(tls): flag SNI hostnames containing ASCII control bytes (C0+DEL)#61
Zious11 merged 5 commits intodevelopfrom
worktree-issue-54-sni-ascii-control

Conversation

@Zious11
Copy link
Copy Markdown
Owner

@Zious11 Zious11 commented Apr 13, 2026

Closes #54.

Summary

  • Adds SniValue::AsciiWithControl { hostname, hex } variant — detection range is b < 0x20 || b == 0x7f (full C0 + DEL). Narrower ranges (ESC+DEL alone) would miss BEL/CR/LF/tab evasion variants.
  • Emits an Anomaly / Inconclusive / Low finding mirroring the existing NonAsciiUtf8 shape. Per ADR 0003, sni_counts and the finding summary store raw bytes; the terminal reporter escapes C0+DEL+C1+backslash at display time.
  • 10 new tests in tests/tls_analyzer_tests.rs pin the primary ESC-injection shape, BEL/DEL/tab/CR/LF variants, regression guards (printable ASCII + Punycode A-label), mutual exclusivity with NonAsciiUtf8, C0 boundary triple (0x00, 0x1F, 0x20), and single-finding-per-hostname cardinality.

Why

RFC 6066 §3 requires ASCII; DNS preferred hostname syntax (RFC 952 / RFC 1123, inherited by RFC 5890 A-label construction) restricts to LDH — no whitespace or control codes. The pre-fix analyzer matched every C0 byte and DEL through char::is_ascii() and silently dropped the SNI into sni_counts with no finding.

Test plan

  • cargo test --test tls_analyzer_tests — 36 tests pass (26 existing + 10 new)
  • cargo test — full suite green
  • cargo clippy --all-targets -- -D warnings — clean
  • Pre-PR multi-agent review (code-reviewer / pr-test-analyzer / comment-analyzer), iterated 3 rounds to convergence — citation tightened to RFC 952/1123 for LDH (RFC 5890 retained for A-label inheritance), 0x00 NUL boundary added, dated "Perplexity validation" metadata removed, universal-client claim softened to conforming-client.

Zious11 added 2 commits April 12, 2026 21:34
Closes #54. The TLS analyzer previously silently accepted SNI hostnames
containing bytes in 0x00-0x1F or 0x7F (e.g., "foo\x1b[31m.example") into
the pure-ASCII Ascii(String) variant — `char::is_ascii()` returns true
for every C0 byte and DEL, so no finding was emitted and the raw bytes
were inserted into sni_counts unchanged. RFC 6066 §3 requires ASCII,
and the DNS preferred hostname syntax (RFC 952 / RFC 1123, preserved
under RFC 5890 A-label construction) restricts to LDH — no control
codes. An SNI with embedded control bytes is simultaneously a protocol
violation, a potential terminal-injection vector, and a plausible
log-poisoning / evasion / covert-channel signal.

Implementation mirrors the existing NonAsciiUtf8 pattern:

  * New SniValue::AsciiWithControl { hostname, hex } variant
  * Detection range b < 0x20 || b == 0x7f (full C0 + DEL — narrower
    ranges like ESC+DEL alone would miss BEL/CR/LF/tab variants)
  * Anomaly / Inconclusive / Low finding with hex evidence
  * Per ADR 0003, sni_counts and finding summary store raw bytes;
    terminal reporter escapes C0+DEL+C1+backslash at render time

10 new tests pin: primary ESC-injection shape, BEL/DEL/tab/CR/LF
variants, regressions (printable ASCII + Punycode A-label), mutual
exclusivity with NonAsciiUtf8, C0 boundary pair (0x00, 0x1F, 0x20),
and single-finding-per-hostname cardinality with multi-byte hex
evidence preservation.
…laim

Addresses round 2 comment-analyzer nits:
- "preserved under RFC 5890 A-label construction" → "inherited by ..."
  (RFC 952/1123 *originated* LDH; RFC 5890 inherits rather than
  preserves it).
- "No legitimate TLS client emits..." → "No conforming TLS client
  should emit..." (the narrower conditional claim is robust to a
  single buggy stack in the wild).
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds detection and reporting for TLS SNI hostnames that are ASCII but contain control bytes (C0 range + DEL), closing a gap where such inputs previously passed silently.

Changes:

  • Introduces SniValue::AsciiWithControl { hostname, hex } and detection via b < 0x20 || b == 0x7f.
  • Emits a new Anomaly/Inconclusive/Low finding for ASCII-control SNI values and preserves raw bytes per ADR 0003.
  • Adds a new test suite covering ESC/BEL/DEL/tab/CR/LF variants, boundary cases, exclusivity vs NonAsciiUtf8, and single-finding cardinality.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/analyzer/tls.rs Adds AsciiWithControl classification, control-byte detection helper, SNI counting key behavior, and new finding emission.
tests/tls_analyzer_tests.rs Adds focused regression/pin tests for ASCII control-byte SNI handling and evidence/keying expectations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/analyzer/tls.rs Outdated
Comment thread src/analyzer/tls.rs Outdated
Comment thread src/analyzer/tls.rs
…condition

Addresses Copilot review round 1:
- SniValue::Ascii doc narrowed too far ("pure printable ASCII 0x20–0x7E
  — RFC 6066 compliant"). Empty SNI (""), space, and other non-LDH
  printable ASCII all land here too. Reworded to describe what the arm
  actually does (no C0/DEL detected) and point readers at broader LDH
  compliance as out-of-scope per issue #54.
- `Ascii(_) => {}` match-arm comment now says "No C0/DEL detected; no
  finding emitted at this layer" instead of the misleading
  "RFC-compliant".
- contains_c0_or_del now carries a debug_assert on its ASCII
  precondition so future misuse (passing non-ASCII UTF-8, where
  continuation bytes ≥ 0x80 would produce a false negative) panics in
  debug builds.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/tls_analyzer_tests.rs Outdated
…w ESC

The assertion "summary should cite RFC 6066" previously formatted
f.summary with `{}` (Display). That summary is constructed from a
hostname deliberately containing an ESC byte (the terminal-injection
test shape), so a failing assertion would write raw control bytes to
CI logs and potentially corrupt the log viewer. Debug formatting
(`{:?}`) escapes the control bytes via escape_default so log output
stays terminal-safe even on regression.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Zious11 Zious11 merged commit f0795c4 into develop Apr 13, 2026
8 checks passed
@Zious11 Zious11 deleted the worktree-issue-54-sni-ascii-control branch April 13, 2026 04:58
Zious11 added a commit that referenced this pull request Apr 13, 2026
9 tasks, TDD-per-task, small commits. Task 9 is the full CI-equivalent
check (cargo fmt + clippy + test) — encodes the lesson from PR #61
where skipping fmt-check locally caused a CI Format regression on
initial push.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TLS: flag SNI hostnames containing ASCII control codes (BEL/ESC/DEL/C0)

2 participants