Skip to content

refactor(dpi/quic): single-allocation connection_id_to_hex#308

Open
0xghost42 wants to merge 1 commit into
domcyrus:mainfrom
0xghost42:refactor/305-quic-connection-id-hex
Open

refactor(dpi/quic): single-allocation connection_id_to_hex#308
0xghost42 wants to merge 1 commit into
domcyrus:mainfrom
0xghost42:refactor/305-quic-connection-id-hex

Conversation

@0xghost42
Copy link
Copy Markdown
Contributor

Summary

connection_id_to_hex was rendering a QUIC connection ID through the worst-case allocation pattern: format!("{:02x}", b) allocated a 2-byte String for every input byte, the iterator collected them into a Vec<String> (one more allocation), and join(":") then walked the vector to build the final String. For an 8-byte short-form DCID that is 8 + 1 + 1 = 10 heap allocations per render.

Replace with a single String::with_capacity(id.len() * 3 - 1) (exact final size: 2 hex chars per byte + N-1 colon separators) and a per-byte write!. Writing into a pre-sized String does not reallocate, so the helper now performs exactly one heap allocation per call regardless of input length, except for the empty-slice case which returns an empty String (no allocation). The lowercase, colon-separated output is unchanged.

Tests

Four regression tests added:

  • An 8-byte representative DCID — locks the lowercase + colon-separated contract.
  • A mix of single-digit bytes (0x00..=0x0f) — locks the zero-padding contract that {:02x} provides.
  • A single-byte input — locks that no trailing separator is emitted.
  • The empty-slice base case.

Local checks:

  • cargo test --lib — 365 passed.
  • cargo clippy --all-targets -- -D warnings — clean.
  • cargo fmt --check — clean.

Closes #305

`connection_id_to_hex` was rendering a QUIC connection ID through the
worst-case allocation pattern: `format!("{:02x}", b)` allocated a
2-byte `String` for every input byte, the iterator collected them into
a `Vec<String>` (one more allocation), and `join(":")` then walked the
vector to build the final `String`. For an 8-byte short-form DCID that
is 8 + 1 + 1 = 10 heap allocations per render.

Replace with a single `String::with_capacity(id.len() * 3 - 1)` (exact
final size: 2 hex chars per byte + N-1 colon separators) and a per-byte
`write!`. Writing into a pre-sized `String` does not reallocate, so the
helper now performs exactly one heap allocation per call regardless of
input length, except for the empty-slice case which returns an empty
`String` (no allocation). The lowercase, colon-separated output is
unchanged.

Adds four regression tests covering:

- An 8-byte representative DCID — locks the lowercase + colon-separated
  contract.
- A mix of single-digit bytes (0x00..=0x0f) — locks the zero-padding
  contract that `{:02x}` provides.
- A single-byte input — locks that no trailing separator is emitted.
- The empty-slice base case.

Closes domcyrus#305
@laundmo
Copy link
Copy Markdown

laundmo commented May 21, 2026

This PR, lik the other, is lacking any proof that this is a performance gain and or that the compiler does not optimize the current code to a single string write.

@0xghost42
Copy link
Copy Markdown
Contributor Author

@laundmo Fair ask. Ran a release-mode micro-bench locally to check both halves of the concern (perf gain + compiler folding). Bench keeps both shapes side-by-side so the comparison is apples-to-apples on the same host:

fn old(id: &[u8]) -> String {
    id.iter()
        .map(|b| format!("{:02x}", b))
        .collect::<Vec<String>>()
        .join(":")
}

fn new(id: &[u8]) -> String {
    if id.is_empty() { return String::new(); }
    let mut out = String::with_capacity(id.len() * 3 - 1);
    let mut first = true;
    for b in id {
        if !first { out.push(':'); }
        let _ = write!(out, "{b:02x}");
        first = false;
    }
    out
}

rustc -O (1M iters, black_box to defeat dead-code elim), on a representative DCID range (8 = typical short-form, 20 = QUIC v1 max):

len old (Vec+join) new (write! + cap) speedup
8 341.4 ns 127.0 ns 2.7x
16 599.3 ns 234.4 ns 2.6x
20 743.9 ns 283.7 ns 2.6x

Output bytes identical (assert_eq! passes on [0x12, 0x34, ..., 0xf0] -> "12:34:...:f0").

So both legs of the concern resolve as expected:

  • compiler does not fold format! -> Vec<String> -> join into a single allocation under -O -- the per-byte String allocs survive
  • the new shape is ~2.6-2.7x faster across the realistic DCID range, dominated by dropping the per-byte Strings + the intermediate Vec

connection_id_to_hex runs on every QUIC Initial parse, so the saving compounds at typical traffic rates. Happy to land the bench as benches/quic_connection_id_to_hex.rs if you'd like it tracked in tree -- otherwise the source above is enough to reproduce locally with rustc -O.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dpi(quic): connection_id_to_hex allocates per-byte Strings plus a Vec for join

2 participants