A tiny MLP binary classifier (~11K parameters) for detecting AI provider network traffic from TLS/HTTPS metadata. Designed for sub-100μs inference via tract (pure Rust, no C++ deps).
Input (61 features) → Linear(96) → BatchNorm → ReLU → Dropout(0.1)
→ Linear(48) → BatchNorm → ReLU → Dropout(0.1)
├→ Linear(1) → logit → sigmoid → P(AI traffic)
└→ Linear(1) → sigmoid → confidence score [0, 1]
ONNX output: (1, 2) = [logit, confidence]
The confidence head tells the pipeline how much to trust the prediction. Low confidence (< 0.4) signals "I don't know" — the pipeline falls back to conservative behavior instead of acting on an uncertain classification.
Stage 3 of the Sonomos Desktop three-stage traffic scanning pipeline:
- Stage 1 — Deterministic rules (sub-μs): domain allowlist + user overrides + cache
- Stage 2 — Heuristic scoring (~μs): JA4 fingerprint, SNI pattern, DNS/IP correlation
- Stage 3 — This ML classifier (~10–70ms cold, <100μs warm): ONNX model via tract
| Group | Features | Dims |
|---|---|---|
| Flow statistics | pkt size mean/std/min/max/p25/p50/p75, IAT mean/std/min/max/p50, duration, pkt count (up/down), bytes/sec | 16 |
| Directional stats | upstream pkt size mean/std/p50, downstream pkt size mean/std/p50, byte ratio (up/total), pkt count ratio (up/total) | 8 |
| First-N packet sizes | first 8 packet sizes (upstream interleaved with downstream) | 8 |
| TLS metadata | version, cipher count, ext count, ALPN, has_grpc, has_h2, cert_chain_len, has_sni, has_sct, has_status_request, tls_13_only, post_handshake_auth | 12 |
| JA4 components | version_ord, cipher_count, ext_count, alpn_ord, sorted_cipher_hash(2d) | 6 |
| SNI n-gram hash | character 2/3-gram hashing into 11-dim feature vector | 11 |
# Install deps
pip install torch scikit-learn onnx onnxruntime numpy pandas
# Generate synthetic training data (for testing the pipeline)
python scripts/generate_synthetic_data.py --output data/synthetic_train.csv --samples 10000
# Train
python scripts/train.py --data data/synthetic_train.csv --output models/traffic_classifier.onnx
# Validate ONNX export
python scripts/validate_onnx.py --model models/traffic_classifier.onnx
# Run tests
python -m pytest tests/ -vExtract features from real packet captures using cicflowmeter:
pip install cicflowmeter>=0.5.0
# Single pcap with label (1=AI traffic, 0=normal)
python scripts/extract_with_cicflowmeter.py \
--pcap data/captures/openai_traffic.pcap \
--label 1 \
--sni api.openai.com \
--output data/openai_flows.csv
# Batch: directory of pcaps with per-file labels
python scripts/extract_with_cicflowmeter.py \
--pcap-dir data/captures/ \
--label-file data/captures/labels.json \
--output data/real_train.csv
# Train on real data
python scripts/train.py --data data/real_train.csv --output models/traffic_classifier.onnx# Cargo.toml
[dependencies]
huginn-net-tls = "1.5"
tract-onnx = "0.22"
anyhow = "1"use crate::classifier::{TrafficClassifier, FlowStats};
// Load once at daemon startup
let classifier = TrafficClassifier::load("traffic_classifier.onnx", Some(0.5))?;
// On each intercepted flow: pass flow stats + raw ClientHello bytes
let (probability, is_ai, sni) = classifier.classify_flow(&flow_stats, &client_hello_bytes)?;
if is_ai {
// AI traffic detected — apply Cloak interception
}The classify_flow() method handles the full pipeline internally:
- Passes ClientHello bytes through
huginn-net-tlsfor JA4/TLS extraction - Builds the 61-dim feature vector (flow stats + TLS + JA4 + SNI hash)
- Runs tract ONNX inference
- Returns
(probability, is_ai, sni_domain)
For maximum accuracy, train an XGBoost teacher first:
python scripts/train_xgboost_teacher.py --data data/real_train.csv --output models/xgb_teacher.json
python scripts/train.py --data data/real_train.csv --teacher models/xgb_teacher.json --output models/traffic_classifier.onnxTarget metrics (on real data):
- AUC-PR > 0.95
- F1 > 0.92
- Precision@90%Recall > 0.90
- Confidence: mean > 0.7 on correct, mean < 0.5 on incorrect
- Inference: <100μs (tract, x86_64)
- Model size: ~50KB (FP32 ONNX)
- Output: (1, 2) = [logit, confidence]
Proprietary — Sonomos, Inc.