Skip to content

fix(server): TLS handshake error flood from plaintext gRPC clients hitting TLS listener #812

@gburachas

Description

@gburachas

Agent Diagnostic

  • Traced the TLS accept handler in crates/openshell-server/src/lib.rs:210-222
  • Found is_benign_tls_handshake_failure() at lines 77-82 only classifies UnexpectedEof and ConnectionReset as benign
  • The route-refresh task in crates/openshell-sandbox/src/lib.rs:1067-1126 runs every 5 seconds (DEFAULT_ROUTE_REFRESH_INTERVAL_SECS = 5)
  • When the sandbox gRPC client uses a plaintext endpoint but the server listens on TLS, rustls emits InvalidContentType — this is not in the benign list
  • Result: ERROR-level log every 5 seconds, filling logs with noise

Description

The OpenShell server's TLS accept handler logs every InvalidContentType error at ERROR level. This error occurs when a plaintext client connects to the TLS listener — typically the sandbox's route-refresh gRPC client when there's a protocol mismatch.

Since the route-refresh task runs every 5 seconds, this creates a flood of:

ERROR openshell_server: TLS handshake failed
  error=received corrupt message of type InvalidContentType client=10.42.0.1:*

The is_benign_tls_handshake_failure() check at lib.rs:77-82 correctly classifies UnexpectedEof and ConnectionReset as benign (logged at DEBUG), but InvalidContentType is not classified, so it logs at ERROR.

This is noisy but not actionable — it's a known protocol mismatch, not a security issue.

Reproduction Steps

  1. Start a gateway with TLS enabled (default)
  2. Have a sandbox running with route-refresh active
  3. Check gateway logs: openshell sandbox logs <name> or pod logs
  4. Observe ERROR-level TLS handshake failed entries every ~5-6 seconds

Environment

  • All platforms where TLS is enabled (default)
  • Not platform-specific

Proposed Fix

Extend is_benign_tls_handshake_failure() to match errors containing InvalidContentType or corrupt message:

fn is_benign_tls_handshake_failure(error: &std::io::Error) -> bool {
    if matches!(
        error.kind(),
        ErrorKind::UnexpectedEof | ErrorKind::ConnectionReset
    ) {
        return true;
    }
    let msg = error.to_string();
    msg.contains("InvalidContentType") || msg.contains("corrupt message")
}

This demotes the log from ERROR to DEBUG without hiding genuine TLS errors (expired certs, wrong hostnames produce different error types).

Agent-First Checklist

  • I pointed my agent at the repo and had it investigate this issue
  • I loaded relevant skills (e.g., debug-openshell-cluster, debug-inference, openshell-cli)
  • My agent could not resolve this — the diagnostic above explains why

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions