Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 26 additions & 1 deletion architecture/gateway-security.md
Original file line number Diff line number Diff line change
Expand Up @@ -229,7 +229,31 @@ SSH connections into sandboxes pass through the gateway's HTTP CONNECT tunnel at
| `x-sandbox-id` | Identifies the target sandbox |
| `x-sandbox-token` | Session token (created via `CreateSshSession` RPC) |

The gateway validates the token against the stored `SshSession` record, checks that it has not been revoked, and confirms the `sandbox_id` matches.
The gateway validates the token against the stored `SshSession` record and checks:

1. The token has not been revoked.
2. The `sandbox_id` matches the request header.
3. The token has not expired (`expires_at_ms` check; 0 means no expiry for backward compatibility).

### Session Lifecycle

SSH session tokens have a configurable TTL (`ssh_session_ttl_secs`, default 24 hours). The `expires_at_ms` field is set at creation time and checked on every tunnel request. Setting the TTL to 0 disables expiry.

Sessions are cleaned up automatically:

- **On sandbox deletion**: all SSH sessions for the deleted sandbox are removed from the store.
- **Background reaper**: a periodic task (hourly) deletes expired and revoked session records to prevent unbounded database growth.

### Connection Limits

The gateway enforces two concurrent connection limits to bound the impact of credential misuse:

| Limit | Value | Purpose |
|---|---|---|
| Per-token | 10 concurrent tunnels | Limits damage from a single leaked token |
| Per-sandbox | 20 concurrent tunnels | Prevents bypass via creating many tokens for one sandbox |

These limits are tracked in-memory and decremented when tunnels close. Exceeding either limit returns HTTP 429 (Too Many Requests).

### NSSH1 Handshake

Expand Down Expand Up @@ -362,6 +386,7 @@ This section defines the primary attacker profiles, what the current design prot
| Weak cryptoperiod | Certificates are effectively non-expiring by default |
| Limited fine-grained revocation | CA private key is not persisted; rotation is coarse-grained |
| Local credential theft risk | CLI mTLS key material is stored on developer filesystem |
| SSH token + mTLS = persistent access within trust boundary | SSH tokens expire after 24h (configurable) and are capped at 3 concurrent connections per token / 20 per sandbox, but within the mTLS trust boundary a stolen token remains usable until TTL expires |

### Out of Scope / Not Defended By This Layer

Expand Down
16 changes: 16 additions & 0 deletions crates/navigator-core/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,10 @@ pub struct Config {
#[serde(default = "default_ssh_handshake_skew_secs")]
pub ssh_handshake_skew_secs: u64,

/// TTL for SSH session tokens, in seconds. 0 disables expiry.
#[serde(default = "default_ssh_session_ttl_secs")]
pub ssh_session_ttl_secs: u64,

/// Kubernetes secret name containing client TLS materials for sandbox pods.
/// When set, sandbox pods get this secret mounted so they can connect to
/// the server over mTLS.
Expand Down Expand Up @@ -103,6 +107,7 @@ impl Config {
sandbox_ssh_port: default_sandbox_ssh_port(),
ssh_handshake_secret: String::new(),
ssh_handshake_skew_secs: default_ssh_handshake_skew_secs(),
ssh_session_ttl_secs: default_ssh_session_ttl_secs(),
client_tls_secret_name: String::new(),
}
}
Expand Down Expand Up @@ -191,6 +196,13 @@ impl Config {
self
}

/// Create a new configuration with the SSH session TTL.
#[must_use]
pub const fn with_ssh_session_ttl_secs(mut self, secs: u64) -> Self {
self.ssh_session_ttl_secs = secs;
self
}

/// Set the Kubernetes secret name for sandbox client TLS materials.
#[must_use]
pub fn with_client_tls_secret_name(mut self, name: impl Into<String>) -> Self {
Expand Down Expand Up @@ -230,3 +242,7 @@ const fn default_sandbox_ssh_port() -> u16 {
const fn default_ssh_handshake_skew_secs() -> u64 {
300
}

const fn default_ssh_session_ttl_secs() -> u64 {
86400 // 24 hours
}
39 changes: 37 additions & 2 deletions crates/navigator-server/src/grpc.rs
Original file line number Diff line number Diff line change
Expand Up @@ -544,6 +544,33 @@ impl Navigator for NavigatorService {
self.state.sandbox_index.update_from_sandbox(&sandbox);
self.state.sandbox_watch_bus.notify(&id);

// Clean up SSH sessions associated with this sandbox.
if let Ok(records) = self
.state
.store
.list(SshSession::object_type(), 1000, 0)
.await
{
for record in records {
if let Ok(session) = SshSession::decode(record.payload.as_slice()) {
if session.sandbox_id == id {
if let Err(e) = self
.state
.store
.delete(SshSession::object_type(), &session.id)
.await
{
warn!(
session_id = %session.id,
error = %e,
"Failed to delete SSH session during sandbox cleanup"
);
}
}
}
}
}

let deleted = match self.state.sandbox_client.delete(&sandbox.name).await {
Ok(deleted) => deleted,
Err(err) => {
Expand Down Expand Up @@ -787,14 +814,21 @@ impl Navigator for NavigatorService {
}

let token = uuid::Uuid::new_v4().to_string();
let now_ms = current_time_ms()
.map_err(|e| Status::internal(format!("timestamp generation failed: {e}")))?;
let expires_at_ms = if self.state.config.ssh_session_ttl_secs > 0 {
now_ms + (self.state.config.ssh_session_ttl_secs as i64 * 1000)
} else {
0
};
let session = SshSession {
id: token.clone(),
sandbox_id: req.sandbox_id.clone(),
token: token.clone(),
created_at_ms: current_time_ms()
.map_err(|e| Status::internal(format!("timestamp generation failed: {e}")))?,
created_at_ms: now_ms,
revoked: false,
name: generate_name(),
expires_at_ms,
};

self.state
Expand All @@ -814,6 +848,7 @@ impl Navigator for NavigatorService {
gateway_scheme: scheme.to_string(),
connect_path: self.state.config.ssh_connect_path.clone(),
host_key_fingerprint: String::new(),
expires_at_ms,
}))
}

Expand Down
12 changes: 11 additions & 1 deletion crates/navigator-server/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@ mod tls;
pub mod tracing_bus;

use navigator_core::{Config, Error, Result};
use std::sync::Arc;
use std::collections::HashMap;
use std::sync::{Arc, Mutex};
use tokio::net::TcpListener;
use tracing::{error, info};

Expand Down Expand Up @@ -56,6 +57,12 @@ pub struct ServerState {

/// In-memory bus for server process logs.
pub tracing_log_bus: TracingLogBus,

/// Active SSH tunnel connection counts per session token.
pub ssh_connections_by_token: Mutex<HashMap<String, u32>>,

/// Active SSH tunnel connection counts per sandbox id.
pub ssh_connections_by_sandbox: Mutex<HashMap<String, u32>>,
}

impl ServerState {
Expand All @@ -76,6 +83,8 @@ impl ServerState {
sandbox_index,
sandbox_watch_bus,
tracing_log_bus,
ssh_connections_by_token: Mutex::new(HashMap::new()),
ssh_connections_by_sandbox: Mutex::new(HashMap::new()),
}
}
}
Expand Down Expand Up @@ -138,6 +147,7 @@ pub async fn run_server(config: Config, tracing_log_bus: TracingLogBus) -> Resul
state.tracing_log_bus.clone(),
);
spawn_kube_event_tailer(state.clone());
ssh_tunnel::spawn_session_reaper(store.clone(), std::time::Duration::from_secs(3600));

// Create the multiplexed service
let service = MultiplexService::new(state.clone());
Expand Down
Loading
Loading