Skip to content

feat: Implement SSH keepalive to prevent idle connection timeouts #121

@inureyes

Description

@inureyes

Problem

Long-running SSH sessions become unresponsive when idle for extended periods. This occurs because:

  1. No keepalive packets: The current implementation uses russh::client::Config::default() which sets keepalive_interval: None
  2. Network device timeouts: Firewalls, NAT routers, and load balancers typically close idle TCP connections after 5-30 minutes of inactivity
  3. Half-open connections: The client cannot detect the dropped connection, resulting in frozen I/O

Root Cause Analysis

Current State

  • src/ssh/tokio_client/connection.rs:83 uses Config::default() with no keepalive
  • src/jump/chain/tunnel.rs:89,211 uses russh::client::Config::default() for jump host tunnels
  • SSH config parser already parses ServerAliveInterval, ServerAliveCountMax, TCPKeepAlive but values are never applied to russh Config
  • No CLI options for keepalive settings
  • No bssh YAML config options for keepalive

russh Config Fields (Available but Unused)

pub struct Config {
    pub keepalive_interval: Option<Duration>,  // Currently: None
    pub keepalive_max: usize,                   // Currently: 3 (default)
    pub inactivity_timeout: Option<Duration>,  // Currently: None
    // ...
}

Proposed Solution

1. Set Sensible Defaults

Based on OpenSSH best practices:

  • keepalive_interval: 60 seconds (matches common ServerAliveInterval recommendation)
  • keepalive_max: 3 (OpenSSH default for ServerAliveCountMax)
  • inactivity_timeout: None (allow indefinite idle sessions)

Total detection time: 60s x 3 = 180 seconds (3 minutes) to detect dead connection

2. CLI Options

Add new command-line arguments:

--server-alive-interval <SECONDS>   Keepalive interval in seconds (default: 60, 0 to disable)
--server-alive-count-max <COUNT>    Max keepalive messages without response (default: 3)

3. Apply SSH Config Values

Wire the already-parsed SSH config values to russh Config:

  • ServerAliveInterval -> keepalive_interval
  • ServerAliveCountMax -> keepalive_max

4. bssh YAML Config Support

defaults:
  server_alive_interval: 60
  server_alive_count_max: 3

clusters:
  production:
    server_alive_interval: 30  # Override for unstable networks

Implementation Checklist

Phase 1: Core Implementation

  • Create SshConnectionConfig struct to hold keepalive settings
  • Modify Client::connect() to accept keepalive configuration
  • Update src/ssh/tokio_client/connection.rs to use configurable keepalive
  • Update src/jump/chain/tunnel.rs (both locations) for jump host connections
  • Update src/jump/chain/chain_connection.rs for direct connections

Phase 2: Configuration Integration

  • Add CLI options (--server-alive-interval, --server-alive-count-max) to src/cli/bssh.rs
  • Wire SSH config parser values (server_alive_interval, server_alive_count_max) to connection
  • Add YAML config support in src/config/types.rs
  • Implement config precedence: CLI > SSH config > YAML config > defaults

Phase 3: Testing & Documentation

  • Add unit tests for keepalive configuration
  • Add integration test verifying keepalive packets are sent
  • Update README.md with new options
  • Update ARCHITECTURE.md with keepalive design decisions

Files to Modify

File Changes
src/ssh/tokio_client/connection.rs Accept and apply keepalive config
src/jump/chain/tunnel.rs Pass keepalive config to tunnel connections
src/jump/chain/chain_connection.rs Pass keepalive config to direct connections
src/cli/bssh.rs Add CLI options
src/config/types.rs Add YAML config fields
src/config/resolver.rs Resolve keepalive values from config
src/executor/parallel.rs Pass keepalive config through executor

Priority Mapping to russh Config

OpenSSH / CLI Option russh Config Field Default
ServerAliveInterval / --server-alive-interval keepalive_interval 60s
ServerAliveCountMax / --server-alive-count-max keepalive_max 3

References

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions