Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -393,6 +393,36 @@ The batch flag is passed through the executor chain:
- Applied in both normal mode (`execute()`) and stream mode (`handle_stream_mode()`)
- TUI mode maintains its own quit handling and ignores this flag

**Fail-Fast Mode (Added 2025-12):**

The `--fail-fast` / `-k` option enables immediate termination when any node fails. This is compatible with pdsh's `-k` flag and useful for:
- Critical operations where partial execution is unacceptable
- Deployment scripts where all nodes must succeed
- Validation checks across clusters

Implementation uses:
```rust
// Cancellation signaling via tokio::sync::watch
let (cancel_tx, cancel_rx) = watch::channel(false);

// Task selection with cancellation check
tokio::select! {
biased; // Prioritize cancellation check
_ = cancel_rx.changed() => {
// Task cancelled due to fail-fast
return Err(anyhow!("Execution cancelled due to fail-fast"));
}
permit = semaphore.acquire() => {
// Execute task normally
}
}
```

The fail-fast mode integrates with:
- `--require-all-success`: Both require all nodes to succeed, but fail-fast stops early
- `--check-all-nodes`: Fail-fast stops early, check-all-nodes affects final exit code
- `--parallel N`: Cancels pending tasks waiting in the semaphore queue

### 4. SSH Client (`ssh/client/*`, `ssh/tokio_client/*`)

**SSH Client Module Structure (Refactored 2025-10-17):**
Expand Down
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ A high-performance SSH client with **SSH-compatible syntax** for both single-hos
- **Port Forwarding**: Full support for local (-L), remote (-R), and dynamic (-D) SSH port forwarding
- **Jump Host Support**: Connect through bastion hosts using OpenSSH ProxyJump syntax (`-J`)
- **Parallel Execution**: Execute commands across multiple nodes simultaneously
- **Fail-Fast Mode**: Stop immediately on first failure with `-k` flag (pdsh compatible)
- **Interactive Terminal UI (TUI)**: Real-time monitoring with 4 view modes (Summary/Detail/Split/Diff) for multi-node operations
- **Cluster Management**: Define and manage node clusters via configuration files
- **Progress Tracking**: Real-time progress indicators with smart detection (percentages, fractions, apt/dpkg)
Expand Down Expand Up @@ -219,6 +220,13 @@ bssh -C production --connect-timeout 10 "uptime"

# Different timeouts for connection and command
bssh -C production --connect-timeout 5 --timeout 600 "long-running-job"

# Fail-fast mode: stop immediately on any failure (pdsh -k compatible)
bssh -k -H "web1,web2,web3" "deploy.sh"
bssh --fail-fast -C production "critical-script.sh"

# Combine fail-fast with require-all-success for critical operations
bssh -k --require-all-success -C production "service-restart.sh"
```

### Output Modes
Expand Down
42 changes: 42 additions & 0 deletions docs/man/bssh.1
Original file line number Diff line number Diff line change
Expand Up @@ -247,6 +247,25 @@ which is useful for programmatic parsing or cleaner display. Works
with both stream mode (--stream) and file mode (--output-dir).
Example: bssh -H host1,host2 --stream -N "uname -a"

.TP
.BR \-k ", " \-\-fail\-fast
Stop execution immediately on first failure (pdsh -k compatible).
When enabled, bssh cancels pending commands when any node fails due to
connection error or non-zero exit code. This is useful for:
.RS
.IP \[bu] 2
Critical operations where partial execution is unacceptable
.IP \[bu] 2
Deployment scripts where all nodes must succeed
.IP \[bu] 2
Validation checks across clusters
.RE
.IP
Running tasks are terminated gracefully, and the error message clearly
indicates which node caused the failure. Can be combined with
.B --require-all-success
for strict error handling.

.TP
.BR \-v ", " \-\-verbose
Increase verbosity (can be used multiple times: -v, -vv, -vvv)
Expand Down Expand Up @@ -1240,6 +1259,29 @@ Example output:
Useful for monitoring long-running commands or when piping output.
.RE

.SS Fail-Fast Mode Examples
.TP
Stop on first failure during critical deployment:
.B bssh -k -C production "deploy.sh"
.RS
Execution stops immediately if any node fails the deployment script
.RE

.TP
Combine fail-fast with require-all-success:
.B bssh --fail-fast --require-all-success -C production "service-restart.sh"
.RS
Stops early on failure AND ensures final exit code reflects any failures
.RE

.TP
Sequential fail-fast with limited parallelism:
.B bssh -k --parallel 1 -H "node1,node2,node3" "critical-operation"
.RS
Runs commands one at a time, stopping on first failure
.RE

.SS File Transfer Examples
.TP
Upload configuration file to all nodes:
.B bssh -H "node1,node2,node3" upload /etc/myapp.conf /etc/myapp.conf
Expand Down
1 change: 1 addition & 0 deletions src/app/dispatcher.rs
Original file line number Diff line number Diff line change
Expand Up @@ -404,6 +404,7 @@ async fn handle_exec_command(cli: &Cli, ctx: &AppContext, command: &str) -> Resu
check_all_nodes: cli.check_all_nodes,
sudo_password,
batch: cli.batch,
fail_fast: cli.fail_fast,
};
execute_command(params).await
}
Expand Down
9 changes: 8 additions & 1 deletion src/cli.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ use std::path::PathBuf;
before_help = "\n\nBroadcast SSH - Parallel command execution across cluster nodes",
about = "Broadcast SSH - SSH-compatible parallel command execution tool",
long_about = "bssh is a high-performance SSH client with parallel execution capabilities.\nIt can be used as a drop-in replacement for SSH (single host) or as a powerful cluster management tool (multiple hosts).\n\nThe tool provides secure file transfer using SFTP and supports SSH keys, SSH agent, and password authentication.\nIt automatically detects Backend.AI multi-node session environments.\n\nOutput Modes:\n- TUI Mode (default): Interactive terminal UI with real-time monitoring (auto-enabled in terminals)\n- Stream Mode (--stream): Real-time output with [node] prefixes\n- File Mode (--output-dir): Save per-node output to timestamped files\n- Normal Mode: Traditional output after all nodes complete\n\nSSH Configuration Support:\n- Reads standard SSH config files (defaulting to ~/.ssh/config)\n- Supports Host patterns, HostName, User, Port, IdentityFile, StrictHostKeyChecking\n- ProxyJump, and many other SSH configuration directives\n- CLI arguments override SSH config values following SSH precedence rules",
after_help = "EXAMPLES:\n SSH Mode:\n bssh user@host # Interactive shell\n bssh admin@server.com \"uptime\" # Execute command\n bssh -p 2222 -i ~/.ssh/key user@host # Custom port and key\n bssh -F ~/.ssh/myconfig webserver # Use custom SSH config\n\n Port Forwarding:\n bssh -L 8080:example.com:80 user@host # Local forward: localhost:8080 → example.com:80\n bssh -R 8080:localhost:80 user@host # Remote forward: remote:8080 → localhost:80\n bssh -D 1080 user@host # SOCKS5 proxy on localhost:1080\n bssh -L 3306:db:3306 -R 80:web:80 user@host # Multiple forwards\n bssh -D *:1080/4 user@host # SOCKS4 proxy on all interfaces\n\n Multi-Server Mode:\n bssh -C production \"systemctl status\" # Execute on cluster (TUI mode auto-enabled)\n bssh -H \"web1,web2,web3\" \"df -h\" # Execute on multiple hosts\n bssh -H \"web1,web2,web3\" -f \"web1\" \"df -h\" # Filter to web1 only\n bssh -C production -f \"web*\" \"uptime\" # Filter cluster nodes\n bssh --parallel 20 -H web* \"apt update\" # Increase parallelism\n\n Host Exclusion (--exclude):\n bssh -H \"node1,node2,node3\" --exclude \"node2\" \"uptime\" # Exclude single host\n bssh -C production --exclude \"web1,web2\" \"apt update\" # Exclude multiple hosts\n bssh -C production --exclude \"db*\" \"systemctl restart\" # Exclude with wildcard pattern\n bssh -C production --exclude \"*-backup\" \"df -h\" # Exclude backup nodes\n\n Output Modes:\n bssh -C prod \"apt-get update\" # TUI mode (default, interactive monitoring)\n bssh -C prod --stream \"tail -f log\" # Stream mode (real-time with [node] prefixes)\n bssh -C prod --output-dir ./logs \"ps\" # File mode (save to timestamped files)\n bssh -C prod \"uptime\" | tee log.txt # Normal mode (auto-detected when piped)\n\n Batch Mode (Ctrl+C Handling):\n bssh -C prod \"long-running-command\" # Default: first Ctrl+C shows status, second terminates\n bssh -C prod -b \"long-command\" # Batch mode: single Ctrl+C terminates immediately\n bssh -H nodes --batch --stream \"cmd\" # Useful for CI/CD and non-interactive scripts\n\n TUI Mode Controls (when in TUI):\n 1-9 Jump to node detail view\n s Enter split view (2-4 nodes)\n d Enter diff view (compare nodes)\n f Toggle auto-scroll\n ↑/↓ Scroll output\n ←/→ Switch nodes\n Esc Return to summary\n ? Show help\n q Quit\n\n File Operations:\n bssh -C staging upload file.txt /tmp/ # Upload to cluster\n bssh -H host1,host2 download /etc/hosts ./backups/\n\n Other Commands:\n bssh list # List configured clusters\n bssh -C production ping # Test connectivity\n bssh -H hosts interactive # Interactive mode\n\n SSH Config Example (~/.ssh/config):\n Host web*\n HostName web.example.com\n User webuser\n Port 2222\n IdentityFile ~/.ssh/web_key\n StrictHostKeyChecking yes\n\nDeveloped and maintained as part of the Backend.AI project.\nFor more information: https://github.com/lablup/bssh"
after_help = "EXAMPLES:\n SSH Mode:\n bssh user@host # Interactive shell\n bssh admin@server.com \"uptime\" # Execute command\n bssh -p 2222 -i ~/.ssh/key user@host # Custom port and key\n bssh -F ~/.ssh/myconfig webserver # Use custom SSH config\n\n Port Forwarding:\n bssh -L 8080:example.com:80 user@host # Local forward: localhost:8080 → example.com:80\n bssh -R 8080:localhost:80 user@host # Remote forward: remote:8080 → localhost:80\n bssh -D 1080 user@host # SOCKS5 proxy on localhost:1080\n bssh -L 3306:db:3306 -R 80:web:80 user@host # Multiple forwards\n bssh -D *:1080/4 user@host # SOCKS4 proxy on all interfaces\n\n Multi-Server Mode:\n bssh -C production \"systemctl status\" # Execute on cluster (TUI mode auto-enabled)\n bssh -H \"web1,web2,web3\" \"df -h\" # Execute on multiple hosts\n bssh -H \"web1,web2,web3\" -f \"web1\" \"df -h\" # Filter to web1 only\n bssh -C production -f \"web*\" \"uptime\" # Filter cluster nodes\n bssh --parallel 20 -H web* \"apt update\" # Increase parallelism\n\n Host Exclusion (--exclude):\n bssh -H \"node1,node2,node3\" --exclude \"node2\" \"uptime\" # Exclude single host\n bssh -C production --exclude \"web1,web2\" \"apt update\" # Exclude multiple hosts\n bssh -C production --exclude \"db*\" \"systemctl restart\" # Exclude with wildcard pattern\n bssh -C production --exclude \"*-backup\" \"df -h\" # Exclude backup nodes\n\n Fail-Fast Mode (pdsh -k compatible):\n bssh -k -H \"web1,web2,web3\" \"deploy.sh\" # Stop on first failure\n bssh --fail-fast -C prod \"apt upgrade\" # Critical deployment - stop if any node fails\n bssh -k --require-all-success -C prod cmd # Fail-fast + require all success\n\n Output Modes:\n bssh -C prod \"apt-get update\" # TUI mode (default, interactive monitoring)\n bssh -C prod --stream \"tail -f log\" # Stream mode (real-time with [node] prefixes)\n bssh -C prod --output-dir ./logs \"ps\" # File mode (save to timestamped files)\n bssh -C prod \"uptime\" | tee log.txt # Normal mode (auto-detected when piped)\n\n Batch Mode (Ctrl+C Handling):\n bssh -C prod \"long-running-command\" # Default: first Ctrl+C shows status, second terminates\n bssh -C prod -b \"long-command\" # Batch mode: single Ctrl+C terminates immediately\n bssh -H nodes --batch --stream \"cmd\" # Useful for CI/CD and non-interactive scripts\n\n TUI Mode Controls (when in TUI):\n 1-9 Jump to node detail view\n s Enter split view (2-4 nodes)\n d Enter diff view (compare nodes)\n f Toggle auto-scroll\n ↑/↓ Scroll output\n ←/→ Switch nodes\n Esc Return to summary\n ? Show help\n q Quit\n\n File Operations:\n bssh -C staging upload file.txt /tmp/ # Upload to cluster\n bssh -H host1,host2 download /etc/hosts ./backups/\n\n Other Commands:\n bssh list # List configured clusters\n bssh -C production ping # Test connectivity\n bssh -H hosts interactive # Interactive mode\n\n SSH Config Example (~/.ssh/config):\n Host web*\n HostName web.example.com\n User webuser\n Port 2222\n IdentityFile ~/.ssh/web_key\n StrictHostKeyChecking yes\n\nDeveloped and maintained as part of the Backend.AI project.\nFor more information: https://github.com/lablup/bssh"
)]
pub struct Cli {
/// SSH destination in format: [user@]hostname[:port] or ssh://[user@]hostname[:port]
Expand Down Expand Up @@ -196,6 +196,13 @@ pub struct Cli {
)]
pub check_all_nodes: bool,

#[arg(
short = 'k',
long = "fail-fast",
help = "Stop execution immediately on first failure (pdsh -k compatible)\nCancels pending commands when any node fails (connection error or non-zero exit)\nUseful for critical operations where partial execution is unacceptable"
)]
pub fail_fast: bool,

#[arg(
trailing_var_arg = true,
help = "Command to execute on remote hosts",
Expand Down
4 changes: 3 additions & 1 deletion src/commands/exec.rs
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ pub struct ExecuteCommandParams<'a> {
pub check_all_nodes: bool,
pub sudo_password: Option<Arc<SudoPassword>>,
pub batch: bool,
pub fail_fast: bool,
}

pub async fn execute_command(params: ExecuteCommandParams<'_>) -> Result<()> {
Expand Down Expand Up @@ -212,7 +213,8 @@ async fn execute_command_without_forwarding(params: ExecuteCommandParams<'_>) ->
.with_connect_timeout(params.connect_timeout)
.with_jump_hosts(params.jump_hosts.map(|s| s.to_string()))
.with_sudo_password(params.sudo_password)
.with_batch_mode(params.batch);
.with_batch_mode(params.batch)
.with_fail_fast(params.fail_fast);

// Set keychain usage if on macOS
#[cfg(target_os = "macos")]
Expand Down
Loading