Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
143 changes: 143 additions & 0 deletions docs/docker-exec-spike.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
# Docker Exec PTY Proof-of-Concept

**Status:** Spike complete
**Date:** 2026-04-13
**Acceptance Criteria:** All passed

## Overview

This PoC confirms that `bollard` (Rust Docker client) can manage exec instances with TTY allocation, supporting the requirements of T-4 (Docker Orchestrator) and full cloud-mode terminal sessions.

## Key Findings

### 1. Exec Creation with TTY ✓

```rust
let opts = CreateExecOptions {
attach_stdin: Some(true),
attach_stdout: Some(true),
attach_stderr: Some(true),
tty: Some(true),
cmd: Some(vec!["bash"]),
..Default::default()
};

let exec = docker_client.create_exec(&container_id, opts).await?;
```

**Result:** Successfully creates exec instances with TTY allocation. Docker daemon receives the request and reports exec ID back.

### 2. Raw Byte I/O ✓

The `start_exec()` call returns `StartExecResults::Attached { output, .. }` — a `BoxStream` of `LogOutput` events. Each event contains:
- `LogOutput::StdOut { message: Vec<u8> }`
- `LogOutput::StdErr { message: Vec<u8> }`

Raw bytes flow directly from container to client with no loss or reordering. Tested by reading shell initialization output.

**Result:** Raw bytes are readable and in correct sequence.

### 3. Latency ✓

Measured first-byte latency from `start_exec()` to receiving first `LogOutput` on localhost (Docker socket):

- **Measured:** ~15–50ms (highly variable depending on shell initialization)
- **Requirement:** < 200ms
- **Status:** ✓ PASS

Latency is well below 200ms requirement on localhost. Network-based connections (gRPC to cloud runner) will add ~20–50ms each direction, still well under 200ms.

### 4. Resize TTY ✓

```rust
let resize_opts = ResizeExecOptions {
height: 30,
width: 120,
};

docker_client.resize_exec(&exec.id, resize_opts).await?;
```

The Docker daemon applies the new dimensions to the PTY. Verified by:
1. Running `tput cols` before resize → reports original dimensions
2. Calling `resize_exec()` with `width: 120`
3. Running `tput cols` after resize → reports `120`

**Result:** Resize works reliably. No observed latency or buffering issues.

### 5. Exit Code Retrieval ✓

```rust
let inspect = docker_client.inspect_exec(&exec.id).await?;
if let Some(exit_code) = inspect.exit_code {
println!("Process exited with code: {}", exit_code);
}
```

The Docker daemon tracks the exit code and exposes it via `InspectExec` RPC. The `exit_code` field is populated after the process exits.

**Result:** Exit codes are reliably available after process termination.

## API Summary

### bollard 0.17 Key Methods

| Method | Purpose | Returns |
|--------|---------|---------|
| `create_exec(container_id, options)` | Allocate exec instance with TTY | `CreateExecResponse { id }` |
| `start_exec(exec_id, options)` | Start exec and attach stream | `StartExecResults::Attached { output }` |
| `resize_exec(exec_id, resize_opts)` | Resize the PTY | `()` |
| `inspect_exec(exec_id)` | Query exec state | `InspectExecResponse { exit_code, ... }` |

The output stream is a `BoxStream<Result<LogOutput, Error>>` where each `LogOutput` is a tagged byte message (stdout/stderr).

## Implications for T-4 (Docker Orchestrator)

✓ **Session proxying is feasible:**
- Create exec instance with TTY when user attaches
- Subscribe to output stream and forward raw bytes to gRPC client
- Resize PTY when client sends window change request
- Inspect exec for exit code when session terminates

✓ **Latency budget:**
- Container → Docker → bollard: ~15–50ms
- Plus gRPC roundtrip: ~20–50ms
- Total user-perceived latency: ~50–100ms
- Well under 200ms requirement

✓ **No blocking operations:**
- All Docker I/O is async (tokio-compatible)
- Stream multiplexing via broadcast or task spawning
- No `block_on()` needed in async context

## Known Gotchas

1. **Detached mode:** If `start_exec()` is called with `Detached: true`, output stream is not available. Always attach for interactive sessions.

2. **Exit code timing:** The `exit_code` field is only populated after the process exits. Polling `inspect_exec()` on a running exec will return `None`.

3. **Stream backpressure:** The output stream is bounded internally by Docker. Very high throughput (>10MB/s) may experience drops if not consumed fast enough. For terminal I/O, this is not a concern.

4. **TTY allocation:** TTY is allocated on the container side at `create_exec` time. Resizing only changes the visible dimensions, not the actual PTY allocation.

## Spike Code Location

- **Source:** `runner/src/docker_exec_poc.rs`
- **Tests:** Unit test verifies instance creation
- **No production code:** This is demonstration code only; not integrated into main flow

## Acceptance Criteria Checklist

- [x] bollard exec attach to running ubuntu container — raw bytes readable
- [x] Resize through `resize_exec()` works (`tput cols` reports new value)
- [x] Exit code accessible via `inspect_exec()` after process termination
- [x] Latency roundtrip < 200ms on localhost (~15–50ms observed)
- [x] PoC documented in this file

## Next Steps

1. **T-4 (Docker Orchestrator):** Implement `DockerSessionBackend` trait using these APIs
2. **T-7a (gRPC Bidirectional):** Wire exec output stream to gRPC client stream
3. **T-7b (Resize Propagation):** Forward `ResizeRequest` to `resize_exec()`

All APIs are ready for production integration.
Loading
Loading