Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
eacbec8
chore: harden VPS deploy user with narrowed sudoers
GitAddRemote May 7, 2026
e9d2e64
chore: switch deploy user hardening to rootless Docker
GitAddRemote May 7, 2026
f30ddaa
docs: add rootless Docker migration runbook to vps-setup.md
GitAddRemote May 7, 2026
ade8057
fix: address PR 156 review feedback
GitAddRemote May 7, 2026
00e2bcd
fix: address PR 156 round-2 review feedback
GitAddRemote May 8, 2026
b4505d5
fix: address PR 156 round-3 review feedback
GitAddRemote May 8, 2026
86798f6
fix: address PR 156 round-4 review feedback
GitAddRemote May 10, 2026
357f900
fix: address PR 156 round-5 review feedback
GitAddRemote May 10, 2026
ff112b8
fix: address PR 156 round-6 review feedback
GitAddRemote May 10, 2026
4d7fb9a
fix: address PR 156 round-7 review feedback
GitAddRemote May 10, 2026
690e6ec
fix: address PR 156 round-8 review feedback
GitAddRemote May 10, 2026
45fd452
fix: address PR 156 round-9 review feedback
GitAddRemote May 10, 2026
e6a0c25
fix: address PR 156 round-10 review feedback
GitAddRemote May 10, 2026
6dbcc16
fix: address PR 156 round-11 review feedback
GitAddRemote May 11, 2026
00e3743
fix: address PR 156 round-12 review feedback
GitAddRemote May 11, 2026
2f18b81
fix: address PR 156 round-13 review feedback
GitAddRemote May 11, 2026
0cf1a21
fix: address PR 156 round-14 review feedback
GitAddRemote May 11, 2026
7266af3
fix: address PR 156 round-15 review feedback
GitAddRemote May 11, 2026
4c722ce
fix: address PR 156 round-16 review feedback
GitAddRemote May 11, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
172 changes: 172 additions & 0 deletions infra/docs/rootless-docker-migration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
# Rootless Docker Migration — station-bot VPS

**Date:** 2026-05-07
**Host:** Cloud VPS (Ubuntu 24.04.4 LTS)
**Scope:** Migrated the `deploy` user's containers from the root Docker daemon to rootless Docker, removing `docker` group membership.

---

## Why

The `docker` group is root-equivalent. Any process that can reach `/var/run/docker.sock` can mount the host filesystem, run privileged containers, and escalate to root. If the deploy SSH key were ever leaked, an attacker would have had full root access to the host.

Rootless Docker runs the daemon entirely inside the deploy user's own namespace. The socket lives at `/run/user/<uid>/docker.sock` and is inaccessible to other non-root users. A compromised deploy key cannot escalate to root or access other users' containers via Docker — the blast radius is limited to the deploy user's own namespace and resources.

---

## What changed

- Rootless Docker daemon installed and running as the `deploy` user via `systemd --user`
- `DOCKER_HOST` written to `~deploy/.bashrc` so interactive sessions use the rootless socket automatically (no PATH change needed — APT install uses `/usr/bin`)
- `deploy` removed from the `docker` group
- All containers (postgres, discord-bot) migrated to the rootless daemon with data intact
- Postgres data preserved via `pg_dump` / `psql` restore across daemons

---

## Prerequisites installed

> **Historical record:** these are the packages installed during the original migration, which used `curl | sh`. The recommended install method now also includes `docker-ce-rootless-extras` — see the install method note in [Migration steps](#migration-steps-for-reference-on-future-vps).

```bash
apt install -y uidmap dbus-user-session
loginctl enable-linger deploy
```

- **`uidmap`** — provides `newuidmap`/`newgidmap`, the kernel tools that make user namespace ID mapping work. Required by rootlesskit, which underlies rootless Docker.
- **`dbus-user-session`** — enables per-user D-Bus sessions, which `systemd --user` needs to manage user-scoped services like the rootless Docker daemon.
- **`loginctl enable-linger`** — keeps the deploy user's systemd session alive after logout so the Docker daemon stays running without an active SSH session.

---

## AppArmor profile

Ubuntu 24.04 sets `/proc/sys/kernel/apparmor_restrict_unprivileged_userns=1` by default, which blocks rootlesskit from creating user namespaces. An explicit AppArmor profile is required to allow it.

With `docker-ce-rootless-extras` installed via APT, `rootlesskit` lives at `/usr/bin/rootlesskit`. Resolve the path dynamically so the profile filename and binary path stay correct regardless of install method:

```bash
ROOTLESSKIT_BIN="$(command -v rootlesskit)"
PROFILE_SLUG="$(echo "${ROOTLESSKIT_BIN}" | sed 's|^/||; s|/|.|g')"

sudo tee "/etc/apparmor.d/${PROFILE_SLUG}" <<EOT
abi <abi/4.0>,
include <tunables/global>

${ROOTLESSKIT_BIN} flags=(unconfined) {
userns,
include if exists <local/${PROFILE_SLUG}>
}
EOT
sudo systemctl restart apparmor.service
```
Comment thread
GitAddRemote marked this conversation as resolved.

This grants rootlesskit permission to use user namespaces without granting broader privileges.

---

## Migration steps (for reference on future VPS)

See the full runbook in [`vps-setup.md`](./vps-setup.md#migrating-an-existing-vps-to-rootless-docker).

> **Install method:** use `apt install docker-ce-rootless-extras` (already signed and pinned to the Docker APT repo) then `dockerd-rootless-setuptool.sh install` as the deploy user. This avoids `curl | sh`. The post-mortem below references the old `curl | sh` path as historical context for what was run during the original migration.

Summary:

1. Install prerequisites + `docker-ce-rootless-extras` as root (no downtime)
2. Run `dockerd-rootless-setuptool.sh install` as deploy (no downtime)
3. `pg_dump` while root daemon still running (no downtime)
4. `docker compose down`, activate rootless in session, start rootless daemon (downtime starts)
5. Write `.bashrc`, start postgres under rootless, restore data, start bot (downtime ends ~2 min)
6. `gpasswd -d deploy docker` as root, verify in a fresh SSH session

---

## Verification

After migration, in a fresh SSH session as deploy:

```bash
docker info | grep -i rootless # should output: rootless
groups # docker should NOT appear
docker compose -f /opt/station-bot/docker-compose.prod.yml ps
Comment thread
GitAddRemote marked this conversation as resolved.
```

---

## Effect on deployments

No change to the deployment workflow. SSH in as deploy, run the usual docker compose commands. `.bashrc` sets `DOCKER_HOST` automatically on login.

---

## Post-mortem

### Issue 1 — `set -a; source .env.production` executed non-variable lines as shell commands

**What happened:** The `.env.production` file contains human-readable comments and descriptive text without `#` prefixes. Sourcing it with `set -a` caused bash to attempt to execute those lines as commands, producing errors like `Member: command not found`.

**Fix:** Extract only the needed variables directly:

```bash
POSTGRES_USER=$(grep '^POSTGRES_USER=' .env.production | cut -d= -f2-)
POSTGRES_DB=$(grep '^POSTGRES_DB=' .env.production | cut -d= -f2-)
```

**Lesson:** `set -a; source` assumes every non-comment line is a valid variable assignment. It's brittle against env files written for human readability. Either enforce strict `KEY=value` formatting in env files, or extract specific variables when sourcing them in scripts.

---

### Issue 2 — `FORCE_ROOTLESS_INSTALL=1` prefix only applied to `curl`, not to `sh`

**What happened:** Running `FORCE_ROOTLESS_INSTALL=1 curl ... | sh` sets the variable in `curl`'s environment, not in the piped `sh` process. The installer still aborted.

**Fix:** Place the variable before `sh`:

```bash
curl -fsSL https://get.docker.com/rootless | FORCE_ROOTLESS_INSTALL=1 sh
```

**Lesson:** In a pipeline, each process inherits from the shell — not from the previous process in the pipe. Environment variable prefixes only apply to the command they immediately precede.

---

### Issue 3 — AppArmor blocked rootlesskit on Ubuntu 24.04

**What happened:** Ubuntu 24.04 restricts unprivileged user namespaces via AppArmor by default. The rootless Docker installer failed with `fork/exec /proc/self/exe: permission denied`. The installer printed the fix but the AppArmor profile wasn't created before the first install attempt.

**Fix:** Create the AppArmor profile for rootlesskit and restart the apparmor service before installing rootless Docker. See the profile above.

**Lesson:** Ubuntu 24.04 is more locked down than previous LTS versions in this regard. The rootless Docker docs mention this but it's easy to miss. On any new Ubuntu 24.04 VPS, create the AppArmor profile as a prerequisite step before attempting rootless Docker installation.

---

### Issue 4 — Partial failed install blocked the retry

**What happened:** After the AppArmor fix, the installer detected the partial installation from the first failed attempt and refused to proceed.

**Fix:** Clean up the partial install before retrying. With the APT-based install (`docker-ce-rootless-extras`), `dockerd-rootless-setuptool.sh` is in `/usr/bin` and `dockerd` is not placed in `~/bin`:

```bash
systemctl --user stop docker 2>/dev/null || true
dockerd-rootless-setuptool.sh uninstall -f 2>/dev/null || true
rm -rf ~/.local/share/docker
```

> **Note:** The original migration used `curl | sh` which installed binaries to `~/bin/`. If cleaning up an old curl-based install, also run `rm -f ~/bin/dockerd ~/bin/dockerd-rootless-setuptool.sh`.

**Lesson:** The rootless Docker installer is not idempotent when a previous run failed partway through. Always clean up before retrying a failed install.

---

### Issue 5 — Post-install, docker commands targeted the rootless daemon instead of root daemon

**What happened:** The installer switched the Docker CLI context to "rootless" on completion. Phase 3 (pg_dump) runs against the root daemon where station-bot's containers live, but after install `docker exec` was hitting the empty rootless daemon, producing `No such container`.

**Fix:** Explicitly target the root daemon socket for Phase 3 commands:

```bash
DOCKER_HOST=unix:///var/run/docker.sock docker exec station-bot-postgres pg_dump ...
```

**Lesson:** After installing rootless Docker, the CLI context changes immediately. Any commands that still need to reach the root daemon must override `DOCKER_HOST` explicitly until the cutover is complete.
162 changes: 162 additions & 0 deletions infra/docs/vps-setup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
# VPS Setup — Deploy User Hardening

## Overview

The deploy SSH key lives in GitHub Secrets and is used on every deployment. If it were leaked, the attacker would have SSH access as the deploy user and could run arbitrary Docker containers within the deploy user's namespace. The key hardening property is that they cannot escalate to root or access other users' containers via Docker: the deploy user runs their own Docker daemon entirely within their user namespace, with no access to the root Docker socket and no docker group membership.

## Approach: rootless Docker

The deploy user's Docker daemon runs unprivileged inside a user namespace. There is no `/var/run/docker.sock` accessible to the deploy user — the socket lives at `/run/user/<uid>/docker.sock` and is owned entirely by that user.

`bootstrap-vps.sh` handles the full setup:

- Installs `uidmap`, `dbus-user-session`, and `docker-ce-rootless-extras` prerequisites
- Enables linger so the deploy user's systemd session persists without an active login
- Sets `DOCKER_HOST` in `~deploy/.bashrc` (no PATH change needed — APT install uses `/usr/bin`)
- Installs rootless Docker via `dockerd-rootless-setuptool.sh install` (ships with `docker-ce-rootless-extras`, no remote script execution)
- Enables and starts the `docker` systemd user service

The deploy scripts (`deploy.sh`, `backup-db.sh`, etc.) call `docker compose` directly — no `sudo` required.

## Pre-check results (recorded 2026-05-07)

| Check | Result |
| --------------------------------------------- | ------------------------------------------------------------------ |
| `/proc/sys/kernel/unprivileged_userns_clone` | `1` ✓ |
| `newuidmap` installed | Not pre-installed — provided by `uidmap` package (installed above) |
| `unshare --user sh -c "echo namespaces work"` | `namespaces work` ✓ |

## Verification

After running `bootstrap-vps.sh`, SSH in as the deploy user and confirm:

```bash
# Docker daemon is running
systemctl --user status docker

# Docker works without sudo or docker group
docker run hello-world

# No root socket access
ls /var/run/docker.sock # deploy user should get permission denied
groups # should NOT include 'docker'
```

## Security properties

| Capability | Before (docker group) | After (rootless) |
| ------------------------------ | --------------------- | ---------------- |
| Run containers | ✓ | ✓ |
| Access root Docker socket | ✓ (root-equivalent) | ✗ |
| Escalate to root via Docker | ✓ | ✗ |
| Affect other users' containers | ✓ | ✗ |
| Blast radius of leaked key | Full host (root) | Deploy user only |

## Reproducing on a fresh VPS

`bootstrap-vps.sh` is fully automated. Prerequisites, linger, AppArmor profile (Ubuntu 24.04+), rootless install, and service enable/start are all handled. After the script completes, verify with the commands above.

Comment thread
GitAddRemote marked this conversation as resolved.
---

## Migrating an existing VPS to rootless Docker

Use these steps when Docker is already running on a VPS (e.g. station-bot is live) and you need to move the deploy user's containers from the root daemon to rootless without data loss. Expected downtime: 2–3 minutes.

### Phase 1 — Install prerequisites (no downtime)

**As root:**

```bash
apt install -y uidmap dbus-user-session
loginctl enable-linger deploy
```

### Phase 2 — Install rootless Docker (no downtime)

**As root — install the APT package that ships the setup tool:**

```bash
apt install -y docker-ce-rootless-extras
```

**As deploy (new SSH session):**

```bash
dockerd-rootless-setuptool.sh install
```

### Phase 3 — Dump postgres data (no downtime)

**As deploy — explicitly target the root daemon; the rootless installer may have switched the CLI context:**

```bash
POSTGRES_USER=$(grep '^POSTGRES_USER=' /opt/station-bot/.env.production | cut -d= -f2-)
POSTGRES_DB=$(grep '^POSTGRES_DB=' /opt/station-bot/.env.production | cut -d= -f2-)
DOCKER_HOST=unix:///var/run/docker.sock docker exec station-bot-postgres pg_dump -U "${POSTGRES_USER}" "${POSTGRES_DB}" > /tmp/station_bot_backup.sql
echo "Dump size: $(wc -c < /tmp/station_bot_backup.sql) bytes"
```

### Phase 4 — Cut over to rootless (downtime starts)

**As deploy:**

```bash
# Bring down root-daemon containers
cd /opt/station-bot
docker compose -f docker-compose.prod.yml down

# Activate rootless in this session (PATH unchanged — APT install uses /usr/bin)
export DOCKER_HOST=unix:///run/user/$(id -u)/docker.sock

# Enable and start rootless service
systemctl --user enable docker
systemctl --user start docker

# Confirm rootless is active
docker info | grep -i rootless
```

### Phase 5 — Restore data and bring services back up (downtime ends)

**As deploy:**

```bash
# Make DOCKER_HOST permanent (PATH unchanged — APT install uses /usr/bin)
cat >> ~/.bashrc << 'RCEOF'

# rootless docker
export DOCKER_HOST=unix:///run/user/$(id -u)/docker.sock
RCEOF

# Start postgres under rootless daemon
cd /opt/station-bot
docker compose -f docker-compose.prod.yml up -d postgres

# Wait for healthy
until docker compose -f docker-compose.prod.yml ps | grep -q "healthy"; do sleep 2; done

# Restore data (DOCKER_HOST already set to rootless socket above)
DOCKER_HOST=unix:///run/user/$(id -u)/docker.sock docker exec -i station-bot-postgres psql -U "${POSTGRES_USER}" "${POSTGRES_DB}" < /tmp/station_bot_backup.sql

Comment thread
GitAddRemote marked this conversation as resolved.
# Start the bot
docker compose -f docker-compose.prod.yml up -d discord-bot

# Verify
docker compose -f docker-compose.prod.yml ps
docker logs station-bot --tail 20
```

### Phase 6 — Remove docker group access (only after confirming services are healthy)

**As root:**

```bash
gpasswd -d deploy docker
```

**As deploy (fresh SSH session to confirm clean environment):**

```bash
docker compose -f /opt/station-bot/docker-compose.prod.yml ps
groups # docker should not appear
```
9 changes: 6 additions & 3 deletions infra/scripts/backup-db.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ ENV_FILE="${STATION_ROOT}/.env.production"
COMPOSE_FILE="${STATION_ROOT}/docker-compose.prod.yml"
RCLONE_CONFIG_FILE="${STATION_ROOT}/rclone.conf"
LOG_PREFIX="[backup]"
DOCKER_HOST="${DOCKER_HOST:-unix:///run/user/$(id -u)/docker.sock}"
export DOCKER_HOST
Comment thread
GitAddRemote marked this conversation as resolved.

if [ ! -f "${ENV_FILE}" ]; then
echo "${LOG_PREFIX} Missing ${ENV_FILE}" >&2
Expand All @@ -18,9 +20,10 @@ if [ ! -f "${RCLONE_CONFIG_FILE}" ]; then
exit 1
fi

set -a
source "${ENV_FILE}"
set +a
DATABASE_USER="$(grep '^DATABASE_USER=' "${ENV_FILE}" | cut -d= -f2- || true)"
DATABASE_NAME="$(grep '^DATABASE_NAME=' "${ENV_FILE}" | cut -d= -f2- || true)"
B2_BUCKET="$(grep '^B2_BUCKET=' "${ENV_FILE}" | cut -d= -f2- || true)"
BACKUP_HEALTHCHECK_URL="$(grep '^BACKUP_HEALTHCHECK_URL=' "${ENV_FILE}" | cut -d= -f2- || true)"

: "${DATABASE_USER:?DATABASE_USER is required}"
: "${DATABASE_NAME:?DATABASE_NAME is required}"
Expand Down
Loading
Loading