Skip to content

feat(admin): Operations Dashboard — standalone app on Cloudflare Containers with Grafana OSS #2149

@jrf0110

Description

@jrf0110

Summary

Deploy a dedicated Kilo Operations Grafana dashboard using Cloudflare Containers. This provides a unified observability platform for Gastown, KiloClaw, and other backend services without relying on external SaaS dashboarding.

The architecture consists of a Cloudflare Worker acting as an auth proxy (validating Kilo JWTs or Cloudflare Access tokens) sitting in front of a Cloudflare Container running the official Grafana OSS Docker image. Dashboards are managed as code (JSON files) and provisioned automatically.

Architecture

Admin Browser
  ↓
Cloudflare Worker (kilo-ops-proxy)
  ├── Auth: Cloudflare Access JWT or Kilo Admin JWT
  ├── Routing: Forward all traffic to the Container
  ↓
Cloudflare Container (kilo-ops-grafana)
  ├── Base: grafana/grafana-oss:latest
  ├── Config: grafana.ini (auth proxy enabled, local sqlite db)
  ├── Provisioning: loads datasources.yaml and dashboards.yaml
  ├── Dashboards: loads JSON files from /etc/grafana/provisioning/dashboards/
  └── Datasources: ClickHouse (Analytics Engine), etc.

Why this architecture?

  • Dashboards as code: We already have gastown-grafana-dash-1.json. By baking dashboards into the container image via Grafana's provisioning system, updates to dashboards are just container deployments. No manual UI clicking required to sync environments.
  • Zero-trust auth: Grafana supports auth proxying (auth.proxy). The Cloudflare Worker handles authentication (verifying the user is a Kilo Admin) and passes the identity to Grafana via headers (X-WEBAUTH-USER). Grafana trusts the worker and logs the user in automatically.
  • Low cost: Cloudflare Containers are perfect for internal tools that don't need constant uptime. sleepAfter ensures we only pay for compute when an admin is actively viewing the dashboard.
  • Data locality: Analytics Engine (ClickHouse) is already on Cloudflare. Querying it from a container in the same network is fast and avoids egress fees.

Implementation Plan

1. New App Structure

Create a new app in the monorepo at apps/kilo-ops/ (or services/kilo-ops/):

apps/kilo-ops/
├── wrangler.jsonc           # Worker + Container config
├── src/
│   └── worker.ts            # Auth proxy worker
├── container/
│   ├── Dockerfile           # Grafana base + COPY configs
│   ├── grafana.ini          # Enable auth.proxy, disable login form
│   ├── provisioning/
│   │   ├── datasources/
│   │   │   └── clickhouse.yaml
│   │   └── dashboards/
│   │       ├── dashboards.yaml
│   │       └── gastown.json # Moved from cloudflare-gastown

2. Container Configuration (Dockerfile)

FROM grafana/grafana-oss:latest

# Grafana needs the vertamedia-clickhouse-datasource plugin for Analytics Engine
RUN grafana cli plugins install vertamedia-clickhouse-datasource

# Copy configuration
COPY grafana.ini /etc/grafana/grafana.ini
COPY provisioning /etc/grafana/provisioning

# The container will use its ephemeral disk for the sqlite DB.
# This means user preferences/stars are lost on container restart, 
# but dashboards/datasources are preserved because they are baked into the image.

3. Grafana Auth Proxy Config (grafana.ini)

[auth.proxy]
enabled = true
header_name = X-WEBAUTH-USER
header_property = username
auto_sign_up = true
sync_ttl = 60

[users]
allow_sign_up = false
auto_assign_org = true
auto_assign_org_role = Admin

4. Proxy Worker (worker.ts)

The worker acts as the entrypoint. It must:

  1. Verify the request comes from an authorized Kilo Admin (via CF Access JWT or Kilo Session JWT)
  2. Extract the user's email or ID
  3. Forward the request to the container with the X-WEBAUTH-USER header injected
  4. Strip any incoming X-WEBAUTH-USER headers from the client to prevent spoofing

5. Datasource Provisioning

The ClickHouse datasource needs the CF_ANALYTICS_API_KEY. This should be passed to the container as an environment variable and referenced in datasources.yaml:

apiVersion: 1
datasources:
  - name: Analytics Engine
    type: vertamedia-clickhouse-datasource
    access: proxy
    url: https://api.cloudflare.com/client/v4/accounts/${CF_ACCOUNT_ID}/analytics_engine/sql
    jsonData:
      add_metadata: true
      defaultDatabase: default
    secureJsonData:
      password: ${CF_ANALYTICS_API_KEY}

Security Considerations

  • The container must NOT be exposed to the internet directly. It should only be accessible via the proxy worker.
  • The proxy worker MUST strictly validate admin identity before appending the X-WEBAUTH-USER header.
  • The proxy worker MUST drop any X-WEBAUTH-USER header provided by the external client.

Acceptance Criteria

  • New apps/kilo-ops (or services/kilo-ops) workspace created
  • Dockerfile building Grafana with ClickHouse plugin
  • Grafana provisioning configured for datasources and dashboards
  • Existing gastown-grafana-dash-1.json moved to the new app and provisioned
  • Proxy worker that authenticates Kilo Admins and injects X-WEBAUTH-USER
  • wrangler.jsonc configured with the Worker and Container
  • Container sleeps when idle (sleepAfter configured)
  • Ephemeral disk wipe on sleep/restart does not break the dashboard (because definitions are in code)

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1Should fix before soft launchenhancementNew feature or requestgt:observabilityLogging, metrics, Grafana, debugging

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions