Skip to content

Add observability signal catalog (operational + business metrics) #17

@rororowyourboat

Description

@rororowyourboat

Context

From the system design audit, observability is stated as a goal (OTLP trace proxy exists) but there is no unified signal catalog defining what is measured and what is excluded.

Design Decision (from spec)

Operational + business metrics. No conversation content in traces.

Operational Metrics

  • event.persisted.count, event.persisted.latency_ms
  • projection.rebuild.duration_ms
  • rpc.request.count (by method), rpc.request.latency_ms, rpc.error.count
  • ws.connection.count, ws.reconnect.count
  • canvas.snapshot.save.latency_ms
  • pty.session.count

Business Metrics

  • turn.completed.count (by provider, model), turn.duration_ms
  • turn.interrupted.count, turn.error.count
  • thread.created.count
  • approval.requested.count, approval.decision.count (by decision type)
  • git.action.count (by type), git.action.duration_ms
  • checkpoint.revert.count, diff.viewed.count

Content Exclusions (Mandatory)

Never include: message.text, attachment content, diff blobs, terminal I/O, file contents, provider API keys.

Proposed Changes

  1. Define signal names, types, dimensions, and collection points in a canonical doc
  2. Instrument key paths: event persistence, RPC dispatch, turn lifecycle, git actions
  3. Ensure OTLP trace export respects content exclusions
  4. Add counters/timers at the collection points listed above

Acceptance Criteria

  • Signal catalog documented
  • At least operational metrics instrumented at key collection points
  • Content exclusion rule enforced (no message text/files/diffs in traces)

References

  • System Design Spec: .plans/21-system-design-spec.md Section 8 (Quality Attributes)
  • Audit finding: "Observability is stated as a goal but has no signal catalog"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions