Context
From the system design audit, observability is stated as a goal (OTLP trace proxy exists) but there is no unified signal catalog defining what is measured and what is excluded.
Design Decision (from spec)
Operational + business metrics. No conversation content in traces.
Operational Metrics
event.persisted.count, event.persisted.latency_ms
projection.rebuild.duration_ms
rpc.request.count (by method), rpc.request.latency_ms, rpc.error.count
ws.connection.count, ws.reconnect.count
canvas.snapshot.save.latency_ms
pty.session.count
Business Metrics
turn.completed.count (by provider, model), turn.duration_ms
turn.interrupted.count, turn.error.count
thread.created.count
approval.requested.count, approval.decision.count (by decision type)
git.action.count (by type), git.action.duration_ms
checkpoint.revert.count, diff.viewed.count
Content Exclusions (Mandatory)
Never include: message.text, attachment content, diff blobs, terminal I/O, file contents, provider API keys.
Proposed Changes
- Define signal names, types, dimensions, and collection points in a canonical doc
- Instrument key paths: event persistence, RPC dispatch, turn lifecycle, git actions
- Ensure OTLP trace export respects content exclusions
- Add counters/timers at the collection points listed above
Acceptance Criteria
References
- System Design Spec:
.plans/21-system-design-spec.md Section 8 (Quality Attributes)
- Audit finding: "Observability is stated as a goal but has no signal catalog"
Context
From the system design audit, observability is stated as a goal (OTLP trace proxy exists) but there is no unified signal catalog defining what is measured and what is excluded.
Design Decision (from spec)
Operational + business metrics. No conversation content in traces.
Operational Metrics
event.persisted.count,event.persisted.latency_msprojection.rebuild.duration_msrpc.request.count(by method),rpc.request.latency_ms,rpc.error.countws.connection.count,ws.reconnect.countcanvas.snapshot.save.latency_mspty.session.countBusiness Metrics
turn.completed.count(by provider, model),turn.duration_msturn.interrupted.count,turn.error.countthread.created.countapproval.requested.count,approval.decision.count(by decision type)git.action.count(by type),git.action.duration_mscheckpoint.revert.count,diff.viewed.countContent Exclusions (Mandatory)
Never include: message.text, attachment content, diff blobs, terminal I/O, file contents, provider API keys.
Proposed Changes
Acceptance Criteria
References
.plans/21-system-design-spec.mdSection 8 (Quality Attributes)