Problem
Zero squad.decisions.* metrics exist in the codebase. The decisions subsystem is completely dark — no gauges, no counters, no span attributes. You cannot detect when archival stops working or measure the token cost impact of bloated decisions.
From
Telemetry reviews of #20 and #21. Telemetry: 'The first PR should not be the archival fix — it should be the metrics.'
Proposed Metrics
Gauges (current state)
squad.decisions.size_bytes — decisions.md file size
squad.decisions.entry_count — number of decision entries
squad.decisions.age_oldest_days — age of oldest active entry
squad.decisions.inbox_depth — unmerged inbox files
squad.decisions.archive_size_bytes — archive file size
Counters (operations)
squad.decisions.archive_runs — Scribe archival executions
squad.decisions.entries_archived — entries moved per run
squad.decisions.bytes_archived — bytes recovered per run
Span Attributes (per spawn)
agent.decisions_size_bytes on every agent spawn span
context_utilization_pct — context window usage
Collection Points
- Coordinator session start (baseline)
- Every agent spawn (span attribute)
- Scribe run (pre/post archival)
Alerting Thresholds
- size_bytes >20KB: warn | >50KB: error
- inbox_depth >10: warn | >25: error
- archive_runs stale + size_bytes rising: error
Owner
Telemetry (Aspire & Observability)
Problem
Zero
squad.decisions.*metrics exist in the codebase. The decisions subsystem is completely dark — no gauges, no counters, no span attributes. You cannot detect when archival stops working or measure the token cost impact of bloated decisions.From
Telemetry reviews of #20 and #21. Telemetry: 'The first PR should not be the archival fix — it should be the metrics.'
Proposed Metrics
Gauges (current state)
squad.decisions.size_bytes— decisions.md file sizesquad.decisions.entry_count— number of decision entriessquad.decisions.age_oldest_days— age of oldest active entrysquad.decisions.inbox_depth— unmerged inbox filessquad.decisions.archive_size_bytes— archive file sizeCounters (operations)
squad.decisions.archive_runs— Scribe archival executionssquad.decisions.entries_archived— entries moved per runsquad.decisions.bytes_archived— bytes recovered per runSpan Attributes (per spawn)
agent.decisions_size_byteson every agent spawn spancontext_utilization_pct— context window usageCollection Points
Alerting Thresholds
Owner
Telemetry (Aspire & Observability)