feat(metrics): Add rework rate OTEL metrics (#265) — 5th DORA metric#415
Conversation
…ORA metric Add PR rework rate as OTEL metrics in squad-sdk, following the exact pattern of existing metrics (bradygaster#261-bradygaster#264) in otel-metrics.ts. New modules: - runtime/rework.ts: Pure calculation module with typed interfaces (PrInfo, PrReview, PrCommit, PrReworkResult, ReworkSummary) and functions (calculatePrRework, calculateReworkSummary) - otel-metrics.ts bradygaster#265 section: Four OTEL instruments - squad.rework.rate (Gauge) — rework rate percentage - squad.rework.cycles (Histogram) — review cycles per PR - squad.rework.rejection_rate (Gauge) — rejection percentage - squad.rework.time_ms (Histogram) — time spent in rework - Export functions: recordReworkMetrics(), recordReworkSummary() Includes 19 Vitest tests (9 for calculatePrRework, 4 for calculateReworkSummary, 6 for OTEL metric recording). Replaces bradygaster#381 (CLI command approach) with proper SDK-level OTEL metrics. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Follow-up Ideas for Rework RateThree enhancements that would make this metric self-sustaining: 1. GitHub Actions Cron — Automated AssessmentA scheduled workflow (daily/weekly) that runs the calculation without any CLI session. Calls gh to get recent merged PRs, computes rework metrics, and:
2. Ralph Integration — Self-Correction LoopWire into Ralph's weekly retro: high rework = flag it, Lead runs retro, team adjusts. Low rework sustained over N weeks = reduce review gates, lower polling, auto-promote skill confidence. The "nap" concept: a well-calibrated squad can relax its guardrails. 3. Azure DevOps SupportCurrent impl uses gh CLI. For ADO repos, need az repos pr list or ADO REST API. The calculation logic in rework.ts is provider-agnostic — just needs PrInfo, PrReview, PrCommit shaped data. An ADO adapter that maps to these interfaces would make it work across both platforms. Happy to contribute any of these as follow-up PRs. |
0f44a19 to
c654cf7
Compare
|
Note: CI failure is a pre-existing issue on dev branch, not from this PR. The CLI build fails on missing SDK exports (listRoles, searchRoles, getCategories, getRoleById, generateCharterFromRole) in roles.ts, cast.ts, and coordinator.ts. These imports exist in the CLI but the SDK doesn't export them yet. This breaks any PR targeting dev right now. Our changes (squad-sdk only) compile clean - the rework.ts module and otel-metrics.ts additions type-check without errors. Re: ADO support - the calculation in rework.ts is already provider-agnostic (just needs PrInfo/PrReview/PrCommit data). Rather than shipping adapter classes, this is better handled at the agent/skill layer - the Squad coordinator already knows how to call gh vs az based on repo context. A skill template documenting both data sources is sufficient. |
* chore(squad): Phase 2 launch — thinking feedback, P0 bugs, dual telemetry Phase 1 complete: 5 issues closed (bradygaster#325, bradygaster#326, bradygaster#327, bradygaster#328, bradygaster#329), 5 PRs merged. Phase 2 launched with Cheritto (thinking feedback), Hockney (P0 bugs), Saul (dual telemetry). Decision inbox merged and archived. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore(squad): Phase 2 Wave 1 merged, Wave 2 launched Session: 2026-02-23T2145-phase2-wave2 Phase 2 Wave 1 complete (PRs bradygaster#351, bradygaster#352, bradygaster#353 merged). Wave 2 launched: Cheritto on ghost response detection (bradygaster#332), Hockney on error hardening (bradygaster#334). Changes: - Session log created: 2026-02-23T2145-phase2-wave2.md - Merged 3 inbox decisions (Cheritto, Hockney, Saul) - Deleted inbox files post-merge Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore(squad): Epic bradygaster#323 complete — all phases shipped 🎉 All 3 phases delivered: - Phase 1 (Testing Wave): 6 issues closed - Phase 2 (Improvement): 6 issues closed - Phase 3 (Breathtaking): 7 issues closed - 17 PRs merged, 19 issues closed total Session log: 2026-02-23T2320-epic-complete.md Decisions merged from inbox: P2 UX Polish, first-run wow moment Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * hostile QA: end-to-end quality assessment — 10 findings, 4 HIGH severity Candid assessment requested by Brady. Traced every code path in cli-entry.ts, shell/index.ts, shell/commands.ts, App.tsx, coordinator.ts, spawn.ts, and the SDK adapter client. Key findings: - Dead sessions never evicted from agentSessions Map after connection drop - No React ErrorBoundary — any render throw kills the shell - Nasty-inputs corpus (95 strings) is never imported by any test - No SIGTERM handler in interactive shell - MemoryManager exported but never instantiated (dead code) - Single streaming content slot clobbers multi-agent output - User input silently dropped during processing (no type-ahead buffer) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore(squad): quality review findings — 7 issues filed Quality audit complete: 5 agents assessed CLI across testing, coverage, stability, accessibility, UX. Results: 4 P0 blockers (bradygaster#365–bradygaster#368), 3 P1 items (bradygaster#369–bradygaster#371). Blocking: Waingro dead sessions, ErrorBoundary, dropped input; Marquez help text consistency. Changes: - Logged session summary to .squad/log/2026-02-24T0205-quality-review-complete.md - Updated .squad/identity/now.md with quality review findings and new issue numbers Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore(squad): merge decision — Marquez UX audit findings Quality assessment merged from inbox (Grade B): 11 improvements (3 P0, 4 P1, 4 P2). help text, stub commands, vocabulary, separators, roster. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore(squad): test sprint launch Session: 2026-02-24T0210-test-sprint Changes: - Logged test sprint: 5 agents, 7+ issues - Branches: P0 fixes, stale tests, E2E, hostile/SDK, A11y Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: correct ThinkingIndicator assertion to match component behavior The ThinkingIndicator renders empty string when isThinking=false, not 'No agents active'. Fix the test assertion. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…er#429, bradygaster#424, bradygaster#417, bradygaster#415, bradygaster#412, bradygaster#411) Documents features and changes from recent PRs that shipped without corresponding docs updates: - bradygaster#429: Update model catalog with Sonnet 4.6, Opus 4.6, GPT-5.4 defaults - bradygaster#424: Document --sdk switch for TypeScript config generation - bradygaster#412: Document --roles flag for opt-in base roles - bradygaster#411: Note Ralph in init + @copilot routing template removal - bradygaster#442: Add Session Recovery skill documentation - bradygaster#417: Document CastingEngine character casting - bradygaster#415: Add rework rate OTEL metrics reference Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…er#429, bradygaster#424, bradygaster#417, bradygaster#415, bradygaster#412, bradygaster#411) Documents features and changes from recent PRs that shipped without corresponding docs updates: - bradygaster#429: Update model catalog with Sonnet 4.6, Opus 4.6, GPT-5.4 defaults - bradygaster#424: Document --sdk switch for TypeScript config generation - bradygaster#412: Document --roles flag for opt-in base roles - bradygaster#411: Note Ralph in init + @copilot routing template removal - bradygaster#442: Add Session Recovery skill documentation - bradygaster#417: Document CastingEngine character casting - bradygaster#415: Add rework rate OTEL metrics reference Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* docs: fill content gaps from 7 recent PRs (#442, #429, #424, #417, #415, #412, #411) Documents features and changes from recent PRs that shipped without corresponding docs updates: - #429: Update model catalog with Sonnet 4.6, Opus 4.6, GPT-5.4 defaults - #424: Document --sdk switch for TypeScript config generation - #412: Document --roles flag for opt-in base roles - #411: Note Ralph in init + @copilot routing template removal - #442: Add Session Recovery skill documentation - #417: Document CastingEngine character casting - #415: Add rework rate OTEL metrics reference Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: remove duplicate gpt-5.1-codex-mini from Fast/Cheap tier --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Add PR rework rate as OTEL metrics in squad-sdk, following the exact pattern of the existing metrics (#261–#264) in \otel-metrics.ts.
This replaces #381 (the standalone CLI command approach) with a proper SDK-level OTEL metric that integrates into the existing telemetry pipeline.
What's included
\packages/squad-sdk/src/runtime/rework.ts\ — Pure calculation module
\otel-metrics.ts\ — #265 — Rework Rate Metrics\ section
Four OTEL instruments following the exact same lazy-init pattern:
Export functions:
ecordReworkMetrics(result)\ and
ecordReworkSummary(summary)\
Tests — 19 Vitest tests (all passing)
ecordReworkMetrics,
ecordReworkSummary, instrument creation, reset)
Changeset
@bradygaster/squad-sdk: minor\ — Add rework rate OTEL metrics (#265)
Metrics tracked (the 5th DORA metric)
Refs #265