Skip to content

feat(metrics): Add rework rate OTEL metrics (#265) — 5th DORA metric#415

Merged
bradygaster merged 1 commit intobradygaster:devfrom
tamirdresher:feat/rework-rate-otel-metric
Mar 15, 2026
Merged

feat(metrics): Add rework rate OTEL metrics (#265) — 5th DORA metric#415
bradygaster merged 1 commit intobradygaster:devfrom
tamirdresher:feat/rework-rate-otel-metric

Conversation

@tamirdresher
Copy link
Copy Markdown
Collaborator

Summary

Add PR rework rate as OTEL metrics in squad-sdk, following the exact pattern of the existing metrics (#261#264) in \otel-metrics.ts.

This replaces #381 (the standalone CLI command approach) with a proper SDK-level OTEL metric that integrates into the existing telemetry pipeline.

What's included

\packages/squad-sdk/src/runtime/rework.ts\ — Pure calculation module

  • Typed interfaces: \PrInfo, \PrReview, \PrCommit, \PrReworkResult, \ReworkSummary\
  • \calculatePrRework()\ — per-PR rework analysis (commits after first review)
  • \calculateReworkSummary()\ — aggregate metrics across multiple PRs

\otel-metrics.ts\ — #265 — Rework Rate Metrics\ section

Four OTEL instruments following the exact same lazy-init pattern:

  • \squad.rework.rate\ (Gauge) — current rework rate percentage
  • \squad.rework.cycles\ (Histogram) — review cycles per PR
  • \squad.rework.rejection_rate\ (Gauge) — percentage of PRs with changes requested
  • \squad.rework.time_ms\ (Histogram) — time spent in rework

Export functions:
ecordReworkMetrics(result)\ and
ecordReworkSummary(summary)\

Tests — 19 Vitest tests (all passing)

  • 9 tests for \calculatePrRework\ (edge cases: no reviews, post-review commits, multiple cycles, nested commit objects, empty data, missing author, filtered reviews, rework time)
  • 4 tests for \calculateReworkSummary\ (empty, aggregate, all-clean, single high-rework)
  • 6 tests for OTEL metric recording (
    ecordReworkMetrics,
    ecordReworkSummary, instrument creation, reset)

Changeset

@bradygaster/squad-sdk: minor\ — Add rework rate OTEL metrics (#265)

Metrics tracked (the 5th DORA metric)

Metric Description
Rework Rate commits after first review / total commits × 100
Review Cycles changes-requested → approved transitions
Rejection Rate PRs with ≥1 changes requested / total PRs × 100
Rework Time last approval − first changes-requested

Refs #265

…ORA metric

Add PR rework rate as OTEL metrics in squad-sdk, following the exact
pattern of existing metrics (bradygaster#261-bradygaster#264) in otel-metrics.ts.

New modules:
- runtime/rework.ts: Pure calculation module with typed interfaces
  (PrInfo, PrReview, PrCommit, PrReworkResult, ReworkSummary) and
  functions (calculatePrRework, calculateReworkSummary)
- otel-metrics.ts bradygaster#265 section: Four OTEL instruments
  - squad.rework.rate (Gauge) — rework rate percentage
  - squad.rework.cycles (Histogram) — review cycles per PR
  - squad.rework.rejection_rate (Gauge) — rejection percentage
  - squad.rework.time_ms (Histogram) — time spent in rework
- Export functions: recordReworkMetrics(), recordReworkSummary()

Includes 19 Vitest tests (9 for calculatePrRework, 4 for
calculateReworkSummary, 6 for OTEL metric recording).

Replaces bradygaster#381 (CLI command approach) with proper SDK-level OTEL metrics.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@tamirdresher
Copy link
Copy Markdown
Collaborator Author

Follow-up Ideas for Rework Rate

Three enhancements that would make this metric self-sustaining:

1. GitHub Actions Cron — Automated Assessment

A scheduled workflow (daily/weekly) that runs the calculation without any CLI session. Calls gh to get recent merged PRs, computes rework metrics, and:

  • Emits OTEL metrics to a configured collector
  • Writes results to .squad/monitoring/rework-metrics.json
  • If rework rate exceeds threshold, creates an issue
  • Could feed into Ralph's work-check cycle

2. Ralph Integration — Self-Correction Loop

Wire into Ralph's weekly retro: high rework = flag it, Lead runs retro, team adjusts. Low rework sustained over N weeks = reduce review gates, lower polling, auto-promote skill confidence. The "nap" concept: a well-calibrated squad can relax its guardrails.

3. Azure DevOps Support

Current impl uses gh CLI. For ADO repos, need az repos pr list or ADO REST API. The calculation logic in rework.ts is provider-agnostic — just needs PrInfo, PrReview, PrCommit shaped data. An ADO adapter that maps to these interfaces would make it work across both platforms.

Happy to contribute any of these as follow-up PRs.

@tamirdresher tamirdresher force-pushed the feat/rework-rate-otel-metric branch from 0f44a19 to c654cf7 Compare March 15, 2026 14:33
@tamirdresher
Copy link
Copy Markdown
Collaborator Author

Note: CI failure is a pre-existing issue on dev branch, not from this PR.

The CLI build fails on missing SDK exports (listRoles, searchRoles, getCategories, getRoleById, generateCharterFromRole) in roles.ts, cast.ts, and coordinator.ts. These imports exist in the CLI but the SDK doesn't export them yet. This breaks any PR targeting dev right now.

Our changes (squad-sdk only) compile clean - the rework.ts module and otel-metrics.ts additions type-check without errors.

Re: ADO support - the calculation in rework.ts is already provider-agnostic (just needs PrInfo/PrReview/PrCommit data). Rather than shipping adapter classes, this is better handled at the agent/skill layer - the Squad coordinator already knows how to call gh vs az based on repo context. A skill template documenting both data sources is sufficient.

@bradygaster bradygaster merged commit a1fcdb8 into bradygaster:dev Mar 15, 2026
1 of 2 checks passed
tamirdresher pushed a commit to tamirdresher/squad that referenced this pull request Mar 16, 2026
* chore(squad): Phase 2 launch — thinking feedback, P0 bugs, dual telemetry

Phase 1 complete: 5 issues closed (bradygaster#325, bradygaster#326, bradygaster#327, bradygaster#328, bradygaster#329), 5 PRs merged.
Phase 2 launched with Cheritto (thinking feedback), Hockney (P0 bugs), Saul (dual telemetry).
Decision inbox merged and archived.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore(squad): Phase 2 Wave 1 merged, Wave 2 launched

Session: 2026-02-23T2145-phase2-wave2
Phase 2 Wave 1 complete (PRs bradygaster#351, bradygaster#352, bradygaster#353 merged).
Wave 2 launched: Cheritto on ghost response detection (bradygaster#332), Hockney on error hardening (bradygaster#334).

Changes:
- Session log created: 2026-02-23T2145-phase2-wave2.md
- Merged 3 inbox decisions (Cheritto, Hockney, Saul)
- Deleted inbox files post-merge

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore(squad): Epic bradygaster#323 complete — all phases shipped 🎉

All 3 phases delivered:
- Phase 1 (Testing Wave): 6 issues closed
- Phase 2 (Improvement): 6 issues closed
- Phase 3 (Breathtaking): 7 issues closed
- 17 PRs merged, 19 issues closed total

Session log: 2026-02-23T2320-epic-complete.md
Decisions merged from inbox: P2 UX Polish, first-run wow moment

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* hostile QA: end-to-end quality assessment — 10 findings, 4 HIGH severity

Candid assessment requested by Brady. Traced every code path in cli-entry.ts,
shell/index.ts, shell/commands.ts, App.tsx, coordinator.ts, spawn.ts, and the
SDK adapter client.

Key findings:
- Dead sessions never evicted from agentSessions Map after connection drop
- No React ErrorBoundary — any render throw kills the shell
- Nasty-inputs corpus (95 strings) is never imported by any test
- No SIGTERM handler in interactive shell
- MemoryManager exported but never instantiated (dead code)
- Single streaming content slot clobbers multi-agent output
- User input silently dropped during processing (no type-ahead buffer)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore(squad): quality review findings — 7 issues filed

Quality audit complete: 5 agents assessed CLI across testing, coverage, stability, accessibility, UX.
Results: 4 P0 blockers (bradygaster#365bradygaster#368), 3 P1 items (bradygaster#369bradygaster#371).
Blocking: Waingro dead sessions, ErrorBoundary, dropped input; Marquez help text consistency.

Changes:
- Logged session summary to .squad/log/2026-02-24T0205-quality-review-complete.md
- Updated .squad/identity/now.md with quality review findings and new issue numbers

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore(squad): merge decision — Marquez UX audit findings

Quality assessment merged from inbox (Grade B): 11 improvements (3 P0, 4 P1, 4 P2). help text, stub commands, vocabulary, separators, roster.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore(squad): test sprint launch

Session: 2026-02-24T0210-test-sprint
Changes:
- Logged test sprint: 5 agents, 7+ issues
- Branches: P0 fixes, stale tests, E2E, hostile/SDK, A11y

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: correct ThinkingIndicator assertion to match component behavior

The ThinkingIndicator renders empty string when isThinking=false,
not 'No agents active'. Fix the test assertion.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
diberry pushed a commit to diberry/squad that referenced this pull request Mar 19, 2026
…er#429, bradygaster#424, bradygaster#417, bradygaster#415, bradygaster#412, bradygaster#411)

Documents features and changes from recent PRs that shipped without
corresponding docs updates:

- bradygaster#429: Update model catalog with Sonnet 4.6, Opus 4.6, GPT-5.4 defaults
- bradygaster#424: Document --sdk switch for TypeScript config generation
- bradygaster#412: Document --roles flag for opt-in base roles
- bradygaster#411: Note Ralph in init + @copilot routing template removal
- bradygaster#442: Add Session Recovery skill documentation
- bradygaster#417: Document CastingEngine character casting
- bradygaster#415: Add rework rate OTEL metrics reference

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
diberry pushed a commit to diberry/squad that referenced this pull request Mar 19, 2026
…er#429, bradygaster#424, bradygaster#417, bradygaster#415, bradygaster#412, bradygaster#411)

Documents features and changes from recent PRs that shipped without
corresponding docs updates:

- bradygaster#429: Update model catalog with Sonnet 4.6, Opus 4.6, GPT-5.4 defaults
- bradygaster#424: Document --sdk switch for TypeScript config generation
- bradygaster#412: Document --roles flag for opt-in base roles
- bradygaster#411: Note Ralph in init + @copilot routing template removal
- bradygaster#442: Add Session Recovery skill documentation
- bradygaster#417: Document CastingEngine character casting
- bradygaster#415: Add rework rate OTEL metrics reference

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
bradygaster pushed a commit that referenced this pull request Mar 20, 2026
* docs: fill content gaps from 7 recent PRs (#442, #429, #424, #417, #415, #412, #411)

Documents features and changes from recent PRs that shipped without
corresponding docs updates:

- #429: Update model catalog with Sonnet 4.6, Opus 4.6, GPT-5.4 defaults
- #424: Document --sdk switch for TypeScript config generation
- #412: Document --roles flag for opt-in base roles
- #411: Note Ralph in init + @copilot routing template removal
- #442: Add Session Recovery skill documentation
- #417: Document CastingEngine character casting
- #415: Add rework rate OTEL metrics reference

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: remove duplicate gpt-5.1-codex-mini from Fast/Cheap tier

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants