Skip to content

test: SDK feature parity batch 2 — 64 tests for #27, #28, #36, #49, #50#425

Merged
bradygaster merged 8 commits intobradygaster:devfrom
williamhallatt:williamhallatt/347-sdk-init-parity-tests
Mar 16, 2026
Merged

test: SDK feature parity batch 2 — 64 tests for #27, #28, #36, #49, #50#425
bradygaster merged 8 commits intobradygaster:devfrom
williamhallatt:williamhallatt/347-sdk-init-parity-tests

Conversation

@williamhallatt
Copy link
Copy Markdown
Contributor

@williamhallatt williamhallatt commented Mar 16, 2026

What

Adds 64 SDK feature parity tests (batch 2) for the #347 quality gate.

These tests cover SDK features from the #341 parity matrix, focusing on features that had ⚠️ Needs Setup status or needed unit-level verification to complement the manual SdkSquad verification.

Tests (64 total)

Feature (from #341 matrix) Tests Category
#27 Manual Ceremonies (⚠️→tested) 12 Contract + validation
#28 Ceremony Cooldown (⚠️→tested) 4 Contract (schedule field)
#36 Human Team Members (⚠️→tested) 12 Contract + validation
#49 Constraint Budget (⚠️→tested) 30 Behavioral (HookPipeline enforcement)
#50 Multi-Agent Artifact (⚠️→tested) 6 Contract + lockout integration

Test quality tiers

  • 🟢 Strong behavioral (23): HookPipeline rate limiter, file-write guard, shell command restriction, PII scrubber
  • 🟡 Moderate (6): ReviewerLockoutHook data structure + pipeline integration
  • 🟠 Contract + validation (35): Builder pass-through + 17 BuilderValidationError edge cases

Relationship to #341 Matrix

The #341 parity matrix tracks 50 features across MD Squad vs SDK Squad. Our tests add unit-level coverage for 5 of the 12 features that were marked ⚠️ Needs Setup. The authoritative parity status remains in #341.

Verification

  • npm run build
  • npm test — 4247 passing (baseline 4183 → +64 new)
  • tsc --noEmit ✅ (lint clean)

Contributes to #341, #347.

…er#341)

Add 39 tests covering 5 SDK features that were previously untested:

- bradygaster#27 Manual Ceremonies: trigger types, schedule, hooks, defineSquad composition
- bradygaster#28 Ceremony Cooldown: schedule-gated cadence, multi-ceremony schedules
- bradygaster#36 Human Team Members: agent status lifecycle (active/inactive/retired)
- bradygaster#49 Constraint Budget: ask_user rate limiter, file-write path guards,
  shell command restrictions, combined constraints, defineHooks builder
- bradygaster#50 Multi-Agent Artifact: per-artifact lockout, handoff, HookPipeline integration

All tests exercise real SDK implementations (builders, HookPipeline,
ReviewerLockoutHook) — no stubs.
Strengthen ceremony, agent, and hooks builder tests with 17 negative/
edge-case tests that verify intended validation behavior:

- defineCeremony: rejects empty name, non-string trigger/schedule,
  non-array participants/hooks, null config
- defineAgent: rejects invalid status enum, empty name/role,
  non-string status, non-object config
- defineHooks: rejects non-array allowedWritePaths/blockedCommands,
  non-number maxAskUser, non-boolean scrubPii/reviewerLockout, null config

These transform weak pass-through contract tests into proper behavioral
tests — verifying the builders actively reject invalid input.
@williamhallatt williamhallatt changed the title test: SDK feature parity batch 2 — 39 tests for #27, #28, #36, #49, #50 test: SDK feature parity batch 2 — 56+ tests for #27, #28, #36, #49, #50 Mar 16, 2026
Add 8 PII scrubbing tests exercising PostToolUse hooks:
- Email redaction in strings (single and multiple)
- Nested object recursive scrubbing
- Array element scrubbing
- Mixed types (numbers, null, booleans preserved)
- Deep nesting with mixed types
- Disabled scrubPii flag bypass
- Clean strings pass through unchanged

Update changeset description to reflect 64 total tests.
@williamhallatt williamhallatt changed the title test: SDK feature parity batch 2 — 56+ tests for #27, #28, #36, #49, #50 test: SDK feature parity batch 2 — 64 tests for #27, #28, #36, #49, #50 Mar 16, 2026
…er#341)

Comprehensive audit mapping all 50 SDK features to existing test files:
- 32 verified (HIGH confidence) — 64%
- 11 partial (MEDIUM confidence) — 22%
- 7 gaps identified (LOW confidence) — 14%
- Combined verified+partial: 43/50 (86%)

Identifies specific gap areas: drop-box pattern, directive capture,
eager execution, PRD intake, lead decomposition, scribe git commits,
and MCP integration. Recommends path to reach 90%+ threshold.
@williamhallatt williamhallatt changed the title test: SDK feature parity batch 2 — 64 tests for #27, #28, #36, #49, #50 test: SDK feature parity batch 2 — 64 tests + 50-feature test matrix audit (#347/#341) Mar 16, 2026
Duplicated and partially contradicted the existing feature parity
matrix in issue bradygaster#341. The unit test coverage audit belongs in issue
comments, not as a repo file competing with the authoritative matrix.
@williamhallatt williamhallatt changed the title test: SDK feature parity batch 2 — 64 tests + 50-feature test matrix audit (#347/#341) test: SDK feature parity batch 2 — 64 tests for #27, #28, #36, #49, #50 Mar 16, 2026
Batch 1 (sdk-feature-parity.test.ts) correctly uses test-agent-1/test-agent-2.
Batch 2 was using real cast names (edie, fenster, hockney).
Updated to use test-agent-{1,2,3} and test-specific email addresses
to follow the same convention and the Product Isolation Rule.
williamhallatt added a commit to williamhallatt/squad that referenced this pull request Mar 16, 2026
…er#341)

46 tests covering 4 features:
- bradygaster#31 Ralph Idle-Watch Mode (RalphMonitor): 11 tests
- bradygaster#47 Client Compatibility (Platform Detection): 16 tests
- bradygaster#45 Reviewer Lockout (deepened): 11 tests
- bradygaster#46 Deadlock Handling (deepened): 8 tests

Combined with batch 1 (PR bradygaster#422) and batch 2 (PR bradygaster#425), automated
tests now cover 11 of 13 ⚠️ Needs Setup features from bradygaster#341.
@bradygaster bradygaster merged commit 8456549 into bradygaster:dev Mar 16, 2026
2 checks passed
@bradygaster
Copy link
Copy Markdown
Owner

Hey @williamhallatt — huge thank you for this. You picked up exactly where I left off on the SDK parity work and absolutely crushed it. 64 tests across 5 features, behavioral coverage on the constraint budget pipeline — this is real quality work, not checkbox stuff. You made my Monday. 🙏

tamirdresher pushed a commit to tamirdresher/squad that referenced this pull request Mar 16, 2026
* chore(squad): Phase 2 launch — thinking feedback, P0 bugs, dual telemetry

Phase 1 complete: 5 issues closed (bradygaster#325, bradygaster#326, bradygaster#327, bradygaster#328, bradygaster#329), 5 PRs merged.
Phase 2 launched with Cheritto (thinking feedback), Hockney (P0 bugs), Saul (dual telemetry).
Decision inbox merged and archived.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore(squad): Phase 2 Wave 1 merged, Wave 2 launched

Session: 2026-02-23T2145-phase2-wave2
Phase 2 Wave 1 complete (PRs bradygaster#351, bradygaster#352, bradygaster#353 merged).
Wave 2 launched: Cheritto on ghost response detection (bradygaster#332), Hockney on error hardening (bradygaster#334).

Changes:
- Session log created: 2026-02-23T2145-phase2-wave2.md
- Merged 3 inbox decisions (Cheritto, Hockney, Saul)
- Deleted inbox files post-merge

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore(squad): Epic bradygaster#323 complete — all phases shipped 🎉

All 3 phases delivered:
- Phase 1 (Testing Wave): 6 issues closed
- Phase 2 (Improvement): 6 issues closed
- Phase 3 (Breathtaking): 7 issues closed
- 17 PRs merged, 19 issues closed total

Session log: 2026-02-23T2320-epic-complete.md
Decisions merged from inbox: P2 UX Polish, first-run wow moment

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* hostile QA: end-to-end quality assessment — 10 findings, 4 HIGH severity

Candid assessment requested by Brady. Traced every code path in cli-entry.ts,
shell/index.ts, shell/commands.ts, App.tsx, coordinator.ts, spawn.ts, and the
SDK adapter client.

Key findings:
- Dead sessions never evicted from agentSessions Map after connection drop
- No React ErrorBoundary — any render throw kills the shell
- Nasty-inputs corpus (95 strings) is never imported by any test
- No SIGTERM handler in interactive shell
- MemoryManager exported but never instantiated (dead code)
- Single streaming content slot clobbers multi-agent output
- User input silently dropped during processing (no type-ahead buffer)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore(squad): quality review findings — 7 issues filed

Quality audit complete: 5 agents assessed CLI across testing, coverage, stability, accessibility, UX.
Results: 4 P0 blockers (bradygaster#365bradygaster#368), 3 P1 items (bradygaster#369bradygaster#371).
Blocking: Waingro dead sessions, ErrorBoundary, dropped input; Marquez help text consistency.

Changes:
- Logged session summary to .squad/log/2026-02-24T0205-quality-review-complete.md
- Updated .squad/identity/now.md with quality review findings and new issue numbers

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore(squad): merge decision — Marquez UX audit findings

Quality assessment merged from inbox (Grade B): 11 improvements (3 P0, 4 P1, 4 P2). help text, stub commands, vocabulary, separators, roster.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore(squad): test sprint launch

Session: 2026-02-24T0210-test-sprint
Changes:
- Logged test sprint: 5 agents, 7+ issues
- Branches: P0 fixes, stale tests, E2E, hostile/SDK, A11y

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix bradygaster#422: Add context to thinking spinner

Changed ThinkingIndicator default label from 'Thinking...' to
'Routing to agent...' to give users meaningful feedback during
SDK connection and initial routing phases.

When activityHint is provided (e.g., 'Keaton thinking...'), it
still takes priority. The new default eliminates the 'is it broken?'
anxiety during the 3-5 second cold connection wait.

Updated tests to reflect new default label.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: Update Marquez history with bradygaster#422 resolution

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix bradygaster#420/bradygaster#425: Add immediate SDK connection feedback

Before this fix, the first message sent to the REPL had 2-7 seconds of
dead air while createSession() blocked on SDK connection. Users thought
the shell was hung.

Changes:
- Set 'Connecting to SDK...' hint BEFORE createSession() in dispatchToCoordinator
- Set 'Connecting to <agent>...' hint BEFORE createSession() in dispatchToAgent
- Use setImmediate to give React a tick to render before blocking
- Update hint to 'Routing...' or 'thinking...' after connection completes

The ThinkingIndicator now displays immediately, eliminating perceived hang.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
tamirdresher pushed a commit to tamirdresher/squad that referenced this pull request Mar 16, 2026
…dygaster#437)

* chore(squad): Phase 2 launch — thinking feedback, P0 bugs, dual telemetry

Phase 1 complete: 5 issues closed (bradygaster#325, bradygaster#326, bradygaster#327, bradygaster#328, bradygaster#329), 5 PRs merged.
Phase 2 launched with Cheritto (thinking feedback), Hockney (P0 bugs), Saul (dual telemetry).
Decision inbox merged and archived.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore(squad): Phase 2 Wave 1 merged, Wave 2 launched

Session: 2026-02-23T2145-phase2-wave2
Phase 2 Wave 1 complete (PRs bradygaster#351, bradygaster#352, bradygaster#353 merged).
Wave 2 launched: Cheritto on ghost response detection (bradygaster#332), Hockney on error hardening (bradygaster#334).

Changes:
- Session log created: 2026-02-23T2145-phase2-wave2.md
- Merged 3 inbox decisions (Cheritto, Hockney, Saul)
- Deleted inbox files post-merge

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore(squad): Epic bradygaster#323 complete — all phases shipped 🎉

All 3 phases delivered:
- Phase 1 (Testing Wave): 6 issues closed
- Phase 2 (Improvement): 6 issues closed
- Phase 3 (Breathtaking): 7 issues closed
- 17 PRs merged, 19 issues closed total

Session log: 2026-02-23T2320-epic-complete.md
Decisions merged from inbox: P2 UX Polish, first-run wow moment

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* hostile QA: end-to-end quality assessment — 10 findings, 4 HIGH severity

Candid assessment requested by Brady. Traced every code path in cli-entry.ts,
shell/index.ts, shell/commands.ts, App.tsx, coordinator.ts, spawn.ts, and the
SDK adapter client.

Key findings:
- Dead sessions never evicted from agentSessions Map after connection drop
- No React ErrorBoundary — any render throw kills the shell
- Nasty-inputs corpus (95 strings) is never imported by any test
- No SIGTERM handler in interactive shell
- MemoryManager exported but never instantiated (dead code)
- Single streaming content slot clobbers multi-agent output
- User input silently dropped during processing (no type-ahead buffer)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore(squad): quality review findings — 7 issues filed

Quality audit complete: 5 agents assessed CLI across testing, coverage, stability, accessibility, UX.
Results: 4 P0 blockers (bradygaster#365bradygaster#368), 3 P1 items (bradygaster#369bradygaster#371).
Blocking: Waingro dead sessions, ErrorBoundary, dropped input; Marquez help text consistency.

Changes:
- Logged session summary to .squad/log/2026-02-24T0205-quality-review-complete.md
- Updated .squad/identity/now.md with quality review findings and new issue numbers

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore(squad): merge decision — Marquez UX audit findings

Quality assessment merged from inbox (Grade B): 11 improvements (3 P0, 4 P1, 4 P2). help text, stub commands, vocabulary, separators, roster.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore(squad): test sprint launch

Session: 2026-02-24T0210-test-sprint
Changes:
- Logged test sprint: 5 agents, 7+ issues
- Branches: P0 fixes, stale tests, E2E, hostile/SDK, A11y

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix bradygaster#420/bradygaster#425: Add immediate SDK connection feedback

Before this fix, the first message sent to the REPL had 2-7 seconds of
dead air while createSession() blocked on SDK connection. Users thought
the shell was hung.

Changes:
- Set 'Connecting to SDK...' hint BEFORE createSession() in dispatchToCoordinator
- Set 'Connecting to <agent>...' hint BEFORE createSession() in dispatchToAgent
- Use setImmediate to give React a tick to render before blocking
- Update hint to 'Routing...' or 'thinking...' after connection completes

The ThinkingIndicator now displays immediately, eliminating perceived hang.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@williamhallatt
Copy link
Copy Markdown
Contributor Author

@bradygaster flattery will get you everywhere! 😄 Happy to be part of this!

@williamhallatt williamhallatt deleted the williamhallatt/347-sdk-init-parity-tests branch March 17, 2026 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants