test: SDK feature parity batch 2 — 64 tests for #27, #28, #36, #49, #50 by williamhallatt · Pull Request #425 · bradygaster/squad

williamhallatt · 2026-03-16T03:09:29Z

What

Adds 64 SDK feature parity tests (batch 2) for the #347 quality gate.

These tests cover SDK features from the #341 parity matrix, focusing on features that had ⚠️ Needs Setup status or needed unit-level verification to complement the manual SdkSquad verification.

Tests (64 total)

Feature (from #341 matrix)	Tests	Category
#27 Manual Ceremonies (⚠️→tested)	12	Contract + validation
#28 Ceremony Cooldown (⚠️→tested)	4	Contract (schedule field)
#36 Human Team Members (⚠️→tested)	12	Contract + validation
#49 Constraint Budget (⚠️→tested)	30	Behavioral (HookPipeline enforcement)
#50 Multi-Agent Artifact (⚠️→tested)	6	Contract + lockout integration

Test quality tiers

🟢 Strong behavioral (23): HookPipeline rate limiter, file-write guard, shell command restriction, PII scrubber
🟡 Moderate (6): ReviewerLockoutHook data structure + pipeline integration
🟠 Contract + validation (35): Builder pass-through + 17 BuilderValidationError edge cases

Relationship to #341 Matrix

The #341 parity matrix tracks 50 features across MD Squad vs SDK Squad. Our tests add unit-level coverage for 5 of the 12 features that were marked ⚠️ Needs Setup. The authoritative parity status remains in #341.

Verification

npm run build ✅
npm test — 4247 passing (baseline 4183 → +64 new)
tsc --noEmit ✅ (lint clean)

Contributes to #341, #347.

…er#341) Add 39 tests covering 5 SDK features that were previously untested: - bradygaster#27 Manual Ceremonies: trigger types, schedule, hooks, defineSquad composition - bradygaster#28 Ceremony Cooldown: schedule-gated cadence, multi-ceremony schedules - bradygaster#36 Human Team Members: agent status lifecycle (active/inactive/retired) - bradygaster#49 Constraint Budget: ask_user rate limiter, file-write path guards, shell command restrictions, combined constraints, defineHooks builder - bradygaster#50 Multi-Agent Artifact: per-artifact lockout, handoff, HookPipeline integration All tests exercise real SDK implementations (builders, HookPipeline, ReviewerLockoutHook) — no stubs.

Strengthen ceremony, agent, and hooks builder tests with 17 negative/ edge-case tests that verify intended validation behavior: - defineCeremony: rejects empty name, non-string trigger/schedule, non-array participants/hooks, null config - defineAgent: rejects invalid status enum, empty name/role, non-string status, non-object config - defineHooks: rejects non-array allowedWritePaths/blockedCommands, non-number maxAskUser, non-boolean scrubPii/reviewerLockout, null config These transform weak pass-through contract tests into proper behavioral tests — verifying the builders actively reject invalid input.

Add 8 PII scrubbing tests exercising PostToolUse hooks: - Email redaction in strings (single and multiple) - Nested object recursive scrubbing - Array element scrubbing - Mixed types (numbers, null, booleans preserved) - Deep nesting with mixed types - Disabled scrubPii flag bypass - Clean strings pass through unchanged Update changeset description to reflect 64 total tests.

…er#341) Comprehensive audit mapping all 50 SDK features to existing test files: - 32 verified (HIGH confidence) — 64% - 11 partial (MEDIUM confidence) — 22% - 7 gaps identified (LOW confidence) — 14% - Combined verified+partial: 43/50 (86%) Identifies specific gap areas: drop-box pattern, directive capture, eager execution, PRD intake, lead decomposition, scribe git commits, and MCP integration. Recommends path to reach 90%+ threshold.

Duplicated and partially contradicted the existing feature parity matrix in issue bradygaster#341. The unit test coverage audit belongs in issue comments, not as a repo file competing with the authoritative matrix.

Batch 1 (sdk-feature-parity.test.ts) correctly uses test-agent-1/test-agent-2. Batch 2 was using real cast names (edie, fenster, hockney). Updated to use test-agent-{1,2,3} and test-specific email addresses to follow the same convention and the Product Isolation Rule.

…er#341) 46 tests covering 4 features: - bradygaster#31 Ralph Idle-Watch Mode (RalphMonitor): 11 tests - bradygaster#47 Client Compatibility (Platform Detection): 16 tests - bradygaster#45 Reviewer Lockout (deepened): 11 tests - bradygaster#46 Deadlock Handling (deepened): 8 tests Combined with batch 1 (PR bradygaster#422) and batch 2 (PR bradygaster#425), automated tests now cover 11 of 13 ⚠️ Needs Setup features from bradygaster#341.

bradygaster · 2026-03-16T11:05:04Z

Hey @williamhallatt — huge thank you for this. You picked up exactly where I left off on the SDK parity work and absolutely crushed it. 64 tests across 5 features, behavioral coverage on the constraint budget pipeline — this is real quality work, not checkbox stuff. You made my Monday. 🙏

* chore(squad): Phase 2 launch — thinking feedback, P0 bugs, dual telemetry Phase 1 complete: 5 issues closed (bradygaster#325, bradygaster#326, bradygaster#327, bradygaster#328, bradygaster#329), 5 PRs merged. Phase 2 launched with Cheritto (thinking feedback), Hockney (P0 bugs), Saul (dual telemetry). Decision inbox merged and archived. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore(squad): Phase 2 Wave 1 merged, Wave 2 launched Session: 2026-02-23T2145-phase2-wave2 Phase 2 Wave 1 complete (PRs bradygaster#351, bradygaster#352, bradygaster#353 merged). Wave 2 launched: Cheritto on ghost response detection (bradygaster#332), Hockney on error hardening (bradygaster#334). Changes: - Session log created: 2026-02-23T2145-phase2-wave2.md - Merged 3 inbox decisions (Cheritto, Hockney, Saul) - Deleted inbox files post-merge Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore(squad): Epic bradygaster#323 complete — all phases shipped 🎉 All 3 phases delivered: - Phase 1 (Testing Wave): 6 issues closed - Phase 2 (Improvement): 6 issues closed - Phase 3 (Breathtaking): 7 issues closed - 17 PRs merged, 19 issues closed total Session log: 2026-02-23T2320-epic-complete.md Decisions merged from inbox: P2 UX Polish, first-run wow moment Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * hostile QA: end-to-end quality assessment — 10 findings, 4 HIGH severity Candid assessment requested by Brady. Traced every code path in cli-entry.ts, shell/index.ts, shell/commands.ts, App.tsx, coordinator.ts, spawn.ts, and the SDK adapter client. Key findings: - Dead sessions never evicted from agentSessions Map after connection drop - No React ErrorBoundary — any render throw kills the shell - Nasty-inputs corpus (95 strings) is never imported by any test - No SIGTERM handler in interactive shell - MemoryManager exported but never instantiated (dead code) - Single streaming content slot clobbers multi-agent output - User input silently dropped during processing (no type-ahead buffer) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore(squad): quality review findings — 7 issues filed Quality audit complete: 5 agents assessed CLI across testing, coverage, stability, accessibility, UX. Results: 4 P0 blockers (bradygaster#365–bradygaster#368), 3 P1 items (bradygaster#369–bradygaster#371). Blocking: Waingro dead sessions, ErrorBoundary, dropped input; Marquez help text consistency. Changes: - Logged session summary to .squad/log/2026-02-24T0205-quality-review-complete.md - Updated .squad/identity/now.md with quality review findings and new issue numbers Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore(squad): merge decision — Marquez UX audit findings Quality assessment merged from inbox (Grade B): 11 improvements (3 P0, 4 P1, 4 P2). help text, stub commands, vocabulary, separators, roster. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore(squad): test sprint launch Session: 2026-02-24T0210-test-sprint Changes: - Logged test sprint: 5 agents, 7+ issues - Branches: P0 fixes, stale tests, E2E, hostile/SDK, A11y Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix bradygaster#422: Add context to thinking spinner Changed ThinkingIndicator default label from 'Thinking...' to 'Routing to agent...' to give users meaningful feedback during SDK connection and initial routing phases. When activityHint is provided (e.g., 'Keaton thinking...'), it still takes priority. The new default eliminates the 'is it broken?' anxiety during the 3-5 second cold connection wait. Updated tests to reflect new default label. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: Update Marquez history with bradygaster#422 resolution Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix bradygaster#420/bradygaster#425: Add immediate SDK connection feedback Before this fix, the first message sent to the REPL had 2-7 seconds of dead air while createSession() blocked on SDK connection. Users thought the shell was hung. Changes: - Set 'Connecting to SDK...' hint BEFORE createSession() in dispatchToCoordinator - Set 'Connecting to <agent>...' hint BEFORE createSession() in dispatchToAgent - Use setImmediate to give React a tick to render before blocking - Update hint to 'Routing...' or 'thinking...' after connection completes The ThinkingIndicator now displays immediately, eliminating perceived hang. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…dygaster#437) * chore(squad): Phase 2 launch — thinking feedback, P0 bugs, dual telemetry Phase 1 complete: 5 issues closed (bradygaster#325, bradygaster#326, bradygaster#327, bradygaster#328, bradygaster#329), 5 PRs merged. Phase 2 launched with Cheritto (thinking feedback), Hockney (P0 bugs), Saul (dual telemetry). Decision inbox merged and archived. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore(squad): Phase 2 Wave 1 merged, Wave 2 launched Session: 2026-02-23T2145-phase2-wave2 Phase 2 Wave 1 complete (PRs bradygaster#351, bradygaster#352, bradygaster#353 merged). Wave 2 launched: Cheritto on ghost response detection (bradygaster#332), Hockney on error hardening (bradygaster#334). Changes: - Session log created: 2026-02-23T2145-phase2-wave2.md - Merged 3 inbox decisions (Cheritto, Hockney, Saul) - Deleted inbox files post-merge Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore(squad): Epic bradygaster#323 complete — all phases shipped 🎉 All 3 phases delivered: - Phase 1 (Testing Wave): 6 issues closed - Phase 2 (Improvement): 6 issues closed - Phase 3 (Breathtaking): 7 issues closed - 17 PRs merged, 19 issues closed total Session log: 2026-02-23T2320-epic-complete.md Decisions merged from inbox: P2 UX Polish, first-run wow moment Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * hostile QA: end-to-end quality assessment — 10 findings, 4 HIGH severity Candid assessment requested by Brady. Traced every code path in cli-entry.ts, shell/index.ts, shell/commands.ts, App.tsx, coordinator.ts, spawn.ts, and the SDK adapter client. Key findings: - Dead sessions never evicted from agentSessions Map after connection drop - No React ErrorBoundary — any render throw kills the shell - Nasty-inputs corpus (95 strings) is never imported by any test - No SIGTERM handler in interactive shell - MemoryManager exported but never instantiated (dead code) - Single streaming content slot clobbers multi-agent output - User input silently dropped during processing (no type-ahead buffer) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore(squad): quality review findings — 7 issues filed Quality audit complete: 5 agents assessed CLI across testing, coverage, stability, accessibility, UX. Results: 4 P0 blockers (bradygaster#365–bradygaster#368), 3 P1 items (bradygaster#369–bradygaster#371). Blocking: Waingro dead sessions, ErrorBoundary, dropped input; Marquez help text consistency. Changes: - Logged session summary to .squad/log/2026-02-24T0205-quality-review-complete.md - Updated .squad/identity/now.md with quality review findings and new issue numbers Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore(squad): merge decision — Marquez UX audit findings Quality assessment merged from inbox (Grade B): 11 improvements (3 P0, 4 P1, 4 P2). help text, stub commands, vocabulary, separators, roster. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore(squad): test sprint launch Session: 2026-02-24T0210-test-sprint Changes: - Logged test sprint: 5 agents, 7+ issues - Branches: P0 fixes, stale tests, E2E, hostile/SDK, A11y Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix bradygaster#420/bradygaster#425: Add immediate SDK connection feedback Before this fix, the first message sent to the REPL had 2-7 seconds of dead air while createSession() blocked on SDK connection. Users thought the shell was hung. Changes: - Set 'Connecting to SDK...' hint BEFORE createSession() in dispatchToCoordinator - Set 'Connecting to <agent>...' hint BEFORE createSession() in dispatchToAgent - Use setImmediate to give React a tick to render before blocking - Update hint to 'Routing...' or 'thinking...' after connection completes The ThinkingIndicator now displays immediately, eliminating perceived hang. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

williamhallatt · 2026-03-16T21:51:59Z

@bradygaster flattery will get you everywhere! 😄 Happy to be part of this!

williamhallatt added 3 commits March 16, 2026 13:08

chore: add changeset for parity batch 2 tests

a329a2e

This was referenced Mar 16, 2026

Shore up squad init --sdk: unified SDK init quality gate #347

Open

PRD: SDK-First Feature Parity — Full Test Results (32/50 verified, 6 gaps, 12 need setup) #341

Open

SDK feature parity: 29 features need active exercise testing #340

Closed

williamhallatt changed the title ~~test: SDK feature parity batch 2 — 39 tests for #27, #28, #36, #49, #50~~ test: SDK feature parity batch 2 — 56+ tests for #27, #28, #36, #49, #50 Mar 16, 2026

williamhallatt changed the title ~~test: SDK feature parity batch 2 — 56+ tests for #27, #28, #36, #49, #50~~ test: SDK feature parity batch 2 — 64 tests for #27, #28, #36, #49, #50 Mar 16, 2026

williamhallatt changed the title ~~test: SDK feature parity batch 2 — 64 tests for #27, #28, #36, #49, #50~~ test: SDK feature parity batch 2 — 64 tests + 50-feature test matrix audit (#347/#341) Mar 16, 2026

williamhallatt added 2 commits March 16, 2026 14:32

chore: remove SDK-FEATURE-TEST-MATRIX.md

fec96bd

Duplicated and partially contradicted the existing feature parity matrix in issue bradygaster#341. The unit test coverage audit belongs in issue comments, not as a repo file competing with the authoritative matrix.

chore: fix changeset — remove matrix reference

a8020ce

williamhallatt changed the title ~~test: SDK feature parity batch 2 — 64 tests + 50-feature test matrix audit (#347/#341)~~ test: SDK feature parity batch 2 — 64 tests for #27, #28, #36, #49, #50 Mar 16, 2026

williamhallatt mentioned this pull request Mar 16, 2026

test: SDK feature parity batch 3 — 46 tests for #31, #47, #45, #46 #428

Merged

bradygaster merged commit 8456549 into bradygaster:dev Mar 16, 2026
2 checks passed

This was referenced Mar 16, 2026

feat: Knowledge library with zero spawn impact #431

Closed

feat(.squad): PAO external communications - Phase 1 infrastructure #427

Merged

williamhallatt deleted the williamhallatt/347-sdk-init-parity-tests branch March 17, 2026 01:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: SDK feature parity batch 2 — 64 tests for #27, #28, #36, #49, #50#425

test: SDK feature parity batch 2 — 64 tests for #27, #28, #36, #49, #50#425
bradygaster merged 8 commits intobradygaster:devfrom
williamhallatt:williamhallatt/347-sdk-init-parity-tests

williamhallatt commented Mar 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

bradygaster commented Mar 16, 2026

Uh oh!

williamhallatt commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

williamhallatt commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Tests (64 total)

Test quality tiers

Relationship to #341 Matrix

Verification

Uh oh!

Uh oh!

bradygaster commented Mar 16, 2026

Uh oh!

williamhallatt commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

williamhallatt commented Mar 16, 2026 •

edited

Loading