test: SDK feature parity batch 2 — 64 tests for #27, #28, #36, #49, #50#425
Merged
bradygaster merged 8 commits intobradygaster:devfrom Mar 16, 2026
Conversation
…er#341) Add 39 tests covering 5 SDK features that were previously untested: - bradygaster#27 Manual Ceremonies: trigger types, schedule, hooks, defineSquad composition - bradygaster#28 Ceremony Cooldown: schedule-gated cadence, multi-ceremony schedules - bradygaster#36 Human Team Members: agent status lifecycle (active/inactive/retired) - bradygaster#49 Constraint Budget: ask_user rate limiter, file-write path guards, shell command restrictions, combined constraints, defineHooks builder - bradygaster#50 Multi-Agent Artifact: per-artifact lockout, handoff, HookPipeline integration All tests exercise real SDK implementations (builders, HookPipeline, ReviewerLockoutHook) — no stubs.
Strengthen ceremony, agent, and hooks builder tests with 17 negative/ edge-case tests that verify intended validation behavior: - defineCeremony: rejects empty name, non-string trigger/schedule, non-array participants/hooks, null config - defineAgent: rejects invalid status enum, empty name/role, non-string status, non-object config - defineHooks: rejects non-array allowedWritePaths/blockedCommands, non-number maxAskUser, non-boolean scrubPii/reviewerLockout, null config These transform weak pass-through contract tests into proper behavioral tests — verifying the builders actively reject invalid input.
Add 8 PII scrubbing tests exercising PostToolUse hooks: - Email redaction in strings (single and multiple) - Nested object recursive scrubbing - Array element scrubbing - Mixed types (numbers, null, booleans preserved) - Deep nesting with mixed types - Disabled scrubPii flag bypass - Clean strings pass through unchanged Update changeset description to reflect 64 total tests.
…er#341) Comprehensive audit mapping all 50 SDK features to existing test files: - 32 verified (HIGH confidence) — 64% - 11 partial (MEDIUM confidence) — 22% - 7 gaps identified (LOW confidence) — 14% - Combined verified+partial: 43/50 (86%) Identifies specific gap areas: drop-box pattern, directive capture, eager execution, PRD intake, lead decomposition, scribe git commits, and MCP integration. Recommends path to reach 90%+ threshold.
Duplicated and partially contradicted the existing feature parity matrix in issue bradygaster#341. The unit test coverage audit belongs in issue comments, not as a repo file competing with the authoritative matrix.
Batch 1 (sdk-feature-parity.test.ts) correctly uses test-agent-1/test-agent-2.
Batch 2 was using real cast names (edie, fenster, hockney).
Updated to use test-agent-{1,2,3} and test-specific email addresses
to follow the same convention and the Product Isolation Rule.
williamhallatt
added a commit
to williamhallatt/squad
that referenced
this pull request
Mar 16, 2026
…er#341) 46 tests covering 4 features: - bradygaster#31 Ralph Idle-Watch Mode (RalphMonitor): 11 tests - bradygaster#47 Client Compatibility (Platform Detection): 16 tests - bradygaster#45 Reviewer Lockout (deepened): 11 tests - bradygaster#46 Deadlock Handling (deepened): 8 tests Combined with batch 1 (PR bradygaster#422) and batch 2 (PR bradygaster#425), automated tests now cover 11 of 13⚠️ Needs Setup features from bradygaster#341.
Owner
|
Hey @williamhallatt — huge thank you for this. You picked up exactly where I left off on the SDK parity work and absolutely crushed it. 64 tests across 5 features, behavioral coverage on the constraint budget pipeline — this is real quality work, not checkbox stuff. You made my Monday. 🙏 |
tamirdresher
pushed a commit
to tamirdresher/squad
that referenced
this pull request
Mar 16, 2026
* chore(squad): Phase 2 launch — thinking feedback, P0 bugs, dual telemetry Phase 1 complete: 5 issues closed (bradygaster#325, bradygaster#326, bradygaster#327, bradygaster#328, bradygaster#329), 5 PRs merged. Phase 2 launched with Cheritto (thinking feedback), Hockney (P0 bugs), Saul (dual telemetry). Decision inbox merged and archived. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore(squad): Phase 2 Wave 1 merged, Wave 2 launched Session: 2026-02-23T2145-phase2-wave2 Phase 2 Wave 1 complete (PRs bradygaster#351, bradygaster#352, bradygaster#353 merged). Wave 2 launched: Cheritto on ghost response detection (bradygaster#332), Hockney on error hardening (bradygaster#334). Changes: - Session log created: 2026-02-23T2145-phase2-wave2.md - Merged 3 inbox decisions (Cheritto, Hockney, Saul) - Deleted inbox files post-merge Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore(squad): Epic bradygaster#323 complete — all phases shipped 🎉 All 3 phases delivered: - Phase 1 (Testing Wave): 6 issues closed - Phase 2 (Improvement): 6 issues closed - Phase 3 (Breathtaking): 7 issues closed - 17 PRs merged, 19 issues closed total Session log: 2026-02-23T2320-epic-complete.md Decisions merged from inbox: P2 UX Polish, first-run wow moment Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * hostile QA: end-to-end quality assessment — 10 findings, 4 HIGH severity Candid assessment requested by Brady. Traced every code path in cli-entry.ts, shell/index.ts, shell/commands.ts, App.tsx, coordinator.ts, spawn.ts, and the SDK adapter client. Key findings: - Dead sessions never evicted from agentSessions Map after connection drop - No React ErrorBoundary — any render throw kills the shell - Nasty-inputs corpus (95 strings) is never imported by any test - No SIGTERM handler in interactive shell - MemoryManager exported but never instantiated (dead code) - Single streaming content slot clobbers multi-agent output - User input silently dropped during processing (no type-ahead buffer) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore(squad): quality review findings — 7 issues filed Quality audit complete: 5 agents assessed CLI across testing, coverage, stability, accessibility, UX. Results: 4 P0 blockers (bradygaster#365–bradygaster#368), 3 P1 items (bradygaster#369–bradygaster#371). Blocking: Waingro dead sessions, ErrorBoundary, dropped input; Marquez help text consistency. Changes: - Logged session summary to .squad/log/2026-02-24T0205-quality-review-complete.md - Updated .squad/identity/now.md with quality review findings and new issue numbers Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore(squad): merge decision — Marquez UX audit findings Quality assessment merged from inbox (Grade B): 11 improvements (3 P0, 4 P1, 4 P2). help text, stub commands, vocabulary, separators, roster. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore(squad): test sprint launch Session: 2026-02-24T0210-test-sprint Changes: - Logged test sprint: 5 agents, 7+ issues - Branches: P0 fixes, stale tests, E2E, hostile/SDK, A11y Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix bradygaster#422: Add context to thinking spinner Changed ThinkingIndicator default label from 'Thinking...' to 'Routing to agent...' to give users meaningful feedback during SDK connection and initial routing phases. When activityHint is provided (e.g., 'Keaton thinking...'), it still takes priority. The new default eliminates the 'is it broken?' anxiety during the 3-5 second cold connection wait. Updated tests to reflect new default label. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: Update Marquez history with bradygaster#422 resolution Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix bradygaster#420/bradygaster#425: Add immediate SDK connection feedback Before this fix, the first message sent to the REPL had 2-7 seconds of dead air while createSession() blocked on SDK connection. Users thought the shell was hung. Changes: - Set 'Connecting to SDK...' hint BEFORE createSession() in dispatchToCoordinator - Set 'Connecting to <agent>...' hint BEFORE createSession() in dispatchToAgent - Use setImmediate to give React a tick to render before blocking - Update hint to 'Routing...' or 'thinking...' after connection completes The ThinkingIndicator now displays immediately, eliminating perceived hang. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
tamirdresher
pushed a commit
to tamirdresher/squad
that referenced
this pull request
Mar 16, 2026
…dygaster#437) * chore(squad): Phase 2 launch — thinking feedback, P0 bugs, dual telemetry Phase 1 complete: 5 issues closed (bradygaster#325, bradygaster#326, bradygaster#327, bradygaster#328, bradygaster#329), 5 PRs merged. Phase 2 launched with Cheritto (thinking feedback), Hockney (P0 bugs), Saul (dual telemetry). Decision inbox merged and archived. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore(squad): Phase 2 Wave 1 merged, Wave 2 launched Session: 2026-02-23T2145-phase2-wave2 Phase 2 Wave 1 complete (PRs bradygaster#351, bradygaster#352, bradygaster#353 merged). Wave 2 launched: Cheritto on ghost response detection (bradygaster#332), Hockney on error hardening (bradygaster#334). Changes: - Session log created: 2026-02-23T2145-phase2-wave2.md - Merged 3 inbox decisions (Cheritto, Hockney, Saul) - Deleted inbox files post-merge Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore(squad): Epic bradygaster#323 complete — all phases shipped 🎉 All 3 phases delivered: - Phase 1 (Testing Wave): 6 issues closed - Phase 2 (Improvement): 6 issues closed - Phase 3 (Breathtaking): 7 issues closed - 17 PRs merged, 19 issues closed total Session log: 2026-02-23T2320-epic-complete.md Decisions merged from inbox: P2 UX Polish, first-run wow moment Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * hostile QA: end-to-end quality assessment — 10 findings, 4 HIGH severity Candid assessment requested by Brady. Traced every code path in cli-entry.ts, shell/index.ts, shell/commands.ts, App.tsx, coordinator.ts, spawn.ts, and the SDK adapter client. Key findings: - Dead sessions never evicted from agentSessions Map after connection drop - No React ErrorBoundary — any render throw kills the shell - Nasty-inputs corpus (95 strings) is never imported by any test - No SIGTERM handler in interactive shell - MemoryManager exported but never instantiated (dead code) - Single streaming content slot clobbers multi-agent output - User input silently dropped during processing (no type-ahead buffer) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore(squad): quality review findings — 7 issues filed Quality audit complete: 5 agents assessed CLI across testing, coverage, stability, accessibility, UX. Results: 4 P0 blockers (bradygaster#365–bradygaster#368), 3 P1 items (bradygaster#369–bradygaster#371). Blocking: Waingro dead sessions, ErrorBoundary, dropped input; Marquez help text consistency. Changes: - Logged session summary to .squad/log/2026-02-24T0205-quality-review-complete.md - Updated .squad/identity/now.md with quality review findings and new issue numbers Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore(squad): merge decision — Marquez UX audit findings Quality assessment merged from inbox (Grade B): 11 improvements (3 P0, 4 P1, 4 P2). help text, stub commands, vocabulary, separators, roster. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore(squad): test sprint launch Session: 2026-02-24T0210-test-sprint Changes: - Logged test sprint: 5 agents, 7+ issues - Branches: P0 fixes, stale tests, E2E, hostile/SDK, A11y Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix bradygaster#420/bradygaster#425: Add immediate SDK connection feedback Before this fix, the first message sent to the REPL had 2-7 seconds of dead air while createSession() blocked on SDK connection. Users thought the shell was hung. Changes: - Set 'Connecting to SDK...' hint BEFORE createSession() in dispatchToCoordinator - Set 'Connecting to <agent>...' hint BEFORE createSession() in dispatchToAgent - Use setImmediate to give React a tick to render before blocking - Update hint to 'Routing...' or 'thinking...' after connection completes The ThinkingIndicator now displays immediately, eliminating perceived hang. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This was referenced Mar 16, 2026
Contributor
Author
|
@bradygaster flattery will get you everywhere! 😄 Happy to be part of this! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds 64 SDK feature parity tests (batch 2) for the #347 quality gate.
These tests cover SDK features from the #341 parity matrix, focusing on features that had⚠️ Needs Setup status or needed unit-level verification to complement the manual SdkSquad verification.
Tests (64 total)
Test quality tiers
Relationship to #341 Matrix
The #341 parity matrix tracks 50 features across MD Squad vs SDK Squad. Our tests add unit-level coverage for 5 of the 12 features that were marked⚠️ Needs Setup. The authoritative parity status remains in #341.
Verification
npm run build✅npm test— 4247 passing (baseline 4183 → +64 new)tsc --noEmit✅ (lint clean)Contributes to #341, #347.