feat: cross-platform local QA pipeline with Playwright, Maestro, and TV YAML runner by Ur-imazing · Pull Request #795 · JesusFilm/forge

Ur-imazing · 2026-04-17T03:45:37Z

Summary

Add /qa Claude Code skill that orchestrates a 4-layer local QA pipeline: typecheck+lint, LLM diff analysis, unit tests, and automated UI flows with visual screenshot review
Add 19 Playwright e2e flows for apps/web covering search, video player, carousels, accordions, quiz modal, navigation, responsive behavior, and error states
Add 49 Maestro e2e flows for apps/mobile covering all tabs, home screen, search/discover, library, video detail, collection player, quiz, and platform-specific iOS vs Android behaviors
Add custom TV YAML runner (apps/tv/e2e/runner.ts) with tvOS (osascript/AXRaise) and Android TV (adb keyevent) adapters, plus 39 TV YAML flows covering home, experience detail, video overlay, carousel navigation, accordion, quiz modal (QR on tvOS, WebView on Android TV), and focus management
Add Layer 3 test infrastructure: @testing-library/react + jsdom for web, @testing-library/react-native for mobile and TV
Security: bundleId validation on both adapters to prevent command injection from YAML flows

Architecture

/qa skill (Claude Code)
  ├── Layer 1: turbo typecheck + lint (fast gate)
  ├── Layer 2: LLM diff analysis → affected surfaces verdict
  ├── Layer 3: turbo test (unit/component)
  ├── Layer 4a: Playwright (web) + Maestro (mobile) + TV YAML runner
  └── Layer 4b: LLM visual screenshot review (cross-platform comparison)

Test plan

pnpm --filter @forge/tv run test — 38 tests pass (including 8 new runner tests)
pnpm --filter @forge/web run test — 6 tests pass
Invoke /qa in Claude Code and verify Layer 1+2 diff analysis works
Run pnpm --filter @forge/web run e2e with CMS running to verify Playwright flows
Run maestro test apps/mobile/.maestro/tab-navigation.yaml on iOS Simulator
Run pnpm --filter @forge/tv run e2e:androidtv on Android TV emulator

🤖 Generated with Claude Code

Installs @playwright/test as a devDependency for the web app and creates the e2e/flows/ and e2e/screenshots/ directories as the foundation for browser-based end-to-end testing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ysis Create the cross-platform local QA pipeline entry point as a Claude Code skill. Layer 2 performs diff analysis to identify affected packages and surfaces using the dependency graph, flags platform-specific risks, and produces a structured verdict. Layer 1 runs typecheck + lint via Turbo filtered to only affected packages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add environmentMatchGlobs to web vitest config so .test.tsx files use jsdom while .test.ts files keep using node environment - Add @testing-library/react and jsdom as web devDependencies - Add @testing-library/react-native as devDependency for mobile and tv - No test files added yet — this enables future component test authoring Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Configure Playwright for the web app with 19 flow files covering all test scenarios: navigation, search overlay, search page, video player, carousel video player, navigation carousel, bible quotes, media collection, related questions, advent countdown, easter dates, quiz modal, video hero, section rendering, routes/page loading, responsive behavior, keyboard navigation, error states, and animations. Each flow captures screenshots to e2e/screenshots/browser/ for Layer 4b visual review. Adds e2e script to package.json. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Create 48 Maestro YAML flow files covering all mobile test scenarios: tab navigation, home screen (hero, sections, blur overlay, header), discover/search (debounce, skeleton, pagination, results, errors), library (list, selection, states, thumbnails), video detail (player, share, description, siblings, back), collection player (playlist, switching, states), hero renderer, carousels (video, media, bible quotes, navigation), accordion, quiz (button, WebView, validation), platform-specific (safe areas, blur, glass, interactions), AppState lifecycle, and error states. Adds e2e:ios and e2e:android scripts to package.json. Screenshots gitignored at e2e/screenshots/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Build a lightweight TypeScript runner (~150 lines) that executes TV test flows written in simple YAML format. Dispatches D-pad commands to tvOS Simulator via AppleScript osascript (key codes 126/125/123/124/36/53) and to Android TV Emulator via adb shell input keyevent (19-23, 4). Includes: - TVAdapter interface for swappable backends - tvOS adapter with AXRaise + keystroke injection - Android TV adapter with adb keyevent - Flow parser, step executor, and flow runner - Unit tests for parser, discovery, and step execution - 200ms default delay between D-pad steps Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Create 31 YAML flow files covering all TV test scenarios: home screen (loading, hero, content rail, card select, error, focus memory), experience detail (sections, error, empty, back), video player (open, controls, dismiss, focus trap, progress), carousel navigation (video, media collection, bible quotes, navigation, focus exit, auto-scroll), accordion (expand/collapse, single open), quiz modal (platform-specific QR code on tvOS, WebView on Android TV), text/static content (text, easter dates, container, section wrapper), focus management (preferred focus, focus ring, restoration, spatial), platform-specific (scroll offset, remote buttons), error states, and accessibility labels. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… to /qa skill Complete the /qa pipeline with pre-flight environment checks (running apps, booted simulators, CMS health), explicit screenshot reading instructions for Layer 4b visual review, and cross-platform comparison guidance. The full pipeline now covers all 4 layers across 5 surfaces. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ions - Add bundleId validation (regex /^[a-zA-Z0-9._-]+$/) to both tvOS and Android TV adapters to prevent command injection via malicious YAML flows - Replace `activate` with `AXRaise` in tvOS adapter to avoid stealing window focus from the terminal during test runs - Increase Android TV screenshot maxBuffer from 10MB to 50MB for 4K TVs - Use exported DEFAULT_STEP_DELAY_MS constant instead of local duplicate - Improve flow name slugification to handle special characters - Guard main() with require.main check to prevent execution during tests - Add yaml to jest transformIgnorePatterns for ESM compatibility - Include brainstorm requirements, implementation plan, and comprehensive test scenarios documents Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

railway-app · 2026-04-17T03:45:47Z

🚅 Deployed to the forge-pr-795 environment in forge

Service	Status	Updated (UTC)
@forge/web	✅ Success (View Logs)	Apr 20, 2026 at 5:04 am
@forge/manager	✅ Success (View Logs)	Apr 20, 2026 at 5:02 am
@forge/cms	✅ Success (View Logs)	Apr 20, 2026 at 5:01 am

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The TV YAML runner uses Node.js APIs (child_process, fs, path) that aren't available in the React Native/Expo type environment. The runner is executed via tsx at runtime, not compiled by the app's tsc. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

QA pipeline corrections discovered during first real test runs: - Mobile Maestro flows: update appId from com.jesusfilm.forge to org.jesusfilm.forgewatch (matches apps/mobile/app.json) - TV YAML flows: update launch target from com.jesusfilm.forge.tv to org.jesusfilm.forgetv (matches apps/tv/app.json) - Android TV adapter: use `am start` with LEANBACK_LAUNCHER instead of `monkey` (monkey returns exit code 251 even on success, which was causing all 38 flows to report failure despite launching correctly) - Playwright config: skip webServer spawn when PW_SKIP_WEBSERVER is set, to avoid port conflicts when dev server is already running Verified on first real run: - Playwright: 190/200 passed - tvOS YAML runner: 38/38 passed - Android TV YAML runner: 38/38 passed (after am start fix) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fixes invalid Maestro v2.4 syntax found during first real test runs: - `- wait: { milliseconds: N }` -> `- waitForAnimationToEnd` - `- scroll: { direction: DOWN, duration: N }` -> `- scroll` - `- clearText` -> `- eraseText` These commands were never valid in Maestro; tests failed fast on YAML validation. With syntax fixed, flows execute correctly — 29/49 pass on iOS; remaining failures are missing testIDs on app components (app-side work, not QA pipeline). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Pure metadata additions — no behavior, styling, or structural changes. Unblocks the 20 Maestro flows that previously failed on "Element not found" because testIDs didn't exist in components. testIDs added: - Tab bar: tab-home, tab-discover, tab-library, tab-profile - Header: header-search, header-profile - Hero: hero-cta, mute-button - Search: search-input, search-result-{index} - Cards: video-card-{index}, video-carousel-card-{index}, media-collection-item-{index}, nav-carousel-item-{index}, experience-card-{index}, playlist-item-{index} - Video detail: video-thumbnail, share-button, back-button - Bible quotes: share-quote, quote-cta - Related questions: accordion-question-{index}, accordion-cta - Quiz: quiz-button, modal-close Preserves all existing accessibilityLabel props. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Pure metadata additions — no behavior, styling, or structural changes. Unblocks Playwright flows that previously failed when fragile CSS selector fallbacks couldn't find elements. data-testids added: - Header: logo, search-toggle, search-close - Hero: hero, hero-heading, hero-cta, hero-mute - Carousels: carousel-thumbnail, media-collection, media-collection-item, collection-size, nav-carousel, nav-carousel-item, bible-quotes, quote-cta, resource-cta - Accordions: accordion-trigger, accordion-cta - Countdown/dates: advent-toggle, advent-days, easter-toggle - Quiz: quiz-button, modal-close - Section fallbacks: error-block, null-block (replace null returns with inert hidden spans so tests can detect error/null states) Also: - Add playwright-report/ and test-results/ to .gitignore - Remove unused eslint-disable for @next/next/no-img-element (rule not loaded in current config) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- docs/plans/2026-04-17-002-fix-qa-testid-coverage-plan.md: plan that drove this branch's testID additions - docs/solutions/platform/local-qa-pipeline-first-runs-20260417.md: captures lessons from first real end-to-end QA pipeline runs — pass rates per surface, Maestro YAML pitfalls, Metro port conflicts, Android phone vs TV emulator setup, and operational prerequisites Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

CI runs eslint with --max-warnings=0 which caught the raw <img> usage after the disable comment was removed. Pre-commit lint-staged rejects the disable comment with "rule not found" (different eslint context). Use next/image consistently — resolves both CI and pre-commit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

New doc: docs/solutions/build-errors/nextjs-img-eslint-lint-staged-ci-mismatch-20260417.md Captures the deadlock where per-file lint-staged ESLint doesn't resolve eslint-config-next plugins, making @next/next/no-img-element appear undefined — while repo-wide CI lint resolves the rule correctly and flags warnings. Fix: use next/image to side-step the rule entirely. This pattern has recurred three times (SiteHeader Apr 15, NavigationCarousel Apr 17, and the Section 5 <style jsx> variant in the search overlay doc). Adds forward cross-reference from the search-overlay Section 5 to the canonical doc so future readers see the broader pattern. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Layer 4 previously failed when apps weren't running on their simulators — the single most common cause of e2e failures during local validation. Added a per-surface pre-flight that, for each affected surface: 1. Boots the simulator if not already running (waits for Booted status) 2. Verifies the target app is installed — emits clear WARN with build command and skips the surface if missing 3. Force-stops any stale instance 4. Launches the app and waits ~8-10s for Metro bundle + first render Surface-specific details: - iOS/tvOS: xcrun simctl boot/launch, Apple TV identified by name - Android phone: adb + ro.build.characteristics check (excludes TV) - Android TV: adb + ro.build.characteristics check, LEANBACK_LAUNCHER intent (not LAUNCHER — the monkey exit-code-251 lesson from docs/solutions/platform/local-qa-pipeline-first-runs-20260417.md) Failure policy: WARN and skip affected surface; don't block the whole pipeline. App builds take 5-10 min and are outside /qa scope. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Prior QA runs leave hundreds of MB of PNG screenshots on disk (TV alone reaches ~150MB). Also, stale screenshots confuse Layer 4b visual review when flows are added, renamed, or removed between runs. Add a pre-flight cleanup step that deletes the surface-specific screenshot subdirectory for each affected surface before flows run: - apps/web/e2e/screenshots/browser - apps/mobile/e2e/screenshots/{ios,android} - apps/tv/e2e/screenshots/{tvos,androidtv} Only clean the directories for surfaces that will actually run, so parallel runs on other surfaces aren't disrupted. Also clean transient Playwright artifacts (playwright-report/, test-results/) when web is affected. Runs recreate the directories on next run, so deletion is safe. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

AppleScript via osascript was the only remaining non-headless input path in the QA pipeline — it required the Apple TV Simulator to be the frontmost macOS app, meaning developer keystrokes collided with test keystrokes during tvOS flow execution. Swap to idb (Facebook iOS Development Bridge) which routes through SimulatorBridge (XPC) directly to the simulator process — matches Android TV's already-headless adb model. Host window focus becomes irrelevant. No Accessibility permission required. Can run in parallel with mobile/browser flows or while the user types. Changes: - apps/tv/e2e/adapters/tvos.ts: replace osascript sendDpad() with idb ui button invocation; swap Accessibility check for idb presence check in checkAvailability(); screenshots and launch unchanged - .claude/commands/qa.md: remove "tvOS requires exclusive foreground" warning; document that all 4 simulator surfaces are now headless - docs/solutions/platform/local-qa-pipeline-first-runs-20260417.md: update lesson #8 to reflect the idb swap; add idb install steps to prerequisites Prerequisite (one-time install): brew tap facebook/fb brew install idb-companion pipx install fb-idb TV flow YAMLs, runner.ts, types.ts, and unit tests unchanged — the swappable-backend design from the original plan pays off here. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

`idb ui button` accepts only iOS hardware buttons (Home, Lock, Siri) — it does NOT expose tvOS Siri Remote D-pad names. The correct API is `idb ui key <HID-usage-code>`, which routes through SimulatorBridge XPC to the simulator's UIFocusEngine (same pathway tvOS uses for external USB/Bluetooth keyboard input). Verified on Apple TV 4K tvOS 26.1 simulator: sending key codes 81 (down) and 40 (return/select) correctly navigates focus and triggers the focused element. Commands execute fully headless — no effect on host keyboard input during testing. HID usage IDs (USB HID Usage Tables, Keyboard/Keypad page): - 82 UpArrow, 81 DownArrow, 80 LeftArrow, 79 RightArrow - 40 Return (Select), 41 Escape (Menu/Back) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Ur-imazing and others added 9 commits April 17, 2026 15:02

railway-app Bot temporarily deployed to forge / forge-pr-795 April 17, 2026 03:45 Destroyed

chore: merge main and resolve pnpm-lock.yaml conflict

732a7d7

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

railway-app Bot temporarily deployed to forge / forge-pr-795 April 17, 2026 03:52 Destroyed

Ur-imazing and others added 2 commits April 17, 2026 15:57

railway-app Bot temporarily deployed to forge / forge-pr-795 April 17, 2026 04:42 Destroyed

Ur-imazing mentioned this pull request Apr 17, 2026

fix(qa): add testID coverage to unblock QA pipeline e2e flows #796

Closed

5 tasks

Ur-imazing and others added 4 commits April 17, 2026 19:41

railway-app Bot temporarily deployed to forge / forge-pr-795 April 17, 2026 07:41 Destroyed

Ur-imazing and others added 2 commits April 18, 2026 15:01

Merge branch 'main' into feat/cross-platform-qa-pipeline

4a22bf3

Ur-imazing force-pushed the feat/cross-platform-qa-pipeline branch from 1e3d236 to 4a22bf3 Compare April 19, 2026 22:24

Ur-imazing force-pushed the feat/cross-platform-qa-pipeline branch from adc0cc0 to 8b71cae Compare April 20, 2026 02:47

Ur-imazing mentioned this pull request Apr 20, 2026

feat(tv): focus-driven hero swaps to the selected experience's video #803

Merged

4 tasks

Ur-imazing and others added 3 commits April 20, 2026 16:38

Merge branch 'main' into feat/cross-platform-qa-pipeline

87c7e8b

railway-app Bot deployed to forge / forge-pr-795 April 20, 2026 04:57 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: cross-platform local QA pipeline with Playwright, Maestro, and TV YAML runner#795

feat: cross-platform local QA pipeline with Playwright, Maestro, and TV YAML runner#795
Ur-imazing wants to merge 24 commits intomainfrom
feat/cross-platform-qa-pipeline

Ur-imazing commented Apr 17, 2026

Uh oh!

railway-app Bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ur-imazing commented Apr 17, 2026

Summary

Architecture

Test plan

Uh oh!

railway-app Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

railway-app Bot commented Apr 17, 2026 •

edited

Loading