Skip to content

feat: cross-platform local QA pipeline with Playwright, Maestro, and TV YAML runner#795

Open
Ur-imazing wants to merge 24 commits intomainfrom
feat/cross-platform-qa-pipeline
Open

feat: cross-platform local QA pipeline with Playwright, Maestro, and TV YAML runner#795
Ur-imazing wants to merge 24 commits intomainfrom
feat/cross-platform-qa-pipeline

Conversation

@Ur-imazing
Copy link
Copy Markdown
Contributor

Summary

  • Add /qa Claude Code skill that orchestrates a 4-layer local QA pipeline: typecheck+lint, LLM diff analysis, unit tests, and automated UI flows with visual screenshot review
  • Add 19 Playwright e2e flows for apps/web covering search, video player, carousels, accordions, quiz modal, navigation, responsive behavior, and error states
  • Add 49 Maestro e2e flows for apps/mobile covering all tabs, home screen, search/discover, library, video detail, collection player, quiz, and platform-specific iOS vs Android behaviors
  • Add custom TV YAML runner (apps/tv/e2e/runner.ts) with tvOS (osascript/AXRaise) and Android TV (adb keyevent) adapters, plus 39 TV YAML flows covering home, experience detail, video overlay, carousel navigation, accordion, quiz modal (QR on tvOS, WebView on Android TV), and focus management
  • Add Layer 3 test infrastructure: @testing-library/react + jsdom for web, @testing-library/react-native for mobile and TV
  • Security: bundleId validation on both adapters to prevent command injection from YAML flows

Architecture

/qa skill (Claude Code)
  ├── Layer 1: turbo typecheck + lint (fast gate)
  ├── Layer 2: LLM diff analysis → affected surfaces verdict
  ├── Layer 3: turbo test (unit/component)
  ├── Layer 4a: Playwright (web) + Maestro (mobile) + TV YAML runner
  └── Layer 4b: LLM visual screenshot review (cross-platform comparison)

Test plan

  • pnpm --filter @forge/tv run test — 38 tests pass (including 8 new runner tests)
  • pnpm --filter @forge/web run test — 6 tests pass
  • Invoke /qa in Claude Code and verify Layer 1+2 diff analysis works
  • Run pnpm --filter @forge/web run e2e with CMS running to verify Playwright flows
  • Run maestro test apps/mobile/.maestro/tab-navigation.yaml on iOS Simulator
  • Run pnpm --filter @forge/tv run e2e:androidtv on Android TV emulator

🤖 Generated with Claude Code

Ur-imazing and others added 9 commits April 17, 2026 15:02
Installs @playwright/test as a devDependency for the web app and
creates the e2e/flows/ and e2e/screenshots/ directories as the
foundation for browser-based end-to-end testing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ysis

Create the cross-platform local QA pipeline entry point as a Claude Code
skill. Layer 2 performs diff analysis to identify affected packages and
surfaces using the dependency graph, flags platform-specific risks, and
produces a structured verdict. Layer 1 runs typecheck + lint via Turbo
filtered to only affected packages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add environmentMatchGlobs to web vitest config so .test.tsx files use
  jsdom while .test.ts files keep using node environment
- Add @testing-library/react and jsdom as web devDependencies
- Add @testing-library/react-native as devDependency for mobile and tv
- No test files added yet — this enables future component test authoring

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Configure Playwright for the web app with 19 flow files covering all
test scenarios: navigation, search overlay, search page, video player,
carousel video player, navigation carousel, bible quotes, media
collection, related questions, advent countdown, easter dates, quiz
modal, video hero, section rendering, routes/page loading, responsive
behavior, keyboard navigation, error states, and animations.

Each flow captures screenshots to e2e/screenshots/browser/ for Layer 4b
visual review. Adds e2e script to package.json.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Create 48 Maestro YAML flow files covering all mobile test scenarios:
tab navigation, home screen (hero, sections, blur overlay, header),
discover/search (debounce, skeleton, pagination, results, errors),
library (list, selection, states, thumbnails), video detail (player,
share, description, siblings, back), collection player (playlist,
switching, states), hero renderer, carousels (video, media, bible
quotes, navigation), accordion, quiz (button, WebView, validation),
platform-specific (safe areas, blur, glass, interactions), AppState
lifecycle, and error states.

Adds e2e:ios and e2e:android scripts to package.json. Screenshots
gitignored at e2e/screenshots/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Build a lightweight TypeScript runner (~150 lines) that executes TV test
flows written in simple YAML format. Dispatches D-pad commands to tvOS
Simulator via AppleScript osascript (key codes 126/125/123/124/36/53)
and to Android TV Emulator via adb shell input keyevent (19-23, 4).

Includes:
- TVAdapter interface for swappable backends
- tvOS adapter with AXRaise + keystroke injection
- Android TV adapter with adb keyevent
- Flow parser, step executor, and flow runner
- Unit tests for parser, discovery, and step execution
- 200ms default delay between D-pad steps

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Create 31 YAML flow files covering all TV test scenarios: home screen
(loading, hero, content rail, card select, error, focus memory),
experience detail (sections, error, empty, back), video player (open,
controls, dismiss, focus trap, progress), carousel navigation (video,
media collection, bible quotes, navigation, focus exit, auto-scroll),
accordion (expand/collapse, single open), quiz modal (platform-specific
QR code on tvOS, WebView on Android TV), text/static content (text,
easter dates, container, section wrapper), focus management (preferred
focus, focus ring, restoration, spatial), platform-specific (scroll
offset, remote buttons), error states, and accessibility labels.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… to /qa skill

Complete the /qa pipeline with pre-flight environment checks (running
apps, booted simulators, CMS health), explicit screenshot reading
instructions for Layer 4b visual review, and cross-platform comparison
guidance. The full pipeline now covers all 4 layers across 5 surfaces.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ions

- Add bundleId validation (regex /^[a-zA-Z0-9._-]+$/) to both tvOS and
  Android TV adapters to prevent command injection via malicious YAML flows
- Replace `activate` with `AXRaise` in tvOS adapter to avoid stealing
  window focus from the terminal during test runs
- Increase Android TV screenshot maxBuffer from 10MB to 50MB for 4K TVs
- Use exported DEFAULT_STEP_DELAY_MS constant instead of local duplicate
- Improve flow name slugification to handle special characters
- Guard main() with require.main check to prevent execution during tests
- Add yaml to jest transformIgnorePatterns for ESM compatibility
- Include brainstorm requirements, implementation plan, and comprehensive
  test scenarios documents

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@railway-app
Copy link
Copy Markdown

railway-app Bot commented Apr 17, 2026

🚅 Deployed to the forge-pr-795 environment in forge

Service Status Web Updated (UTC)
@forge/web ✅ Success (View Logs) Apr 20, 2026 at 5:04 am
@forge/manager ✅ Success (View Logs) Apr 20, 2026 at 5:02 am
@forge/cms ✅ Success (View Logs) Apr 20, 2026 at 5:01 am

@railway-app railway-app Bot temporarily deployed to forge / forge-pr-795 April 17, 2026 03:45 Destroyed
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to forge / forge-pr-795 April 17, 2026 03:52 Destroyed
Ur-imazing and others added 2 commits April 17, 2026 15:57
The TV YAML runner uses Node.js APIs (child_process, fs, path) that
aren't available in the React Native/Expo type environment. The runner
is executed via tsx at runtime, not compiled by the app's tsc.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
QA pipeline corrections discovered during first real test runs:

- Mobile Maestro flows: update appId from com.jesusfilm.forge to
  org.jesusfilm.forgewatch (matches apps/mobile/app.json)
- TV YAML flows: update launch target from com.jesusfilm.forge.tv to
  org.jesusfilm.forgetv (matches apps/tv/app.json)
- Android TV adapter: use `am start` with LEANBACK_LAUNCHER instead of
  `monkey` (monkey returns exit code 251 even on success, which was
  causing all 38 flows to report failure despite launching correctly)
- Playwright config: skip webServer spawn when PW_SKIP_WEBSERVER is
  set, to avoid port conflicts when dev server is already running

Verified on first real run:
- Playwright: 190/200 passed
- tvOS YAML runner: 38/38 passed
- Android TV YAML runner: 38/38 passed (after am start fix)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to forge / forge-pr-795 April 17, 2026 04:42 Destroyed
Fixes invalid Maestro v2.4 syntax found during first real test runs:

- `- wait: { milliseconds: N }` -> `- waitForAnimationToEnd`
- `- scroll: { direction: DOWN, duration: N }` -> `- scroll`
- `- clearText` -> `- eraseText`

These commands were never valid in Maestro; tests failed fast on
YAML validation. With syntax fixed, flows execute correctly — 29/49
pass on iOS; remaining failures are missing testIDs on app components
(app-side work, not QA pipeline).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ur-imazing and others added 4 commits April 17, 2026 19:41
Pure metadata additions — no behavior, styling, or structural changes.
Unblocks the 20 Maestro flows that previously failed on "Element not
found" because testIDs didn't exist in components.

testIDs added:
- Tab bar: tab-home, tab-discover, tab-library, tab-profile
- Header: header-search, header-profile
- Hero: hero-cta, mute-button
- Search: search-input, search-result-{index}
- Cards: video-card-{index}, video-carousel-card-{index},
  media-collection-item-{index}, nav-carousel-item-{index},
  experience-card-{index}, playlist-item-{index}
- Video detail: video-thumbnail, share-button, back-button
- Bible quotes: share-quote, quote-cta
- Related questions: accordion-question-{index}, accordion-cta
- Quiz: quiz-button, modal-close

Preserves all existing accessibilityLabel props.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pure metadata additions — no behavior, styling, or structural changes.
Unblocks Playwright flows that previously failed when fragile CSS
selector fallbacks couldn't find elements.

data-testids added:
- Header: logo, search-toggle, search-close
- Hero: hero, hero-heading, hero-cta, hero-mute
- Carousels: carousel-thumbnail, media-collection,
  media-collection-item, collection-size, nav-carousel,
  nav-carousel-item, bible-quotes, quote-cta, resource-cta
- Accordions: accordion-trigger, accordion-cta
- Countdown/dates: advent-toggle, advent-days, easter-toggle
- Quiz: quiz-button, modal-close
- Section fallbacks: error-block, null-block (replace null returns
  with inert hidden spans so tests can detect error/null states)

Also:
- Add playwright-report/ and test-results/ to .gitignore
- Remove unused eslint-disable for @next/next/no-img-element
  (rule not loaded in current config)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- docs/plans/2026-04-17-002-fix-qa-testid-coverage-plan.md: plan that
  drove this branch's testID additions
- docs/solutions/platform/local-qa-pipeline-first-runs-20260417.md:
  captures lessons from first real end-to-end QA pipeline runs —
  pass rates per surface, Maestro YAML pitfalls, Metro port conflicts,
  Android phone vs TV emulator setup, and operational prerequisites

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CI runs eslint with --max-warnings=0 which caught the raw <img> usage
after the disable comment was removed. Pre-commit lint-staged rejects
the disable comment with "rule not found" (different eslint context).
Use next/image consistently — resolves both CI and pre-commit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to forge / forge-pr-795 April 17, 2026 07:41 Destroyed
Ur-imazing and others added 2 commits April 18, 2026 15:01
New doc: docs/solutions/build-errors/nextjs-img-eslint-lint-staged-ci-mismatch-20260417.md
Captures the deadlock where per-file lint-staged ESLint doesn't resolve
eslint-config-next plugins, making @next/next/no-img-element appear
undefined — while repo-wide CI lint resolves the rule correctly and
flags warnings. Fix: use next/image to side-step the rule entirely.

This pattern has recurred three times (SiteHeader Apr 15, NavigationCarousel
Apr 17, and the Section 5 <style jsx> variant in the search overlay doc).
Adds forward cross-reference from the search-overlay Section 5 to the
canonical doc so future readers see the broader pattern.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Ur-imazing Ur-imazing force-pushed the feat/cross-platform-qa-pipeline branch from 1e3d236 to 4a22bf3 Compare April 19, 2026 22:24
Layer 4 previously failed when apps weren't running on their simulators
— the single most common cause of e2e failures during local validation.

Added a per-surface pre-flight that, for each affected surface:
1. Boots the simulator if not already running (waits for Booted status)
2. Verifies the target app is installed — emits clear WARN with build
   command and skips the surface if missing
3. Force-stops any stale instance
4. Launches the app and waits ~8-10s for Metro bundle + first render

Surface-specific details:
- iOS/tvOS: xcrun simctl boot/launch, Apple TV identified by name
- Android phone: adb + ro.build.characteristics check (excludes TV)
- Android TV: adb + ro.build.characteristics check, LEANBACK_LAUNCHER
  intent (not LAUNCHER — the monkey exit-code-251 lesson from
  docs/solutions/platform/local-qa-pipeline-first-runs-20260417.md)

Failure policy: WARN and skip affected surface; don't block the
whole pipeline. App builds take 5-10 min and are outside /qa scope.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Ur-imazing Ur-imazing force-pushed the feat/cross-platform-qa-pipeline branch from adc0cc0 to 8b71cae Compare April 20, 2026 02:47
Prior QA runs leave hundreds of MB of PNG screenshots on disk (TV
alone reaches ~150MB). Also, stale screenshots confuse Layer 4b
visual review when flows are added, renamed, or removed between runs.

Add a pre-flight cleanup step that deletes the surface-specific
screenshot subdirectory for each affected surface before flows run:
- apps/web/e2e/screenshots/browser
- apps/mobile/e2e/screenshots/{ios,android}
- apps/tv/e2e/screenshots/{tvos,androidtv}

Only clean the directories for surfaces that will actually run, so
parallel runs on other surfaces aren't disrupted. Also clean
transient Playwright artifacts (playwright-report/, test-results/)
when web is affected.

Runs recreate the directories on next run, so deletion is safe.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ur-imazing and others added 3 commits April 20, 2026 16:38
AppleScript via osascript was the only remaining non-headless input
path in the QA pipeline — it required the Apple TV Simulator to be
the frontmost macOS app, meaning developer keystrokes collided with
test keystrokes during tvOS flow execution.

Swap to idb (Facebook iOS Development Bridge) which routes through
SimulatorBridge (XPC) directly to the simulator process — matches
Android TV's already-headless adb model. Host window focus becomes
irrelevant. No Accessibility permission required. Can run in
parallel with mobile/browser flows or while the user types.

Changes:
- apps/tv/e2e/adapters/tvos.ts: replace osascript sendDpad() with
  idb ui button invocation; swap Accessibility check for idb presence
  check in checkAvailability(); screenshots and launch unchanged
- .claude/commands/qa.md: remove "tvOS requires exclusive foreground"
  warning; document that all 4 simulator surfaces are now headless
- docs/solutions/platform/local-qa-pipeline-first-runs-20260417.md:
  update lesson #8 to reflect the idb swap; add idb install steps
  to prerequisites

Prerequisite (one-time install):
  brew tap facebook/fb
  brew install idb-companion
  pipx install fb-idb

TV flow YAMLs, runner.ts, types.ts, and unit tests unchanged — the
swappable-backend design from the original plan pays off here.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
`idb ui button` accepts only iOS hardware buttons (Home, Lock, Siri)
— it does NOT expose tvOS Siri Remote D-pad names. The correct API is
`idb ui key <HID-usage-code>`, which routes through SimulatorBridge
XPC to the simulator's UIFocusEngine (same pathway tvOS uses for
external USB/Bluetooth keyboard input).

Verified on Apple TV 4K tvOS 26.1 simulator: sending key codes 81
(down) and 40 (return/select) correctly navigates focus and triggers
the focused element. Commands execute fully headless — no effect on
host keyboard input during testing.

HID usage IDs (USB HID Usage Tables, Keyboard/Keypad page):
- 82 UpArrow, 81 DownArrow, 80 LeftArrow, 79 RightArrow
- 40 Return (Select), 41 Escape (Menu/Back)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant