diff --git a/docs/solutions/skill-design/ce-prefix-required-for-skills-and-agents-2026-05-01.md b/docs/solutions/skill-design/ce-prefix-required-for-skills-and-agents-2026-05-01.md new file mode 100644 index 000000000..402c962b7 --- /dev/null +++ b/docs/solutions/skill-design/ce-prefix-required-for-skills-and-agents-2026-05-01.md @@ -0,0 +1,110 @@ +--- +title: New skills and agents must use the ce- prefix; enforce it in tests, not just prose +date: 2026-05-01 +category: skill-design +module: compound-engineering +problem_type: convention +component: plugins/compound-engineering +severity: low +applies_when: + - Adding a new skill directory under plugins/compound-engineering/skills/ + - Adding a new agent file under plugins/compound-engineering/agents/ + - Authoring or reviewing a PR that introduces a new component to the plugin +tags: + - naming-convention + - ce-prefix + - skill-authoring + - test-enforcement + - plugin-conventions +related: + - docs/solutions/skill-design/beta-skills-framework.md +related_pr: https://github.com/EveryInc/compound-engineering-plugin/pull/747 +--- + +## Problem + +`plugins/compound-engineering/AGENTS.md` already stated that "all skills and agents use the `ce-` prefix to unambiguously identify them as compound-engineering components." But the rule was prose-only, and three legacy skills (`every-style-editor`, `file-todos`, `lfg`) sit unprefixed in the same directory as their `ce-`-prefixed siblings. The combination — a soft rule plus visible exceptions — let a new skill (`riffrec-feedback-analysis`) ship in PR #747 without the prefix. The user caught it post-merge of the first commit, requiring a rename commit on the same PR. + +A prose convention that has visible counterexamples and no machine check is, in practice, an *advisory* convention. Any author skim-reading the directory listing sees `every-style-editor` next to `ce-brainstorm` and reasonably concludes the prefix is optional. + +## Root cause + +Two layered problems: + +1. **The rule was unenforced.** Nothing in CI or the test suite would fail when a non-`ce-` skill was added. The frontmatter test asserts that the skill's `name:` matches its directory and that the directory uses `[a-z0-9-]+`, but does not check for the `ce-` prefix. +2. **The exception list was implicit.** Three legacy skills predate the rule. Without an explicit allowlist, "predates the rule" looks identical to "the rule doesn't apply" when reading the filesystem. + +## Solution + +Make the rule mechanically enforced and pin the exceptions explicitly. + +### 1. Test enforcement + +Added to `tests/frontmatter.test.ts` inside the existing `frontmatter YAML validity` block: + +```ts +if (pluginRoot === "plugins/compound-engineering") { + const SKILL_PREFIX_ALLOWLIST = new Set([ + "every-style-editor", + "file-todos", + "lfg", + ]) + test(`${pluginRoot}/${rel} skill name uses ce- prefix`, () => { + const dirName = path.basename(path.dirname(rel)) + if (SKILL_PREFIX_ALLOWLIST.has(dirName)) return + expect( + dirName.startsWith("ce-"), + `Skill "${dirName}" must use the ce- prefix. ` + + `If this is a legacy skill that predates the rule, add it to ` + + `SKILL_PREFIX_ALLOWLIST in tests/frontmatter.test.ts.`, + ).toBe(true) + }) +} +``` + +A parallel test at the agent level (no allowlist, since every existing agent already conforms): + +```ts +if ( + pluginRoot === "plugins/compound-engineering" && + /^agents\/[^/]+\.agent\.md$/.test(rel) +) { + test(`${pluginRoot}/${rel} agent name uses ce- prefix`, () => { + const fileName = path.basename(rel, ".agent.md") + expect( + fileName.startsWith("ce-"), + `Agent "${fileName}" must use the ce- prefix.`, + ).toBe(true) + }) +} +``` + +The test failure message tells the author exactly what to do — either rename the skill or, if it is genuinely legacy, edit the allowlist (which a reviewer can then push back on). + +### 2. Strengthened prose + +Updated `plugins/compound-engineering/AGENTS.md` to call the prefix mandatory, name the legacy exceptions, point at the test, and forbid extending the allowlist. The prose now says "no exceptions" and tells authors that the test will fail. Prose alone wouldn't have prevented the original mistake, but pairing it with the test gives a single internally consistent story. + +### 3. Persistent author memory + +Saved a feedback memory at `~/.claude/projects/-Users-kieranklaassen-compound-engineering-plugin/memory/feedback_ce_prefix_required.md` so future sessions on this repo load the rule automatically and apply it before the test fires. + +## Prevention + +For any plugin convention that is currently prose-only, ask: + +- Is there at least one visible counterexample in the codebase that an author could mistake for permission? +- Is there a mechanical check that would fail on violation? + +If the answer to the first is yes and the second is no, the convention will eventually be violated. The fix is one of: + +- Add a test that asserts the convention with a hard-coded allowlist for legacy exceptions. +- Migrate the legacy exceptions so the rule is universal and no allowlist is needed. + +The allowlist pattern is preferred when migration is risky (renaming an installed skill breaks user invocations) but the rule applies cleanly going forward. + +## Related + +- `plugins/compound-engineering/AGENTS.md` — Naming Convention section now documents the rule and the allowlist. +- `tests/frontmatter.test.ts` — implements the enforcement. +- PR #747 — the original mistake and the rename + enforcement that came with it. diff --git a/plugins/compound-engineering/AGENTS.md b/plugins/compound-engineering/AGENTS.md index 0fa90414d..1401d3c1c 100644 --- a/plugins/compound-engineering/AGENTS.md +++ b/plugins/compound-engineering/AGENTS.md @@ -88,6 +88,8 @@ Important: Just because the developer's installed plugin may be out of date, it' **Agents** follow the same convention: `ce-adversarial-reviewer`, `ce-learnings-researcher`, etc. When referencing agents from skills, use the bare `ce-` form (e.g., `ce-adversarial-reviewer`) — the `ce-` prefix is sufficient for uniqueness across plugins. +**The `ce-` prefix is required for every new skill and agent — no exceptions.** Three legacy skills (`every-style-editor`, `file-todos`, `lfg`) predate the rule and remain unprefixed; they are pinned in `tests/frontmatter.test.ts` as the only allowed exceptions. Do not add to that allowlist. When adding a new skill, the directory name, the SKILL.md `name:` frontmatter, and any README references must all start with `ce-`. The frontmatter test enforces this and will fail on a missing prefix. + ## Known External Limitations **Proof HITL surfaces a ghost "AI collaborator" agent** (noted 2026-04-16, may change): The Proof API auto-joins any header-less `/state` read under a synthetic `ai:auto-` identity, so docs created by the `skills/proof/` HITL workflow show a phantom participant alongside `Compound Engineering`. The only way to suppress it is to set `ownerId: "agent:ai:compound-engineering"` on create — but that transfers document ownership to the agent and prevents the user from claiming it into their Proof library, so we don't use it. Treat as cosmetic noise; don't reintroduce the `ownerId` workaround. Tracked upstream: https://github.com/EveryInc/proof/issues/951. diff --git a/plugins/compound-engineering/README.md b/plugins/compound-engineering/README.md index bc5a073e9..666052139 100644 --- a/plugins/compound-engineering/README.md +++ b/plugins/compound-engineering/README.md @@ -41,6 +41,7 @@ For `/ce-optimize`, see [`skills/ce-optimize/README.md`](./skills/ce-optimize/RE |-------|-------------| | `/ce-sessions` | Ask questions about session history across Claude Code, Codex, and Cursor | | `/ce-slack-research` | Search Slack for interpreted organizational context -- decisions, constraints, and discussion arcs | +| `ce-riffrec-feedback-analysis` | Convert [Riffrec](https://github.com/kieranklaassen/riffrec) recordings, videos, audio, or notes into structured feedback. Routes between setup, quick bug report, and extensive analysis that hands off to `ce-brainstorm` | ### Git Workflow diff --git a/plugins/compound-engineering/skills/ce-riffrec-feedback-analysis/SKILL.md b/plugins/compound-engineering/skills/ce-riffrec-feedback-analysis/SKILL.md new file mode 100644 index 000000000..0f589f533 --- /dev/null +++ b/plugins/compound-engineering/skills/ce-riffrec-feedback-analysis/SKILL.md @@ -0,0 +1,36 @@ +--- +name: ce-riffrec-feedback-analysis +description: Riffrec product-feedback workflow. ALWAYS load when the user posts a `riffrec-*.zip`, a bundle with `session.json` + `events.json` + `recording.webm` + `voice.webm`, a video/audio recording for product feedback, or asks how to capture and share Riffrec sessions. Routes between setup, quick bug report, and extensive analysis. +--- + +# Riffrec Feedback Analysis + +Turn raw product feedback into structured evidence for downstream agents. This skill is the consumption side of [Riffrec](https://github.com/kieranklaassen/riffrec), a capture tool that records synchronized screen + voice + event sessions and emits a `riffrec-*.zip` bundle. + +## Choose the path + +Route to the matching reference based on the input. Read only that reference; do not load the others. + +- **Setup** — user has no recording yet and asks how to install Riffrec, capture a session, or share feedback. Read `references/install-riffrec.md`. +- **Quick bug report** — input is a short recording (under ~60 seconds), the user describes a single specific issue, or asks for "quick", "small", or "just transcribe". Read `references/quick-bug-report.md`. Emit one concise bug report; skip the full artifact set and brainstorm handoff. +- **Extensive analysis** — input is a longer recording, contains multiple issues / requirements / workflow walkthroughs, or the user wants requirements or brainstorm material. Read `references/extensive-analysis.md`. Always continue into the `ce-brainstorm` skill. + +When the input is ambiguous (e.g., a zip arrived without context), inspect the recording length and event count before choosing. If still unclear, ask the user which path applies before running anything heavy. + +## Common rules + +- Keep raw recordings, audio chunks, zip contents, session dumps, and extracted screenshots local-only by default. Do not commit `raw/` or `frames/` directories unless the user explicitly asks and privacy is acceptable. +- Text/metadata artifacts (requirements docs, analysis summaries, problem analyses, source manifests) may be committed when they are needed for traceability and contain no sensitive data. +- Use repo-relative screenshot paths in any committed doc so later agents can open the evidence without absolute local paths. + +## Analyzer entrypoint + +All non-setup paths share the same analyzer: + +```bash +python scripts/analyze_riffrec_zip.py /path/to/input +``` + +Accepted inputs: a Riffrec `.zip`, an `.mp4` / `.mov` / `.webm` video, an `.m4a` / `.mp3` / `.wav` audio file, or a meeting-notes `.md`. Use `--output-dir ` to control where artifacts land. In repos with `docs/brainstorms/`, the default is `docs/brainstorms/riffrec-feedback/`. The quick path overrides the output dir to a temp location so nothing pollutes the repo. + +The Compound Engineering output format used by the extensive path is documented in `references/compound-engineering-feedback-format.md`. diff --git a/plugins/compound-engineering/skills/ce-riffrec-feedback-analysis/references/compound-engineering-feedback-format.md b/plugins/compound-engineering/skills/ce-riffrec-feedback-analysis/references/compound-engineering-feedback-format.md new file mode 100644 index 000000000..3700d1277 --- /dev/null +++ b/plugins/compound-engineering/skills/ce-riffrec-feedback-analysis/references/compound-engineering-feedback-format.md @@ -0,0 +1,116 @@ +# Compound Engineering Feedback Format + +Use this shape when converting Riffrec evidence into a durable brainstorm or planning input. + +## Finding + +```markdown +### F1. + +- **Severity:** P0/P1/P2/P3 +- **Observed:** +- **Expected:** +- **Evidence:** +- **Confidence:** High/Medium/Low, with reason +- **Requirement candidates:** R1, R2 +``` + +## Requirements Kickoff + +```markdown +--- +date: YYYY-MM-DD +topic: +--- + +# + +## Problem Frame + + + +--- + +## Actors + +- A1. User: +- A2. Product surface: +- A3. Agent/assistant, if relevant: + +--- + +## Key Flows + +- F1. Recorded feedback triage + - **Trigger:** A Riffrec zip is available for review. + - **Actors:** A1, A2 + - **Steps:** <3-7 product steps seen in the recording> + - **Outcome:** + - **Covered by:** R1, R2 + +--- + +## Requirements + +**Observed product behavior** +- R1. + +**Feedback evidence and reviewability** +- R2. + +--- + +## Acceptance Examples + +- AE1. **Covers R1.** Given , when , . + +--- + +## Success Criteria + +- +- + +--- + +## Scope Boundaries + +- + +--- + +## Key Decisions + +- : + +--- + +## Dependencies / Assumptions + +- + +--- + +## Outstanding Questions + +### Resolve Before Planning + +- + +### Deferred to Planning + +- [Technical] + +--- + +## Next Steps + +-> /ce-brainstorm to confirm, correct, and regroup the captured requirements before any planning. +``` + +## Evidence Rules + +- Prefer moment IDs and screenshot links over prose-only claims. +- Mark visual interpretation as an inference when the screenshot does not prove intent. +- Requirements should describe product behavior, not implementation details. +- Do not include absolute local paths in CE docs; use repo-relative paths when possible. diff --git a/plugins/compound-engineering/skills/ce-riffrec-feedback-analysis/references/extensive-analysis.md b/plugins/compound-engineering/skills/ce-riffrec-feedback-analysis/references/extensive-analysis.md new file mode 100644 index 000000000..16aa38ee3 --- /dev/null +++ b/plugins/compound-engineering/skills/ce-riffrec-feedback-analysis/references/extensive-analysis.md @@ -0,0 +1,119 @@ +# Extensive analysis path + +Use this path when the input is a longer recording (over ~60 seconds), contains multiple issues, requirements, or workflow walkthroughs, or the user explicitly wants requirements material. The goal is a full Compound Engineering-compatible artifact set that feeds `ce-brainstorm`. + +## Workflow + +1. Run the analyzer: + + ```bash + python scripts/analyze_riffrec_zip.py /path/to/input + ``` + + Use `--output-dir ` when the artifact should live somewhere specific. In a repo with `docs/brainstorms/`, the default output goes under `docs/brainstorms/riffrec-feedback/`. + +2. Read the generated `analysis.md`, `problem-analysis.md`, `review-prompt.md`, and `requirements-kickoff.md`. + +3. Read `source-materials.md` before brainstorm. It is the source-of-truth manifest for the original raw feedback location, transcript, local-only frames, chunks, analysis artifacts, and screenshot paths. Use it to keep brainstorm and planning traceable to the original feedback evidence. + +4. Inspect the extracted screenshots for high-signal moments using the platform's image-view tool. Prioritize screenshots selected because of click events near verbal complaints, failed network requests, console errors, or repeated interaction. + +5. Fill or refine `problem-analysis.md` using the frame review structure from `review-prompt.md`. The final problem analysis must have exactly these top-level categories: + + - **Visual/UI Problems** + - **Functional Problems** + - **Requirements** + - **Usability/UX Problems** + + Each numbered item should describe the problem, location, UI element, frame reference, and relevant transcript context when available. Focus on WHAT is wrong, not HOW to fix it. + +6. Convert evidence into requirements. Keep these categories distinct: + + - **Observed facts:** transcript quotes, click targets, request statuses, screenshot contents. + - **Inferences:** likely user intent, likely broken control, suspected missing state. + - **Requirements:** product behavior needed to resolve the problem. + +7. When the current workspace contains the product source code, run a source-mapping pass before or during brainstorm. Use the transcript language, visible UI labels, screenshot paths, route names, and generated requirements to search the codebase for likely components, controllers, services, models, tests, and state stores. For larger sessions, split this mapping by product area and use sub-agents when available so independent areas can be inspected in parallel. + +8. Add source mapping to the brainstorm material as suspected implementation surfaces, not as proven root cause unless the code clearly proves it. Include confidence levels and short evidence notes explaining why each file or component is relevant. + +9. Always continue into brainstorm. Once `analysis.md`, `problem-analysis.md`, `source-materials.md`, and `requirements-kickoff.md` exist, say "Analysis complete. Ready to brainstorm the findings." Then immediately load the `ce-brainstorm` skill with the generated `requirements-kickoff.md`, unless the user explicitly asked only to extract or analyze artifacts. + +10. In brainstorm, first ask the user to confirm the captured requirements: "Did this capture the requirements correctly, and what is missing, wrong, or grouped badly?" Do not move to planning until brainstorm has confirmed or corrected the requirements. + +## Automatic handoff + +Do not end the workflow after extraction in normal use. The intended sequence is: + +1. Run the analyzer. +2. Read `source-materials.md` so brainstorm has direct links to raw feedback, transcript, frames, and analysis artifacts. +3. Inspect or refine `problem-analysis.md` when the evidence needs human-visible interpretation. +4. Load the `ce-brainstorm` skill with `requirements-kickoff.md`. +5. Ask the user to confirm, correct, or regroup the captured requirements. +6. Let `ce-brainstorm` produce the durable requirements doc under `docs/brainstorms/`. + +Only stop after step 1 or 2 when the user asks specifically for raw artifacts, transcript, screenshots, or analysis without brainstorming. + +## Capture scale + +Prefer over-capture to under-capture. The purpose of this path is to preserve product feedback as structured data for later AI work, not to decide what is worth implementing during extraction. + +When analyzing a feedback source: + +- Capture every distinct problem, bug, request, expectation, confusion point, and "note to self" that appears in the transcript or frames. +- Include concrete examples from the source material for each issue when possible: timestamp, transcript phrase, screenshot path, clicked UI element, email/thread ID, or observed state. +- Include concrete source-code mapping when possible: likely component/service/controller/model/test files, route or API endpoint names, relevant state variables, and confidence level. This mapping should make it obvious where a later implementation agent should start looking. +- If only video is available, infer likely screens and components from visible UI labels, layout, URLs, route names, copied text, screenshots, and transcript references. Mark uncertain mappings explicitly instead of omitting them. +- If only audio or notes are available, map from product terminology and workflow descriptions to likely code areas when the repo is present, and label the mapping as transcript-derived. +- Do not drop lower-priority items during analysis. Mark them as lower priority or secondary if needed, but keep them represented. +- Separate capture from prioritization. Brainstorm may regroup, split, defer, or reject items later, but the first requirements pass should preserve the full signal. +- If a feedback session contains many issues, create a comprehensive capture document and state that planning should split it into smaller plans. +- Treat source mapping as supporting material, not a filter. If a problem cannot yet be mapped to code, keep the problem and mark the source mapping as unknown. + +## Source mapping grounding + +When mapping feedback to source code, classify each mapping as one of: + +- **Likely buggy surface:** the code path exists and directly handles the observed behavior. +- **Missing or incomplete surface:** the feedback names a behavior, but the repo has no clear UI, route, controller action, or component implementing it yet. +- **Indirect surface:** the code is adjacent to the behavior, but the exact interaction may happen through rendered email content, third-party UI, generated HTML, or another layer. +- **Unknown:** no grounded source mapping found yet. + +Every source mapping should include: + +- Requirement/example ids, such as `R14`, `AE4`, or `EX17`. +- File paths with line numbers when practical. +- A short evidence note from code, not just a file guess. +- Confidence: `High`, `Medium`, `Low`, or `Unknown`. + +Prefer saying "I did not find a current inbox implementation for this surface" over forcing a speculative mapping. Missing surfaces are useful product findings and should stay in the brainstorm. + +## Output shape + +The analyzer writes: + +- `analysis.md`: session summary, transcript, selected moments, screenshot links, candidate findings, and review checklist. +- `problem-analysis.md`: a categorized problem statement scaffold for visual, functional, requirement, and UX findings. +- `review-prompt.md`: a filled prompt containing screenshot paths and transcript for a deeper visual analysis pass. +- `source-materials.md`: a manifest linking the original source location, local-only raw files, transcript locations, chunks, local-only frames, and generated artifacts. +- `requirements-kickoff.md`: a CE-friendly requirements starter with Problem Frame, Actors, Key Flows, R-IDs, Acceptance Examples, Success Criteria, Scope Boundaries, Questions, and Next Steps. +- `analysis.json`: structured session, event, transcript, moment, and artifact metadata. +- `frames/`: extracted PNG screenshots for selected moments. Local-only by default. +- `raw/`: extracted zip contents and copied source media. Local-only by default. + +Long media is transcribed in chunks when a single transcription request is too large. Chunk transcripts include timestamp prefixes so the review pass can still connect discussion points to approximate video regions. + +For audio-only or notes-only sources, the visual sections intentionally say that no frames are available. In those cases, extract functional problems, requirements, and UX friction from transcript or notes only. + +## Review heuristics + +Select moments when they contain: + +- Verbal complaint cues: "weird", "doesn't work", "can't", "broken", "bug", "problem", "confusing", "should". +- Clicks on controls shortly before or after a complaint. +- Repeated clicks on the same control. +- Failed requests outside known development noise. +- Console errors, uncaught exceptions, or failed form submissions. +- Visible toasts, validation errors, disabled controls, empty states, or surprising navigation. + +The script's findings are deliberately conservative. Look at screenshots and transcript together before turning a candidate finding into a requirement. diff --git a/plugins/compound-engineering/skills/ce-riffrec-feedback-analysis/references/install-riffrec.md b/plugins/compound-engineering/skills/ce-riffrec-feedback-analysis/references/install-riffrec.md new file mode 100644 index 000000000..564e9b3a0 --- /dev/null +++ b/plugins/compound-engineering/skills/ce-riffrec-feedback-analysis/references/install-riffrec.md @@ -0,0 +1,27 @@ +# Setup: Add Riffrec to a project + +Use this path when the user has no recording yet and wants to start capturing product feedback with [Riffrec](https://github.com/kieranklaassen/riffrec). + +Riffrec is a browser-based capture tool that records the screen, microphone audio, console output, network requests, and DOM events into a single `riffrec-*.zip` bundle. The bundle is what this skill consumes downstream. + +## What to tell the user + +1. Riffrec lives at . Refer them to the README for the current install command — it is the source of truth and may change. +2. The general shape of integration: + - Add the Riffrec capture script or package to the project's web app. + - Wire a "Record feedback" affordance somewhere accessible during real use (a bug report button, a dev-only floating recorder, or a keyboard shortcut). + - Confirm a sample session ends with a downloadable `riffrec-*.zip`. +3. Once a zip exists, the user runs this skill again with the zip path. The skill will pick the **quick bug report** or **extensive analysis** path automatically based on length and content. + +## Recommended capture habits + +Surface these to the user during setup so the recordings they share later are easy to analyze: + +- Speak the issue out loud while reproducing it. The transcript is the single highest-signal artifact. +- Click the affected UI even when it does nothing — failed clicks are the strongest signal in event extraction. +- Keep recordings focused. Many short clips beat one long one when issues are unrelated. +- Note when a step is intentional vs. accidental ("oops, that wasn't what I meant"). The analyzer cannot infer intent. + +## After install + +When the user returns with their first zip, route to `references/quick-bug-report.md` or `references/extensive-analysis.md` per the SKILL.md routing rules. Do not run the analyzer in the setup path — there is nothing to analyze yet. diff --git a/plugins/compound-engineering/skills/ce-riffrec-feedback-analysis/references/quick-bug-report.md b/plugins/compound-engineering/skills/ce-riffrec-feedback-analysis/references/quick-bug-report.md new file mode 100644 index 000000000..00b292a5e --- /dev/null +++ b/plugins/compound-engineering/skills/ce-riffrec-feedback-analysis/references/quick-bug-report.md @@ -0,0 +1,44 @@ +# Quick bug report path + +Use this path when the input is a short recording (under ~60 seconds), the user describes a single specific issue, or the user explicitly asks for "quick", "small", "simple", or "just transcribe". The goal is one concise bug report, not a multi-artifact requirements package. + +## Workflow + +1. Run the analyzer to a temp directory so nothing pollutes the repo: + + ```bash + python scripts/analyze_riffrec_zip.py /path/to/input --output-dir "$(mktemp -d -t riffrec-quick-XXXXXX)" + ``` + + Capture the printed output directory; later steps read from it. + +2. Read only `analysis.md` from the temp output. Skip `problem-analysis.md`, `review-prompt.md`, `requirements-kickoff.md`, and `source-materials.md` — they are designed for the extensive path. + +3. Pick at most one or two screenshots from `frames/` that directly show the reported issue. Prefer frames near a verbal complaint, a failed click, a console error, or a failed network request. + +4. Emit a single concise bug report. Default to printing it inline in the chat so the user can confirm before anything is written to disk. Only write a file if the user asks for one — and even then, prefer a single `bug-report.md` next to the source recording or in a path the user names. Do not auto-create `docs/brainstorms/...` for this path. + +## Bug report shape + +Keep it focused and short. Include only what the recording supports: + +- **Title** — one short sentence naming the broken behavior. +- **Steps to reproduce** — bullet list reconstructed from clicks and transcript. +- **Expected vs. actual** — what the user said should happen vs. what happened. +- **Evidence** — transcript quote(s) with timestamps, plus 0–2 screenshot references. +- **Suggested next step** — single sentence: file an issue, open `ce-debug`, or escalate to extensive analysis if more issues surfaced. + +## Source mapping (optional, only if obvious) + +If the workspace is the product source code AND the broken surface is named clearly in the transcript or visible UI, add one short "Likely surface" line with file path and confidence (`High` / `Medium` / `Low`). Skip this section entirely when the mapping is speculative — speculative mappings belong in the extensive path, not a quick bug report. + +## What to skip + +- No `problem-analysis.md`, no `requirements-kickoff.md`, no Visual / Functional / Requirement / UX category split. +- No automatic handoff to `ce-brainstorm`. The quick path ends with the bug report. +- No commit of `raw/` or `frames/` — they live only in the temp dir and are discarded by the OS. +- No source-mapping pass across the codebase. + +## Escalation + +If, while reading the transcript, the recording turns out to contain multiple distinct issues, requirements, or a workflow walkthrough, stop and tell the user: "This recording has more than one issue — switching to the extensive path." Then load `references/extensive-analysis.md` and re-run the analyzer with a non-temp output directory. diff --git a/plugins/compound-engineering/skills/ce-riffrec-feedback-analysis/scripts/analyze_riffrec_zip.py b/plugins/compound-engineering/skills/ce-riffrec-feedback-analysis/scripts/analyze_riffrec_zip.py new file mode 100755 index 000000000..818cb93af --- /dev/null +++ b/plugins/compound-engineering/skills/ce-riffrec-feedback-analysis/scripts/analyze_riffrec_zip.py @@ -0,0 +1,1121 @@ +#!/usr/bin/env python3 +""" +Analyze a product feedback source. + +Supported sources: Riffrec zip, standalone video, standalone audio, and +meeting notes text/markdown. The script extracts transcript, high-signal +video frames when available, and CE-friendly markdown artifacts. +""" + +from __future__ import annotations + +import argparse +import json +import os +import re +import shutil +import subprocess +import sys +import zipfile +from collections import Counter, defaultdict +from datetime import datetime, timezone +from pathlib import Path +from typing import Any + + +COMPLAINT_CUES = ( + "weird", + "doesn't work", + "does not work", + "dont work", + "don't work", + "can't", + "cannot", + "broken", + "bug", + "problem", + "confusing", + "should", + "wrong", + "stuck", + "failed", +) + +NOISY_NETWORK_PATTERNS = ( + "/mini-profiler-resources/", + "__vite_ping", + "/rails/action_cable", +) + +VIDEO_EXTENSIONS = {".webm", ".mp4", ".mov", ".m4v", ".mkv", ".avi"} +AUDIO_EXTENSIONS = {".webm", ".mp3", ".mp4", ".mpeg", ".mpga", ".m4a", ".wav", ".ogg", ".flac"} +NOTES_EXTENSIONS = {".txt", ".md", ".markdown", ".text"} + + +def parse_args() -> argparse.Namespace: + parser = argparse.ArgumentParser(description="Analyze a product feedback source") + parser.add_argument("source_path", type=Path, help="Path to a Riffrec zip, video, audio, or meeting notes file") + parser.add_argument( + "--output-dir", + type=Path, + help="Directory for extracted artifacts. Defaults to docs/brainstorms/riffrec-feedback/ when available.", + ) + parser.add_argument("--topic", help="Kebab-case topic for requirements-kickoff frontmatter") + parser.add_argument( + "--model", + default=os.environ.get("RIFFREC_TRANSCRIBE_MODEL", "gpt-4o-mini-transcribe"), + help="OpenAI transcription model to use when OPENAI_API_KEY is set", + ) + parser.add_argument("--no-transcribe", action="store_true", help="Skip media transcription") + parser.add_argument("--max-moments", type=int, default=12, help="Maximum screenshots to extract") + return parser.parse_args() + + +def slugify(value: str) -> str: + slug = re.sub(r"[^a-zA-Z0-9]+", "-", value.strip().lower()).strip("-") + return re.sub(r"-{2,}", "-", slug) or "riffrec-feedback" + + +def read_json(path: Path, default: Any) -> Any: + if not path.exists(): + return default + try: + return json.loads(path.read_text()) + except json.JSONDecodeError: + return default + + +def safe_extract(zip_path: Path, dest: Path) -> None: + dest.mkdir(parents=True, exist_ok=True) + with zipfile.ZipFile(zip_path) as archive: + for member in archive.infolist(): + member_path = dest / member.filename + resolved = member_path.resolve() + if not str(resolved).startswith(str(dest.resolve())): + raise RuntimeError(f"Unsafe zip member path: {member.filename}") + if member.is_dir(): + resolved.mkdir(parents=True, exist_ok=True) + else: + resolved.parent.mkdir(parents=True, exist_ok=True) + with archive.open(member) as source, resolved.open("wb") as target: + shutil.copyfileobj(source, target) + + +def default_output_dir(zip_path: Path) -> Path: + cwd = Path.cwd() + stem = slugify(zip_path.stem) + if (cwd / "docs" / "brainstorms").is_dir(): + return cwd / "docs" / "brainstorms" / "riffrec-feedback" / stem + return cwd / "riffrec-feedback" / stem + + +def classify_source(source_path: Path) -> str: + if zipfile.is_zipfile(source_path): + return "riffrec_zip" + suffix = source_path.suffix.lower() + if suffix in NOTES_EXTENSIONS: + return "meeting_notes" + if suffix in VIDEO_EXTENSIONS and suffix in AUDIO_EXTENSIONS: + return "video" if has_video_stream(source_path) else "audio" + if suffix in VIDEO_EXTENSIONS: + return "video" + if suffix in AUDIO_EXTENSIONS: + return "audio" + return "unknown" + + +def ffprobe_duration(path: Path) -> float: + if not path.exists() or not shutil.which("ffprobe"): + return 0.0 + command = [ + "ffprobe", + "-v", + "error", + "-show_entries", + "format=duration", + "-of", + "default=noprint_wrappers=1:nokey=1", + str(path), + ] + result = subprocess.run(command, capture_output=True, text=True, timeout=30) + if result.returncode != 0: + return 0.0 + try: + return float(result.stdout.strip()) + except ValueError: + return 0.0 + + +def has_video_stream(path: Path) -> bool: + if not path.exists() or not shutil.which("ffprobe"): + return path.suffix.lower() in VIDEO_EXTENSIONS + command = [ + "ffprobe", + "-v", + "error", + "-select_streams", + "v:0", + "-show_entries", + "stream=codec_type", + "-of", + "csv=p=0", + str(path), + ] + result = subprocess.run(command, capture_output=True, text=True, timeout=30) + return result.returncode == 0 and "video" in result.stdout + + +def read_notes(path: Path) -> dict[str, Any]: + try: + text = path.read_text() + except UnicodeDecodeError: + text = path.read_text(encoding="utf-8", errors="replace") + return {"status": "ok", "text": text.strip(), "source": "meeting_notes"} + + +def prepare_source(source_path: Path, raw_dir: Path) -> dict[str, Any]: + raw_dir.mkdir(parents=True, exist_ok=True) + source_kind = classify_source(source_path) + + if source_kind == "riffrec_zip": + safe_extract(source_path, raw_dir) + session = read_json(raw_dir / "session.json", {}) + events_payload = read_json(raw_dir / "events.json", {}) + events = events_payload.get("events", events_payload if isinstance(events_payload, list) else []) + if not isinstance(events, list): + events = [] + try: + duration = float(session.get("duration_seconds") or events_payload.get("duration_seconds") or 0) + except (TypeError, ValueError): + duration = 0.0 + return { + "source_kind": source_kind, + "session": session, + "events": events, + "duration": duration, + "recording_path": raw_dir / "recording.webm", + "transcription_path": raw_dir / "voice.webm", + "notes_transcript": None, + } + + copied_path = raw_dir / source_path.name + if source_path.resolve() != copied_path.resolve(): + shutil.copy2(source_path, copied_path) + + session = { + "url": "unknown", + "started_at": "unknown", + "duration_seconds": 0, + "source_file": str(source_path), + "source_kind": source_kind, + } + + if source_kind == "meeting_notes": + notes_transcript = read_notes(copied_path) + return { + "source_kind": source_kind, + "session": session, + "events": [], + "duration": 0.0, + "recording_path": None, + "transcription_path": None, + "notes_transcript": notes_transcript, + } + + duration = ffprobe_duration(copied_path) + session["duration_seconds"] = round(duration, 3) if duration else 0 + recording_path = copied_path if has_video_stream(copied_path) else None + transcription_path = copied_path if source_kind in {"video", "audio", "unknown"} else None + return { + "source_kind": source_kind, + "session": session, + "events": [], + "duration": duration, + "recording_path": recording_path, + "transcription_path": transcription_path, + "notes_transcript": None, + } + + +def repo_relative(path: Path, base: Path) -> str: + try: + return str(path.resolve().relative_to(base.resolve())) + except ValueError: + return str(path) + + +def display_path(path: Path, repo_root: Path) -> str: + relative = repo_relative(path, repo_root) + return relative if not relative.startswith("/") else str(path) + + +def format_time(seconds: float | int | None) -> str: + if seconds is None: + return "n/a" + seconds_float = float(seconds) + minutes = int(seconds_float // 60) + rest = seconds_float - minutes * 60 + return f"{minutes:02d}:{rest:05.2f}" + + +def event_time(event: dict[str, Any]) -> float: + try: + return float(event.get("t", 0)) + except (TypeError, ValueError): + return 0.0 + + +def event_label(event: dict[str, Any]) -> str: + event_type = event.get("type", "event") + if event_type == "click": + element = event.get("element") or {} + text = compact_text(element.get("text") or "") + element_id = element.get("id") + tag = element.get("tag") or "element" + if element_id: + return f"click {tag}#{element_id} {text}".strip() + return f"click {tag} {text}".strip() + if event_type == "network_request": + return f"{event.get('method', 'GET')} {event.get('url', '')} -> {event.get('status')}" + return compact_text(json.dumps(event, sort_keys=True)) + + +def compact_text(text: str, limit: int = 120) -> str: + compacted = re.sub(r"\s+", " ", str(text)).strip() + if len(compacted) <= limit: + return compacted + return compacted[: limit - 1].rstrip() + "..." + + +def network_is_noise(event: dict[str, Any]) -> bool: + url = str(event.get("url") or "") + return any(pattern in url for pattern in NOISY_NETWORK_PATTERNS) + + +def transcript_has_complaint(transcript: str) -> bool: + lowered = transcript.lower() + return any(cue in lowered for cue in COMPLAINT_CUES) + + +def transcribe_media(media_path: Path | None, model: str) -> dict[str, Any]: + if not media_path or not media_path.exists(): + return {"status": "missing", "text": ""} + api_key = os.environ.get("OPENAI_API_KEY") + if not api_key: + return { + "status": "skipped", + "text": "", + "reason": "OPENAI_API_KEY is not set. Re-run with the key available to transcribe the media file.", + } + if not shutil.which("curl"): + return {"status": "skipped", "text": "", "reason": "curl is not installed"} + + command = [ + "curl", + "-sS", + "https://api.openai.com/v1/audio/transcriptions", + "-H", + f"Authorization: Bearer {api_key}", + "-F", + f"file=@{media_path}", + "-F", + f"model={model}", + "-F", + "response_format=json", + ] + try: + result = subprocess.run(command, capture_output=True, text=True, timeout=180) + except subprocess.TimeoutExpired: + return {"status": "failed", "text": "", "reason": "transcription request timed out"} + + if result.returncode != 0: + return { + "status": "failed", + "text": "", + "reason": compact_text(result.stderr or result.stdout, 500), + } + + try: + payload = json.loads(result.stdout) + except json.JSONDecodeError: + return {"status": "failed", "text": "", "reason": compact_text(result.stdout, 500)} + + if "error" in payload: + return {"status": "failed", "text": "", "reason": compact_text(json.dumps(payload["error"]), 500)} + + text = payload.get("text", "") + return {"status": "ok", "text": text, "raw": payload} + + +def should_retry_transcription_in_chunks(transcript: dict[str, Any]) -> bool: + reason = str(transcript.get("reason") or "") + return transcript.get("status") == "failed" and ( + "input_too_large" in reason or "too large" in reason.lower() or "maximum context" in reason.lower() + ) + + +def transcribe_media_chunks( + media_path: Path | None, + model: str, + chunks_dir: Path, + duration: float, + chunk_seconds: int = 420, +) -> dict[str, Any]: + if not media_path or not media_path.exists(): + return {"status": "missing", "text": ""} + if not shutil.which("ffmpeg"): + return {"status": "failed", "text": "", "reason": "ffmpeg is not installed; cannot chunk media"} + + chunks_dir.mkdir(parents=True, exist_ok=True) + chunk_count = max(1, int((duration or chunk_seconds) // chunk_seconds) + (1 if duration % chunk_seconds else 0)) + transcripts: list[str] = [] + chunk_results: list[dict[str, Any]] = [] + + for index in range(chunk_count): + start = index * chunk_seconds + if duration and start >= duration: + break + chunk_path = chunks_dir / f"audio-chunk-{index + 1:03d}-{start}s.mp3" + extract_command = [ + "ffmpeg", + "-y", + "-ss", + str(start), + "-t", + str(chunk_seconds), + "-i", + str(media_path), + "-vn", + "-ac", + "1", + "-ar", + "16000", + "-b:a", + "64k", + str(chunk_path), + ] + extract = subprocess.run(extract_command, capture_output=True, text=True, timeout=180) + if extract.returncode != 0 or not chunk_path.exists(): + chunk_results.append( + { + "chunk": index + 1, + "start_seconds": start, + "status": "failed", + "reason": compact_text(extract.stderr or extract.stdout, 500), + } + ) + continue + + chunk_transcript = transcribe_media(chunk_path, model) + chunk_results.append( + { + "chunk": index + 1, + "start_seconds": start, + "path": str(chunk_path), + "status": chunk_transcript.get("status"), + "reason": chunk_transcript.get("reason"), + } + ) + if chunk_transcript.get("text"): + transcripts.append(f"[{format_time(start)}]\n{chunk_transcript['text'].strip()}") + + if transcripts: + return { + "status": "ok", + "text": "\n\n".join(transcripts), + "source": "chunked_media", + "chunk_seconds": chunk_seconds, + "chunks": chunk_results, + } + + return { + "status": "failed", + "text": "", + "reason": "No chunks transcribed successfully", + "chunks": chunk_results, + } + + +def select_moments( + events: list[dict[str, Any]], + transcript: str, + duration: float, + max_moments: int, +) -> list[dict[str, Any]]: + candidates: list[dict[str, Any]] = [] + clicks_by_target: defaultdict[str, list[dict[str, Any]]] = defaultdict(list) + has_complaint = transcript_has_complaint(transcript) + + for event in events: + event_type = event.get("type") + t = event_time(event) + if event_type == "click": + element = event.get("element") or {} + key = element.get("id") or element.get("selector") or element.get("text") or "unknown" + clicks_by_target[str(key)].append(event) + reason = "click event" + if has_complaint and duration and t >= duration * 0.55: + reason = "late-session click near complaint transcript" + candidates.append({"t": t, "reason": reason, "events": [event]}) + elif event_type == "network_request": + status = event.get("status") + try: + failed = int(status) >= 400 + except (TypeError, ValueError): + failed = False + if failed and not network_is_noise(event): + candidates.append({"t": t, "reason": f"failed network request ({status})", "events": [event]}) + elif event_type in {"console_error", "error", "exception"}: + candidates.append({"t": t, "reason": f"{event_type} event", "events": [event]}) + + for grouped in clicks_by_target.values(): + if len(grouped) >= 2: + first = grouped[0] + last = grouped[-1] + if event_time(last) - event_time(first) <= 8: + candidates.append( + { + "t": event_time(last), + "reason": "repeated clicks on the same target", + "events": grouped[:4], + } + ) + + if has_complaint and duration and not candidates: + for fraction in (0.35, 0.55, 0.75, 0.9): + candidates.append({"t": max(0.0, duration * fraction), "reason": "representative frame for complaint transcript", "events": []}) + + if duration and not candidates: + fractions = (0.1, 0.3, 0.5, 0.7, 0.9) + for fraction in fractions[:max(1, min(max_moments, len(fractions)))]: + candidates.append({"t": max(0.0, duration * fraction), "reason": "representative video frame", "events": []}) + + candidates.sort(key=lambda item: (item["t"], item["reason"])) + deduped: list[dict[str, Any]] = [] + for candidate in candidates: + if candidate["t"] < 0: + continue + if any(abs(candidate["t"] - existing["t"]) < 0.45 and candidate["reason"] == existing["reason"] for existing in deduped): + continue + deduped.append(candidate) + if len(deduped) >= max_moments: + break + + for index, moment in enumerate(deduped, start=1): + moment["id"] = f"M{index}" + return deduped + + +def extract_frames(recording_path: Path | None, frames_dir: Path, moments: list[dict[str, Any]]) -> None: + frames_dir.mkdir(parents=True, exist_ok=True) + if not recording_path or not recording_path.exists(): + for moment in moments: + moment["screenshot"] = None + moment["screenshot_status"] = "no video source" + return + if not shutil.which("ffmpeg"): + for moment in moments: + moment["screenshot"] = None + moment["screenshot_status"] = "ffmpeg not installed" + return + + for moment in moments: + safe_reason = slugify(moment["reason"])[:48] + frame_path = frames_dir / f"{moment['id'].lower()}-{moment['t']:.2f}s-{safe_reason}.png" + command = [ + "ffmpeg", + "-y", + "-ss", + f"{max(0.0, float(moment['t'])):.3f}", + "-i", + str(recording_path), + "-frames:v", + "1", + "-q:v", + "2", + str(frame_path), + ] + result = subprocess.run(command, capture_output=True, text=True, timeout=60) + if result.returncode == 0 and frame_path.exists(): + moment["screenshot"] = str(frame_path) + moment["screenshot_status"] = "ok" + else: + moment["screenshot"] = None + moment["screenshot_status"] = compact_text(result.stderr or result.stdout, 300) + + +def event_counts(events: list[dict[str, Any]]) -> dict[str, int]: + return dict(Counter(str(event.get("type", "unknown")) for event in events)) + + +def summarize_candidate_findings(moments: list[dict[str, Any]], transcript: str) -> list[dict[str, Any]]: + findings: list[dict[str, Any]] = [] + complaint_moments = [moment for moment in moments if "complaint" in moment.get("reason", "")] + repeated_clicks = [moment for moment in moments if "repeated clicks" in moment.get("reason", "")] + failed_requests = [moment for moment in moments if "failed network" in moment.get("reason", "")] + + if transcript_has_complaint(transcript): + evidence_ids = [moment["id"] for moment in complaint_moments] or [moment["id"] for moment in moments[-3:]] + findings.append( + { + "id": "F1", + "title": "User reported a control that felt weird or unclickable", + "severity": "P2", + "observed": "Transcript or notes contain a complaint cue. Review linked moments when available and use the text to identify the affected product behavior.", + "expected": "The affected product behavior should either work as presented or clearly explain why it is unavailable.", + "evidence": evidence_ids, + "confidence": "Medium until screenshots are reviewed", + } + ) + + if repeated_clicks: + findings.append( + { + "id": f"F{len(findings) + 1}", + "title": "Repeated interaction may indicate missing feedback or a dead control", + "severity": "P2", + "observed": "The same target was clicked more than once within a short interval.", + "expected": "Repeated clicks should not be needed; the UI should respond once or show a clear disabled/error state.", + "evidence": [moment["id"] for moment in repeated_clicks], + "confidence": "Medium", + } + ) + + if failed_requests: + findings.append( + { + "id": f"F{len(findings) + 1}", + "title": "User-visible flow coincided with failed network requests", + "severity": "P2", + "observed": "One or more non-noisy network requests returned a failure status.", + "expected": "Failures should be handled with durable user feedback and recoverable behavior.", + "evidence": [moment["id"] for moment in failed_requests], + "confidence": "High for request failure, medium for user impact until screenshots are reviewed", + } + ) + + if not findings: + findings.append( + { + "id": "F1", + "title": "No obvious failure detected automatically", + "severity": "P3", + "observed": "The analyzer did not find complaint cues, repeated clicks, console errors, or non-noisy failed requests.", + "expected": "A human should still inspect the source evidence before closing the feedback.", + "evidence": [moment["id"] for moment in moments[:3]], + "confidence": "Low", + } + ) + return findings + + +def markdown_link(path: str | None, output_dir: Path, repo_root: Path) -> str: + if not path: + return "n/a" + path_obj = Path(path) + if path_obj.exists(): + return repo_relative(path_obj, repo_root) + return path + + +def write_analysis_md( + output_path: Path, + source_path: Path, + source_kind: str, + session: dict[str, Any], + events: list[dict[str, Any]], + transcript: dict[str, Any], + moments: list[dict[str, Any]], + findings: list[dict[str, Any]], + repo_root: Path, +) -> None: + lines: list[str] = [] + lines.append("# Product Feedback Analysis") + lines.append("") + lines.append("## Source") + lines.append("") + lines.append(f"- Source: `{source_path}`") + lines.append(f"- Source kind: `{source_kind}`") + lines.append(f"- URL: `{session.get('url', 'unknown')}`") + lines.append(f"- Started: `{session.get('started_at', 'unknown')}`") + lines.append(f"- Duration: `{session.get('duration_seconds', 'unknown')}` seconds") + lines.append(f"- Browser: `{session.get('browser', 'unknown')}`") + lines.append(f"- Event counts: `{event_counts(events)}`") + lines.append("") + lines.append("## Transcript") + lines.append("") + if transcript.get("text"): + lines.append(transcript["text"].strip()) + else: + lines.append(f"_Transcript unavailable: {transcript.get('reason') or transcript.get('status', 'unknown')}._") + lines.append("") + lines.append("## Selected Moments") + lines.append("") + if moments: + lines.append("| ID | Time | Why selected | Screenshot | Event evidence |") + lines.append("|---|---:|---|---|---|") + for moment in moments: + screenshot = markdown_link(moment.get("screenshot"), output_path.parent, repo_root) + evidence = "
".join(compact_text(event_label(event), 140) for event in moment.get("events", [])) or "n/a" + lines.append( + f"| {moment['id']} | {format_time(moment['t'])} | {moment['reason']} | `{screenshot}` | {evidence} |" + ) + else: + lines.append("_No video moments available for this source._") + lines.append("") + lines.append("## Candidate Findings") + lines.append("") + for finding in findings: + lines.append(f"### {finding['id']}. {finding['title']}") + lines.append("") + lines.append(f"- **Severity:** {finding['severity']}") + lines.append(f"- **Observed:** {finding['observed']}") + lines.append(f"- **Expected:** {finding['expected']}") + lines.append(f"- **Evidence:** {', '.join(finding['evidence'])}") + lines.append(f"- **Confidence:** {finding['confidence']}") + lines.append("") + lines.append("## Human Review Checklist") + lines.append("") + lines.append("- Open each selected screenshot and name the exact visible control or state.") + lines.append("- Tie transcript language to the closest click or visible UI state.") + lines.append("- Promote only confirmed product problems into requirements.") + lines.append("- Use repo-relative screenshot paths when moving evidence into a CE requirements document.") + output_path.write_text("\n".join(lines) + "\n") + + +def write_requirements_kickoff( + output_path: Path, + topic: str, + session: dict[str, Any], + findings: list[dict[str, Any]], + moments: list[dict[str, Any]], + repo_root: Path, +) -> None: + title = topic.replace("-", " ").title() + date = datetime.now(timezone.utc).date().isoformat() + primary_evidence = ", ".join(finding["id"] for finding in findings) + screenshot_refs = [] + for moment in moments: + if moment.get("screenshot"): + screenshot_refs.append(f"{moment['id']}: `{markdown_link(moment['screenshot'], output_path.parent, repo_root)}`") + evidence_text = "; ".join(screenshot_refs[:6]) or "See analysis.md selected moments." + source_materials = markdown_link(str(output_path.parent / "source-materials.md"), output_path.parent, repo_root) + analysis_path = markdown_link(str(output_path.parent / "analysis.md"), output_path.parent, repo_root) + problem_analysis_path = markdown_link(str(output_path.parent / "problem-analysis.md"), output_path.parent, repo_root) + review_prompt_path = markdown_link(str(output_path.parent / "review-prompt.md"), output_path.parent, repo_root) + + lines = [ + "---", + f"date: {date}", + f"topic: {topic}", + "---", + "", + f"# {title}", + "", + "## Problem Frame", + "", + f"A product feedback source for `{session.get('url', 'the product surface')}` produced evidence of product friction. The raw source has been converted into transcript, selected moments when video is available, screenshots when frames can be extracted, and candidate findings so the team can decide what product behavior should change before planning implementation.", + "", + "Source materials for brainstorm:", + f"- Source materials manifest: `{source_materials}`", + f"- Analysis: `{analysis_path}`", + f"- Problem analysis: `{problem_analysis_path}`", + f"- Review prompt with transcript and frames: `{review_prompt_path}`", + "", + "---", + "", + "## Actors", + "", + "- A1. User: Operates the product in the recorded session and verbalizes friction.", + "- A2. Product surface: The UI and backend behavior visible in the recording.", + "- A3. Brainstorm agent: Uses the evidence bundle to confirm, correct, and group requirements before planning.", + "", + "---", + "", + "## Key Flows", + "", + "- F1. Evidence-backed feedback triage", + " - **Trigger:** A feedback zip, video, audio file, or meeting notes file is available.", + " - **Actors:** A1, A2, A3", + " - **Steps:** Extract or copy the source, transcribe media or read notes, select high-signal moments when video exists, inspect screenshots when available, confirm problems, and write requirements with supporting evidence.", + " - **Outcome:** Confirmed product problems are represented as requirements with transcript support and screenshot support when visual evidence exists.", + " - **Covered by:** R1, R2, R3", + "", + "---", + "", + "## Requirements", + "", + "**Evidence handling**", + "- R1. Each confirmed product problem must cite supporting transcript, notes, or moment evidence from the source, including timestamp and screenshot when video is available.", + "- R2. Transcript claims must be tied to the closest visible interaction or explicitly marked as untimed verbal context.", + "", + "**Product requirements from this session**", + ] + + for index, finding in enumerate(findings, start=3): + lines.append(f"- R{index}. Resolve or intentionally scope the issue described by {finding['id']}: {finding['title']}.") + + lines.extend( + [ + "", + "---", + "", + "## Acceptance Examples", + "", + "- AE1. **Covers R1, R2.** Given a feedback source with voice, video, or notes, when the analysis is complete, each promoted issue includes source evidence rather than prose-only claims.", + "- AE2. **Covers R3.** Given the user reports that a button is weird or unclickable, when requirements are finalized, the requirement identifies the specific control and the expected available/unavailable behavior.", + "", + "---", + "", + "## Success Criteria", + "", + "- A human reviewer can understand what went wrong without rewatching the entire recording.", + "- `ce-brainstorm` can confirm requirements from linked source evidence before any planning begins.", + "", + "---", + "", + "## Scope Boundaries", + "", + "- The analyzer output is evidence and requirements kickoff material, not final implementation design.", + "- Automatically detected findings remain candidates until screenshots are inspected.", + "- Development-only noise, such as profiler requests, should not become product requirements unless it affects the user experience.", + "", + "---", + "", + "## Key Decisions", + "", + "- Evidence first: Requirements should cite moments and screenshots before moving to planning.", + "- Brainstorm before plan: Use `ce-brainstorm` to refine product behavior when the recording reveals ambiguity.", + "", + "---", + "", + "## Dependencies / Assumptions", + "", + f"- Source session URL: `{session.get('url', 'unknown')}`.", + f"- Source materials manifest: `{source_materials}`.", + f"- Candidate findings: {primary_evidence}.", + f"- Screenshot evidence: {evidence_text}.", + "", + "---", + "", + "## Outstanding Questions", + "", + "### Resolve Before Planning", + "", + "- Which candidate findings are real product problems after screenshot review?", + "- For each promoted finding, what should the user experience be instead?", + "", + "### Deferred to Planning", + "", + "- [Technical] Which code paths own the confirmed product behavior?", + "- [Technical] What regression tests should lock the behavior once fixed?", + "", + "---", + "", + "## Next Steps", + "", + "-> Resume `/ce-brainstorm` to confirm candidate findings and replace generic R-items with product-specific requirements.", + ] + ) + output_path.write_text("\n".join(lines) + "\n") + + +def write_source_materials( + output_path: Path, + source_path: Path, + source_kind: str, + session: dict[str, Any], + transcript: dict[str, Any], + moments: list[dict[str, Any]], + raw_dir: Path, + frames_dir: Path, + repo_root: Path, +) -> None: + def link(path: Path) -> str: + return markdown_link(str(path), output_path.parent, repo_root) + + raw_files = sorted(path for path in raw_dir.rglob("*") if path.is_file()) + frame_files = sorted(path for path in frames_dir.rglob("*.png") if path.is_file()) + chunk_files = sorted((raw_dir / "transcription_chunks").glob("*")) if (raw_dir / "transcription_chunks").is_dir() else [] + + copied_source = next((path for path in raw_files if path.name == source_path.name), None) + if not copied_source: + copied_source = raw_dir / "recording.webm" if (raw_dir / "recording.webm").exists() else None + + lines = [ + "# Source Materials", + "", + "Use this manifest during brainstorm so requirements can be traced back to the raw feedback evidence.", + "", + "## Original Source", + "", + f"- Source kind: `{source_kind}`", + f"- Original path: `{source_path}`", + f"- Local raw copy: `{link(copied_source) if copied_source else 'n/a'}`", + "- Commit policy: raw media, audio chunks, zip contents, session dumps, and extracted screenshots are local-only by default; commit generated Markdown/JSON/manifests when useful for brainstorm/planning traceability.", + f"- Session URL: `{session.get('url', 'unknown')}`", + f"- Duration: `{session.get('duration_seconds', 'unknown')}` seconds", + "", + "## Analysis Artifacts", + "", + f"- Analysis summary: `{link(output_path.parent / 'analysis.md')}`", + f"- Problem statements: `{link(output_path.parent / 'problem-analysis.md')}`", + f"- Review prompt: `{link(output_path.parent / 'review-prompt.md')}`", + f"- Requirements kickoff: `{link(output_path.parent / 'requirements-kickoff.md')}`", + f"- Structured JSON: `{link(output_path.parent / 'analysis.json')}`", + "", + "## Transcript", + "", + f"- Transcript status: `{transcript.get('status', 'unknown')}`", + f"- Transcript source: `{transcript.get('source', source_kind)}`", + f"- Transcript text lives in: `{link(output_path.parent / 'analysis.md')}` and `{link(output_path.parent / 'review-prompt.md')}`", + ] + + if chunk_files: + lines.append("- Transcription chunks:") + lines.append(f" - retained locally in `{link(raw_dir / 'transcription_chunks')}`; not commit-safe by default.") + + lines.extend(["", "## Local-Only Frames", ""]) + lines.append("Extracted screenshots are retained locally for agent inspection and should not be committed by default.") + lines.append("") + if moments: + lines.append("| Moment | Time | Screenshot | Why selected |") + lines.append("|---|---:|---|---|") + for moment in moments: + screenshot = moment.get("screenshot") + lines.append( + f"| {moment['id']} | {format_time(moment['t'])} | `{markdown_link(screenshot, output_path.parent, repo_root)}` | {moment['reason']} |" + ) + else: + lines.append("_No video frames were available for this source._") + + if frame_files: + lines.extend(["", "All frame files:"]) + for frame in frame_files: + lines.append(f"- `{link(frame)}`") + + lines.extend(["", "## Local Raw Files", ""]) + lines.append("Raw files are intentionally local-only by default. Do not commit these unless the user explicitly asks and privacy/security is acceptable.") + lines.append("") + for raw_file in raw_files[:50]: + lines.append(f"- `{link(raw_file)}`") + if len(raw_files) > 50: + lines.append(f"- ... {len(raw_files) - 50} more files") + + output_path.write_text("\n".join(lines) + "\n") + + +def write_problem_analysis( + output_path: Path, + transcript: dict[str, Any], + moments: list[dict[str, Any]], + findings: list[dict[str, Any]], + repo_root: Path, +) -> None: + complaint_text = transcript.get("text") or "" + lines = [ + "", + "## 1. Visual/UI Problems", + "", + ] + if moments: + lines.extend( + [ + "1. Review required: inspect the extracted frames and replace this scaffold with precise visual observations. Include location, UI element type, issue description, and frame reference.", + "", + ] + ) + else: + lines.extend(["1. No video frames were available for this source.", ""]) + + lines.extend(["## 2. Functional Problems", ""]) + + for index, finding in enumerate(findings, start=1): + evidence = ", ".join(finding.get("evidence", [])) or "n/a" + lines.append( + f"{index}. {finding['title']}: {finding['observed']} Evidence: {evidence}. Context from discussion: {compact_text(complaint_text, 220) or 'n/a'}" + ) + if not findings: + lines.append("1. No functional problems were detected automatically; inspect transcript and frames manually.") + + lines.extend( + [ + "", + "## 3. Requirements", + "", + "1. Convert confirmed problems into requirements after evidence review. State what capability or behavior is needed and why, without prescribing implementation.", + "", + "## 4. Usability/UX Problems", + "", + ] + ) + + for index, moment in enumerate(moments, start=1): + screenshot = markdown_link(moment.get("screenshot"), output_path.parent, repo_root) + lines.append( + f"{index}. Moment {moment['id']} at {format_time(moment['t'])}: Review `{screenshot}` for UX friction related to `{moment['reason']}`." + ) + if not moments: + lines.append("1. Review the transcript or notes for workflow friction, confusion, and unmet expectations.") + + lines.append("") + output_path.write_text("\n".join(lines) + "\n") + + +def write_review_prompt( + output_path: Path, + transcript: dict[str, Any], + moments: list[dict[str, Any]], + repo_root: Path, +) -> None: + frame_lines: list[str] = [] + for moment in moments: + screenshot = markdown_link(moment.get("screenshot"), output_path.parent, repo_root) + event_summary = "; ".join(event_label(event) for event in moment.get("events", [])) or "no event metadata" + frame_lines.append( + f"- {moment['id']} ({format_time(moment['t'])}, {moment['reason']}): `{screenshot}`. Events: {event_summary}" + ) + if not frame_lines: + frame_lines.append("- No video frames are available for this source. Analyze transcript or meeting notes only.") + + lines = [ + "You will be analyzing a product feedback session by examining video frames and a discussion transcript. Your goal is to identify problems, requirements, and feedback points that need to be addressed - focusing on clear problem statements rather than solutions.", + "", + "Here are the frames extracted from the video:", + "", + "", + *frame_lines, + "", + "", + "Here is the transcript of the discussion that occurred during the feedback session:", + "", + "", + transcript.get("text") or f"[Transcript unavailable: {transcript.get('reason') or transcript.get('status', 'unknown')}]", + "", + "", + "Your task is to carefully analyze both the visual content and the discussion to extract actionable problem statements. Follow these guidelines:", + "", + "**Visual Analysis Requirements:**", + "- Examine each frame carefully for UI/UX issues, bugs, design inconsistencies, or usability problems", + "- Be extremely precise about what you observe: specify exact locations (e.g., \"top-right corner,\" \"navigation bar,\" \"third item in the list\")", + "- Identify specific UI elements by type (button, input field, dropdown, modal, etc.)", + "- Note visual problems like misalignment, poor contrast, truncated text, overlapping elements, broken layouts, etc.", + "", + "**Discussion Analysis Requirements:**", + "- Extract feedback points, feature requests, and problems mentioned in the conversation", + "- Identify requirements that are stated or implied", + "- Note any pain points or frustrations expressed by participants", + "- Connect visual observations with relevant discussion points when applicable", + "", + "**Problem Statement Guidelines:**", + "- Focus on describing WHAT the problem is, not HOW to fix it", + "- Be specific and actionable - avoid vague statements", + "- Each problem should be clear enough that a developer or designer can understand what needs to be addressed", + "- Include context about where the problem occurs and why it matters", + "", + "Structure your final output as follows:", + "", + "1. **Visual/UI Problems**: Issues observed directly in the interface", + "2. **Functional Problems**: Issues related to behavior, workflow, or functionality mentioned in discussion", + "3. **Requirements**: New features or capabilities requested", + "4. **Usability/UX Problems**: Issues related to user experience, confusion, or workflow friction", + "", + "Format each problem as a clear, numbered item within its category.", + "", + "Your final output should contain only the analysis section with clearly categorized, numbered problem statements. Do not include scratchpad notes.", + ] + output_path.write_text("\n".join(lines) + "\n") + + +def main() -> int: + args = parse_args() + source_path = args.source_path.expanduser().resolve() + if not source_path.exists(): + print(f"Source file not found: {source_path}", file=sys.stderr) + return 1 + + output_dir = (args.output_dir or default_output_dir(source_path)).expanduser().resolve() + output_dir.mkdir(parents=True, exist_ok=True) + raw_dir = output_dir / "raw" + frames_dir = output_dir / "frames" + source = prepare_source(source_path, raw_dir) + source_kind = source["source_kind"] + session = source["session"] + events = source["events"] + duration = source["duration"] + + if source["notes_transcript"]: + transcript = source["notes_transcript"] + elif args.no_transcribe: + transcript = {"status": "skipped", "text": "", "reason": "--no-transcribe was passed"} + else: + transcript = transcribe_media(source["transcription_path"], args.model) + if should_retry_transcription_in_chunks(transcript): + transcript = transcribe_media_chunks( + source["transcription_path"], + args.model, + raw_dir / "transcription_chunks", + duration, + ) + + moments = select_moments(events, transcript.get("text", ""), duration, args.max_moments) + if not moments and source["recording_path"]: + fallback_times = [0.5, 2.0, 5.0, 10.0, 15.0] + moments = [ + {"id": f"M{index}", "t": timestamp, "reason": "representative video frame", "events": []} + for index, timestamp in enumerate(fallback_times[: args.max_moments], start=1) + ] + extract_frames(source["recording_path"], frames_dir, moments) + findings = summarize_candidate_findings(moments, transcript.get("text", "")) + + topic = slugify(args.topic or source_path.stem) + repo_root = Path.cwd() + analysis_md = output_dir / "analysis.md" + problem_analysis_md = output_dir / "problem-analysis.md" + review_prompt_md = output_dir / "review-prompt.md" + source_materials_md = output_dir / "source-materials.md" + kickoff_md = output_dir / "requirements-kickoff.md" + write_analysis_md(analysis_md, source_path, source_kind, session, events, transcript, moments, findings, repo_root) + write_problem_analysis(problem_analysis_md, transcript, moments, findings, repo_root) + write_review_prompt(review_prompt_md, transcript, moments, repo_root) + write_source_materials(source_materials_md, source_path, source_kind, session, transcript, moments, raw_dir, frames_dir, repo_root) + write_requirements_kickoff(kickoff_md, topic, session, findings, moments, repo_root) + + structured = { + "source": str(source_path), + "source_kind": source_kind, + "output_dir": str(output_dir), + "session": session, + "event_counts": event_counts(events), + "transcript": transcript, + "moments": moments, + "candidate_findings": findings, + "artifacts": { + "analysis_md": str(analysis_md), + "problem_analysis_md": str(problem_analysis_md), + "review_prompt_md": str(review_prompt_md), + "source_materials_md": str(source_materials_md), + "requirements_kickoff_md": str(kickoff_md), + "frames_dir": str(frames_dir), + "raw_dir": str(raw_dir), + }, + } + (output_dir / "analysis.json").write_text(json.dumps(structured, indent=2, sort_keys=True) + "\n") + + print(f"Analysis written to: {analysis_md}") + print(f"Problem analysis scaffold written to: {problem_analysis_md}") + print(f"Review prompt written to: {review_prompt_md}") + print(f"Source materials manifest written to: {source_materials_md}") + print(f"Requirements kickoff written to: {kickoff_md}") + print(f"Frames written to: {frames_dir}") + print("") + print("Analysis complete. Ready to brainstorm the findings.") + print(f"Source materials: {display_path(source_materials_md, repo_root)}") + print(f"Problem statements: {display_path(problem_analysis_md, repo_root)}") + print(f"Brainstorm handoff: $compound-engineering:ce-brainstorm {display_path(kickoff_md, repo_root)}") + print("Brainstorm should first confirm whether the captured requirements are complete and correctly grouped.") + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/tests/frontmatter.test.ts b/tests/frontmatter.test.ts index f1a99b6cc..e7851dcf7 100644 --- a/tests/frontmatter.test.ts +++ b/tests/frontmatter.test.ts @@ -107,6 +107,42 @@ describe("frontmatter YAML validity", () => { expect(name, `frontmatter name "${name}" must match parent directory "${dirName}"`).toBe(dirName) expect(name, `frontmatter name "${name}" must be lowercase a-z, 0-9, and hyphens`).toMatch(/^[a-z0-9-]+$/) }) + + // All compound-engineering skills (and agents) must use the `ce-` prefix + // so they are unambiguously identifiable as compound-engineering + // components. See plugins/compound-engineering/AGENTS.md "Naming + // Convention". A small allowlist preserves three pre-existing skills + // that predate the rule -- no new entries should be added. + if (pluginRoot === "plugins/compound-engineering") { + const SKILL_PREFIX_ALLOWLIST = new Set([ + "every-style-editor", + "file-todos", + "lfg", + ]) + test(`${pluginRoot}/${rel} skill name uses ce- prefix`, () => { + const dirName = path.basename(path.dirname(rel)) + if (SKILL_PREFIX_ALLOWLIST.has(dirName)) return + expect( + dirName.startsWith("ce-"), + `Skill "${dirName}" must use the ce- prefix. ` + + `If this is a legacy skill that predates the rule, add it to ` + + `SKILL_PREFIX_ALLOWLIST in tests/frontmatter.test.ts.`, + ).toBe(true) + }) + } + } + + if ( + pluginRoot === "plugins/compound-engineering" && + /^agents\/[^/]+\.agent\.md$/.test(rel) + ) { + test(`${pluginRoot}/${rel} agent name uses ce- prefix`, () => { + const fileName = path.basename(rel, ".agent.md") + expect( + fileName.startsWith("ce-"), + `Agent "${fileName}" must use the ce- prefix.`, + ).toBe(true) + }) } } }