diff --git a/docs/explanation/index.md b/docs/explanation/index.md index 65e4590..595cde1 100644 --- a/docs/explanation/index.md +++ b/docs/explanation/index.md @@ -21,6 +21,7 @@ Create world-class AI agents and workflows with the BMad Builder. | **[Progressive Disclosure](/explanation/progressive-disclosure.md)** | Four layers of context loading — from frontmatter through step files | | **[Subagent Patterns](/explanation/subagent-patterns.md)** | Six orchestration patterns for parallel and hierarchical work | | **[Skill Authoring Best Practices](/explanation/skill-authoring-best-practices.md)** | Core principles, common patterns, quality dimensions, and anti-patterns | +| **[Scripts in Skills](/explanation/scripts-in-skills.md)** | Why deterministic scripts make skills faster, cheaper, and more reliable | ## Reference diff --git a/docs/explanation/scripts-in-skills.md b/docs/explanation/scripts-in-skills.md new file mode 100644 index 0000000..3b16604 --- /dev/null +++ b/docs/explanation/scripts-in-skills.md @@ -0,0 +1,127 @@ +--- +title: "Scripts in Skills" +description: Why deterministic scripts make skills faster, cheaper, and more reliable — and the technical choices behind portable script design +--- + +Scripts are the reliability backbone of a well-built skill. They handle work that has clear right-and-wrong answers — validation, transformation, extraction, counting — so the LLM can focus on what it does best: judgment, synthesis, and creative reasoning. + +## The Problem: LLMs Do Too Much + +Without scripts, every operation in a skill runs through the LLM. That means: + +- **Non-deterministic results.** Ask an LLM to count tokens in a file three times and you may get three different numbers. Ask a script and you get the same answer every time. +- **Wasted tokens and time.** Parsing a JSON file, checking if a directory exists, or comparing two strings are mechanical operations. Running them through the LLM burns context window and adds latency for no gain. +- **Harder to test.** You can write unit tests for a script. You cannot write unit tests for an LLM prompt. + +The pattern shows up everywhere: skills that try to LLM their way through structural validation are slower, less reliable, and more expensive than skills that offload those checks to scripts. + +## The Determinism Boundary + +The core design principle is **intelligence placement** — put each operation where it belongs. + +| Scripts Handle | LLM Handles | +| -------------- | ----------- | +| Validate structure, format, schema | Interpret meaning, evaluate quality | +| Count, parse, extract, transform | Classify ambiguous input, make judgment calls | +| Compare, diff, check consistency | Synthesize insights, generate creative output | +| Pre-process data into compact form | Analyze pre-processed data with domain reasoning | + +**The test:** Given identical input, will this operation always produce identical output? If yes, it belongs in a script. Could you write a unit test with expected output? Definitely a script. Requires interpreting meaning, tone, or context? Keep it as an LLM prompt. + +:::tip[The Pre-Processing Pattern] +One of the highest-value script uses is pre-processing. A script extracts compact metrics from large files into a small JSON summary. The LLM then reasons over the summary instead of reading raw files — dramatically reducing token usage while improving analysis quality because the data is clean and structured. +::: + +## Why Python, Not Bash + +Skills must work across macOS, Linux, and Windows. Bash is not portable. + +| Factor | Bash | Python | +| ------ | ---- | ------ | +| **macOS / Linux** | Works | Works | +| **Windows (native)** | Fails or behaves inconsistently | Works identically | +| **Windows (WSL)** | Works, but can conflict with Git Bash on PATH | Works identically | +| **Error handling** | Limited, fragile | Rich exception handling | +| **Testing** | Difficult | Standard unittest/pytest | +| **Complex logic** | Quickly becomes unreadable | Clean, maintainable | + +Even basic commands like `sed -i` behave differently on macOS vs Linux. Piping, `jq`, `grep`, `awk` — all of these have cross-platform pitfalls that Python's standard library avoids entirely. + +**Safe bash commands** that work everywhere and remain fine to use directly: + +| Command | Purpose | +| ------- | ------- | +| `git`, `gh` | Version control and GitHub CLI | +| `uv run` | Python script execution | +| `npm`, `npx`, `pnpm` | Node.js ecosystem | +| `mkdir -p` | Directory creation | + +Everything beyond that list should be a Python script. + +## Standard Library First + +Python's standard library covers most script needs without any external dependencies. Stdlib-only scripts run with plain `python3`, need no special tooling, and have zero supply-chain risk. + +| Need | Standard Library | +| ---- | ---------------- | +| JSON parsing | `json` | +| Path handling | `pathlib` | +| Pattern matching | `re` | +| CLI interface | `argparse` | +| Text comparison | `difflib` | +| Counting, grouping | `collections` | +| Source analysis | `ast` | +| Data formats | `csv`, `xml.etree` | + +Only reach for external dependencies when the stdlib genuinely cannot do the job — `tiktoken` for accurate token counting, `pyyaml` for YAML parsing, `jsonschema` for schema validation. Each external dependency adds install-time cost, requires `uv` to be available, and expands the supply-chain surface. The BMad builders require explicit user approval for any external dependency during the build process. + +## Zero-Friction Dependencies with PEP 723 + +Python scripts in skills use [PEP 723](https://peps.python.org/pep-0723/) inline metadata to declare their dependencies directly in the file. Combined with `uv run`, this gives you `npx`-like behavior — dependencies are silently cached in an isolated environment, no global installs, no user prompts. + +```python +#!/usr/bin/env -S uv run --script +# /// script +# requires-python = ">=3.10" +# dependencies = ["pyyaml>=6.0"] +# /// + +import yaml +# script logic here +``` + +When a skill invokes this script with `uv run scripts/analyze.py`, the dependency (`pyyaml` in this example) is automatically resolved. The user never sees an install prompt, never needs to manage a virtual environment, and never pollutes their global Python installation. + +**Why this matters for skill authoring:** Without PEP 723, skills that needed libraries like `pyyaml` or `tiktoken` would force users to run `pip install` — a jarring, trust-breaking experience that makes users hesitate to adopt the skill. + +## Graceful Degradation + +Skills run in multiple environments: CLI terminals, desktop apps, IDE extensions, and web interfaces like claude.ai. Not all environments can execute Python scripts. + +The principle: **scripts are the fast, reliable path — but the skill must still deliver its outcome when execution is unavailable.** + +When a script cannot run, the LLM performs the equivalent work directly. This is slower and less deterministic, but the user still gets a result. The script's `--help` output documents what it checks, making the fallback natural — the LLM reads the help to understand the script's purpose and replicates the logic. + +Frame script steps as outcomes in the SKILL.md, not just commands: + +| Approach | Example | +| -------- | ------- | +| **Good** | "Validate path conventions (run `scripts/scan-paths.py --help` for details)" | +| **Fragile** | "Execute `python3 scripts/scan-paths.py`" with no context | + +The good version tells the LLM both what to accomplish and where to find the details — enabling graceful degradation without additional instructions. + +## When to Reach for a Script + +Look for these signal verbs in a skill's requirements — they indicate script opportunities: + +| Signal | Script Type | +| ------ | ----------- | +| "validate", "check", "verify" | Validation | +| "count", "tally", "aggregate" | Metrics | +| "extract", "parse", "pull from" | Data extraction | +| "convert", "transform", "format" | Transformation | +| "compare", "diff", "match against" | Comparison | +| "scan for", "find all", "list all" | Pattern scanning | + +The builders guide you through script opportunity discovery during the build process. The key insight: if you find yourself writing detailed validation logic in a prompt, it almost certainly belongs in a script instead. diff --git a/skills/bmad-agent-builder/build-process.md b/skills/bmad-agent-builder/build-process.md index 4b1ff25..aae7d8d 100644 --- a/skills/bmad-agent-builder/build-process.md +++ b/skills/bmad-agent-builder/build-process.md @@ -46,7 +46,7 @@ Early check: internal capabilities only, external skills, both, or unclear? **Script Opportunity Discovery** (active probing — do not skip): -Identify deterministic operations that should be scripts. Load `./references/script-opportunities-reference.md` for guidance. Confirm the script-vs-prompt plan with the user before proceeding. +Identify deterministic operations that should be scripts. Load `./references/script-opportunities-reference.md` for guidance. Confirm the script-vs-prompt plan with the user before proceeding. If any scripts require external dependencies (anything beyond Python's standard library), explicitly list each dependency and get user approval — dependencies add install-time cost and require `uv` to be available. ## Phase 3: Gather Requirements @@ -125,6 +125,8 @@ Activation is a single flow regardless of mode. It should: - If headless, route to `./references/autonomous-wake.md` - If interactive, greet the user and continue from memory context or offer capabilities +**If the built agent includes scripts**, also load `./references/script-standards.md` — ensures PEP 723 metadata, correct shebangs, and `uv run` invocation from the start. + **Lint gate** — after building, validate and auto-fix: If subagents available, delegate lint-fix to a subagent. Otherwise run inline. diff --git a/skills/bmad-agent-builder/references/script-opportunities-reference.md b/skills/bmad-agent-builder/references/script-opportunities-reference.md index 1f24ee7..b7b3322 100644 --- a/skills/bmad-agent-builder/references/script-opportunities-reference.md +++ b/skills/bmad-agent-builder/references/script-opportunities-reference.md @@ -48,10 +48,12 @@ Beyond obvious validation, consider: - Could metric collection feed into LLM decision-making without the LLM doing the counting? ### Your Toolbox -Scripts have access to full capabilities — think broadly: -- **Bash**: Full shell — `jq`, `grep`, `awk`, `sed`, `find`, `diff`, `wc`, `sort`, `uniq`, `curl`, plus piping and composition -- **Python**: Standard library (`json`, `yaml`, `pathlib`, `re`, `argparse`, `collections`, `difflib`, `ast`, `csv`, `xml`, etc.) plus PEP 723 inline-declared dependencies (`tiktoken`, `jsonschema`, `pyyaml`, etc.) -- **System tools**: `git` commands for history/diff/blame, filesystem operations, process execution + +**Python is the default** for all script logic (cross-platform: macOS, Linux, Windows/WSL). See `references/script-standards.md` for full rationale and safe bash commands. + +- **Python:** Standard library (`json`, `pathlib`, `re`, `argparse`, `collections`, `difflib`, `ast`, `csv`, `xml`, etc.) plus PEP 723 inline-declared dependencies (`tiktoken`, `jsonschema`, `pyyaml`, etc.) +- **Safe shell commands:** `git`, `gh`, `uv run`, `npm`/`npx`/`pnpm`, `mkdir -p` +- **Avoid bash for logic** — no piping, `jq`, `grep`, `sed`, `awk`, `find`, `diff`, `wc` in scripts. Use Python equivalents instead. If you can express the logic as deterministic code, it's a script candidate. diff --git a/skills/bmad-agent-builder/references/script-standards.md b/skills/bmad-agent-builder/references/script-standards.md new file mode 100644 index 0000000..62028f3 --- /dev/null +++ b/skills/bmad-agent-builder/references/script-standards.md @@ -0,0 +1,89 @@ +# Script Creation Standards + +When building scripts for a skill, follow these standards to ensure portability and zero-friction execution. Skills must work across macOS, Linux, and Windows (native, Git Bash, and WSL). + +## Python Over Bash + +**Always favor Python for script logic.** Bash is not portable — it fails or behaves inconsistently on Windows (Git Bash is MSYS2-based, not a full Linux shell; WSL bash can conflict with Git Bash on PATH; PowerShell is a different language entirely). Python with `uv run` works identically on all platforms. + +**Safe bash commands** — these work reliably across all environments and are fine to use directly: +- `git`, `gh` — version control and GitHub CLI +- `uv run` — Python script execution with automatic dependency handling +- `npm`, `npx`, `pnpm` — Node.js ecosystem +- `mkdir -p` — directory creation + +**Everything else should be Python** — piping, `jq`, `grep`, `sed`, `awk`, `find`, `diff`, `wc`, and any non-trivial logic. Even `sed -i` behaves differently on macOS vs Linux. If it's more than a single safe command, write a Python script. + +## Favor the Standard Library + +Always prefer Python's standard library over external dependencies. The stdlib is pre-installed everywhere, requires no `uv run`, and has zero supply-chain risk. Common stdlib modules that cover most script needs: + +- `json` — JSON parsing and output +- `pathlib` — cross-platform path handling +- `re` — pattern matching +- `argparse` — CLI interface +- `collections` — counters, defaultdicts +- `difflib` — text comparison +- `ast` — Python source analysis +- `csv`, `xml.etree` — data formats + +Only pull in external dependencies when the stdlib genuinely cannot do the job (e.g., `tiktoken` for accurate token counting, `pyyaml` for YAML parsing, `jsonschema` for schema validation). **External dependencies must be confirmed with the user during the build process** — they add install-time cost, supply-chain surface, and require `uv` to be available. + +## PEP 723 Inline Metadata (Required) + +Every Python script MUST include a PEP 723 metadata block. For scripts with external dependencies, use the `uv run` shebang: + +```python +#!/usr/bin/env -S uv run --script +# /// script +# requires-python = ">=3.10" +# dependencies = ["pyyaml>=6.0", "jsonschema>=4.0"] +# /// +``` + +For scripts using only the standard library, use a plain Python shebang but still include the metadata block: + +```python +#!/usr/bin/env python3 +# /// script +# requires-python = ">=3.10" +# /// +``` + +**Key rules:** +- The shebang MUST be line 1 — before the metadata block +- Always include `requires-python` +- List all external dependencies with version constraints +- Never use `requirements.txt`, `pip install`, or expect global package installs +- The shebang is a Unix convenience — cross-platform invocation relies on `uv run scripts/foo.py`, not `./scripts/foo.py` + +## Invocation in SKILL.md + +How a built skill's SKILL.md should reference its scripts: + +- **Scripts with external dependencies:** `uv run scripts/analyze.py {args}` +- **Stdlib-only scripts:** `python3 scripts/scan.py {args}` (also fine to use `uv run` for consistency) + +`uv run` reads the PEP 723 metadata, silently caches dependencies in an isolated environment, and runs the script — no user prompt, no global install. Like `npx` for Python. + +## Graceful Degradation + +Skills may run in environments where Python or `uv` is unavailable (e.g., claude.ai web). Scripts should be the fast, reliable path — but the skill must still deliver its outcome when execution is not possible. + +**Pattern:** When a script cannot execute, the LLM performs the equivalent work directly. The script's `--help` documents what it checks, making this fallback natural. Design scripts so their logic is understandable from their help output and the skill's context. + +In SKILL.md, frame script steps as outcomes, not just commands: +- Good: "Validate path conventions (run `scripts/scan-paths.py --help` for details)" +- Avoid: "Execute `python3 scripts/scan-paths.py`" with no context about what it does + +## Script Interface Standards + +- Implement `--help` via `argparse` (single source of truth for the script's API) +- Accept target path as a positional argument +- `-o` flag for output file (default to stdout) +- Diagnostics and progress to stderr +- Exit codes: 0=pass, 1=fail, 2=error +- `--verbose` flag for debugging +- Output valid JSON to stdout +- No interactive prompts, no network dependencies +- Tests in `scripts/tests/` diff --git a/skills/bmad-workflow-builder/build-process.md b/skills/bmad-workflow-builder/build-process.md index 3568be8..33fcab0 100644 --- a/skills/bmad-workflow-builder/build-process.md +++ b/skills/bmad-workflow-builder/build-process.md @@ -67,7 +67,7 @@ Work through conversationally, adapted per skill type. Glean from what the user - **Role guidance:** Brief "Act as a [role/expert]" primer - **Design rationale:** Non-obvious choices the executing agent should understand - **External skills used:** Which skills does this invoke? -- **Script Opportunity Discovery** — Walk through planned steps with the user. Identify deterministic operations that should be scripts not prompts. Load `./references/script-opportunities-reference.md` for guidance. Confirm the script-vs-prompt plan. +- **Script Opportunity Discovery** — Walk through planned steps with the user. Identify deterministic operations that should be scripts not prompts. Load `./references/script-opportunities-reference.md` for guidance. Confirm the script-vs-prompt plan. If any scripts require external dependencies (anything beyond Python's standard library), explicitly list each dependency and get user approval before proceeding — dependencies add install-time cost and require `uv` to be available. - **Creates output documents?** If yes, will use `{document_output_language}` **Simple Utility additional:** @@ -130,6 +130,8 @@ Load the template from `./assets/SKILL-template.md` and `./references/template-s | **`./assets/`** | Templates, starter files | Copied/transformed into output | | **`./scripts/`** | Python, shell scripts with tests | Invoked for deterministic operations | +**If the built skill includes scripts**, also load `./references/script-standards.md` — ensures PEP 723 metadata, correct shebangs, and `uv run` invocation from the start. + **Lint gate** — after building, validate and auto-fix: If subagents available, delegate lint-fix to a subagent. Otherwise run inline. diff --git a/skills/bmad-workflow-builder/references/script-opportunities-reference.md b/skills/bmad-workflow-builder/references/script-opportunities-reference.md index d64efe0..70bf743 100644 --- a/skills/bmad-workflow-builder/references/script-opportunities-reference.md +++ b/skills/bmad-workflow-builder/references/script-opportunities-reference.md @@ -1,5 +1,7 @@ # Script Opportunities Reference — Workflow Builder +**Reference: `references/script-standards.md` for script creation guidelines.** + ## Core Principle Scripts handle deterministic operations (validate, transform, count). Prompts handle judgment (interpret, classify, decide). If a check has clear pass/fail criteria, it belongs in a script. @@ -42,10 +44,11 @@ When you see these in a workflow's requirements, think scripts first: "validate" ### Your Toolbox -Scripts have access to the full execution environment: -- **Bash:** `jq`, `grep`, `awk`, `sed`, `find`, `diff`, `wc`, piping and composition -- **Python:** Full standard library plus PEP 723 inline-declared dependencies (`tiktoken`, `jsonschema`, `pyyaml`, etc.) -- **System tools:** `git` for history/diff/blame, filesystem operations +**Python is the default** for all script logic (cross-platform: macOS, Linux, Windows/WSL). See `references/script-standards.md` for full rationale and safe bash commands. + +- **Python:** Full standard library (`json`, `pathlib`, `re`, `argparse`, `collections`, `difflib`, `ast`, `csv`, `xml`, etc.) plus PEP 723 inline-declared dependencies (`tiktoken`, `jsonschema`, `pyyaml`, etc.) +- **Safe shell commands:** `git`, `gh`, `uv run`, `npm`/`npx`/`pnpm`, `mkdir -p` +- **Avoid bash for logic** — no piping, `jq`, `grep`, `sed`, `awk`, `find`, `diff`, `wc` in scripts. Use Python equivalents instead. ### The --help Pattern diff --git a/skills/bmad-workflow-builder/references/script-standards.md b/skills/bmad-workflow-builder/references/script-standards.md new file mode 100644 index 0000000..62028f3 --- /dev/null +++ b/skills/bmad-workflow-builder/references/script-standards.md @@ -0,0 +1,89 @@ +# Script Creation Standards + +When building scripts for a skill, follow these standards to ensure portability and zero-friction execution. Skills must work across macOS, Linux, and Windows (native, Git Bash, and WSL). + +## Python Over Bash + +**Always favor Python for script logic.** Bash is not portable — it fails or behaves inconsistently on Windows (Git Bash is MSYS2-based, not a full Linux shell; WSL bash can conflict with Git Bash on PATH; PowerShell is a different language entirely). Python with `uv run` works identically on all platforms. + +**Safe bash commands** — these work reliably across all environments and are fine to use directly: +- `git`, `gh` — version control and GitHub CLI +- `uv run` — Python script execution with automatic dependency handling +- `npm`, `npx`, `pnpm` — Node.js ecosystem +- `mkdir -p` — directory creation + +**Everything else should be Python** — piping, `jq`, `grep`, `sed`, `awk`, `find`, `diff`, `wc`, and any non-trivial logic. Even `sed -i` behaves differently on macOS vs Linux. If it's more than a single safe command, write a Python script. + +## Favor the Standard Library + +Always prefer Python's standard library over external dependencies. The stdlib is pre-installed everywhere, requires no `uv run`, and has zero supply-chain risk. Common stdlib modules that cover most script needs: + +- `json` — JSON parsing and output +- `pathlib` — cross-platform path handling +- `re` — pattern matching +- `argparse` — CLI interface +- `collections` — counters, defaultdicts +- `difflib` — text comparison +- `ast` — Python source analysis +- `csv`, `xml.etree` — data formats + +Only pull in external dependencies when the stdlib genuinely cannot do the job (e.g., `tiktoken` for accurate token counting, `pyyaml` for YAML parsing, `jsonschema` for schema validation). **External dependencies must be confirmed with the user during the build process** — they add install-time cost, supply-chain surface, and require `uv` to be available. + +## PEP 723 Inline Metadata (Required) + +Every Python script MUST include a PEP 723 metadata block. For scripts with external dependencies, use the `uv run` shebang: + +```python +#!/usr/bin/env -S uv run --script +# /// script +# requires-python = ">=3.10" +# dependencies = ["pyyaml>=6.0", "jsonschema>=4.0"] +# /// +``` + +For scripts using only the standard library, use a plain Python shebang but still include the metadata block: + +```python +#!/usr/bin/env python3 +# /// script +# requires-python = ">=3.10" +# /// +``` + +**Key rules:** +- The shebang MUST be line 1 — before the metadata block +- Always include `requires-python` +- List all external dependencies with version constraints +- Never use `requirements.txt`, `pip install`, or expect global package installs +- The shebang is a Unix convenience — cross-platform invocation relies on `uv run scripts/foo.py`, not `./scripts/foo.py` + +## Invocation in SKILL.md + +How a built skill's SKILL.md should reference its scripts: + +- **Scripts with external dependencies:** `uv run scripts/analyze.py {args}` +- **Stdlib-only scripts:** `python3 scripts/scan.py {args}` (also fine to use `uv run` for consistency) + +`uv run` reads the PEP 723 metadata, silently caches dependencies in an isolated environment, and runs the script — no user prompt, no global install. Like `npx` for Python. + +## Graceful Degradation + +Skills may run in environments where Python or `uv` is unavailable (e.g., claude.ai web). Scripts should be the fast, reliable path — but the skill must still deliver its outcome when execution is not possible. + +**Pattern:** When a script cannot execute, the LLM performs the equivalent work directly. The script's `--help` documents what it checks, making this fallback natural. Design scripts so their logic is understandable from their help output and the skill's context. + +In SKILL.md, frame script steps as outcomes, not just commands: +- Good: "Validate path conventions (run `scripts/scan-paths.py --help` for details)" +- Avoid: "Execute `python3 scripts/scan-paths.py`" with no context about what it does + +## Script Interface Standards + +- Implement `--help` via `argparse` (single source of truth for the script's API) +- Accept target path as a positional argument +- `-o` flag for output file (default to stdout) +- Diagnostics and progress to stderr +- Exit codes: 0=pass, 1=fail, 2=error +- `--verbose` flag for debugging +- Output valid JSON to stdout +- No interactive prompts, no network dependencies +- Tests in `scripts/tests/`