bmad-code-org · bmadcode · Mar 28, 2026 · Mar 28, 2026 · augmentcode · Mar 28, 2026
diff --git a/docs/explanation/index.md b/docs/explanation/index.md
@@ -21,6 +21,7 @@ Create world-class AI agents and workflows with the BMad Builder.
 | **[Progressive Disclosure](/explanation/progressive-disclosure.md)** | Four layers of context loading — from frontmatter through step files |
 | **[Subagent Patterns](/explanation/subagent-patterns.md)** | Six orchestration patterns for parallel and hierarchical work |
 | **[Skill Authoring Best Practices](/explanation/skill-authoring-best-practices.md)** | Core principles, common patterns, quality dimensions, and anti-patterns |
+| **[Scripts in Skills](/explanation/scripts-in-skills.md)** | Why deterministic scripts make skills faster, cheaper, and more reliable |
 
 ## Reference
 

diff --git a/docs/explanation/scripts-in-skills.md b/docs/explanation/scripts-in-skills.md
@@ -0,0 +1,127 @@
+---
+title: "Scripts in Skills"
+description: Why deterministic scripts make skills faster, cheaper, and more reliable — and the technical choices behind portable script design
+---
+
+Scripts are the reliability backbone of a well-built skill. They handle work that has clear right-and-wrong answers — validation, transformation, extraction, counting — so the LLM can focus on what it does best: judgment, synthesis, and creative reasoning.
+
+## The Problem: LLMs Do Too Much
+
+Without scripts, every operation in a skill runs through the LLM. That means:
+
+- **Non-deterministic results.** Ask an LLM to count tokens in a file three times and you may get three different numbers. Ask a script and you get the same answer every time.
+- **Wasted tokens and time.** Parsing a JSON file, checking if a directory exists, or comparing two strings are mechanical operations. Running them through the LLM burns context window and adds latency for no gain.
+- **Harder to test.** You can write unit tests for a script. You cannot write unit tests for an LLM prompt.
+
+The pattern shows up everywhere: skills that try to LLM their way through structural validation are slower, less reliable, and more expensive than skills that offload those checks to scripts.
+
+## The Determinism Boundary
+
+The core design principle is **intelligence placement** — put each operation where it belongs.
+
+| Scripts Handle | LLM Handles |
+| -------------- | ----------- |
+| Validate structure, format, schema | Interpret meaning, evaluate quality |
+| Count, parse, extract, transform | Classify ambiguous input, make judgment calls |
+| Compare, diff, check consistency | Synthesize insights, generate creative output |
+| Pre-process data into compact form | Analyze pre-processed data with domain reasoning |
+
+**The test:** Given identical input, will this operation always produce identical output? If yes, it belongs in a script. Could you write a unit test with expected output? Definitely a script. Requires interpreting meaning, tone, or context? Keep it as an LLM prompt.
+
+:::tip[The Pre-Processing Pattern]
+One of the highest-value script uses is pre-processing. A script extracts compact metrics from large files into a small JSON summary. The LLM then reasons over the summary instead of reading raw files — dramatically reducing token usage while improving analysis quality because the data is clean and structured.
+:::
+
+## Why Python, Not Bash
+
+Skills must work across macOS, Linux, and Windows. Bash is not portable.
+
+| Factor | Bash | Python |
+| ------ | ---- | ------ |
+| **macOS / Linux** | Works | Works |
+| **Windows (native)** | Fails or behaves inconsistently | Works identically |
+| **Windows (WSL)** | Works, but can conflict with Git Bash on PATH | Works identically |
+| **Error handling** | Limited, fragile | Rich exception handling |
+| **Testing** | Difficult | Standard unittest/pytest |
+| **Complex logic** | Quickly becomes unreadable | Clean, maintainable |
+
+Even basic commands like `sed -i` behave differently on macOS vs Linux. Piping, `jq`, `grep`, `awk` — all of these have cross-platform pitfalls that Python's standard library avoids entirely.
+
+**Safe bash commands** that work everywhere and remain fine to use directly:
+
+| Command | Purpose |
+| ------- | ------- |
+| `git`, `gh` | Version control and GitHub CLI |
+| `uv run` | Python script execution |
+| `npm`, `npx`, `pnpm` | Node.js ecosystem |
+| `mkdir -p` | Directory creation |
+
+Everything beyond that list should be a Python script.
+
+## Standard Library First
+
+Python's standard library covers most script needs without any external dependencies. Stdlib-only scripts run with plain `python3`, need no special tooling, and have zero supply-chain risk.
+
+| Need | Standard Library |
+| ---- | ---------------- |
+| JSON parsing | `json` |
+| Path handling | `pathlib` |
+| Pattern matching | `re` |
+| CLI interface | `argparse` |
+| Text comparison | `difflib` |
+| Counting, grouping | `collections` |
+| Source analysis | `ast` |
+| Data formats | `csv`, `xml.etree` |
+
+Only reach for external dependencies when the stdlib genuinely cannot do the job — `tiktoken` for accurate token counting, `pyyaml` for YAML parsing, `jsonschema` for schema validation. Each external dependency adds install-time cost, requires `uv` to be available, and expands the supply-chain surface. The BMad builders require explicit user approval for any external dependency during the build process.
+
+## Zero-Friction Dependencies with PEP 723
+
+Python scripts in skills use [PEP 723](https://peps.python.org/pep-0723/) inline metadata to declare their dependencies directly in the file. Combined with `uv run`, this gives you `npx`-like behavior — dependencies are silently cached in an isolated environment, no global installs, no user prompts.
+
+```python
+#!/usr/bin/env -S uv run --script
+# /// script
+# requires-python = ">=3.10"
+# dependencies = ["pyyaml>=6.0"]
+# ///
+
+import yaml
+# script logic here
+```
+
+When a skill invokes this script with `uv run scripts/analyze.py`, the dependency (`pyyaml` in this example) is automatically resolved. The user never sees an install prompt, never needs to manage a virtual environment, and never pollutes their global Python installation.
+
+**Why this matters for skill authoring:** Without PEP 723, skills that needed libraries like `pyyaml` or `tiktoken` would force users to run `pip install` — a jarring, trust-breaking experience that makes users hesitate to adopt the skill.
+
+## Graceful Degradation
+
+Skills run in multiple environments: CLI terminals, desktop apps, IDE extensions, and web interfaces like claude.ai. Not all environments can execute Python scripts.
+
+The principle: **scripts are the fast, reliable path — but the skill must still deliver its outcome when execution is unavailable.**
+
+When a script cannot run, the LLM performs the equivalent work directly. This is slower and less deterministic, but the user still gets a result. The script's `--help` output documents what it checks, making the fallback natural — the LLM reads the help to understand the script's purpose and replicates the logic.
+
+Frame script steps as outcomes in the SKILL.md, not just commands:
+
+| Approach | Example |
+| -------- | ------- |
+| **Good** | "Validate path conventions (run `scripts/scan-paths.py --help` for details)" |
+| **Fragile** | "Execute `python3 scripts/scan-paths.py`" with no context |
+
+The good version tells the LLM both what to accomplish and where to find the details — enabling graceful degradation without additional instructions.
+
+## When to Reach for a Script
+
+Look for these signal verbs in a skill's requirements — they indicate script opportunities:
+
+| Signal | Script Type |
+| ------ | ----------- |
+| "validate", "check", "verify" | Validation |
+| "count", "tally", "aggregate" | Metrics |
+| "extract", "parse", "pull from" | Data extraction |
+| "convert", "transform", "format" | Transformation |
+| "compare", "diff", "match against" | Comparison |
+| "scan for", "find all", "list all" | Pattern scanning |
+
+The builders guide you through script opportunity discovery during the build process. The key insight: if you find yourself writing detailed validation logic in a prompt, it almost certainly belongs in a script instead.
diff --git a/skills/bmad-agent-builder/build-process.md b/skills/bmad-agent-builder/build-process.md
@@ -46,7 +46,7 @@ Early check: internal capabilities only, external skills, both, or unclear?
 
 **Script Opportunity Discovery** (active probing — do not skip):
 
-Identify deterministic operations that should be scripts. Load `./references/script-opportunities-reference.md` for guidance. Confirm the script-vs-prompt plan with the user before proceeding.
+Identify deterministic operations that should be scripts. Load `./references/script-opportunities-reference.md` for guidance. Confirm the script-vs-prompt plan with the user before proceeding. If any scripts require external dependencies (anything beyond Python's standard library), explicitly list each dependency and get user approval — dependencies add install-time cost and require `uv` to be available.
 
 ## Phase 3: Gather Requirements
 
@@ -125,6 +125,8 @@ Activation is a single flow regardless of mode. It should:
 - If headless, route to `./references/autonomous-wake.md`
 - If interactive, greet the user and continue from memory context or offer capabilities
 
+**If the built agent includes scripts**, also load `./references/script-standards.md` — ensures PEP 723 metadata, correct shebangs, and `uv run` invocation from the start.
+
 **Lint gate** — after building, validate and auto-fix:
 
 If subagents available, delegate lint-fix to a subagent. Otherwise run inline.

diff --git a/skills/bmad-agent-builder/references/script-opportunities-reference.md b/skills/bmad-agent-builder/references/script-opportunities-reference.md
@@ -48,10 +48,12 @@ Beyond obvious validation, consider:
 - Could metric collection feed into LLM decision-making without the LLM doing the counting?
 
 ### Your Toolbox
-Scripts have access to full capabilities — think broadly:
-- **Bash**: Full shell — `jq`, `grep`, `awk`, `sed`, `find`, `diff`, `wc`, `sort`, `uniq`, `curl`, plus piping and composition
-- **Python**: Standard library (`json`, `yaml`, `pathlib`, `re`, `argparse`, `collections`, `difflib`, `ast`, `csv`, `xml`, etc.) plus PEP 723 inline-declared dependencies (`tiktoken`, `jsonschema`, `pyyaml`, etc.)
-- **System tools**: `git` commands for history/diff/blame, filesystem operations, process execution
+
+**Python is the default** for all script logic (cross-platform: macOS, Linux, Windows/WSL). See `references/script-standards.md` for full rationale and safe bash commands.
+
+- **Python:** Standard library (`json`, `pathlib`, `re`, `argparse`, `collections`, `difflib`, `ast`, `csv`, `xml`, etc.) plus PEP 723 inline-declared dependencies (`tiktoken`, `jsonschema`, `pyyaml`, etc.)
+- **Safe shell commands:** `git`, `gh`, `uv run`, `npm`/`npx`/`pnpm`, `mkdir -p`
+- **Avoid bash for logic** — no piping, `jq`, `grep`, `sed`, `awk`, `find`, `diff`, `wc` in scripts. Use Python equivalents instead.
 
 If you can express the logic as deterministic code, it's a script candidate.
 

diff --git a/skills/bmad-agent-builder/references/script-standards.md b/skills/bmad-agent-builder/references/script-standards.md
@@ -0,0 +1,89 @@
+# Script Creation Standards
+
+When building scripts for a skill, follow these standards to ensure portability and zero-friction execution. Skills must work across macOS, Linux, and Windows (native, Git Bash, and WSL).
+
+## Python Over Bash
+
+**Always favor Python for script logic.** Bash is not portable — it fails or behaves inconsistently on Windows (Git Bash is MSYS2-based, not a full Linux shell; WSL bash can conflict with Git Bash on PATH; PowerShell is a different language entirely). Python with `uv run` works identically on all platforms.
+
+**Safe bash commands** — these work reliably across all environments and are fine to use directly:
+- `git`, `gh` — version control and GitHub CLI
+- `uv run` — Python script execution with automatic dependency handling
+- `npm`, `npx`, `pnpm` — Node.js ecosystem
+- `mkdir -p` — directory creation
+
+**Everything else should be Python** — piping, `jq`, `grep`, `sed`, `awk`, `find`, `diff`, `wc`, and any non-trivial logic. Even `sed -i` behaves differently on macOS vs Linux. If it's more than a single safe command, write a Python script.
+
+## Favor the Standard Library
+
+Always prefer Python's standard library over external dependencies. The stdlib is pre-installed everywhere, requires no `uv run`, and has zero supply-chain risk. Common stdlib modules that cover most script needs:
+
+- `json` — JSON parsing and output
+- `pathlib` — cross-platform path handling
+- `re` — pattern matching
+- `argparse` — CLI interface
+- `collections` — counters, defaultdicts
+- `difflib` — text comparison
+- `ast` — Python source analysis
+- `csv`, `xml.etree` — data formats
+
+Only pull in external dependencies when the stdlib genuinely cannot do the job (e.g., `tiktoken` for accurate token counting, `pyyaml` for YAML parsing, `jsonschema` for schema validation). **External dependencies must be confirmed with the user during the build process** — they add install-time cost, supply-chain surface, and require `uv` to be available.
+
+## PEP 723 Inline Metadata (Required)
+
+Every Python script MUST include a PEP 723 metadata block. For scripts with external dependencies, use the `uv run` shebang:
+
+```python
+#!/usr/bin/env -S uv run --script
+# /// script
+# requires-python = ">=3.10"
+# dependencies = ["pyyaml>=6.0", "jsonschema>=4.0"]
+# ///
+```
+
+For scripts using only the standard library, use a plain Python shebang but still include the metadata block:
+
+```python
+#!/usr/bin/env python3
+# /// script
+# requires-python = ">=3.10"
+# ///
+```
+
+**Key rules:**
+- The shebang MUST be line 1 — before the metadata block
+- Always include `requires-python`
+- List all external dependencies with version constraints
+- Never use `requirements.txt`, `pip install`, or expect global package installs
+- The shebang is a Unix convenience — cross-platform invocation relies on `uv run scripts/foo.py`, not `./scripts/foo.py`
+
+## Invocation in SKILL.md
+
+How a built skill's SKILL.md should reference its scripts:
+
+- **Scripts with external dependencies:** `uv run scripts/analyze.py {args}`
+- **Stdlib-only scripts:** `python3 scripts/scan.py {args}` (also fine to use `uv run` for consistency)
+
+`uv run` reads the PEP 723 metadata, silently caches dependencies in an isolated environment, and runs the script — no user prompt, no global install. Like `npx` for Python.
+
+## Graceful Degradation
+
+Skills may run in environments where Python or `uv` is unavailable (e.g., claude.ai web). Scripts should be the fast, reliable path — but the skill must still deliver its outcome when execution is not possible.
+
+**Pattern:** When a script cannot execute, the LLM performs the equivalent work directly. The script's `--help` documents what it checks, making this fallback natural. Design scripts so their logic is understandable from their help output and the skill's context.
+
+In SKILL.md, frame script steps as outcomes, not just commands:
+- Good: "Validate path conventions (run `scripts/scan-paths.py --help` for details)"
+- Avoid: "Execute `python3 scripts/scan-paths.py`" with no context about what it does
+
+## Script Interface Standards
+
+- Implement `--help` via `argparse` (single source of truth for the script's API)
+- Accept target path as a positional argument
+- `-o` flag for output file (default to stdout)
+- Diagnostics and progress to stderr
+- Exit codes: 0=pass, 1=fail, 2=error
+- `--verbose` flag for debugging
+- Output valid JSON to stdout
+- No interactive prompts, no network dependencies
+- Tests in `scripts/tests/`
diff --git a/skills/bmad-workflow-builder/build-process.md b/skills/bmad-workflow-builder/build-process.md
@@ -67,7 +67,7 @@ Work through conversationally, adapted per skill type. Glean from what the user
 - **Role guidance:** Brief "Act as a [role/expert]" primer
 - **Design rationale:** Non-obvious choices the executing agent should understand
 - **External skills used:** Which skills does this invoke?
-- **Script Opportunity Discovery** — Walk through planned steps with the user. Identify deterministic operations that should be scripts not prompts. Load `./references/script-opportunities-reference.md` for guidance. Confirm the script-vs-prompt plan.
+- **Script Opportunity Discovery** — Walk through planned steps with the user. Identify deterministic operations that should be scripts not prompts. Load `./references/script-opportunities-reference.md` for guidance. Confirm the script-vs-prompt plan. If any scripts require external dependencies (anything beyond Python's standard library), explicitly list each dependency and get user approval before proceeding — dependencies add install-time cost and require `uv` to be available.
 - **Creates output documents?** If yes, will use `{document_output_language}`
 
 **Simple Utility additional:**
@@ -130,6 +130,8 @@ Load the template from `./assets/SKILL-template.md` and `./references/template-s
 | **`./assets/`** | Templates, starter files | Copied/transformed into output |
 | **`./scripts/`** | Python, shell scripts with tests | Invoked for deterministic operations |
 
+**If the built skill includes scripts**, also load `./references/script-standards.md` — ensures PEP 723 metadata, correct shebangs, and `uv run` invocation from the start.
+
 **Lint gate** — after building, validate and auto-fix:
 
 If subagents available, delegate lint-fix to a subagent. Otherwise run inline.

diff --git a/skills/bmad-workflow-builder/references/script-opportunities-reference.md b/skills/bmad-workflow-builder/references/script-opportunities-reference.md
@@ -1,5 +1,7 @@
 # Script Opportunities Reference — Workflow Builder
 
+**Reference: `references/script-standards.md` for script creation guidelines.**
+
 ## Core Principle
 
 Scripts handle deterministic operations (validate, transform, count). Prompts handle judgment (interpret, classify, decide). If a check has clear pass/fail criteria, it belongs in a script.
@@ -42,10 +44,11 @@ When you see these in a workflow's requirements, think scripts first: "validate"
 
 ### Your Toolbox
 
-Scripts have access to the full execution environment:
-- **Bash:** `jq`, `grep`, `awk`, `sed`, `find`, `diff`, `wc`, piping and composition
-- **Python:** Full standard library plus PEP 723 inline-declared dependencies (`tiktoken`, `jsonschema`, `pyyaml`, etc.)
-- **System tools:** `git` for history/diff/blame, filesystem operations
+**Python is the default** for all script logic (cross-platform: macOS, Linux, Windows/WSL). See `references/script-standards.md` for full rationale and safe bash commands.
+
+- **Python:** Full standard library (`json`, `pathlib`, `re`, `argparse`, `collections`, `difflib`, `ast`, `csv`, `xml`, etc.) plus PEP 723 inline-declared dependencies (`tiktoken`, `jsonschema`, `pyyaml`, etc.)
+- **Safe shell commands:** `git`, `gh`, `uv run`, `npm`/`npx`/`pnpm`, `mkdir -p`
+- **Avoid bash for logic** — no piping, `jq`, `grep`, `sed`, `awk`, `find`, `diff`, `wc` in scripts. Use Python equivalents instead.
 
 ### The --help Pattern