Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/explanation/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ Create world-class AI agents and workflows with the BMad Builder.
| **[Progressive Disclosure](/explanation/progressive-disclosure.md)** | Four layers of context loading — from frontmatter through step files |
| **[Subagent Patterns](/explanation/subagent-patterns.md)** | Six orchestration patterns for parallel and hierarchical work |
| **[Skill Authoring Best Practices](/explanation/skill-authoring-best-practices.md)** | Core principles, common patterns, quality dimensions, and anti-patterns |
| **[Scripts in Skills](/explanation/scripts-in-skills.md)** | Why deterministic scripts make skills faster, cheaper, and more reliable |

## Reference

Expand Down
127 changes: 127 additions & 0 deletions docs/explanation/scripts-in-skills.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
---
title: "Scripts in Skills"
description: Why deterministic scripts make skills faster, cheaper, and more reliable — and the technical choices behind portable script design
---

Scripts are the reliability backbone of a well-built skill. They handle work that has clear right-and-wrong answers — validation, transformation, extraction, counting — so the LLM can focus on what it does best: judgment, synthesis, and creative reasoning.

## The Problem: LLMs Do Too Much

Without scripts, every operation in a skill runs through the LLM. That means:

- **Non-deterministic results.** Ask an LLM to count tokens in a file three times and you may get three different numbers. Ask a script and you get the same answer every time.
- **Wasted tokens and time.** Parsing a JSON file, checking if a directory exists, or comparing two strings are mechanical operations. Running them through the LLM burns context window and adds latency for no gain.
- **Harder to test.** You can write unit tests for a script. You cannot write unit tests for an LLM prompt.

The pattern shows up everywhere: skills that try to LLM their way through structural validation are slower, less reliable, and more expensive than skills that offload those checks to scripts.

## The Determinism Boundary

The core design principle is **intelligence placement** — put each operation where it belongs.

| Scripts Handle | LLM Handles |
| -------------- | ----------- |
| Validate structure, format, schema | Interpret meaning, evaluate quality |
| Count, parse, extract, transform | Classify ambiguous input, make judgment calls |
| Compare, diff, check consistency | Synthesize insights, generate creative output |
| Pre-process data into compact form | Analyze pre-processed data with domain reasoning |

**The test:** Given identical input, will this operation always produce identical output? If yes, it belongs in a script. Could you write a unit test with expected output? Definitely a script. Requires interpreting meaning, tone, or context? Keep it as an LLM prompt.

:::tip[The Pre-Processing Pattern]
One of the highest-value script uses is pre-processing. A script extracts compact metrics from large files into a small JSON summary. The LLM then reasons over the summary instead of reading raw files — dramatically reducing token usage while improving analysis quality because the data is clean and structured.
:::

## Why Python, Not Bash

Skills must work across macOS, Linux, and Windows. Bash is not portable.

| Factor | Bash | Python |
| ------ | ---- | ------ |
| **macOS / Linux** | Works | Works |
| **Windows (native)** | Fails or behaves inconsistently | Works identically |
| **Windows (WSL)** | Works, but can conflict with Git Bash on PATH | Works identically |
| **Error handling** | Limited, fragile | Rich exception handling |
| **Testing** | Difficult | Standard unittest/pytest |
| **Complex logic** | Quickly becomes unreadable | Clean, maintainable |

Even basic commands like `sed -i` behave differently on macOS vs Linux. Piping, `jq`, `grep`, `awk` — all of these have cross-platform pitfalls that Python's standard library avoids entirely.

**Safe bash commands** that work everywhere and remain fine to use directly:

| Command | Purpose |
| ------- | ------- |
| `git`, `gh` | Version control and GitHub CLI |
| `uv run` | Python script execution |
| `npm`, `npx`, `pnpm` | Node.js ecosystem |
| `mkdir -p` | Directory creation |

Everything beyond that list should be a Python script.

## Standard Library First

Python's standard library covers most script needs without any external dependencies. Stdlib-only scripts run with plain `python3`, need no special tooling, and have zero supply-chain risk.

| Need | Standard Library |
| ---- | ---------------- |
| JSON parsing | `json` |
| Path handling | `pathlib` |
| Pattern matching | `re` |
| CLI interface | `argparse` |
| Text comparison | `difflib` |
| Counting, grouping | `collections` |
| Source analysis | `ast` |
| Data formats | `csv`, `xml.etree` |

Only reach for external dependencies when the stdlib genuinely cannot do the job — `tiktoken` for accurate token counting, `pyyaml` for YAML parsing, `jsonschema` for schema validation. Each external dependency adds install-time cost, requires `uv` to be available, and expands the supply-chain surface. The BMad builders require explicit user approval for any external dependency during the build process.

## Zero-Friction Dependencies with PEP 723

Python scripts in skills use [PEP 723](https://peps.python.org/pep-0723/) inline metadata to declare their dependencies directly in the file. Combined with `uv run`, this gives you `npx`-like behavior — dependencies are silently cached in an isolated environment, no global installs, no user prompts.

```python
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.10"
# dependencies = ["pyyaml>=6.0"]
# ///

import yaml
# script logic here
```

When a skill invokes this script with `uv run scripts/analyze.py`, the dependency (`pyyaml` in this example) is automatically resolved. The user never sees an install prompt, never needs to manage a virtual environment, and never pollutes their global Python installation.

**Why this matters for skill authoring:** Without PEP 723, skills that needed libraries like `pyyaml` or `tiktoken` would force users to run `pip install` — a jarring, trust-breaking experience that makes users hesitate to adopt the skill.

## Graceful Degradation

Skills run in multiple environments: CLI terminals, desktop apps, IDE extensions, and web interfaces like claude.ai. Not all environments can execute Python scripts.

The principle: **scripts are the fast, reliable path — but the skill must still deliver its outcome when execution is unavailable.**

When a script cannot run, the LLM performs the equivalent work directly. This is slower and less deterministic, but the user still gets a result. The script's `--help` output documents what it checks, making the fallback natural — the LLM reads the help to understand the script's purpose and replicates the logic.

Frame script steps as outcomes in the SKILL.md, not just commands:

| Approach | Example |
| -------- | ------- |
| **Good** | "Validate path conventions (run `scripts/scan-paths.py --help` for details)" |
| **Fragile** | "Execute `python3 scripts/scan-paths.py`" with no context |

The good version tells the LLM both what to accomplish and where to find the details — enabling graceful degradation without additional instructions.

## When to Reach for a Script

Look for these signal verbs in a skill's requirements — they indicate script opportunities:

| Signal | Script Type |
| ------ | ----------- |
| "validate", "check", "verify" | Validation |
| "count", "tally", "aggregate" | Metrics |
| "extract", "parse", "pull from" | Data extraction |
| "convert", "transform", "format" | Transformation |
| "compare", "diff", "match against" | Comparison |
| "scan for", "find all", "list all" | Pattern scanning |

The builders guide you through script opportunity discovery during the build process. The key insight: if you find yourself writing detailed validation logic in a prompt, it almost certainly belongs in a script instead.
4 changes: 3 additions & 1 deletion skills/bmad-agent-builder/build-process.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ Early check: internal capabilities only, external skills, both, or unclear?

**Script Opportunity Discovery** (active probing — do not skip):

Identify deterministic operations that should be scripts. Load `./references/script-opportunities-reference.md` for guidance. Confirm the script-vs-prompt plan with the user before proceeding.
Identify deterministic operations that should be scripts. Load `./references/script-opportunities-reference.md` for guidance. Confirm the script-vs-prompt plan with the user before proceeding. If any scripts require external dependencies (anything beyond Python's standard library), explicitly list each dependency and get user approval — dependencies add install-time cost and require `uv` to be available.

## Phase 3: Gather Requirements

Expand Down Expand Up @@ -125,6 +125,8 @@ Activation is a single flow regardless of mode. It should:
- If headless, route to `./references/autonomous-wake.md`
- If interactive, greet the user and continue from memory context or offer capabilities

**If the built agent includes scripts**, also load `./references/script-standards.md` — ensures PEP 723 metadata, correct shebangs, and `uv run` invocation from the start.

**Lint gate** — after building, validate and auto-fix:

If subagents available, delegate lint-fix to a subagent. Otherwise run inline.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,10 +48,12 @@ Beyond obvious validation, consider:
- Could metric collection feed into LLM decision-making without the LLM doing the counting?

### Your Toolbox
Scripts have access to full capabilities — think broadly:
- **Bash**: Full shell — `jq`, `grep`, `awk`, `sed`, `find`, `diff`, `wc`, `sort`, `uniq`, `curl`, plus piping and composition
- **Python**: Standard library (`json`, `yaml`, `pathlib`, `re`, `argparse`, `collections`, `difflib`, `ast`, `csv`, `xml`, etc.) plus PEP 723 inline-declared dependencies (`tiktoken`, `jsonschema`, `pyyaml`, etc.)
- **System tools**: `git` commands for history/diff/blame, filesystem operations, process execution

**Python is the default** for all script logic (cross-platform: macOS, Linux, Windows/WSL). See `references/script-standards.md` for full rationale and safe bash commands.

- **Python:** Standard library (`json`, `pathlib`, `re`, `argparse`, `collections`, `difflib`, `ast`, `csv`, `xml`, etc.) plus PEP 723 inline-declared dependencies (`tiktoken`, `jsonschema`, `pyyaml`, etc.)
- **Safe shell commands:** `git`, `gh`, `uv run`, `npm`/`npx`/`pnpm`, `mkdir -p`
- **Avoid bash for logic** — no piping, `jq`, `grep`, `sed`, `awk`, `find`, `diff`, `wc` in scripts. Use Python equivalents instead.

If you can express the logic as deterministic code, it's a script candidate.

Expand Down
89 changes: 89 additions & 0 deletions skills/bmad-agent-builder/references/script-standards.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Script Creation Standards

When building scripts for a skill, follow these standards to ensure portability and zero-friction execution. Skills must work across macOS, Linux, and Windows (native, Git Bash, and WSL).

## Python Over Bash

**Always favor Python for script logic.** Bash is not portable — it fails or behaves inconsistently on Windows (Git Bash is MSYS2-based, not a full Linux shell; WSL bash can conflict with Git Bash on PATH; PowerShell is a different language entirely). Python with `uv run` works identically on all platforms.

**Safe bash commands** — these work reliably across all environments and are fine to use directly:
- `git`, `gh` — version control and GitHub CLI
- `uv run` — Python script execution with automatic dependency handling
- `npm`, `npx`, `pnpm` — Node.js ecosystem
- `mkdir -p` — directory creation
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In skills/bmad-agent-builder/references/script-standards.md line 13, listing mkdir -p as “works reliably across all environments” isn’t correct for Windows-native shells (PowerShell/cmd don’t support -p). This could cause generated skills to fail if they follow the “safe bash commands” list on Windows.

Severity: medium

Other Locations
  • skills/bmad-workflow-builder/references/script-standards.md:13
  • docs/explanation/scripts-in-skills.md:57

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.


**Everything else should be Python** — piping, `jq`, `grep`, `sed`, `awk`, `find`, `diff`, `wc`, and any non-trivial logic. Even `sed -i` behaves differently on macOS vs Linux. If it's more than a single safe command, write a Python script.

## Favor the Standard Library

Always prefer Python's standard library over external dependencies. The stdlib is pre-installed everywhere, requires no `uv run`, and has zero supply-chain risk. Common stdlib modules that cover most script needs:

- `json` — JSON parsing and output
- `pathlib` — cross-platform path handling
- `re` — pattern matching
- `argparse` — CLI interface
- `collections` — counters, defaultdicts
- `difflib` — text comparison
- `ast` — Python source analysis
- `csv`, `xml.etree` — data formats

Only pull in external dependencies when the stdlib genuinely cannot do the job (e.g., `tiktoken` for accurate token counting, `pyyaml` for YAML parsing, `jsonschema` for schema validation). **External dependencies must be confirmed with the user during the build process** — they add install-time cost, supply-chain surface, and require `uv` to be available.

## PEP 723 Inline Metadata (Required)

Every Python script MUST include a PEP 723 metadata block. For scripts with external dependencies, use the `uv run` shebang:

```python
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.10"
# dependencies = ["pyyaml>=6.0", "jsonschema>=4.0"]
# ///
```

For scripts using only the standard library, use a plain Python shebang but still include the metadata block:

```python
#!/usr/bin/env python3
# /// script
# requires-python = ">=3.10"
# ///
```

**Key rules:**
- The shebang MUST be line 1 — before the metadata block
- Always include `requires-python`
- List all external dependencies with version constraints
- Never use `requirements.txt`, `pip install`, or expect global package installs
- The shebang is a Unix convenience — cross-platform invocation relies on `uv run scripts/foo.py`, not `./scripts/foo.py`

## Invocation in SKILL.md

How a built skill's SKILL.md should reference its scripts:

- **Scripts with external dependencies:** `uv run scripts/analyze.py {args}`
- **Stdlib-only scripts:** `python3 scripts/scan.py {args}` (also fine to use `uv run` for consistency)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In skills/bmad-agent-builder/references/script-standards.md line 65, the guidance uses python3 ... for stdlib-only scripts, but python3 isn’t a reliable command on Windows-native installs (often python or py). Since the doc claims native Windows support, these examples may break when followed verbatim.

Severity: medium

Other Locations
  • skills/bmad-workflow-builder/references/script-standards.md:65
  • docs/explanation/scripts-in-skills.md:63
  • docs/explanation/scripts-in-skills.md:110
  • skills/bmad-workflow-builder/references/script-standards.md:47
  • skills/bmad-agent-builder/references/script-standards.md:47
  • skills/bmad-workflow-builder/references/script-standards.md:77

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.


`uv run` reads the PEP 723 metadata, silently caches dependencies in an isolated environment, and runs the script — no user prompt, no global install. Like `npx` for Python.

## Graceful Degradation

Skills may run in environments where Python or `uv` is unavailable (e.g., claude.ai web). Scripts should be the fast, reliable path — but the skill must still deliver its outcome when execution is not possible.

**Pattern:** When a script cannot execute, the LLM performs the equivalent work directly. The script's `--help` documents what it checks, making this fallback natural. Design scripts so their logic is understandable from their help output and the skill's context.

In SKILL.md, frame script steps as outcomes, not just commands:
- Good: "Validate path conventions (run `scripts/scan-paths.py --help` for details)"
- Avoid: "Execute `python3 scripts/scan-paths.py`" with no context about what it does

## Script Interface Standards

- Implement `--help` via `argparse` (single source of truth for the script's API)
- Accept target path as a positional argument
- `-o` flag for output file (default to stdout)
- Diagnostics and progress to stderr
- Exit codes: 0=pass, 1=fail, 2=error
- `--verbose` flag for debugging
- Output valid JSON to stdout
- No interactive prompts, no network dependencies
- Tests in `scripts/tests/`
4 changes: 3 additions & 1 deletion skills/bmad-workflow-builder/build-process.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ Work through conversationally, adapted per skill type. Glean from what the user
- **Role guidance:** Brief "Act as a [role/expert]" primer
- **Design rationale:** Non-obvious choices the executing agent should understand
- **External skills used:** Which skills does this invoke?
- **Script Opportunity Discovery** — Walk through planned steps with the user. Identify deterministic operations that should be scripts not prompts. Load `./references/script-opportunities-reference.md` for guidance. Confirm the script-vs-prompt plan.
- **Script Opportunity Discovery** — Walk through planned steps with the user. Identify deterministic operations that should be scripts not prompts. Load `./references/script-opportunities-reference.md` for guidance. Confirm the script-vs-prompt plan. If any scripts require external dependencies (anything beyond Python's standard library), explicitly list each dependency and get user approval before proceeding — dependencies add install-time cost and require `uv` to be available.
- **Creates output documents?** If yes, will use `{document_output_language}`

**Simple Utility additional:**
Expand Down Expand Up @@ -130,6 +130,8 @@ Load the template from `./assets/SKILL-template.md` and `./references/template-s
| **`./assets/`** | Templates, starter files | Copied/transformed into output |
| **`./scripts/`** | Python, shell scripts with tests | Invoked for deterministic operations |

**If the built skill includes scripts**, also load `./references/script-standards.md` — ensures PEP 723 metadata, correct shebangs, and `uv run` invocation from the start.

**Lint gate** — after building, validate and auto-fix:

If subagents available, delegate lint-fix to a subagent. Otherwise run inline.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Script Opportunities Reference — Workflow Builder

**Reference: `references/script-standards.md` for script creation guidelines.**

## Core Principle

Scripts handle deterministic operations (validate, transform, count). Prompts handle judgment (interpret, classify, decide). If a check has clear pass/fail criteria, it belongs in a script.
Expand Down Expand Up @@ -42,10 +44,11 @@ When you see these in a workflow's requirements, think scripts first: "validate"

### Your Toolbox

Scripts have access to the full execution environment:
- **Bash:** `jq`, `grep`, `awk`, `sed`, `find`, `diff`, `wc`, piping and composition
- **Python:** Full standard library plus PEP 723 inline-declared dependencies (`tiktoken`, `jsonschema`, `pyyaml`, etc.)
- **System tools:** `git` for history/diff/blame, filesystem operations
**Python is the default** for all script logic (cross-platform: macOS, Linux, Windows/WSL). See `references/script-standards.md` for full rationale and safe bash commands.

- **Python:** Full standard library (`json`, `pathlib`, `re`, `argparse`, `collections`, `difflib`, `ast`, `csv`, `xml`, etc.) plus PEP 723 inline-declared dependencies (`tiktoken`, `jsonschema`, `pyyaml`, etc.)
- **Safe shell commands:** `git`, `gh`, `uv run`, `npm`/`npx`/`pnpm`, `mkdir -p`
- **Avoid bash for logic** — no piping, `jq`, `grep`, `sed`, `awk`, `find`, `diff`, `wc` in scripts. Use Python equivalents instead.

### The --help Pattern

Expand Down
Loading
Loading