Skip to content

Combatting cargo cult programming in Agent Instructions, Skills, and Custom Agents for GitHub Copilot and other coding agents since 2026.

License

Notifications You must be signed in to change notification settings

sbroenne/pytest-codingagents

Repository files navigation

pytest-codingagents

Test-driven prompt engineering for GitHub Copilot.

Everyone copies instruction files from blog posts, adds "you are a senior engineer" to agent configs, and includes skills found on Reddit. But does any of it work? Are your instructions making your agent better — or just longer?

You don't know, because you're not testing it.

pytest-codingagents gives you a complete test→optimize→test loop for GitHub Copilot configurations:

  1. Write a test — define what the agent should do
  2. Run it — see it fail (or pass)
  3. Optimize — call optimize_instruction() to get a concrete suggestion
  4. A/B confirm — use ab_run to prove the change actually helps
  5. Ship it — you now have evidence, not vibes

Currently supports GitHub Copilot via copilot-sdk with IDE personas for VS Code, Claude Code, and Copilot CLI environments.

from pytest_codingagents import CopilotAgent, optimize_instruction
import pytest


async def test_docstring_instruction_works(ab_run):
    """Prove the docstring instruction actually changes output, and get a fix if it doesn't."""
    baseline = CopilotAgent(instructions="Write Python code.")
    treatment = CopilotAgent(
        instructions="Write Python code. Add Google-style docstrings to every function."
    )

    b, t = await ab_run(baseline, treatment, "Create math.py with add(a, b) and subtract(a, b).")

    assert b.success and t.success

    if '"""' not in t.file("math.py"):
        suggestion = await optimize_instruction(
            treatment.instructions or "",
            t,
            "Agent should add docstrings to every function.",
        )
        pytest.fail(f"Docstring instruction was ignored.\n\n{suggestion}")

    assert '"""' not in b.file("math.py"), "Baseline should not have docstrings"

Install

uv add pytest-codingagents

Authenticate via GITHUB_TOKEN env var (CI) or gh auth status (local).

What You Can Test

Capability What it proves Guide
A/B comparison Config B actually produces different (and better) output than Config A Getting Started
Instruction optimization Turn a failing test into a ready-to-use instruction fix Optimize Instructions
Instructions Your custom instructions change agent behavior — not just vibes Getting Started
Skills That domain knowledge file is helping, not being ignored Skill Testing
Models Which model works best for your use case and budget Model Comparison
Custom Agents Your custom agent configurations actually work as intended Getting Started
MCP Servers The agent discovers and uses your custom tools MCP Server Testing
CLI Tools The agent operates command-line interfaces correctly CLI Tool Testing

AI Analysis

See it in action: Basic Report · Model Comparison · Instruction Testing

Every test run produces an HTML report with AI-powered insights:

  • Diagnoses failures — root cause analysis with suggested fixes
  • Compares models — leaderboards ranked by pass rate and cost
  • Evaluates instructions — which instructions produce better results
  • Recommends improvements — actionable changes to tools, instructions, and skills
uv run pytest tests/ --aitest-html=report.html --aitest-summary-model=azure/gpt-5.2-chat

Documentation

Full docs at sbroenne.github.io/pytest-codingagents — API reference, how-to guides, and demo reports.

License

MIT

About

Combatting cargo cult programming in Agent Instructions, Skills, and Custom Agents for GitHub Copilot and other coding agents since 2026.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages