pytest-codingagents

Test-driven prompt engineering for GitHub Copilot.

Everyone copies instruction files from blog posts, adds "you are a senior engineer" to agent configs, and includes skills found on Reddit. But does any of it work? Are your instructions making your agent better — or just longer?

You don't know, because you're not testing it.

pytest-codingagents gives you a complete test→optimize→test loop for GitHub Copilot configurations:

Write a test — define what the agent should do
Run it — see it fail (or pass)
Optimize — call optimize_instruction() to get a concrete suggestion
A/B confirm — use ab_run to prove the change actually helps
Ship it — you now have evidence, not vibes

Currently supports GitHub Copilot via copilot-sdk with IDE personas for VS Code, Claude Code, and Copilot CLI environments.

from pytest_codingagents import CopilotAgent, optimize_instruction
import pytest


async def test_docstring_instruction_works(ab_run):
    """Prove the docstring instruction actually changes output, and get a fix if it doesn't."""
    baseline = CopilotAgent(instructions="Write Python code.")
    treatment = CopilotAgent(
        instructions="Write Python code. Add Google-style docstrings to every function."
    )

    b, t = await ab_run(baseline, treatment, "Create math.py with add(a, b) and subtract(a, b).")

    assert b.success and t.success

    if '"""' not in t.file("math.py"):
        suggestion = await optimize_instruction(
            treatment.instructions or "",
            t,
            "Agent should add docstrings to every function.",
        )
        pytest.fail(f"Docstring instruction was ignored.\n\n{suggestion}")

    assert '"""' not in b.file("math.py"), "Baseline should not have docstrings"

Install

uv add pytest-codingagents

Authenticate via GITHUB_TOKEN env var (CI) or gh auth status (local).

What You Can Test

Capability	What it proves	Guide
A/B comparison	Config B actually produces different (and better) output than Config A	Getting Started
Instruction optimization	Turn a failing test into a ready-to-use instruction fix	Optimize Instructions
Instructions	Your custom instructions change agent behavior — not just vibes	Getting Started
Skills	That domain knowledge file is helping, not being ignored	Skill Testing
Models	Which model works best for your use case and budget	Model Comparison
Custom Agents	Your custom agent configurations actually work as intended	Getting Started
MCP Servers	The agent discovers and uses your custom tools	MCP Server Testing
CLI Tools	The agent operates command-line interfaces correctly	CLI Tool Testing

AI Analysis

See it in action: Basic Report · Model Comparison · Instruction Testing

Every test run produces an HTML report with AI-powered insights:

Diagnoses failures — root cause analysis with suggested fixes
Compares models — leaderboards ranked by pass rate and cost
Evaluates instructions — which instructions produce better results
Recommends improvements — actionable changes to tools, instructions, and skills

uv run pytest tests/ --aitest-html=report.html --aitest-summary-model=azure/gpt-5.2-chat

Documentation

Full docs at sbroenne.github.io/pytest-codingagents — API reference, how-to guides, and demo reports.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github		.github
docs		docs
scripts		scripts
src/pytest_codingagents		src/pytest_codingagents
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pytest-codingagents

Install

What You Can Test

AI Analysis

Documentation

License

About

Uh oh!

Releases 9

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

sbroenne/pytest-codingagents

Folders and files

Latest commit

History

Repository files navigation

pytest-codingagents

Install

What You Can Test

AI Analysis

Documentation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages