Skip to content

Comments

feat: v0.2.0 — test-driven prompt engineering#8

Merged
sbroenne merged 1 commit intomainfrom
feature/v0.2.0-ab-run-and-optimizer
Feb 19, 2026
Merged

feat: v0.2.0 — test-driven prompt engineering#8
sbroenne merged 1 commit intomainfrom
feature/v0.2.0-ab-run-and-optimizer

Conversation

@sbroenne
Copy link
Owner

Summary

Repositions pytest-codingagents as a test-driven prompt engineering platform with two new core features:

New: \�b_run\ fixture

  • Takes (baseline, treatment, task)\ → (CopilotResult, CopilotResult)\
  • Auto-creates isolated \�aseline/\ and \ reatment/\ subdirs under \ mp_path\
  • Overrides \working_directory\ on both agents (prevents shared-workspace bugs)
  • Runs sequentially (no parallel — agents may have side effects)
  • Stashes treatment result for pytest-aitest HTML reporting

New: \optimize_instruction()\ + \InstructionSuggestion\

  • Analyzes gap between current instruction and observed agent behavior
  • Uses pydantic-ai structured output (default: \openai:gpt-4o-mini)
  • Returns \InstructionSuggestion(instruction, reasoning, changes)\
  • Designed to drop into \pytest.fail()\ for actionable failure messages
  • Exported from top-level _init_.py\

Dependency

  • Added \pydantic-ai>=1.0\ as required dependency

Docs

  • README + \docs/index.md\ repositioned as 'test-driven prompt engineering'
  • New \docs/how-to/optimize.md\ how-to guide
  • \�b_run\ section added to \docs/how-to/ab-testing.md\
  • \optimize_instruction\ + \InstructionSuggestion\ added to reference API

Tests

  • Added 23 new unit tests (\ est_fixtures.py, \ est_optimizer.py)
  • All 129 unit tests pass; ruff + pyright clean

Version

0.1.2 → 0.2.0

- Add ab_run fixture: isolated baseline/treatment dirs, sequential execution,
  auto-stash treatment result for aitest reporting
- Add optimize_instruction() + InstructionSuggestion: LLM-powered instruction
  gap analysis via pydantic-ai, designed to drop into pytest.fail()
- Add pydantic-ai>=1.0 as required dependency
- Export optimize_instruction + InstructionSuggestion from top-level __init__
- Export ab_run fixture via plugin entry point
- Reposition README + docs as 'test-driven prompt engineering' headline
- Add docs/how-to/optimize.md how-to guide
- Add ab_run section to docs/how-to/ab-testing.md
- Add optimize_instruction + InstructionSuggestion to reference/api.md
- Add 23 new unit tests (test_fixtures.py, test_optimizer.py); 129 total pass
- Version bump: 0.1.2 → 0.2.0

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link

Dependency Review

The following issues were found:
  • ✅ 0 vulnerable package(s)
  • ✅ 0 package(s) with incompatible licenses
  • ✅ 0 package(s) with invalid SPDX license definitions
  • ⚠️ 1 package(s) with unknown licenses.
See the Details below.

License Issues

pyproject.toml

PackageVersionLicenseIssue Type
pydantic-ai>= 1.0NullUnknown License

OpenSSF Scorecard

PackageVersionScoreDetails
pip/pydantic-ai >= 1.0 UnknownUnknown

Scanned Files

  • pyproject.toml

@sbroenne sbroenne merged commit 2697096 into main Feb 19, 2026
15 checks passed
@sbroenne sbroenne deleted the feature/v0.2.0-ab-run-and-optimizer branch February 19, 2026 09:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant