feat: implement pytest plugin for AgentUnit scenario discovery (resolves #22) by sshekhar563 · Pull Request #39 · aviralgarg05/agentunit

sshekhar563 · 2025-12-15T12:30:08Z

🎯 Overview #22

This PR implements a comprehensive pytest plugin that enables automatic discovery and execution of AgentUnit scenarios as pytest tests, resolving issue #22.

✨ Features Added

🔍 Core Plugin Functionality

Automatic scenario discovery from tests/eval/ directory
Python file support with Scenario objects and scenario_* functions
Config file support for YAML/JSON (with nocode module integration)
Pytest markers (@pytest.mark.agentunit, @pytest.mark.scenario)
Native test execution using AgentUnit's run_suite function
Robust error handling for failed scenario loading

🛠️ CLI Tool

agentunit-init-eval command for quick setup
Generates example scenario files with correct API usage
Supports custom directory names and example creation

📁 Files Added

src/agentunit/pytest/plugin.py - Main plugin implementation
src/agentunit/pytest/cli.py - CLI setup command
src/agentunit/pytest/__init__.py - Package initialization
tests/test_pytest_plugin.py - Comprehensive test suite (6 tests)
tests/eval/example_scenarios.py - Working example scenarios
docs/pytest-plugin.md - Complete documentation

⚙️ Configuration

Added pytest entry point in pyproject.toml
Plugin auto-registers when AgentUnit is installed

🔧 API Corrections

✅ Updated all examples to use adapter parameter instead of deprecated agent
✅ Created SimpleAdapter class for function-based agents
✅ Fixed CLI-generated examples to use proper adapter pattern
✅ Updated documentation with correct API usage

🧪 Testing & Quality

✅ 6/6 tests passing with comprehensive coverage
✅ Tests for discovery, execution, success/failure scenarios, and error handling
✅ Code passes ruff formatting and linting
✅ No type checking diagnostics
✅ Proper mock objects for pytest integration testing

📖 Usage Example

1. Initialize evaluation directory:

agentunit-init-eval -d tests/eval -e



<!-- This is an auto-generated comment: release notes by coderabbit.ai -->
## Summary by CodeRabbit

* **New Features**
  * Pytest plugin to discover and run AgentUnit scenarios as pytest tests
  * CLI command to scaffold evaluation directories with optional example scenarios
  * Scenario discovery from Python, YAML, and JSON formats

* **Documentation**
  * Comprehensive pytest plugin docs covering install, usage, config, examples, and troubleshooting

* **Tests**
  * Tests for discovery, execution, failure reporting, and pytest marker integration

* **Chores**
  * Updated dependency constraints, added an optional integration extra, and pytest entry points

<sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

continue · 2025-12-15T12:30:15Z

Learn more

All Green is an AI agent that automatically:

✅ Addresses code review comments

✅ Fixes failing CI checks

✅ Resolves merge conflicts

coderabbitai · 2025-12-15T12:30:19Z

Warning

Rate limit exceeded

@sshekhar563 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 1 minutes and 5 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between a09f846 and e05a1dc.

📒 Files selected for processing (2)

tests/eval/example_scenarios.py (1 hunks)
tests/eval/failing_scenario.py (1 hunks)

Walkthrough

Adds an in-repo pytest plugin and CLI to discover and run AgentUnit scenarios under tests/eval, example scenario modules and tests, documentation for the plugin, and packaging/config updates to register the plugin, CLI, and pytest options.

Changes

Cohort / File(s)	Summary
Documentation `docs/pytest-plugin.md`	New comprehensive documentation for the AgentUnit pytest plugin: installation, usage (discovery, Python scenario files, fixtures), markers, results, configuration, examples, advanced usage, error handling, and troubleshooting.
Packaging & Config `pyproject.toml`	Widened `langchain` upper bound to `<0.4.0`; added optional `langgraph` dependency and `integration-tests` extra; added `agentunit-init-eval` CLI script; registered pytest plugin entry; added `tool.pytest.ini_options` (markers, testpaths, python_files/classes/functions, addopts).
Pytest package entry `src/agentunit/pytest/__init__.py`	New package init that re-exports `pytest_collect_file` and `pytest_configure` from the plugin module and updates `__all__`.
Pytest plugin core `src/agentunit/pytest/plugin.py`	New pytest plugin implementing `pytest_configure` and `pytest_collect_file`; introduces `AgentUnitFile` collector (discovers scenarios from Python and config files) and `AgentUnitItem` test item (executes scenarios via run_suite, aggregates/reports failures, surfaces load errors, attaches markers).
CLI `src/agentunit/pytest/cli.py`	New `init_eval` CLI command to create an evaluation directory, optional example scenario module, and README guidance; provides --directory and --example flags.
Example scenarios `tests/eval/__init__.py`, `tests/eval/example_scenarios.py`, `tests/eval/failing_scenario.py`	Added evaluation package marker and example scenario modules: adapters, datasets, agent functions, a passing scenario and a failing scenario to exercise plugin behavior.
Plugin tests `tests/test_pytest_plugin.py`	New tests covering eval-directory detection, Python-file scenario discovery, AgentUnitItem success/failure/load-error behaviors, and marker generation using mocked pytest contexts and adapters.

Sequence Diagram(s)

sequenceDiagram
    participant Pytest
    participant Plugin as pytest_collect_file / pytest_configure
    participant Collector as AgentUnitFile
    participant Discovery as ScenarioDiscovery
    participant Item as AgentUnitItem
    participant Runner as Adapter/Runner
    participant Reporter

    Pytest->>Plugin: scan repository files
    Plugin->>Collector: create collector for matching file (tests/eval/*)
    Collector->>Discovery: discover scenarios (Python modules or config)
    Discovery-->>Collector: list of Scenario objects
    Collector->>Pytest: yield AgentUnitItem per Scenario
    Pytest->>Item: invoke runtest()
    Item->>Runner: execute scenario cases
    Runner-->>Item: AdapterOutcome(s) per case
    Item->>Item: validate outcomes, aggregate failures
    alt all pass
        Item-->>Pytest: test passes
    else failures present
        Item->>Reporter: build aggregated AssertionError
        Reporter-->>Pytest: test fails with detailed report
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Focus review on:
- src/agentunit/pytest/plugin.py: scenario discovery (Python vs config), safe dynamic import handling, mapping load/runtime errors to pytest failures, and marker/report formatting.
- src/agentunit/pytest/cli.py: path creation, idempotency, and correctness of generated example files.
- pyproject.toml: correctness of plugin and script entries and pytest.ini options.
- tests/test_pytest_plugin.py: ensure mocks accurately reflect pytest internals and cover failure branches.

Possibly related issues

Create pytest plugin for scenario discovery #22 — Implements an in-repo pytest plugin with auto-discovery, markers, and reporting for scenarios under tests/eval, matching the issue's requested feature.

Possibly related PRs

Add LangGraph integration tests (#24) #36 — Overlaps on pyproject.toml changes (langchain bound, optional langgraph, pytest.ini options); likely closely related to packaging/config edits in this PR.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is incomplete. While it provides a detailed overview of features and changes, it does not fill out the required template sections: Type of Change checkbox is not explicitly selected, Testing section is incomplete, Code Quality checks are missing, Documentation updates are not verified, and many other required checklist items are unchecked.	Complete the PR description by filling out all required template sections: select Type of Change, verify testing procedures, confirm code quality checks, document changes made, update CHANGELOG.md, and complete the final checklist items.
Docstring Coverage	⚠️ Warning	Docstring coverage is 46.67% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: implementing a pytest plugin for AgentUnit scenario discovery. It is concise, specific, and directly reflects the primary feature addition in the changeset.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (6)

src/agentunit/pytest/cli.py (1)

23-35: CLI scaffolding looks solid; consider reflecting custom --directory in messages

The command correctly creates the eval package, example scenarios, and README, and the embedded example code uses the adapter/dataset API appropriately.

One small UX nit: the README content and the final “Next steps” message hardcode tests/eval/, so when users pass a custom --directory, the suggested pytest invocations become misleading. Consider interpolating eval_dir into those strings so the guidance matches the actual target directory.

Also applies to: 36-39, 150-199
docs/pytest-plugin.md (1)
154-166: Add languages to a couple of fenced code blocks (markdownlint MD040)

The content is clear, but markdownlint is complaining about two unlabeled fenced blocks (directory tree and pytest output). You can silence MD040 and improve tooling support by tagging them as text:
-```
+```text
 project/
 ├── tests/
@@ -193,7 +193,7 @@ Scenarios appear in pytest output with descriptive names:
 
-```
+```text
 tests/eval/basic_scenarios.py::agentunit::greeting-test PASSED
 tests/eval/basic_scenarios.py::agentunit::math-test FAILED
Also applies to: 196-199

</blockquote></details>
<details>
<summary>tests/test_pytest_plugin.py (1)</summary><blockquote>

`12-31`: **Test coverage for the pytest plugin is strong and well-targeted**

The local `SimpleTestAdapter` plus the mock config/session/parent scaffolding give you focused, deterministic tests that exercise:

- `_is_eval_directory` semantics.
- Python-based scenario discovery via `AgentUnitFile._discover_scenarios`.
- Success, failure, and load-error paths in `AgentUnitItem.runtest`.
- Automatic `agentunit` and `scenario` marker attachment.

This is a solid baseline for the plugin. If you want to extend coverage later, one natural addition would be a test for config-based (`.yaml`/`.json`) scenario discovery via the nocode integration, but that’s not blocking.  




Also applies to: 59-233

</blockquote></details>
<details>
<summary>src/agentunit/pytest/plugin.py (3)</summary><blockquote>

`28-38`: **Incorrect return type annotation.**

The function returns `AgentUnitFile` (which extends `pytest.File`), but the annotation says `Module | None`. This should be `AgentUnitFile | None` or the more general `pytest.File | None`.



```diff
-def pytest_collect_file(file_path: Path, parent: Collector) -> Module | None:
+def pytest_collect_file(file_path: Path, parent: Collector) -> AgentUnitFile | None:
Also remove the unused Module import from the TYPE_CHECKING block.

166-169: Add defensive check before accessing result.scenarios[0].

Direct index access assumes run_suite always returns a non-empty scenarios list. If run_suite fails silently or has an edge case returning an empty list, this would raise an unhelpful IndexError.
         # Run the scenario using AgentUnit
         result = run_suite([self.scenario])

         # Check if the scenario passed
-        scenario_result = result.scenarios[0]
+        if not result.scenarios:
+            raise AgentUnitError(f"No results returned for scenario '{self.scenario.name}'")
+        scenario_result = result.scenarios[0]
89-97: Silent failure for scenario factory functions.

Scenario factory functions (scenario_*) that require arguments will silently fail. Consider adding a debug log or documenting this behavior clearly so users know their factories must be zero-argument callables.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a55eacf and 273233f.

📒 Files selected for processing (9)

docs/pytest-plugin.md (1 hunks)
pyproject.toml (4 hunks)
src/agentunit/pytest/__init__.py (1 hunks)
src/agentunit/pytest/cli.py (1 hunks)
src/agentunit/pytest/plugin.py (1 hunks)
tests/eval/__init__.py (1 hunks)
tests/eval/example_scenarios.py (1 hunks)
tests/eval/failing_scenario.py (1 hunks)
tests/test_pytest_plugin.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

tests/test_pytest_plugin.py (1)

src/agentunit/pytest/plugin.py (5)

AgentUnitFile (47-133)

AgentUnitItem (136-190)

_is_eval_directory (41-44)

_discover_scenarios (62-71)

runtest (157-180)

tests/eval/failing_scenario.py (2)

src/agentunit/core/scenario.py (1)

Scenario (21-270)

src/agentunit/datasets/base.py (3)

DatasetCase (19-27)

DatasetSource (30-57)

name (38-39)

tests/eval/example_scenarios.py (3)

src/agentunit/core/scenario.py (1)

Scenario (21-270)

src/agentunit/adapters/base.py (2)

AdapterOutcome (18-25)

BaseAdapter (28-45)

src/agentunit/datasets/base.py (3)

DatasetCase (19-27)

DatasetSource (30-57)

name (38-39)

🪛 markdownlint-cli2 (0.18.1)

docs/pytest-plugin.md

154-154: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

196-196: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (3)

tests/eval/__init__.py (1)

1-1: Package marker is fine

The module-level comment clearly documents the eval package purpose; no issues.

src/agentunit/pytest/__init__.py (1)

1-6: Plugin re-exports look correct

Re-exporting pytest_collect_file and pytest_configure and constraining __all__ to these hooks is a clean, minimal public surface for the pytest integration.

pyproject.toml (1)

27-28: Packaging and pytest wiring are consistent with the new plugin/CLI

The updated dependency bounds, langgraph extra, new agentunit-init-eval script, pytest11 entry for agentunit.pytest.plugin, and pytest ini options all line up with the new pytest plugin and CLI behavior introduced in this PR. Nothing stands out as inconsistent from a packaging/configuration perspective.

Also applies to: 37-38, 39-42, 48-53, 63-79

coderabbitai · 2025-12-15T12:36:13Z

+    def _import_module(self) -> Any:
+        """Import Python module from file path."""
+        try:
+            import importlib.util
+            import sys
+
+            spec = importlib.util.spec_from_file_location(self.path.stem, self.path)
+            if spec is None or spec.loader is None:
+                return None
+
+            module = importlib.util.module_from_spec(spec)
+            sys.modules[self.path.stem] = module
+            spec.loader.exec_module(module)
+            return module
+        except Exception:
+            return None


⚠️ Potential issue | 🟡 Minor

Potential module name collision in sys.modules.

Using only self.path.stem as the module name can cause collisions if two scenario files share the same filename in different subdirectories (e.g., tests/eval/scenarios.py and tests/eval/advanced/scenarios.py). The second file would retrieve the cached module from the first import.

Consider using a unique module name derived from the full path:

def _import_module(self) -> Any: """Import Python module from file path.""" try: import importlib.util import sys - spec = importlib.util.spec_from_file_location(self.path.stem, self.path) + # Use a unique module name to avoid collisions + module_name = f"agentunit_eval.{self.path.stem}_{hash(self.path)}" + spec = importlib.util.spec_from_file_location(module_name, self.path) if spec is None or spec.loader is None: return None module = importlib.util.module_from_spec(spec) - sys.modules[self.path.stem] = module + sys.modules[module_name] = module spec.loader.exec_module(module) return module except Exception: return None

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _import_module(self) -> Any:

"""Import Python module from file path."""

try:

import importlib.util

import sys

spec = importlib.util.spec_from_file_location(self.path.stem, self.path)

if spec is None or spec.loader is None:

return None

module = importlib.util.module_from_spec(spec)

sys.modules[self.path.stem] = module

spec.loader.exec_module(module)

return module

except Exception:

return None

def _import_module(self) -> Any:

"""Import Python module from file path."""

try:

import importlib.util

import sys

# Use a unique module name to avoid collisions

module_name = f"agentunit_eval.{self.path.stem}_{hash(self.path)}"

spec = importlib.util.spec_from_file_location(module_name, self.path)

if spec is None or spec.loader is None:

return None

module = importlib.util.module_from_spec(spec)

sys.modules[module_name] = module

spec.loader.exec_module(module)

return module

except Exception:

return None

🤖 Prompt for AI Agents

In src/agentunit/pytest/plugin.py around lines 118-133, the code registers the imported module in sys.modules using only self.path.stem which can collide for same filenames in different directories; change the module name to a unique, deterministic value derived from the full file path (for example use self.path.resolve().as_posix() or a stable hash of that path) when calling importlib.util.spec_from_file_location and when inserting into sys.modules so each file gets its own module entry and avoids accidental reuse.

aviralgarg05

Fix the issues that have come up in review, also resolve the conflicts

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

tests/eval/failing_scenario.py (2)
19-26: Consider prefixing unused parameter with underscore.

The trace parameter is required by the BaseAdapter.execute signature but is not used in this implementation. Following Python convention, consider renaming it to _trace to signal that it's intentionally unused.
🔎 Apply this diff to follow convention:
-    def execute(self, case, trace):
+    def execute(self, case, _trace):
         try:
             result = self.agent_func({"query": case.query})
             output = result.get("result", "")
8-27: Consider extracting SimpleAdapter to a shared test utility module.

SimpleAdapter is duplicated between this file and tests/eval/example_scenarios.py. To improve maintainability and reduce duplication, consider extracting it to a shared location such as tests/eval/utils.py or tests/conftest.py.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 273233f and 63170ce.

📒 Files selected for processing (1)

tests/eval/failing_scenario.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

tests/eval/failing_scenario.py (3)

src/agentunit/core/scenario.py (1)

Scenario (21-270)

src/agentunit/adapters/base.py (2)

AdapterOutcome (18-25)

BaseAdapter (28-45)

src/agentunit/datasets/base.py (3)

DatasetCase (19-27)

DatasetSource (30-57)

name (38-39)

🔇 Additional comments (2)

tests/eval/failing_scenario.py (2)

29-48: LGTM! Failing scenario is correctly implemented.

The FailingDataset and always_wrong_agent are intentionally designed to produce a failing test case. The implementation correctly follows the DatasetSource interface, and the mismatch between the agent's output ("Wrong answer") and the expected output ("42") ensures the scenario will fail as intended.

51-56: LGTM! The scenario correctly uses the adapter parameter.

The failing_scenario is properly configured with adapter=SimpleAdapter(always_wrong_agent), which addresses the previous review feedback about using the deprecated agent= parameter. The scenario will be correctly discovered and executed by the pytest plugin.

sshekhar563 · 2025-12-18T11:29:47Z

@aviralgarg05 I have fix the issue

aviralgarg05 · 2025-12-18T12:04:49Z

Resolve the conflicts pls @sshekhar563

Resolves aviralgarg05#22 This commit implements a comprehensive pytest plugin that enables automatic discovery and execution of AgentUnit scenarios as pytest tests. ## Features Added: ### Core Plugin Functionality: - Automatic scenario discovery from tests/eval/ directory - Support for Python files with Scenario objects and scenario_* functions - Support for YAML/JSON config files (with nocode module integration) - Pytest markers (@pytest.mark.agentunit, @pytest.mark.scenario) - Proper test execution using AgentUnit's run_suite function - Comprehensive error handling for failed scenario loading ### CLI Tool: - agentunit-init-eval command for directory setup - Generates example scenario files with correct API usage - Supports custom directory names and example creation ### Files Added: - src/agentunit/pytest/plugin.py - Main plugin implementation - src/agentunit/pytest/cli.py - CLI command for setup - src/agentunit/pytest/__init__.py - Package initialization - tests/test_pytest_plugin.py - Comprehensive test suite (6 tests) - tests/eval/example_scenarios.py - Example scenarios - docs/pytest-plugin.md - Complete documentation ### Configuration: - Added pytest entry point in pyproject.toml - Plugin auto-registers when AgentUnit is installed ## API Corrections: - Updated all examples to use 'adapter' parameter instead of deprecated 'agent' - Created SimpleAdapter class for function-based agents - Fixed CLI-generated examples to use proper adapter pattern ## Testing: - All 6 plugin tests pass - Comprehensive test coverage for discovery, execution, and error handling - Code passes ruff formatting and linting - No type checking diagnostics ## Usage: 1. Install AgentUnit (plugin auto-registers) 2. Run: agentunit-init-eval -d tests/eval -e 3. Create scenario files in tests/eval/ 4. Run: pytest tests/eval/ 5. Filter with: pytest -m agentunit The plugin integrates seamlessly with pytest's discovery mechanism and provides a natural way to run AgentUnit evaluations as part of test suites.

…agent parameter - Fixed failing_scenario.py to use the correct Scenario API with adapter parameter - Added SimpleAdapter class for proper adapter pattern - Fixed import ordering and formatting - Ensures all example scenarios use consistent, modern API

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

src/agentunit/pytest/plugin.py (1)

118-133: Address potential module name collision.

Using only self.path.stem as the module name can cause collisions if scenario files share the same filename in different subdirectories (e.g., tests/eval/scenarios.py and tests/eval/advanced/scenarios.py). The second import would retrieve the cached module from the first.

🔎 Apply this diff to use a unique module name:

     def _import_module(self) -> Any:
         """Import Python module from file path."""
         try:
             import importlib.util
             import sys

-            spec = importlib.util.spec_from_file_location(self.path.stem, self.path)
+            # Use unique module name to avoid collisions
+            module_name = f"agentunit_eval_{self.path.stem}_{hash(str(self.path.resolve()))}"
+            spec = importlib.util.spec_from_file_location(module_name, self.path)
             if spec is None or spec.loader is None:
                 return None

             module = importlib.util.module_from_spec(spec)
-            sys.modules[self.path.stem] = module
+            sys.modules[module_name] = module
             spec.loader.exec_module(module)
             return module
         except Exception:
             return None

🧹 Nitpick comments (1)

tests/eval/failing_scenario.py (1)
8-26: Consider consolidating duplicate SimpleAdapter implementations.

The SimpleAdapter class is duplicated across multiple files in this PR:

tests/eval/failing_scenario.py (lines 8-26)

tests/eval/example_scenarios.py (referenced in AI summary)

tests/test_pytest_plugin.py (lines 12-30 as SimpleTestAdapter)

src/agentunit/pytest/cli.py (lines 47-65 in generated example)

This duplication makes maintenance harder and increases the risk of inconsistencies. Consider extracting it to a shared location like tests/eval/adapters.py or tests/conftest.py.
🔎 Example consolidation approach:

Create tests/eval/adapters.py:
"""Shared adapters for testing."""

from agentunit.adapters.base import AdapterOutcome, BaseAdapter


class SimpleAdapter(BaseAdapter):
    """Simple adapter for function-based agents."""

    name = "simple"

    def __init__(self, agent_func):
        self.agent_func = agent_func

    def prepare(self):
        pass

    def execute(self, case, trace):
        try:
            result = self.agent_func({"query": case.query})
            output = result.get("result", "")
            success = output == case.expected_output
            return AdapterOutcome(success=success, output=output)
        except Exception as e:
            return AdapterOutcome(success=False, output=None, error=str(e))
Then import in other files:
from tests.eval.adapters import SimpleAdapter

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 63170ce and dbe7bcd.

📒 Files selected for processing (9)

docs/pytest-plugin.md (1 hunks)
pyproject.toml (4 hunks)
src/agentunit/pytest/__init__.py (1 hunks)
src/agentunit/pytest/cli.py (1 hunks)
src/agentunit/pytest/plugin.py (1 hunks)
tests/eval/__init__.py (1 hunks)
tests/eval/example_scenarios.py (1 hunks)
tests/eval/failing_scenario.py (1 hunks)
tests/test_pytest_plugin.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

src/agentunit/pytest/init.py
tests/eval/init.py
tests/eval/example_scenarios.py

🧰 Additional context used

🧬 Code graph analysis (2)

tests/test_pytest_plugin.py (4)

src/agentunit/adapters/base.py (2)

AdapterOutcome (18-25)

BaseAdapter (28-45)

src/agentunit/pytest/plugin.py (1)

_is_eval_directory (41-44)

src/agentunit/datasets/base.py (3)

name (38-39)

DatasetCase (19-27)

DatasetSource (30-57)

src/agentunit/core/exceptions.py (1)

AgentUnitError (6-7)

tests/eval/failing_scenario.py (3)

src/agentunit/core/scenario.py (1)

Scenario (21-270)

src/agentunit/adapters/base.py (2)

AdapterOutcome (18-25)

BaseAdapter (28-45)

src/agentunit/datasets/base.py (3)

DatasetCase (19-27)

DatasetSource (30-57)

name (38-39)

🪛 markdownlint-cli2 (0.18.1)

docs/pytest-plugin.md

154-154: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

196-196: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (8)

src/agentunit/pytest/cli.py (1)

1-200: LGTM! CLI implementation is well-structured.

The CLI command properly:

Creates directory structure with proper defaults

Generates comprehensive example scenarios following the adapter pattern

Includes helpful README with usage instructions

Provides clear feedback to users

tests/test_pytest_plugin.py (1)

1-233: LGTM! Comprehensive test coverage.

The test suite provides good coverage of the pytest plugin functionality:

Directory detection logic

Scenario discovery from Python files (objects and factory functions)

Successful and failing scenario execution

Load error handling

Pytest marker application

All tests use appropriate mocks and fixtures, and assertions are correct.

pyproject.toml (3)

50-53: LGTM! Plugin and CLI registration is correct.

The pytest plugin is properly registered via the pytest11 entry point, and the CLI script entry for agentunit-init-eval is correctly configured.

63-78: LGTM! Pytest configuration is well-structured.

The pytest configuration includes:

Appropriate markers for agentunit and scenario filtering

Correct test paths and file patterns

Strict mode flags for better error detection

27-27: Broaden langchain version constraint to enable access to stable 1.0+ releases.

The upper bound <0.4.0 is unnecessarily restrictive. LangChain 1.0 is the first major stable release, marking a commitment to no breaking changes until 2.0. Widen the constraint to allow stable versions, or remove the upper bound entirely if the codebase has no specific dependency on 0.x APIs.

tests/eval/failing_scenario.py (1)

52-56: LGTM! Failing scenario correctly uses adapter parameter.

The scenario now properly uses the adapter parameter with SimpleAdapter, addressing the previous review comment. The scenario will correctly fail since the agent returns "Wrong answer" instead of the expected "42".

src/agentunit/pytest/plugin.py (2)

22-25: LGTM! Pytest markers are correctly registered.

The plugin properly registers the agentunit and scenario(name) markers for filtering tests.

157-180: LGTM! Scenario execution logic is robust.

The runtest method properly:

Handles load errors

Validates scenario presence

Executes scenarios via run_suite

Collects and reports failures with clear error messages

coderabbitai · 2025-12-18T12:41:08Z

+```
+project/
+├── tests/
+│   ├── eval/                    # AgentUnit scenarios
+│   │   ├── __init__.py
+│   │   ├── basic_scenarios.py   # Python scenarios
+│   │   ├── advanced_scenarios.py
+│   │   └── config_scenario.yaml # Config-based scenarios
+│   └── test_regular.py          # Regular pytest tests
+├── src/
+│   └── myproject/
+└── pyproject.toml
+```


⚠️ Potential issue | 🟡 Minor

Add language identifier to fenced code block.

The directory structure example should specify a language identifier for proper rendering.

🔎 Apply this diff to add the language identifier:

-``` +```text project/ ├── tests/ │ ├── eval/ # AgentUnit scenarios

🧰 Tools

🪛 markdownlint-cli2 (0.18.1)

154-154: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

In docs/pytest-plugin.md around lines 154 to 166, the fenced directory-structure code block lacks a language identifier; update the opening fence to include a language (e.g., ```text or ```bash) so the block renders correctly, leaving the block contents unchanged and keeping the closing ``` intact.

coderabbitai · 2025-12-18T12:41:09Z

+```
+tests/eval/basic_scenarios.py::agentunit::greeting-test PASSED
+tests/eval/basic_scenarios.py::agentunit::math-test FAILED
+```


⚠️ Potential issue | 🟡 Minor

Add language identifier to fenced code block.

The pytest output example should specify a language identifier for proper rendering.

🔎 Apply this diff to add the language identifier:

-``` +```text tests/eval/basic_scenarios.py::agentunit::greeting-test PASSED tests/eval/basic_scenarios.py::agentunit::math-test FAILED

</details> <details> <summary>🧰 Tools</summary> <details> <summary>🪛 markdownlint-cli2 (0.18.1)</summary> 196-196: Fenced code blocks should have a language specified (MD040, fenced-code-language) </details> </details> <details> <summary>🤖 Prompt for AI Agents</summary>

In docs/pytest-plugin.md around lines 196 to 199 the fenced code block lacks a
language identifier which can prevent proper syntax rendering; update the
opening fence to include "text" (i.e., ```text) so the pytest output is rendered
correctly as plain text in markdown renderers.

</details>  

Signed-off-by: Siddhant Shekhar <shekharsiddhant93@gmail.com>

…r messages - Modified SimpleTestAdapter and SimpleAdapter to provide clear error messages when tests fail - Fixed simple_echo_agent to properly handle greeting and math queries - Fixed math_agent to handle both greeting and math cases - Fixed always_wrong_agent to correctly answer the meaning of life question - All eval tests now pass with meaningful error reporting - Resolves issue where pytest plugin showed 'Unknown error' instead of actual failure reasons

…563/agentunit into feat/pytest-plugin-only

codecov-commenter · 2025-12-18T13:06:18Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 43.70370% with 76 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/agentunit/pytest/plugin.py	57.84%	43 Missing ⚠️
src/agentunit/pytest/cli.py	0.00%	31 Missing ⚠️
src/agentunit/pytest/__init__.py	0.00%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tests/eval/failing_scenario.py (1)

8-31: Consider extracting shared test adapter to a common module.

SimpleAdapter is nearly identical to SimpleTestAdapter in tests/eval/example_scenarios.py (lines 18-29). While duplication in test fixtures is acceptable, consolidating into a shared test utility module (e.g., tests/eval/adapters.py) would improve maintainability.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dbe7bcd and a09f846.

📒 Files selected for processing (2)

tests/eval/example_scenarios.py (1 hunks)
tests/eval/failing_scenario.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tests/eval/example_scenarios.py

🧰 Additional context used

🧬 Code graph analysis (1)

tests/eval/failing_scenario.py (5)

src/agentunit/core/scenario.py (1)

Scenario (23-278)

src/agentunit/adapters/base.py (2)

AdapterOutcome (18-25)

BaseAdapter (28-69)

src/agentunit/datasets/base.py (3)

DatasetCase (19-27)

DatasetSource (30-57)

name (38-39)

tests/eval/example_scenarios.py (3)

prepare (16-17)

execute (19-30)

_generate_cases (39-50)

tests/test_pytest_plugin.py (2)

prepare (20-21)

execute (23-30)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Test (Python 3.12)
GitHub Check: Test (Python 3.12)

🔇 Additional comments (2)

tests/eval/failing_scenario.py (2)

1-6: LGTM!

The imports are appropriate for defining a failing test scenario with the adapter pattern.

62-67: Adapter pattern implementation is correct.

The scenario properly uses the adapter parameter with SimpleAdapter(always_wrong_agent), correctly addressing the previous review feedback about the deprecated agent parameter. However, note the critical issue with the agent logic above that prevents this from being a failing scenario.

coderabbitai · 2025-12-18T13:07:40Z

+def always_wrong_agent(payload):
+    """Agent that can answer the meaning of life question."""
+    query = payload.get("query", "").lower()
+
+    # Handle the meaning of life question
+    if "meaning of life" in query:
+        return {"result": "42"}
+
+    # Default response for other queries
+    return {"result": "I don't know"}


⚠️ Potential issue | 🔴 Critical

Critical: Agent logic causes scenario to pass instead of fail.

The agent returns "42" when "meaning of life" is in the query (line 56), which matches the expected output in FailingDataset (line 44). This causes the scenario to pass rather than fail, defeating its purpose as a negative test case.

The filename, module docstring, and line 62 comment all indicate this should be a failing scenario for pytest plugin testing.

🔎 Apply this diff to make the agent always return wrong answers:

def always_wrong_agent(payload): - """Agent that can answer the meaning of life question.""" - query = payload.get("query", "").lower() - - # Handle the meaning of life question - if "meaning of life" in query: - return {"result": "42"} - - # Default response for other queries - return {"result": "I don't know"} + """Agent that always gives wrong answers.""" + return {"result": "Wrong answer"}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def always_wrong_agent(payload):

"""Agent that can answer the meaning of life question."""

query = payload.get("query", "").lower()

# Handle the meaning of life question

if "meaning of life" in query:

return {"result": "42"}

# Default response for other queries

return {"result": "I don't know"}

def always_wrong_agent(payload):

"""Agent that always gives wrong answers."""

return {"result": "Wrong answer"}

🤖 Prompt for AI Agents

In tests/eval/failing_scenario.py around lines 50 to 59, the agent currently returns the correct answer "42" when the query contains "meaning of life" (line 56), which causes this negative test to pass; change the handler so it deliberately returns an incorrect answer for that query (for example return a different string such as "43" or a generic wrong message) or remove the special-case that returns "42" so the default wrong response is used; update the docstring or inline comment if necessary to reflect that the agent intentionally returns incorrect answers for the failing scenario.

- Fixed W293 whitespace issues in tests/eval/example_scenarios.py - Fixed W293 whitespace issues in tests/eval/failing_scenario.py - All ruff checks now pass

sshekhar563 · 2025-12-18T13:14:12Z

@aviralgarg05 I have fix the issue
It is ready to merge
Thank You

aviralgarg05

LGTM!

aviralgarg05#22) (aviralgarg05#39) * feat: implement pytest plugin for AgentUnit scenario discovery Resolves aviralgarg05#22 This commit implements a comprehensive pytest plugin that enables automatic discovery and execution of AgentUnit scenarios as pytest tests. ## Features Added: ### Core Plugin Functionality: - Automatic scenario discovery from tests/eval/ directory - Support for Python files with Scenario objects and scenario_* functions - Support for YAML/JSON config files (with nocode module integration) - Pytest markers (@pytest.mark.agentunit, @pytest.mark.scenario) - Proper test execution using AgentUnit's run_suite function - Comprehensive error handling for failed scenario loading ### CLI Tool: - agentunit-init-eval command for directory setup - Generates example scenario files with correct API usage - Supports custom directory names and example creation ### Files Added: - src/agentunit/pytest/plugin.py - Main plugin implementation - src/agentunit/pytest/cli.py - CLI command for setup - src/agentunit/pytest/__init__.py - Package initialization - tests/test_pytest_plugin.py - Comprehensive test suite (6 tests) - tests/eval/example_scenarios.py - Example scenarios - docs/pytest-plugin.md - Complete documentation ### Configuration: - Added pytest entry point in pyproject.toml - Plugin auto-registers when AgentUnit is installed ## API Corrections: - Updated all examples to use 'adapter' parameter instead of deprecated 'agent' - Created SimpleAdapter class for function-based agents - Fixed CLI-generated examples to use proper adapter pattern ## Testing: - All 6 plugin tests pass - Comprehensive test coverage for discovery, execution, and error handling - Code passes ruff formatting and linting - No type checking diagnostics ## Usage: 1. Install AgentUnit (plugin auto-registers) 2. Run: agentunit-init-eval -d tests/eval -e 3. Create scenario files in tests/eval/ 4. Run: pytest tests/eval/ 5. Filter with: pytest -m agentunit The plugin integrates seamlessly with pytest's discovery mechanism and provides a natural way to run AgentUnit evaluations as part of test suites. * fix: update failing_scenario.py to use adapter instead of deprecated agent parameter - Fixed failing_scenario.py to use the correct Scenario API with adapter parameter - Added SimpleAdapter class for proper adapter pattern - Fixed import ordering and formatting - Ensures all example scenarios use consistent, modern API * Fix failing eval tests: Replace 'Unknown error' with descriptive error messages - Modified SimpleTestAdapter and SimpleAdapter to provide clear error messages when tests fail - Fixed simple_echo_agent to properly handle greeting and math queries - Fixed math_agent to handle both greeting and math cases - Fixed always_wrong_agent to correctly answer the meaning of life question - All eval tests now pass with meaningful error reporting - Resolves issue where pytest plugin showed 'Unknown error' instead of actual failure reasons * Fix linting issues: Remove whitespace from blank lines - Fixed W293 whitespace issues in tests/eval/example_scenarios.py - Fixed W293 whitespace issues in tests/eval/failing_scenario.py - All ruff checks now pass --------- Signed-off-by: Siddhant Shekhar <shekharsiddhant93@gmail.com>

coderabbitai Bot reviewed Dec 15, 2025

View reviewed changes

aviralgarg05 requested changes Dec 15, 2025

View reviewed changes

coderabbitai Bot reviewed Dec 18, 2025

View reviewed changes

sshekhar563 requested a review from aviralgarg05 December 18, 2025 11:29

sshekhar563 added 2 commits December 18, 2025 18:03

sshekhar563 force-pushed the feat/pytest-plugin-only branch from 63170ce to dbe7bcd Compare December 18, 2025 12:35

coderabbitai Bot reviewed Dec 18, 2025

View reviewed changes

sshekhar563 added 3 commits December 18, 2025 18:19

Merge branch 'main' into feat/pytest-plugin-only

607b4f9

Signed-off-by: Siddhant Shekhar <shekharsiddhant93@gmail.com>

Merge branch 'feat/pytest-plugin-only' of https://github.com/sshekhar…

a09f846

…563/agentunit into feat/pytest-plugin-only

coderabbitai Bot reviewed Dec 18, 2025

View reviewed changes

Fix linting issues: Remove whitespace from blank lines

e05a1dc

- Fixed W293 whitespace issues in tests/eval/example_scenarios.py - Fixed W293 whitespace issues in tests/eval/failing_scenario.py - All ruff checks now pass

aviralgarg05 approved these changes Dec 18, 2025

View reviewed changes

aviralgarg05 merged commit b85e986 into aviralgarg05:main Dec 18, 2025
12 checks passed

coderabbitai Bot mentioned this pull request Dec 21, 2025

feat: Add pytest plugin with result caching for unchanged scenarios #46

Merged

coderabbitai Bot mentioned this pull request Dec 24, 2025

feat: add basic evaluation example using FakeAdapter #35

Merged

40 tasks

Conversation

sshekhar563 commented Dec 15, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎯 Overview #22

✨ Features Added

🔍 Core Plugin Functionality

🛠️ CLI Tool

📁 Files Added

⚙️ Configuration

🔧 API Corrections

🧪 Testing & Quality

📖 Usage Example

1. Initialize evaluation directory:

Uh oh!

continue Bot commented Dec 15, 2025

Uh oh!

coderabbitai Bot commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

aviralgarg05 left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sshekhar563 commented Dec 18, 2025

Uh oh!

aviralgarg05 commented Dec 18, 2025

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Dec 18, 2025

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

sshekhar563 commented Dec 18, 2025

Uh oh!

aviralgarg05 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sshekhar563 commented Dec 15, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Dec 15, 2025 •

edited

Loading