Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -193,14 +193,23 @@ Use the table above as the canonical navigation surface; every document cross-li
## Development workflow

1. Install dependencies (Poetry or pip).
2. Run the unit and integration suite:
2. Run the test suite:

```bash
# Run all tests (unit + integration)
poetry run python3 -m pytest tests -v

# Run only unit tests (skip integration tests)
poetry run python3 -m pytest -m "not integration" -v

# Run only integration tests (requires framework dependencies)
poetry run python3 -m pytest tests/integration/ -v
```

3. Execute targeted suites during active development, then run the full matrix before opening a pull request.

**Integration Tests**: The `tests/integration/` directory contains tests that verify AgentUnit works with real framework implementations (LangGraph, etc.). These tests are automatically skipped if the required dependencies are not installed. See [tests/integration/README.md](tests/integration/README.md) for details.

Latest verification (2025-10-24): 144 passed, 10 skipped, 32 warnings. Warnings originate from third-party dependencies (`langchain` pydantic shim deprecations and `datetime.utcnow` usage). Track upstream fixes or pin patched releases as needed.

## Contributing
Expand Down
253 changes: 187 additions & 66 deletions poetry.lock

Large diffs are not rendered by default.

19 changes: 18 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ classifiers = [
python = "^3.10"
pyyaml = "^6.0"
crewai = { version = "^0.201.1", python = "<3.14" }
langchain = ">=0.0.353,<0.2.0" # security: stay on patched 0.0.x line compatible with ecosystem
langchain = ">=0.0.353,<0.4.0" # security: stay on patched line compatible with ecosystem
opentelemetry-api = "^1.25.0"
opentelemetry-sdk = "^1.25.0"
opentelemetry-exporter-otlp = "^1.25.0"
Expand All @@ -34,9 +34,11 @@ httpx = "^0.27.0"
numpy = "^1.24.0"
scipy = "^1.11.0"
ragas = { version = ">=0.1.9", optional = true }
langgraph = { version = "^0.2.0", optional = true }

[tool.poetry.extras]
ragas = ["ragas"]
integration-tests = ["langgraph"]

[tool.poetry.group.dev.dependencies]
pytest = "^8.2.0"
Expand All @@ -54,3 +56,18 @@ agentunit = "agentunit.cli:entrypoint"
requires = ["poetry-core>=1.8.2"]
build-backend = "poetry.core.masonry.api"

[tool.pytest.ini_options]
markers = [
"integration: marks tests as integration tests (deselect with '-m \"not integration\"')",
"langgraph: marks tests as requiring LangGraph (skipped if not installed)",
]
testpaths = ["tests"]
python_files = ["test_*.py"]
python_classes = ["Test*"]
python_functions = ["test_*"]
addopts = [
"--strict-markers",
"--strict-config",
"-ra",
]

Comment on lines +59 to +73
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

pytest --strict-config flag behavior and plugin compatibility

💡 Result:

Short answer

  • --strict-config (or the ini option strict_config) turns unknown/unsupported pytest INI keys from warnings into errors (UsageError) when enabled. [1][2]
  • You can enable it via CLI (--strict-config), in pytest.ini/pyproject.toml (strict_config = true), or enable the umbrella strict = true which enables strict_config plus other strict options. [1][2]

What it actually does

  • During config parsing pytest computes unknown ini keys and, if strict_config is true, raises a UsageError instead of issuing a PytestConfigWarning. If strict is set, strict_config is considered true. [2][3]

Plugin compatibility notes (practical implications)

  • Plugins that relied on unregistered/typoed INI keys, or that add custom ini keys but didn’t register them with parser.addini, will start failing under strict_config. Such plugins should register their ini options via pytest's parser.addini (or document the keys) to remain compatible. [2][3]
  • If you enable strict (the global flag) you may pick up future strictness checks too — enable it only with a pinned pytest version or be prepared to opt out of individual strict options. [1]

Versions / changelog pointers

  • pytest added the umbrella strict option and strict_config handling in recent releases; the reintroduced/changed behavior of --strict/strict mode is documented in the changelog and deprecations pages. Check your pytest version’s docs/changelog if you rely on older behavior. [1][2]

If you want, I can:

  • show the exact error message you’ll get with --strict-config, or
  • scan your project/plugins (paste pytest.ini and plugin names) and list likely incompatibilities.

Sources
[1] pytest docs — Good Integration Practices / strict mode.
[2] pytest source/docs — Config parsing and _warn_or_fail_if_strict implementation (strict_config behavior) and changelog notes.
[3] pytest changelog (notes about strict/strict_config).


Ensure --strict-config compatibility before merging.
The --strict-config flag will error on unknown INI keys from incompatible plugins or legacy configs. Before enabling, audit third-party plugins to confirm they properly register custom INI options, and add a CI check to catch config issues early. Document the stricter requirements for contributors if your project uses external plugins.

🤖 Prompt for AI Agents
In pyproject.toml around lines 57 to 71, adding pytest's --strict-config may
cause test runs to fail if third-party or legacy plugins define unknown INI
keys; audit installed pytest plugins and project config for any non-registered
INI options, update or remove offending plugins/config keys, or register custom
options via pytest_addoption in conftest.py as appropriate, add a CI job that
runs pytest with --strict-config to catch regressions, and update
CONTRIBUTING/README to document the stricter INI key requirements for
contributors.

1 change: 1 addition & 0 deletions tests/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Tests package
131 changes: 131 additions & 0 deletions tests/integration/IMPLEMENTATION_SUMMARY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
# LangGraph Integration Tests - Implementation Summary

This document summarizes the implementation of LangGraph integration tests for AgentUnit (Issue #24).

## ✅ Completed Tasks

### 1. Created Integration Test Structure
- ✅ Created `tests/integration/` directory
- ✅ Added `__init__.py` and `conftest.py` for proper test configuration
- ✅ Configured pytest markers for integration and LangGraph tests

### 2. Simple LangGraph Agent Implementation
- ✅ Created `simple_langgraph_agent.py` with a working LangGraph agent
- ✅ Implemented fallback behavior when LangGraph is not installed
- ✅ Agent handles multiple query types (quantum, python, weather, general)
- ✅ Compatible with AgentUnit's payload format

### 3. Comprehensive Integration Tests
- ✅ Created `test_langgraph_integration.py` with full test suite
- ✅ Tests scenario creation from callable agents and Python files
- ✅ Tests full evaluation cycle with multiple test cases
- ✅ Tests metrics integration (when available)
- ✅ Tests error handling and retry functionality
- ✅ Tests multiple scenarios running together

### 4. Pytest Configuration
- ✅ Added pytest markers to `pyproject.toml`
- ✅ Configured automatic test marking for integration tests
- ✅ Tests are properly skipped when LangGraph is not installed

### 5. Documentation
- ✅ Created comprehensive `README.md` for integration tests
- ✅ Documented prerequisites and running instructions
- ✅ Added CI configuration example
- ✅ Updated main project README with integration test information

## ✅ Acceptance Criteria Met

### Integration tests pass with LangGraph installed
- Tests are designed to pass when LangGraph is available
- Comprehensive test coverage of AgentUnit + LangGraph integration

### Tests are skipped gracefully without LangGraph
- Uses `pytest.importorskip()` to skip tests when LangGraph is not available
- Provides clear skip messages
- Fallback mock responses work without LangGraph

### CI optionally runs integration tests
- Provided example CI configuration in `ci-example.yml`
- Shows how to run integration tests conditionally
- Demonstrates selective test execution with pytest markers

## 📁 Files Created

```
tests/integration/
├── __init__.py # Package initialization
├── conftest.py # Test configuration and markers
├── simple_langgraph_agent.py # Simple LangGraph agent for testing
├── test_langgraph_integration.py # Main integration tests
├── test_integration_basic.py # Basic structure tests
├── README.md # Documentation
├── ci-example.yml # CI configuration example
└── IMPLEMENTATION_SUMMARY.md # This file
```

## 🧪 Test Coverage

The integration tests cover:

1. **Scenario Creation**
- From callable functions
- From Python files
- With custom configurations

2. **Full Evaluation Cycle**
- Multiple test cases
- Success and failure scenarios
- Metrics calculation
- Trace logging

3. **Error Handling**
- Agent failures
- Retry logic
- Graceful degradation

4. **Framework Integration**
- LangGraph adapter registration
- Multiple scenario execution
- Scenario cloning and modification

## 🚀 Usage Examples

### Run all integration tests:
```bash
pytest tests/integration/
```

### Run only LangGraph tests:
```bash
pytest tests/integration/ -m langgraph
```

### Skip integration tests:
```bash
pytest -m "not integration"
```

### Install LangGraph for testing:
```bash
# Install optional integration test dependencies
poetry install --extras integration-tests
```

## 🔧 Technical Implementation Details

- **Graceful Dependency Handling**: Uses `pytest.importorskip()` and try/except imports
- **Mock Fallbacks**: Provides mock responses when dependencies are unavailable
- **Pytest Markers**: Proper test categorization and selective execution
- **AgentUnit Integration**: Full compatibility with AgentUnit's Scenario and Runner APIs
- **CI Ready**: Designed for optional execution in continuous integration

## 🎯 Next Steps

The integration test framework is now ready for:
1. Adding more framework integrations (CrewAI, AutoGen, etc.)
2. Expanding test coverage with more complex scenarios
3. Integration with CI/CD pipelines
4. Performance and load testing scenarios

This implementation fully addresses Issue #24 and provides a solid foundation for future integration testing needs.
97 changes: 97 additions & 0 deletions tests/integration/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Integration Tests

This directory contains integration tests that verify AgentUnit works with real framework implementations.

## LangGraph Integration Tests

The LangGraph integration tests verify that AgentUnit can properly evaluate LangGraph agents through a complete evaluation cycle.

### Prerequisites

To run LangGraph integration tests, you need to install LangGraph:

```bash
# Install optional integration test dependencies
poetry install --extras integration-tests
```

Or install LangGraph manually:

```bash
poetry add langgraph --group dev
```

### Running Integration Tests

#### Run all integration tests:
```bash
pytest tests/integration/
```

#### Run only LangGraph tests:
```bash
pytest tests/integration/ -m langgraph
```

#### Skip integration tests (run only unit tests):
```bash
pytest -m "not integration"
```

#### Run with verbose output:
```bash
pytest tests/integration/ -v
```

### Test Structure

- `simple_langgraph_agent.py` - Contains a simple LangGraph agent implementation for testing
- `test_langgraph_integration.py` - Integration tests for LangGraph adapter
- `conftest.py` - Test configuration and markers

### What the Tests Cover

1. **Scenario Creation**: Tests creating scenarios from callable agents and Python files
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language specification to fenced code block.

The bash code block is missing a language identifier, which breaks syntax highlighting and violates Markdown best practices.

Apply this diff to add the language tag:

1. **Scenario Creation**: Tests creating scenarios from callable agents and Python files
2. **Full Evaluation Cycle**: Tests running complete evaluation cycles with multiple test cases

Actually, looking more carefully, line 54 shows the start of the list. Let me check line 55 context again. The issue is on the code block around line 13-16 (bash block). Let me correct the line range:

- ```bash
+ ```

Wait, I need to look at the actual lines. In the annotated code starting around line 13, I see:

13~
# Install optional integration test dependencies
14~
poetry install --extras integration-tests

This is within a bash code fence that starts at line 13. The static analysis hint says line 55 is missing language spec, but looking at the code, line 55 doesn't show a code fence. Let me recount...

Actually, the file shows lines 13-16 have bash code, then 20-22 have another bash code block. Then further down around line 28-44 are bash code blocks. The static analysis error "55-55: Fenced code blocks should have a language specified" points to line 55 which shows:

54~
1. **Scenario Creation**: Tests creating scenarios from callable agents and Python files
55~
2. **Full Evaluation Cycle**: Tests running complete evaluation cycles with multiple test cases

This doesn't match. Let me re-examine. The static analysis hint says line 55 has a fenced code block issue. Looking back at the annotated code, around line 55 is where? Let me count more carefully...

The annotation shows lines 1-97. Line 55 in that context is in the "What the Tests Cover" section which is NOT a code block. The static analysis tool might be flagging something else. Let me look for any unclosed or incorrectly specified code fences...

Actually, I see the issue. Look at the Example Usage section (lines 79-97). It has a python code fence starting at line 81:

```python
import pytest
...

This is properly marked. So the static analysis hint might be referring to one of the earlier bash blocks. Let me check if any are missing the bash language tag in the original file... The code shown has ```bash markers properly, so this might be a false positive from the static analysis tool, OR the static analysis tool is pointing to the Example Usage section's python block not being properly closed.

For safety, I'll include a review comment about ensuring all code fences have language specifications, using a more general approach.

<details>
<summary>🤖 Prompt for AI Agents</summary>

In tests/integration/README.md around lines 54-54, there are fenced code blocks
missing a language specifier; update the offending fences (e.g., the bash block
starting near lines ~13-16 and any other unlabelled fences) by adding the
appropriate language identifier (bash or python) to the opening fence,
ensure each fence is properly closed, and scan the file to confirm every fenced
code block includes a language tag.


</details>

<!-- fingerprinting:phantom:triton:mongoose -->

<!-- This is an auto-generated comment by CodeRabbit -->

2. **Full Evaluation Cycle**: Tests running complete evaluation cycles with multiple test cases
3. **Metrics Integration**: Tests that metrics can be calculated (when available)
4. **Error Handling**: Tests graceful handling of agent failures
5. **Retry Logic**: Tests scenario retry functionality
6. **Multiple Scenarios**: Tests running multiple scenarios together

### CI Integration

The integration tests are designed to be optionally run in CI:

- Tests are automatically skipped if LangGraph is not installed
- Use pytest markers to selectively run or skip integration tests
- All tests are marked with `@pytest.mark.integration` and `@pytest.mark.langgraph`

Comment on lines +65 to +68
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Doc claim about markers is likely too strong.
Basic integration tests (e.g., fallback behavior) shouldn’t necessarily be @pytest.mark.langgraph. Consider rephrasing to: “All tests are marked integration; LangGraph-dependent tests are marked langgraph.”

🤖 Prompt for AI Agents
In tests/integration/README.md around lines 64 to 67, the README overstates that
all integration tests are marked with both `integration` and `langgraph`; change
the wording to clarify that all integration tests use the `integration` marker
and only tests that depend on LangGraph use the `langgraph` marker. Update the
three bullet points to state that tests are skipped if LangGraph is not
installed, use pytest markers to selectively run or skip integration tests, and
that tests are marked `integration` while LangGraph-dependent tests are
additionally marked `langgraph`.

### Adding New Integration Tests

When adding integration tests for other frameworks:

1. Create a simple agent implementation in the framework
2. Create test cases that cover the full evaluation cycle
3. Use appropriate pytest markers (e.g., `@pytest.mark.crewai`)
4. Ensure tests are skipped gracefully when dependencies are not available
5. Document the prerequisites and running instructions

### Example Usage

```python
import pytest
from agentunit import Scenario, run_suite
from tests.integration.simple_langgraph_agent import invoke_agent

@pytest.mark.langgraph
@pytest.mark.integration
def test_my_langgraph_scenario():
scenario = Scenario.load_langgraph(
path=invoke_agent,
dataset=my_dataset,
name="my-test"
)

result = run_suite([scenario])
assert len(result.scenarios) == 1
```
Comment on lines +79 to +97
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Ensure all fenced code blocks have explicit language specifications.

Per markdownlint (MD040), all fenced code blocks should declare their language. Review the code blocks in the Example Usage section and earlier to ensure they are properly specified (e.g., python, bash, ```yaml).

🤖 Prompt for AI Agents
In tests/integration/README.md around lines 79 to 97 the fenced code block(s)
under "Example Usage" are missing explicit language identifiers; update each
triple-backtick fence to include the appropriate language (e.g., ```python for
the Python snippet) and scan earlier sections for any other fenced blocks
without language tags, adding the correct language specifiers (bash, yaml, etc.)
to satisfy markdownlint MD040.

1 change: 1 addition & 0 deletions tests/integration/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
"""Integration tests for AgentUnit with real frameworks."""
59 changes: 59 additions & 0 deletions tests/integration/ci-example.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Example CI configuration for running integration tests
# This shows how to optionally run integration tests in CI

name: Tests

on: [push, pull_request]

jobs:
unit-tests:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.10", "3.11", "3.12"]

steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install poetry
poetry install

- name: Run unit tests (excluding integration)
run: |
poetry run pytest -m "not integration" --cov=agentunit --cov-report=xml

- name: Upload coverage to Codecov
uses: codecov/codecov-action@v4

integration-tests:
runs-on: ubuntu-latest
# Only run integration tests on main branch or when explicitly requested
if: github.ref == 'refs/heads/main' || contains(github.event.pull_request.labels.*.name, 'run-integration-tests')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard pull_request context in the job if: to avoid push-event evaluation failures.

On push, github.event.pull_request isn’t present; rewrite to avoid touching it unless event_name == 'pull_request'.

-    if: github.ref == 'refs/heads/main' || contains(github.event.pull_request.labels.*.name, 'run-integration-tests')
+    if: github.ref == 'refs/heads/main' || (github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'run-integration-tests'))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if: github.ref == 'refs/heads/main' || contains(github.event.pull_request.labels.*.name, 'run-integration-tests')
if: github.ref == 'refs/heads/main' || (github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'run-integration-tests'))
🤖 Prompt for AI Agents
In tests/integration/ci-example.yml around line 38, the job if: currently
accesses github.event.pull_request.labels unguarded which will fail on push
events; change the condition to first check the event name before touching
pull_request (e.g. keep the main-branch check, and combine with a guarded
pull_request check using github.event_name == 'pull_request' &&
contains(github.event.pull_request.labels.*.name, 'run-integration-tests')).
Ensure the final boolean expression evaluates safely on push events by only
referencing github.event.pull_request when github.event_name is 'pull_request'.


steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"

- name: Install dependencies including integration test deps
run: |
python -m pip install --upgrade pip
pip install poetry
poetry install --extras integration-tests

- name: Run integration tests
run: |
poetry run pytest tests/integration/ -v

- name: Run LangGraph specific tests
run: |
poetry run pytest tests/integration/ -m langgraph -v
23 changes: 23 additions & 0 deletions tests/integration/conftest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
"""Configuration for integration tests."""

from __future__ import annotations

import pytest


def pytest_configure(config):
"""Configure pytest markers for integration tests."""
config.addinivalue_line(
"markers",
"integration: marks tests as integration tests (deselect with '-m \"not integration\"')",
)
config.addinivalue_line(
"markers", "langgraph: marks tests as requiring LangGraph (skipped if not installed)"
)


def pytest_collection_modifyitems(config, items):
"""Automatically mark integration tests."""
for item in items:
if "integration" in str(item.fspath):
item.add_marker(pytest.mark.integration)
Loading