Add LangGraph integration tests (#24) by sshekhar563 · Pull Request #36 · aviralgarg05/agentunit

sshekhar563 · 2025-12-12T11:26:16Z

🎯 Overview #24

This PR implements comprehensive integration tests for LangGraph with AgentUnit, addressing issue #24.

✅ Tasks Completed

Create a simple LangGraph agent in tests/integration/
Write integration test that runs a full evaluation cycle
Use pytest markers to skip if LangGraph not installed
Document how to run integration tests

🧪 What's Added

New Files

tests/integration/ - New integration test directory
simple_langgraph_agent.py - Mock LangGraph agent for testing
test_langgraph_integration.py - Comprehensive integration tests
test_integration_basic.py - Basic structure validation tests
conftest.py - Pytest configuration and markers
README.md - Complete documentation
ci-example.yml - CI configuration example

Updated Files

pyproject.toml - Added pytest markers configuration
README.md - Updated testing section with integration test info

🔧 Key Features

Graceful Dependency Handling

Tests automatically skip with clear messages when LangGraph isn't installed
Uses pytest.importorskip() for proper dependency checking
Provides mock responses when dependencies are unavailable

Comprehensive Test Coverage

Scenario creation from callables and Python files
Full evaluation cycle with multiple test cases
Error handling and retry functionality
Metrics integration testing
Multiple scenario execution

CI Ready

Pytest markers for selective test execution
Example CI configuration included
Proper test categorization (integration, langgraph)

🚀 Usage

# Run all integration tests
pytest tests/integration/

# Run only LangGraph tests
pytest tests/integration/ -m langgraph

# Skip integration tests (unit tests only)
pytest -m "not integration"

# Install LangGraph for full testing
pip install langgraph



<!-- This is an auto-generated comment: release notes by coderabbit.ai -->
## Summary by CodeRabbit

* **New Features**
  * Optional LangGraph integration test suite with a lightweight runtime fallback when LangGraph is unavailable.

* **Documentation**
  * Expanded testing docs: renamed test-run heading, added run examples, an integration test guide and implementation summary, verification notes, and a CI example for gated integration runs. (Duplicated content consolidated.)

* **Tests**
  * pytest config with custom markers and strict options; comprehensive integration tests covering scenarios, execution flows, metrics, retries, errors, and parallel runs.

* **Chores**
  * Broadened LangChain constraint, added optional LangGraph extra, and added dev/test tooling dependencies.

<sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

- Create tests/integration/ directory with comprehensive LangGraph tests - Add simple_langgraph_agent.py with mock LangGraph agent for testing - Implement test_langgraph_integration.py with full evaluation cycle tests - Configure pytest markers for integration and langgraph tests - Add graceful dependency handling - tests skip when LangGraph not installed - Include comprehensive documentation and CI examples - Update main README with integration test instructions Fixes aviralgarg05#24

continue · 2025-12-12T11:26:19Z

Learn more

All Green is an AI agent that automatically:

✅ Addresses code review comments

✅ Fixes failing CI checks

✅ Resolves merge conflicts

coderabbitai · 2025-12-12T11:26:26Z

Walkthrough

Adds a LangGraph-focused integration test suite: pytest markers and hooks, a simple LangGraph agent with runtime fallback, comprehensive integration tests and docs (README, IMPLEMENTATION_SUMMARY), pyproject test/config updates, and a CI workflow example. No production APIs or exported-public declarations changed.

Changes

Cohort / File(s)	Summary
Documentation `README.md`, `tests/integration/README.md`, `tests/integration/IMPLEMENTATION_SUMMARY.md`	Update test-run instructions (heading -> “Run the test suite”), add Integration Tests narrative and verification notes; add integration README and implementation summary describing LangGraph tests, layout, usage, and acceptance criteria.
Project config / Test discovery `pyproject.toml`	Widen `langchain` upper bound to `<0.4.0`; add optional `langgraph = "^0.2.0"` and `integration-tests` extra; add dev deps `pytest-cov`, `sphinx`; add `[tool.pytest.ini_options]` with markers (`integration`, `langgraph`), testpaths, naming patterns, and strict addopts.
Integration test package & hooks `tests/__init__.py`, `tests/integration/__init__.py`, `tests/integration/conftest.py`	Create tests package and integration package; `conftest.py` registers `integration`/`langgraph` markers and auto-marks tests whose file path contains "integration" during collection.
LangGraph test agent & fallback `tests/integration/simple_langgraph_agent.py`	New module providing `AgentState` TypedDict, `create_simple_agent()`, `invoke_agent()`, module-level `graph`, `LANGGRAPH_AVAILABLE` handling, and fallback mock implementations enabling tests to run without LangGraph; deterministic responses based on query content.
Integration test suites `tests/integration/test_integration_basic.py`, `tests/integration/test_langgraph_integration.py`	Add smoke and comprehensive LangGraph-focused integration tests: directory structure checks, fallback behavior, scenario creation (callable/file), full evaluation runs, metrics/error/retry handling, cloning, adapter registry checks, and parallel runs.
CI example `tests/integration/ci-example.yml`	Add GitHub Actions example workflows: `unit-tests` matrix and gated `integration-tests` workflow (runs on `main` or when PR labeled); installs integration extras and runs integration-specific markers.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Areas needing attention:
- tests/integration/simple_langgraph_agent.py — conditional imports, fallback mocks, deterministic response logic, and public API surface for tests.
- tests/integration/test_langgraph_integration.py — scenario construction, assertions around runs/metrics/errors/retries and parallel execution.
- pyproject.toml & tests/integration/conftest.py — pytest configuration, markers, and auto-marking semantics.
- tests/integration/ci-example.yml — CI gating and dependency installation details.

Possibly related issues

Add LangGraph integration tests #24 — Matches: implements LangGraph integration tests, fallback agent, pytest markers/skips, documentation, and CI example described in the issue.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly and concisely describes the main change: adding LangGraph integration tests to address issue #24, which aligns with the primary objective of the changeset.
Description check	✅ Passed	The PR description includes most required sections: overview, completed tasks, files added/updated, key features, and usage examples. However, several template sections are not addressed: Type of Change (checkbox selection), Testing section with test results, Code Quality checklist, and Dependencies section.
Docstring Coverage	✅ Passed	Docstring coverage is 95.45% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (10)

tests/integration/conftest.py (2)
8-15: Avoid marker definition drift (already in pyproject.toml).
With --strict-markers and global marker definitions in pyproject.toml, keeping the same strings here is redundant and can drift over time. Consider removing these lines (or ensure they’re generated from a single source).

18-22: Auto-marking via substring match can mislabel tests.
if "integration" in str(item.fspath) may accidentally mark non-integration tests whose path happens to contain that substring. Consider matching a path segment instead.
 def pytest_collection_modifyitems(config, items):
     """Automatically mark integration tests."""
     for item in items:
-        if "integration" in str(item.fspath):
+        # Match directory name, not substring
+        from pathlib import Path
+        if "integration" in Path(str(item.fspath)).parts:
             item.add_marker(pytest.mark.integration)
tests/integration/IMPLEMENTATION_SUMMARY.md (1)

92-112: Keep install/run commands consistent with the repo’s Poetry workflow.
The examples are helpful; just ensure “pip install langgraph” vs “poetry add …” guidance matches how CI/devshell actually runs pytest (Poetry venv vs system pip).

README.md (1)

213-213: “Latest verification” will go stale quickly.
Consider linking to CI artifacts / a badge instead, or moving this to a dated changelog entry to avoid implying it’s current.
tests/integration/README.md (1)
80-96: Prefer relative import in example to avoid tests package ambiguity.
from tests.integration.simple_langgraph_agent ... can collide if a different tests package is on PYTHONPATH. Since this doc lives inside the integration package, a relative import example is more robust.
-from tests.integration.simple_langgraph_agent import invoke_agent
+from .simple_langgraph_agent import invoke_agent
tests/integration/test_integration_basic.py (2)
23-47: Use relative import to avoid tests package ambiguity.
This reduces risk of importing the wrong tests module in some environments.
 def test_simple_langgraph_agent_fallback():
     """Test that the simple agent works without LangGraph installed."""
-    from tests.integration.simple_langgraph_agent import invoke_agent, LANGGRAPH_AVAILABLE
+    from .simple_langgraph_agent import invoke_agent, LANGGRAPH_AVAILABLE
49-55: test_pytest_markers_configured is a no-op; assert markers via pytestconfig.
You can make this test actually validate marker registration under --strict-markers.
 @pytest.mark.integration
-def test_pytest_markers_configured():
+def test_pytest_markers_configured(pytestconfig):
     """Test that pytest markers are properly configured."""
-    import pytest
-    
-    # This test itself should have the integration marker
-    # We can't easily test this programmatically, but the test runner will validate it
+    markers = "\n".join(pytestconfig.getini("markers"))
+    assert "integration:" in markers
+    assert "langgraph:" in markers
tests/integration/test_langgraph_integration.py (2)
89-102: Consider using a path relative to the test file.

The hardcoded path "tests/integration/simple_langgraph_agent.py" assumes tests are run from the repository root. This is typically fine for CI but could fail if tests are run from a different directory.

Consider using Path(__file__).parent / "simple_langgraph_agent.py" for a more robust path resolution:
     def test_langgraph_scenario_from_python_file(self):
         """Test creating a scenario from a Python file."""
-        # Create a temporary Python file with the agent
-        agent_file = Path("tests/integration/simple_langgraph_agent.py")
+        # Use path relative to this test file
+        agent_file = Path(__file__).parent / "simple_langgraph_agent.py"
171-189: Retry test doesn't verify actual retry behavior.

The test sets scenario.retries = 2 but uses an agent that never fails, so retry logic is never exercised. This only verifies that setting retries doesn't break the happy path.

Consider adding a test with an agent that fails on first attempts to actually verify retry behavior:
def test_scenario_with_retries_actually_retrying(self):
    """Test that retries actually work when agent fails intermittently."""
    call_count = 0
    
    def flaky_agent(payload):
        nonlocal call_count
        call_count += 1
        if call_count < 2:
            raise ValueError("Temporary failure")
        return {"result": f"Success on attempt {call_count}"}
    
    scenario = Scenario.load_langgraph(
        path=flaky_agent,
        dataset=SimpleDataset(),
        name="flaky-retry-test"
    )
    scenario.retries = 3
    
    result = run_suite([scenario])
    # Verify that retries allowed eventual success
tests/integration/simple_langgraph_agent.py (1)
74-92: Consider simplifying the graph structure.

The should_continue function always returns END, making the conditional edge unnecessary. For a single-step agent, you could use a direct edge instead of conditional edges.

If intentional for demonstration purposes, this is fine. Otherwise, consider simplifying:
-    def should_continue(state: AgentState) -> str:
-        """Determine if the agent should continue processing."""
-        return END
-
     # Create the graph
     workflow = StateGraph(AgentState)
     
     # Add nodes
     workflow.add_node("process", process_query)
     
     # Set entry point
     workflow.set_entry_point("process")
     
-    # Add edges
-    workflow.add_conditional_edges(
-        "process",
-        should_continue,
-        {END: END}
-    )
+    # Add edge to end
+    workflow.add_edge("process", END)

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3a2bce5 and cd8f7f1.

📒 Files selected for processing (10)

README.md (1 hunks)
pyproject.toml (1 hunks)
tests/integration/IMPLEMENTATION_SUMMARY.md (1 hunks)
tests/integration/README.md (1 hunks)
tests/integration/__init__.py (1 hunks)
tests/integration/ci-example.yml (1 hunks)
tests/integration/conftest.py (1 hunks)
tests/integration/simple_langgraph_agent.py (1 hunks)
tests/integration/test_integration_basic.py (1 hunks)
tests/integration/test_langgraph_integration.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

tests/integration/simple_langgraph_agent.py (1)

tests/test_framework_adapters.py (1)

agent (90-92)

tests/integration/test_integration_basic.py (1)

tests/integration/simple_langgraph_agent.py (1)

invoke_agent (98-132)

🔇 Additional comments (15)

tests/integration/__init__.py (1)

1-1: Good package initializer docstring.
Clear intent; no functional risk.

README.md (1)

196-207: Test commands/readability look good.
Nice, explicit split between unit vs integration runs.

tests/integration/test_langgraph_integration.py (9)

1-13: LGTM on module setup and skip mechanism.

The use of pytest.importorskip() at module level is the correct approach for conditionally skipping all tests when LangGraph is not installed. This provides clear feedback to developers about why tests are skipped.

16-46: LGTM on SimpleDataset implementation.

The dataset correctly implements iter_cases() as a generator yielding well-structured test cases with varied query types for comprehensive coverage.

54-87: LGTM on callable scenario test.

The test properly verifies scenario creation, adapter preparation, and execution. The assertion order (success → output not None → content check) is correct.

104-129: LGTM on full evaluation cycle test.

Comprehensive test that validates the entire evaluation pipeline, including result structure and individual run outcomes. Good coverage of the happy path.

131-148: LGTM on metrics test.

The test appropriately validates the structure of metrics results without being overly strict about the metric values, allowing for graceful handling when specific metrics are unavailable.

150-169: LGTM on error handling test.

Good test coverage for the failure path. Verifying that errors are captured and recorded without crashing the test suite is essential for robust integration testing.

191-212: LGTM on clone test with minor note.

Good verification of clone immutability. The assertion original.retries == 1 assumes the default value; consider documenting this expectation or retrieving the default dynamically.

230-251: LGTM on multiple scenarios test.

Clean test for batch scenario execution. The list comprehension for scenario creation and iteration over results is well-structured.

215-227: LGTM on adapter registry test.

Good verification of adapter registration and alias resolution. Testing both the primary name and alias ensures the registry is properly configured. The langgraph_graph alias is explicitly registered in src/agentunit/adapters/langgraph.py and intentionally tested here.

tests/integration/simple_langgraph_agent.py (4)

7-31: LGTM on conditional imports with fallback.

The try/except pattern with LANGGRAPH_AVAILABLE flag is a clean approach for graceful degradation. The fallback stubs enable type checking and import without runtime usage.

33-39: LGTM on AgentState definition.

Clean TypedDict definition that captures the agent state structure appropriately.

98-132: LGTM on invoke_agent implementation.

The function correctly handles both the LangGraph-available and fallback paths. The response structure is consistent between real and mock implementations, which is important for test reliability.

One minor observation: the agent is created fresh on each invoke_agent call (line 107). This is acceptable for integration tests but worth noting if performance becomes a concern.

135-139: LGTM on module-level graph instance.

Clean conditional initialization that provides a pre-built graph for direct usage while handling the unavailable case gracefully.

coderabbitai · 2025-12-12T11:30:49Z

+[tool.pytest.ini_options]
+markers = [
+    "integration: marks tests as integration tests (deselect with '-m \"not integration\"')",
+    "langgraph: marks tests as requiring LangGraph (skipped if not installed)",
+]
+testpaths = ["tests"]
+python_files = ["test_*.py"]
+python_classes = ["Test*"]
+python_functions = ["test_*"]
+addopts = [
+    "--strict-markers",
+    "--strict-config",
+    "-ra",
+]
+


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

pytest --strict-config flag behavior and plugin compatibility

💡 Result:

Short answer

--strict-config (or the ini option strict_config) turns unknown/unsupported pytest INI keys from warnings into errors (UsageError) when enabled. [1][2]

You can enable it via CLI (--strict-config), in pytest.ini/pyproject.toml (strict_config = true), or enable the umbrella strict = true which enables strict_config plus other strict options. [1][2]

What it actually does

During config parsing pytest computes unknown ini keys and, if strict_config is true, raises a UsageError instead of issuing a PytestConfigWarning. If strict is set, strict_config is considered true. [2][3]

Plugin compatibility notes (practical implications)

Plugins that relied on unregistered/typoed INI keys, or that add custom ini keys but didn’t register them with parser.addini, will start failing under strict_config. Such plugins should register their ini options via pytest's parser.addini (or document the keys) to remain compatible. [2][3]

If you enable strict (the global flag) you may pick up future strictness checks too — enable it only with a pinned pytest version or be prepared to opt out of individual strict options. [1]

Versions / changelog pointers

pytest added the umbrella strict option and strict_config handling in recent releases; the reintroduced/changed behavior of --strict/strict mode is documented in the changelog and deprecations pages. Check your pytest version’s docs/changelog if you rely on older behavior. [1][2]

If you want, I can:

show the exact error message you’ll get with --strict-config, or

scan your project/plugins (paste pytest.ini and plugin names) and list likely incompatibilities.

Sources
[1] pytest docs — Good Integration Practices / strict mode.
[2] pytest source/docs — Config parsing and _warn_or_fail_if_strict implementation (strict_config behavior) and changelog notes.
[3] pytest changelog (notes about strict/strict_config).

Ensure --strict-config compatibility before merging.
The --strict-config flag will error on unknown INI keys from incompatible plugins or legacy configs. Before enabling, audit third-party plugins to confirm they properly register custom INI options, and add a CI check to catch config issues early. Document the stricter requirements for contributors if your project uses external plugins.

🤖 Prompt for AI Agents

In pyproject.toml around lines 57 to 71, adding pytest's --strict-config may cause test runs to fail if third-party or legacy plugins define unknown INI keys; audit installed pytest plugins and project config for any non-registered INI options, update or remove offending plugins/config keys, or register custom options via pytest_addoption in conftest.py as appropriate, add a CI job that runs pytest with --strict-config to catch regressions, and update CONTRIBUTING/README to document the stricter INI key requirements for contributors.

coderabbitai · 2025-12-12T11:30:49Z

+    - name: Install dependencies including integration test deps
+      run: |
+        python -m pip install --upgrade pip
+        pip install poetry
+        poetry install
+        # Install optional integration test dependencies
+        pip install langgraph
+


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n tests/integration/ci-example.yml | head -70

Repository: aviralgarg05/agentunit

Length of output: 2233

🌐 Web query:

current recommended version actions/setup-python 2024 2025 latest

💡 Result:

Recommended: use actions/setup-python@v6 (current latest release v6.1.0). [1][2][3]

Sources:

action repo README (examples use @v6). [1]

releases (v6.1.0). [2]

GitHub Marketplace docs / usage examples. [3]

🌐 Web query:

codecov/codecov-action latest version recommended 2024 2025

💡 Result:

Recommendation: use codecov/codecov-action@v5 (the v5 line). The repository README states "@v5 is recommended" and v5.x is the current major with recent patch releases (e.g., v5.5.1). [1][2]

Notes:

v5 uses the Codecov CLI/uploader and is the supported path for tokenless/global-upload features; v3 and earlier do not include CLI features. [1][3]

Sources:
[1] codecov/codecov-action README (recommended @v5).
[2] codecov/codecov-action releases (v5.x current; example v5.5.1).
[3] Codecov docs on the Codecov Uploader/CLI.

CI example installs langgraph into the wrong environment (will be invisible to poetry run) and uses outdated action versions.

pip install langgraph runs against the base Python environment and won't be visible to poetry run pytest, which executes within Poetry's managed venv. Install via poetry run pip install langgraph or add langgraph to Poetry dependencies/extras. Additionally, upgrade action versions: actions/setup-python@v6 (current latest) and codecov/codecov-action@v5 (v3 is deprecated).

- name: Install dependencies including integration test deps run: | python -m pip install --upgrade pip pip install poetry poetry install # Install optional integration test dependencies - pip install langgraph + poetry run pip install langgraph

Also update action versions:

Line 18, 43: actions/setup-python@v4 → actions/setup-python@v6

Line 33: codecov/codecov-action@v3 → codecov/codecov-action@v5

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- name: Install dependencies including integration test deps

run: |

python -m pip install --upgrade pip

pip install poetry

poetry install

# Install optional integration test dependencies

pip install langgraph

- name: Install dependencies including integration test deps

run: |

python -m pip install --upgrade pip

pip install poetry

poetry install

# Install optional integration test dependencies

poetry run pip install langgraph

coderabbitai · 2025-12-12T11:30:49Z

+- Tests are automatically skipped if LangGraph is not installed
+- Use pytest markers to selectively run or skip integration tests
+- All tests are marked with `@pytest.mark.integration` and `@pytest.mark.langgraph`
+


⚠️ Potential issue | 🟡 Minor

Doc claim about markers is likely too strong.
Basic integration tests (e.g., fallback behavior) shouldn’t necessarily be @pytest.mark.langgraph. Consider rephrasing to: “All tests are marked integration; LangGraph-dependent tests are marked langgraph.”

🤖 Prompt for AI Agents

In tests/integration/README.md around lines 64 to 67, the README overstates that all integration tests are marked with both `integration` and `langgraph`; change the wording to clarify that all integration tests use the `integration` marker and only tests that depend on LangGraph use the `langgraph` marker. Update the three bullet points to state that tests are skipped if LangGraph is not installed, use pytest markers to selectively run or skip integration tests, and that tests are marked `integration` while LangGraph-dependent tests are additionally marked `langgraph`.

- Add langgraph as optional dependency in pyproject.toml - Update installation instructions to use poetry extras - Ensure LangGraph is properly managed within Poetry environment - Address coderabbit review feedback about pip vs poetry installation

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (1)

tests/integration/README.md (1)
65-67: Clarify pytest marker claim—not all integration tests require the langgraph marker.

Line 67 overstates that "All tests are marked with @pytest.mark.integration and @pytest.mark.langgraph". Basic integration tests (e.g., fallback behavior, graceful skipping) should only use @pytest.mark.integration; the @pytest.mark.langgraph marker should only apply to tests that directly depend on LangGraph functionality. Update the three bullet points under "CI Integration" to accurately reflect this distinction.

Apply this diff to correct the claim:
- Tests are automatically skipped if LangGraph is not installed
- Use pytest markers to selectively run or skip integration tests
- All tests are marked with `@pytest.mark.integration` and `@pytest.mark.langgraph`
+ Tests are automatically skipped if LangGraph is not installed
+ Use pytest markers to selectively run or skip integration tests
+ All tests are marked `integration`; LangGraph-dependent tests are additionally marked `langgraph`

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cd8f7f1 and c88d027.

📒 Files selected for processing (4)

pyproject.toml (2 hunks)
tests/integration/IMPLEMENTATION_SUMMARY.md (1 hunks)
tests/integration/README.md (1 hunks)
tests/integration/ci-example.yml (1 hunks)

✅ Files skipped from review due to trivial changes (1)

tests/integration/IMPLEMENTATION_SUMMARY.md

🧰 Additional context used

🪛 markdownlint-cli2 (0.18.1)

tests/integration/README.md

55-55: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

coderabbitai · 2025-12-12T11:45:20Z

+    steps:
+    - uses: actions/checkout@v4
+    - name: Set up Python ${{ matrix.python-version }}
+      uses: actions/setup-python@v4


⚠️ Potential issue | 🟠 Major

Update GitHub Actions to current recommended versions.

The workflow uses outdated action versions. Update to the current stable releases for security, feature parity, and future compatibility.

Apply these diffs to modernize the action versions:

- name: Set up Python ${{ matrix.python-version }} - uses: actions/setup-python@v4 + uses: actions/setup-python@v6 with: python-version: ${{ matrix.python-version }}

- name: Upload coverage to Codecov - uses: codecov/codecov-action@v3 + uses: codecov/codecov-action@v5

- name: Set up Python - uses: actions/setup-python@v4 + uses: actions/setup-python@v6 with: python-version: "3.11"

Also applies to: 43-43, 33-33

🤖 Prompt for AI Agents

In tests/integration/ci-example.yml around lines 18, 33 and 43, the workflow is using outdated action versions; update each "uses:" entry to the current recommended stable release tags (for example upgrade any actions/checkout, actions/setup-python, actions/cache, actions/setup-node, etc., to their latest major stable versions) by replacing the old version suffixes with the current stable tags, verify compatibility in the action release notes, run the workflow locally or via a test branch to confirm no breaking changes, and pin to a specific major tag (not a floating tag) for future stability.

coderabbitai · 2025-12-12T11:45:20Z

+
+### What the Tests Cover
+
+1. **Scenario Creation**: Tests creating scenarios from callable agents and Python files


⚠️ Potential issue | 🟡 Minor

Add language specification to fenced code block.

The bash code block is missing a language identifier, which breaks syntax highlighting and violates Markdown best practices.

Apply this diff to add the language tag:

1. **Scenario Creation**: Tests creating scenarios from callable agents and Python files 2. **Full Evaluation Cycle**: Tests running complete evaluation cycles with multiple test cases

Actually, looking more carefully, line 54 shows the start of the list. Let me check line 55 context again. The issue is on the code block around line 13-16 (bash block). Let me correct the line range:

- ```bash + ```

Wait, I need to look at the actual lines. In the annotated code starting around line 13, I see:

13~ # Install optional integration test dependencies 14~ poetry install --extras integration-tests

This is within a bash code fence that starts at line 13. The static analysis hint says line 55 is missing language spec, but looking at the code, line 55 doesn't show a code fence. Let me recount...

Actually, the file shows lines 13-16 have bash code, then 20-22 have another bash code block. Then further down around line 28-44 are bash code blocks. The static analysis error "55-55: Fenced code blocks should have a language specified" points to line 55 which shows:

54~ 1. **Scenario Creation**: Tests creating scenarios from callable agents and Python files 55~ 2. **Full Evaluation Cycle**: Tests running complete evaluation cycles with multiple test cases

This doesn't match. Let me re-examine. The static analysis hint says line 55 has a fenced code block issue. Looking back at the annotated code, around line 55 is where? Let me count more carefully...

The annotation shows lines 1-97. Line 55 in that context is in the "What the Tests Cover" section which is NOT a code block. The static analysis tool might be flagging something else. Let me look for any unclosed or incorrectly specified code fences...

Actually, I see the issue. Look at the Example Usage section (lines 79-97). It has a python code fence starting at line 81:

```python import pytest ...

This is properly marked. So the static analysis hint might be referring to one of the earlier bash blocks. Let me check if any are missing the bash language tag in the original file... The code shown has ```bash markers properly, so this might be a false positive from the static analysis tool, OR the static analysis tool is pointing to the Example Usage section's python block not being properly closed. For safety, I'll include a review comment about ensuring all code fences have language specifications, using a more general approach. <details> <summary>🤖 Prompt for AI Agents</summary>

In tests/integration/README.md around lines 54-54, there are fenced code blocks
missing a language specifier; update the offending fences (e.g., the bash block
starting near lines ~13-16 and any other unlabelled fences) by adding the
appropriate language identifier (bash or python) to the opening fence,
ensure each fence is properly closed, and scan the file to confirm every fenced
code block includes a language tag.

</details>  

coderabbitai · 2025-12-12T11:45:21Z

+### Example Usage
+
+```python
+import pytest
+from agentunit import Scenario, run_suite
+from tests.integration.simple_langgraph_agent import invoke_agent
+
+@pytest.mark.langgraph
+@pytest.mark.integration
+def test_my_langgraph_scenario():
+    scenario = Scenario.load_langgraph(
+        path=invoke_agent,
+        dataset=my_dataset,
+        name="my-test"
+    )
+
+    result = run_suite([scenario])
+    assert len(result.scenarios) == 1
+```


⚠️ Potential issue | 🟡 Minor

Ensure all fenced code blocks have explicit language specifications.

Per markdownlint (MD040), all fenced code blocks should declare their language. Review the code blocks in the Example Usage section and earlier to ensure they are properly specified (e.g., python, bash, ```yaml).

🤖 Prompt for AI Agents

In tests/integration/README.md around lines 79 to 97 the fenced code block(s) under "Example Usage" are missing explicit language identifiers; update each triple-backtick fence to include the appropriate language (e.g., ```python for the Python snippet) and scan earlier sections for any other fenced blocks without language tags, adding the correct language specifiers (bash, yaml, etc.) to satisfy markdownlint MD040.

- Update actions/setup-python from v4 to v5 - Update codecov/codecov-action from v3 to v4 - Address coderabbit security and compatibility recommendations

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

tests/integration/ci-example.yml (1)
18-18: Update GitHub Actions to current recommended majors (setup-python / codecov).

This repeats prior review feedback: actions/setup-python@v5 and codecov/codecov-action@v4 may be behind current recommended majors; please confirm latest and bump accordingly (typically setup-python@v6, codecov-action@v5).
-      uses: actions/setup-python@v5
+      uses: actions/setup-python@v6
@@
-      uses: codecov/codecov-action@v4
+      uses: codecov/codecov-action@v5
@@
-      uses: actions/setup-python@v5
+      uses: actions/setup-python@v6
What is the current recommended major version for actions/setup-python?
What is the current recommended major version for codecov/codecov-action?
Also applies to: 33-33, 43-43

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c88d027 and 7a71405.

📒 Files selected for processing (1)

tests/integration/ci-example.yml (1 hunks)

coderabbitai · 2025-12-12T11:53:06Z

+  integration-tests:
+    runs-on: ubuntu-latest
+    # Only run integration tests on main branch or when explicitly requested
+    if: github.ref == 'refs/heads/main' || contains(github.event.pull_request.labels.*.name, 'run-integration-tests')


⚠️ Potential issue | 🟠 Major

Guard pull_request context in the job if: to avoid push-event evaluation failures.

On push, github.event.pull_request isn’t present; rewrite to avoid touching it unless event_name == 'pull_request'.

- if: github.ref == 'refs/heads/main' || contains(github.event.pull_request.labels.*.name, 'run-integration-tests') + if: github.ref == 'refs/heads/main' || (github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'run-integration-tests'))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if: github.ref == 'refs/heads/main' || contains(github.event.pull_request.labels.*.name, 'run-integration-tests')

if: github.ref == 'refs/heads/main' || (github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'run-integration-tests'))

🤖 Prompt for AI Agents

In tests/integration/ci-example.yml around line 38, the job if: currently accesses github.event.pull_request.labels unguarded which will fail on push events; change the condition to first check the event name before touching pull_request (e.g. keep the main-branch check, and combine with a guarded pull_request check using github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'run-integration-tests')). Ensure the final boolean expression evaluates safely on push events by only referencing github.event.pull_request when github.event_name is 'pull_request'.

- Correct indentation for steps sections in both jobs - Ensure proper YAML syntax for GitHub Actions workflow - Address coderabbit YAML formatting feedback

- Update langchain version constraint to be compatible with langgraph - Resolve langchain-core version conflicts between langchain and langgraph - Update poetry.lock with resolved dependencies - Fix CI test failures caused by outdated lock file

sshekhar563 · 2025-12-13T05:37:04Z

@aviralgarg05 I have fix the issue

- Change absolute imports to relative imports in test files - Add missing tests/__init__.py for proper package structure - Fix all linting errors (whitespace, formatting, imports) - Improve code quality in simple_langgraph_agent.py - Resolve ModuleNotFoundError issues in CI This addresses the test import failures and ensures proper Python package structure for the tests directory.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (5)

tests/integration/test_integration_basic.py (2)
42-47: Simplify redundant assertion logic.

The assertion on line 47 is somewhat redundant with the check on lines 42-44. When LANGGRAPH_AVAILABLE is False, line 44 already asserts "Mock response" is in the result. The comment on line 46 suggests checking if the query appears in the response, but the assertion specifically checks for "quantum" rather than validating the query content generically.

Consider simplifying to:
-    # Should contain the query in the response
-    assert "quantum" in result["result"].lower() or "Mock response" in result["result"]
+    # Response should be relevant to the query
+    assert any(term in result["result"].lower() for term in ["quantum", "mock"])
50-55: Placeholder test provides no validation.

This test has no assertions and doesn't validate that pytest markers are configured. Consider either implementing actual validation or removing it.

You could verify markers programmatically:
 @pytest.mark.integration
 def test_pytest_markers_configured():
     """Test that pytest markers are properly configured."""
-
-    # This test itself should have the integration marker
-    # We can't easily test this programmatically, but the test runner will validate it
+    import pytest
+    
+    # Verify markers are registered in pytest config
+    config = pytest.Config.fromdictargs({}, [])
+    markers = {marker.name for marker in config.getini("markers")}
+    
+    assert "integration" in markers
+    assert "langgraph" in markers
tests/integration/test_langgraph_integration.py (3)
13-14: Unused variable from pytest.importorskip.

The langgraph variable is assigned but never used. The importorskip call will skip tests regardless of assignment.
-# Skip all tests in this module if LangGraph is not available
-langgraph = pytest.importorskip("langgraph", reason="LangGraph not installed")
+# Skip all tests in this module if LangGraph is not available
+pytest.importorskip("langgraph", reason="LangGraph not installed")
90-104: Use relative path instead of hardcoded path.

Line 93 uses a hardcoded path that assumes tests run from the repository root. This can fail if tests are executed from a different directory.

Apply this diff to use a relative path:
     def test_langgraph_scenario_from_python_file(self):
         """Test creating a scenario from a Python file."""
-        # Create a temporary Python file with the agent
-        agent_file = Path("tests/integration/simple_langgraph_agent.py")
+        agent_file = Path(__file__).parent / "simple_langgraph_agent.py"
172-191: Test doesn't verify retry behavior.

This test sets retries = 2 but uses invoke_agent which always succeeds. The retry logic is never exercised or validated. Consider adding a test with an agent that fails initially but succeeds on retry.

Add a test that actually triggers retries:
def test_scenario_with_retries(self):
    """Test scenario retry functionality."""
    attempt_count = {"value": 0}
    
    def flaky_agent(payload):
        """Agent that fails first time, succeeds on retry."""
        attempt_count["value"] += 1
        if attempt_count["value"] == 1:
            raise ValueError("First attempt fails")
        return {"result": f"Success on attempt {attempt_count['value']}"}
    
    scenario = Scenario.load_langgraph(
        path=flaky_agent,
        dataset=SimpleDataset(),
        name="retry-test"
    )
    scenario.retries = 2
    
    result = run_suite([scenario])
    scenario_result = result.scenarios[0]
    
    # Should succeed after retry
    for run in scenario_result.runs:
        assert run.success is True

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between db0f198 and bea72ef.

📒 Files selected for processing (6)

tests/__init__.py (1 hunks)
tests/integration/__init__.py (1 hunks)
tests/integration/conftest.py (1 hunks)
tests/integration/simple_langgraph_agent.py (1 hunks)
tests/integration/test_integration_basic.py (1 hunks)
tests/integration/test_langgraph_integration.py (1 hunks)

✅ Files skipped from review due to trivial changes (1)

tests/init.py

🚧 Files skipped from review as they are similar to previous changes (2)

tests/integration/conftest.py
tests/integration/init.py

🧰 Additional context used

🧬 Code graph analysis (3)

tests/integration/test_langgraph_integration.py (5)

src/agentunit/core/scenario.py (3)

Scenario (21-270)

load_langgraph (41-53)

clone (257-270)

tests/integration/simple_langgraph_agent.py (1)

invoke_agent (99-133)

src/agentunit/core/trace.py (1)

TraceLog (22-72)

src/agentunit/adapters/langgraph.py (1)

LangGraphAdapter (34-142)

src/agentunit/adapters/registry.py (1)

resolve_adapter (39-45)

tests/integration/simple_langgraph_agent.py (2)

src/agentunit/cli/__init__.py (1)

get (80-89)

tests/test_framework_adapters.py (1)

agent (90-92)

tests/integration/test_integration_basic.py (1)

tests/integration/simple_langgraph_agent.py (1)

invoke_agent (99-133)

🔇 Additional comments (11)

tests/integration/test_langgraph_integration.py (6)

17-48: LGTM!

The SimpleDataset class provides good test coverage with three diverse test cases covering different query types and difficulty levels.

105-131: LGTM!

Comprehensive test of the full evaluation cycle with proper validation of results, including success status, duration, and error handling.

151-171: LGTM!

The error handling test properly validates that failures are recorded with error messages and don't crash the test suite.

233-253: LGTM!

Well-structured test for running multiple independent scenarios in parallel, with proper validation that each scenario executes correctly.

216-229: The adapter alias is properly registered.

The "langgraph_graph" alias is correctly defined in src/agentunit/adapters/langgraph.py at line 145 via register_adapter(LangGraphAdapter, aliases=("langgraph_graph",)). The test assertion will pass as expected.

132-150: Use a valid metric name instead of "response_length".

The metric "response_length" is not defined in the system. The available metrics are: faithfulness, tool_success, answer_correctness, hallucination_rate, and retrieval_quality. Using an undefined metric will raise a KeyError in resolve_metrics() before the test can reach the assertion. Replace the metric name with one of the valid options or mock the metrics registry for this test.

Likely an incorrect or invalid review comment.

tests/integration/simple_langgraph_agent.py (5)

1-32: LGTM!

Excellent use of graceful dependency handling with LANGGRAPH_AVAILABLE flag and fallback types. This allows tests to run even when LangGraph is not installed, with clear distinction between mock and real behavior.

34-41: LGTM!

The AgentState TypedDict properly defines the expected state structure for the agent.

43-97: LGTM!

The agent implementation is appropriately simple for testing purposes. Deterministic responses based on query keywords make test assertions predictable. The graph structure is minimal but correctly follows LangGraph patterns.

99-133: LGTM!

The invoke_agent function correctly handles both mock and real execution paths. The response extraction logic safely handles edge cases with proper None checks.

136-140: LGTM!

Module-level graph initialization is appropriately conditional and provides a convenient import target for tests.

coderabbitai · 2025-12-13T06:02:54Z

+        from agentunit.core.trace import TraceLog
+        trace = TraceLog()
+        outcome = scenario.adapter.execute(test_case, trace)
+
+        assert outcome.success is True
+        assert outcome.output is not None
+        assert "quantum" in outcome.output.lower()


⚠️ Potential issue | 🟡 Minor

Add type guard before string operation.

Line 88 calls .lower() on outcome.output without verifying it's a string. According to the AdapterOutcome structure, output can be None.

Apply this diff to add a type guard:

outcome = scenario.adapter.execute(test_case, trace) assert outcome.success is True assert outcome.output is not None + assert isinstance(outcome.output, str) assert "quantum" in outcome.output.lower()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from agentunit.core.trace import TraceLog

trace = TraceLog()

outcome = scenario.adapter.execute(test_case, trace)

assert outcome.success is True

assert outcome.output is not None

assert "quantum" in outcome.output.lower()

from agentunit.core.trace import TraceLog

trace = TraceLog()

outcome = scenario.adapter.execute(test_case, trace)

assert outcome.success is True

assert outcome.output is not None

assert isinstance(outcome.output, str)

assert "quantum" in outcome.output.lower()

🤖 Prompt for AI Agents

In tests/integration/test_langgraph_integration.py around lines 82 to 88, outcome.output may be None but the test calls outcome.output.lower(); add a type guard before calling .lower() by asserting outcome.output is a string (e.g., assert isinstance(outcome.output, str)) or checking isinstance(outcome.output, str) and then perform the lower() check only inside that branch so the test won’t raise when output is None.

sshekhar563 · 2025-12-13T06:05:26Z

@aviralgarg05 Could you please again run the tests/workflows for this PR? Thank you.

- Add proper __init__ method that calls super() with required name and loader - Change iter_cases to _generate_cases method that returns a list - This fixes the TypeError about missing positional arguments Resolves the 8 failing LangGraph integration tests.

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (1)

tests/integration/test_langgraph_integration.py (1)
86-92: Guard outcome.output type before calling .lower() to avoid AttributeError.

This matches the earlier review note: AdapterOutcome.output may be non-string (or None), so .lower() can raise.
         outcome = scenario.adapter.execute(test_case, trace)

         assert outcome.success is True
         assert outcome.output is not None
+        assert isinstance(outcome.output, str)
         assert "quantum" in outcome.output.lower()

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bea72ef and b5c522f.

📒 Files selected for processing (1)

tests/integration/test_langgraph_integration.py (1 hunks)

coderabbitai · 2025-12-13T06:13:44Z

+        # Create a temporary Python file with the agent
+        agent_file = Path("tests/integration/simple_langgraph_agent.py")
+
+        scenario = Scenario.load_langgraph(
+            path=agent_file,
+            dataset=SimpleDataset(),
+            name="test-langgraph-file",
+            callable="invoke_agent"  # Specify the callable name
+        )


⚠️ Potential issue | 🟠 Major

Fix cwd-dependent test path: build the agent file path relative to this test file.

Path("tests/integration/simple_langgraph_agent.py") will fail if pytest is invoked from a different working directory (common in CI).

- agent_file = Path("tests/integration/simple_langgraph_agent.py") + agent_file = Path(__file__).resolve().parent / "simple_langgraph_agent.py"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Create a temporary Python file with the agent

agent_file = Path("tests/integration/simple_langgraph_agent.py")

scenario = Scenario.load_langgraph(

path=agent_file,

dataset=SimpleDataset(),

name="test-langgraph-file",

callable="invoke_agent" # Specify the callable name

)

# Create a temporary Python file with the agent

agent_file = Path(__file__).resolve().parent / "simple_langgraph_agent.py"

scenario = Scenario.load_langgraph(

path=agent_file,

dataset=SimpleDataset(),

name="test-langgraph-file",

callable="invoke_agent" # Specify the callable name

)

🤖 Prompt for AI Agents

In tests/integration/test_langgraph_integration.py around lines 96 to 104, the test constructs agent_file using a cwd-dependent relative path which breaks when pytest runs from another working directory; instead build the path relative to the test file by using the test file's directory (e.g. Path(__file__).resolve().parent / "simple_langgraph_agent.py") so the test locates the agent file regardless of current working directory.

coderabbitai · 2025-12-13T06:13:44Z

+        for run in scenario_result.runs:
+            assert run.success is True
+            assert run.duration_ms > 0
+            assert run.error is None


⚠️ Potential issue | 🟡 Minor

Avoid flaky timing assertion: duration_ms can legitimately be 0ms.

for run in scenario_result.runs: assert run.success is True - assert run.duration_ms > 0 + assert run.duration_ms >= 0 assert run.error is None

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for run in scenario_result.runs:

assert run.success is True

assert run.duration_ms > 0

assert run.error is None

for run in scenario_result.runs:

assert run.success is True

assert run.duration_ms >= 0

assert run.error is None

🤖 Prompt for AI Agents

In tests/integration/test_langgraph_integration.py around lines 131 to 134, the assertion "assert run.duration_ms > 0" is flaky because duration can legitimately be 0ms; change this to check for non-negativity instead (e.g., "assert run.duration_ms >= 0") or remove the timing assertion entirely if timing isn't critical, ensuring the test only asserts that duration is not negative rather than strictly positive.

coderabbitai · 2025-12-13T06:13:44Z

+        cloned = original.clone(
+            name="cloned",
+            retries=3,
+            timeout=120.0
+        )
+
+        assert cloned.name == "cloned"
+        assert cloned.retries == 3
+        assert cloned.timeout == 120.0
+        assert original.name == "original"  # Original unchanged
+        assert original.retries == 1  # Default value
+


⚠️ Potential issue | 🟡 Minor

Don’t hardcode default Scenario.retries in test; assert “unchanged” instead.

assert original.retries == 1 bakes in an implementation default and will break on harmless default changes.

cloned = original.clone( name="cloned", retries=3, timeout=120.0 ) assert cloned.name == "cloned" assert cloned.retries == 3 assert cloned.timeout == 120.0 assert original.name == "original" # Original unchanged - assert original.retries == 1 # Default value + assert original.retries != cloned.retries

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

cloned = original.clone(

name="cloned",

retries=3,

timeout=120.0

)

assert cloned.name == "cloned"

assert cloned.retries == 3

assert cloned.timeout == 120.0

assert original.name == "original" # Original unchanged

assert original.retries == 1 # Default value

cloned = original.clone(

name="cloned",

retries=3,

timeout=120.0

)

assert cloned.name == "cloned"

assert cloned.retries == 3

assert cloned.timeout == 120.0

assert original.name == "original" # Original unchanged

assert original.retries != cloned.retries

🤖 Prompt for AI Agents

In tests/integration/test_langgraph_integration.py around lines 207 to 218, the test currently asserts a hardcoded default original.retries == 1 which pins behavior to an implementation detail; instead capture the original retries value before cloning (e.g. original_retries = original.retries) and after cloning assert original.retries == original_retries (or assert that the original value is unchanged) rather than asserting a specific numeric default.

coderabbitai · 2025-12-13T06:13:45Z

+    assert len(result.scenarios) == 3
+    for i, scenario_result in enumerate(result.scenarios):
+        assert scenario_result.name == f"scenario-{i}"
+        assert len(scenario_result.runs) == 3  # Each has 3 test cases


⚠️ Potential issue | 🟠 Major

Don’t rely on run_suite() preserving scenario order; assert by name set.

If run_suite() parallelizes or sorts internally, ordering may change and this becomes flaky.

result = run_suite(scenarios) assert len(result.scenarios) == 3 - for i, scenario_result in enumerate(result.scenarios): - assert scenario_result.name == f"scenario-{i}" - assert len(scenario_result.runs) == 3 # Each has 3 test cases + names = {sr.name for sr in result.scenarios} + assert names == {f"scenario-{i}" for i in range(3)} + for scenario_result in result.scenarios: + assert len(scenario_result.runs) == 3 # Each has 3 test cases

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

assert len(result.scenarios) == 3

for i, scenario_result in enumerate(result.scenarios):

assert scenario_result.name == f"scenario-{i}"

assert len(scenario_result.runs) == 3 # Each has 3 test cases

assert len(result.scenarios) == 3

names = {sr.name for sr in result.scenarios}

assert names == {f"scenario-{i}" for i in range(3)}

for scenario_result in result.scenarios:

assert len(scenario_result.runs) == 3 # Each has 3 test cases

🤖 Prompt for AI Agents

In tests/integration/test_langgraph_integration.py around lines 254 to 257, the test currently assumes run_suite() preserves scenario order which can be flaky; change the assertions to validate scenario names as a set and then look up scenarios by name to assert their runs. Specifically, assert that the set of scenario_result.name values equals {"scenario-0","scenario-1","scenario-2"}, then build a mapping from name to scenario_result and for each expected name assert len(mapping[name].runs) == 3.

- Replace invalid 'response_length' metric with 'faithfulness' metric - Fix missing newline at end of tests/__init__.py file - This should resolve the last failing test and linting issues All LangGraph integration tests should now pass successfully.

sshekhar563 · 2025-12-13T10:16:33Z

@aviralgarg05 If possible, could you please re-run the tests/workflows for this PR? Thank you for your time and support.

- Run ruff format on all integration test files - Fix formatting for conftest.py, simple_langgraph_agent.py, test_integration_basic.py, and test_langgraph_integration.py - This resolves the 'Run ruff formatter check' CI failure All linting and formatting checks should now pass.

sshekhar563 · 2025-12-13T10:27:07Z

@aviralgarg05 I have the issue of lint & format, could you please re-run the tests/workflows for this PR? Thank you for your time and support.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

tests/integration/simple_langgraph_agent.py (1)
75-89: Consider simplifying the edge pattern.

Since should_continue always returns END, the conditional edge pattern at line 89 could be simplified to a direct edge: workflow.add_edge("process", END). The current implementation works correctly but is more complex than necessary for this deterministic flow.

Apply this diff to simplify:
-    def should_continue(state: AgentState) -> str:
-        """Determine if the agent should continue processing."""
-        return END
-
     # Create the graph
     workflow = StateGraph(AgentState)
 
     # Add nodes
     workflow.add_node("process", process_query)
 
     # Set entry point
     workflow.set_entry_point("process")
 
     # Add edges
-    workflow.add_conditional_edges("process", should_continue, {END: END})
+    workflow.add_edge("process", END)
tests/integration/test_integration_basic.py (1)

50-55: Consider removing or enhancing this placeholder test.

This test contains no assertions and always passes. While pytest markers can be hard to test programmatically, consider either:

Adding runtime assertions (e.g., checking pytest.mark registry or test node metadata via a pytest hook in conftest.py)

Documenting in the docstring why this is intentionally minimal

Removing the test if it serves no verification purpose

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0c17003 and f383761.

📒 Files selected for processing (4)

tests/integration/conftest.py (1 hunks)
tests/integration/simple_langgraph_agent.py (1 hunks)
tests/integration/test_integration_basic.py (1 hunks)
tests/integration/test_langgraph_integration.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

tests/integration/conftest.py
tests/integration/test_langgraph_integration.py

🧰 Additional context used

🧬 Code graph analysis (2)

tests/integration/simple_langgraph_agent.py (1)

tests/test_framework_adapters.py (1)

agent (90-92)

tests/integration/test_integration_basic.py (1)

tests/integration/simple_langgraph_agent.py (1)

invoke_agent (95-129)

🔇 Additional comments (5)

tests/integration/simple_langgraph_agent.py (4)

8-34: Good fallback pattern for optional dependencies.

The conditional import with fallback stubs is well-implemented and allows the test module to be imported regardless of LangGraph availability. The simplified add_messages fallback is acceptable for test scenarios.

36-44: LGTM!

The AgentState TypedDict provides clear structure for the agent's state and aligns well with typical LangGraph patterns.

95-129: Well-structured invocation logic with proper fallback.

The function correctly handles both LangGraph-available and unavailable cases, with matching response structures. The safe extraction of the final message (lines 118-121) properly handles edge cases.

132-136: Module-level graph initialization is acceptable for test code.

Creating the graph at module import time is a reasonable approach for a test agent, as create_simple_agent() is lightweight and deterministic.

tests/integration/test_integration_basic.py (1)

23-48: LGTM! Comprehensive fallback testing.

The test effectively validates both the mock response (when LangGraph is unavailable) and the real response structure, with appropriate assertions for each case.

sshekhar563 · 2025-12-13T10:31:11Z

All required checks are passing.
Kindly proceed with merging this pull request when you have time. Thank you for your support.

codecov-commenter · 2025-12-13T13:20:28Z

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment

Thanks for integrating Codecov - We've got you covered ☂️

* Add LangGraph integration tests - Create tests/integration/ directory with comprehensive LangGraph tests - Add simple_langgraph_agent.py with mock LangGraph agent for testing - Implement test_langgraph_integration.py with full evaluation cycle tests - Configure pytest markers for integration and langgraph tests - Add graceful dependency handling - tests skip when LangGraph not installed - Include comprehensive documentation and CI examples - Update main README with integration test instructions Fixes aviralgarg05#24 * Fix LangGraph installation instructions for Poetry environment - Add langgraph as optional dependency in pyproject.toml - Update installation instructions to use poetry extras - Ensure LangGraph is properly managed within Poetry environment - Address coderabbit review feedback about pip vs poetry installation * Update GitHub Actions to current recommended versions - Update actions/setup-python from v4 to v5 - Update codecov/codecov-action from v3 to v4 - Address coderabbit security and compatibility recommendations * Fix YAML indentation in CI example file - Correct indentation for steps sections in both jobs - Ensure proper YAML syntax for GitHub Actions workflow - Address coderabbit YAML formatting feedback * Fix dependency conflicts and update poetry.lock - Update langchain version constraint to be compatible with langgraph - Resolve langchain-core version conflicts between langchain and langgraph - Update poetry.lock with resolved dependencies - Fix CI test failures caused by outdated lock file * Fix import issues and linting errors in integration tests - Change absolute imports to relative imports in test files - Add missing tests/__init__.py for proper package structure - Fix all linting errors (whitespace, formatting, imports) - Improve code quality in simple_langgraph_agent.py - Resolve ModuleNotFoundError issues in CI This addresses the test import failures and ensures proper Python package structure for the tests directory. * Fix SimpleDataset constructor to properly inherit from DatasetSource - Add proper __init__ method that calls super() with required name and loader - Change iter_cases to _generate_cases method that returns a list - This fixes the TypeError about missing positional arguments Resolves the 8 failing LangGraph integration tests. * Fix final test issue and linting error - Replace invalid 'response_length' metric with 'faithfulness' metric - Fix missing newline at end of tests/__init__.py file - This should resolve the last failing test and linting issues All LangGraph integration tests should now pass successfully. * Fix code formatting issues in integration tests - Run ruff format on all integration test files - Fix formatting for conftest.py, simple_langgraph_agent.py, test_integration_basic.py, and test_langgraph_integration.py - This resolves the 'Run ruff formatter check' CI failure All linting and formatting checks should now pass.

coderabbitai Bot reviewed Dec 12, 2025

View reviewed changes

Update GitHub Actions to current recommended versions

7a71405

- Update actions/setup-python from v4 to v5 - Update codecov/codecov-action from v3 to v4 - Address coderabbit security and compatibility recommendations

coderabbitai Bot reviewed Dec 12, 2025

View reviewed changes

sshekhar563 added 2 commits December 12, 2025 17:25

Fix YAML indentation in CI example file

1a9e09b

- Correct indentation for steps sections in both jobs - Ensure proper YAML syntax for GitHub Actions workflow - Address coderabbit YAML formatting feedback

coderabbitai Bot reviewed Dec 13, 2025

View reviewed changes

aviralgarg05 merged commit 4eb821c into aviralgarg05:main Dec 13, 2025
12 checks passed

This was referenced Dec 15, 2025

feat: implement pytest plugin for AgentUnit scenario discovery (resolves #22) #38

Closed

feat: implement pytest plugin for AgentUnit scenario discovery (resolves #22) #39

Merged

coderabbitai Bot mentioned this pull request Dec 24, 2025

feat: add basic evaluation example using FakeAdapter #35

Merged

40 tasks

coderabbitai Bot mentioned this pull request Jan 12, 2026

Add CSV export support for SuiteResultAdd csv reports #62

Merged


		### What the Tests Cover

		1. Scenario Creation: Tests creating scenarios from callable agents and Python files

	if: github.ref == 'refs/heads/main' \|\| contains(github.event.pull_request.labels.*.name, 'run-integration-tests')
	if: github.ref == 'refs/heads/main' \|\| (github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'run-integration-tests'))

Conversation

sshekhar563 commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎯 Overview #24

✅ Tasks Completed

🧪 What's Added

New Files

Updated Files

🔧 Key Features

Graceful Dependency Handling

Comprehensive Test Coverage

CI Ready

🚀 Usage

Uh oh!

continue Bot commented Dec 12, 2025

Uh oh!

coderabbitai Bot commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

sshekhar563 commented Dec 13, 2025

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

sshekhar563 commented Dec 13, 2025

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

sshekhar563 commented Dec 12, 2025 •

edited

Loading

coderabbitai Bot commented Dec 12, 2025 •

edited

Loading