Add quickstart instructions for running CI checks locally by Jagriti-student · Pull Request #41 · aviralgarg05/agentunit

Jagriti-student · 2025-12-17T16:31:03Z

This PR adds a new section to README.md explaining how contributors can
run the same lint, format, and test checks locally as CI.

This helps beginners validate changes before opening a PR.

closes #3

Summary by CodeRabbit

New Features
- Added a new example script demonstrating basic evaluation workflow with AgentUnit.
Documentation
- Added "Running CI Checks Locally" section to README with setup prerequisites and instructions.
- Enhanced docstrings across core modules for improved clarity and API documentation.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

continue · 2025-12-17T16:31:05Z

Learn more

All Green is an AI agent that automatically:

✅ Addresses code review comments

✅ Fixes failing CI checks

✅ Resolves merge conflicts

Unsubscribe from All Green comments

coderabbitai · 2025-12-17T16:31:14Z

📝 Walkthrough

Walkthrough

This PR enhances documentation across the project by adding a new example script demonstrating basic evaluation workflow and expanding docstrings throughout core modules with clarified parameter descriptions and behaviors, without altering any runtime logic or control flow.

Changes

Cohort / File(s)	Change Summary
Documentation and Examples `README.md`, `examples/basic_evaluation.py`	README updated with "Running CI Checks Locally" section covering prerequisites and setup instructions. New example script `basic_evaluation.py` demonstrates minimal evaluation workflow using AgentUnit with FakeAdapter implementation.
Core Module Documentation `src/agentunit/core/__init__.py`, `src/agentunit/core/exceptions.py`, `src/agentunit/core/replay.py`, `src/agentunit/core/runner.py`, `src/agentunit/core/scenario.py`, `src/agentunit/core/trace.py`	Module and class docstrings converted from single-line to multi-line triple-quoted format across all files. No functional changes; formatting and documentation clarity improvements only.
Adapter Documentation `src/agentunit/adapters/base.py`	BaseAdapter method docstrings expanded with detailed descriptions of behavior, parameters, and return values for `prepare()`, `execute()`, and `cleanup()` methods. No signature or logic changes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Majority of changes are cosmetic docstring formatting and documentation updates requiring minimal verification
New example script is straightforward with simple, single-responsibility logic (FakeAdapter implementation and basic evaluation flow)
No complex logic, control flow modifications, or API surface changes to validate

Pre-merge checks and finishing touches

❌ Failed checks (4 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is minimal and lacks adherence to the repository's comprehensive template. Required sections like Type of Change, Changes Made, Testing, Code Quality, and Documentation are not completed.	Complete the PR description template by filling in Type of Change (Documentation update), Changes Made list, Testing sections, and other applicable checklist items to meet repository standards.
Linked Issues check	⚠️ Warning	The PR partially addresses issue #3 by adding documentation to README.md, but it duplicates the CI instruction section and includes unrelated changes (docstring formatting, new example file) beyond the scope.	Remove the duplicated README section and unrelated changes (docstring updates, examples/basic_evaluation.py) to focus solely on the CI instructions requirement from issue #3.
Out of Scope Changes check	⚠️ Warning	The PR contains multiple out-of-scope changes: duplicated README content, extensive docstring formatting updates across multiple files, and a new example script (basic_evaluation.py) unrelated to the CI instructions objective.	Remove out-of-scope changes: deduplicate README, remove docstring reformatting from all core modules, and exclude examples/basic_evaluation.py. Keep only the CI instructions addition.
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately reflects the main objective of the PR: adding quickstart instructions for running CI checks locally. It is concise, specific, and directly related to the primary change.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov-commenter · 2025-12-17T16:32:27Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (1)

src/agentunit/core/scenario.py (1)

82-84: Inconsistent docstring coverage across factory methods.

Only from_crewai and from_autogen have docstrings, while similar factory methods like load_langgraph, from_openai_agents, from_haystack, etc., lack documentation. Consider applying docstrings consistently across all factory methods for uniform API clarity.

Also applies to: 100-102

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fcd78c8 and 0669ffd.

📒 Files selected for processing (9)

README.md (1 hunks)
examples/basic_evaluation.py (1 hunks)
src/agentunit/adapters/base.py (1 hunks)
src/agentunit/core/__init__.py (1 hunks)
src/agentunit/core/exceptions.py (1 hunks)
src/agentunit/core/replay.py (2 hunks)
src/agentunit/core/runner.py (1 hunks)
src/agentunit/core/scenario.py (4 hunks)
src/agentunit/core/trace.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

examples/basic_evaluation.py (1)

src/agentunit/adapters/base.py (1)

BaseAdapter (28-69)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Test (Python 3.10)

🔇 Additional comments (9)

src/agentunit/core/runner.py (1)

1-3: Docstring formatting change unrelated to PR objective.

This docstring formatting improvement is fine, but it's unrelated to issue #3, which is specifically about adding CI instructions to the README. Consider keeping PRs focused on a single objective to make reviews easier.

src/agentunit/core/trace.py (1)

1-3: Docstring formatting changes unrelated to PR objective.

These docstring improvements are appropriate, but they're outside the scope of issue #3 (adding CI instructions). Consider submitting documentation improvements separately from feature additions.

Also applies to: 16-18, 27-29

src/agentunit/core/exceptions.py (1)

1-3: Docstring formatting changes unrelated to PR objective.

These docstring improvements follow best practices, but they're unrelated to issue #3. Please keep PRs focused on their stated objectives.

Also applies to: 9-11, 15-17, 21-23

src/agentunit/core/replay.py (1)

1-3: Docstring formatting changes unrelated to PR objective.

These changes are fine but outside the scope of issue #3. Consider grouping documentation improvements into a separate PR.

Also applies to: 13-15

src/agentunit/adapters/base.py (1)

35-42: Docstring enhancements unrelated to PR objective.

These docstring improvements add valuable detail to the BaseAdapter interface, but they're outside the scope of issue #3. Consider separating documentation improvements from feature work.

Also applies to: 46-55, 58-66

src/agentunit/core/__init__.py (1)

1-3: Docstring formatting change unrelated to PR objective.

This formatting change is fine but outside the scope of issue #3. Please keep PRs focused on their stated objectives for easier review and clearer git history.

src/agentunit/core/scenario.py (3)

1-3: Module docstring improved for clarity.

The expanded module docstring clearly describes the purpose of the module. This change improves discoverability and onboarding for contributors.

24-26: Class docstring concise and descriptive.

The expanded Scenario class docstring clearly conveys its purpose without being verbose. Good improvement for API documentation.

1-283: Verify PR scope: Documentation vs. CI instructions mismatch.

The PR objectives state the goal is to add quickstart CI instructions to README.md or CONTRIBUTING.md with specific Poetry commands. However, the provided file contains only docstring improvements to a Python module. Please clarify:

Are there other files (README.md, CONTRIBUTING.md, or example scripts) included in this PR that were not provided for review?

Does this PR scope include both docstring improvements and CI documentation, or is there a mismatch?

The code changes themselves are sound, but confirming alignment with stated PR objectives will ensure completeness.

coderabbitai · 2025-12-17T16:34:48Z

+from agentunit.core.adapters import BaseAdapter
+from agentunit.core.evaluator import Evaluator


⚠️ Potential issue | 🔴 Critical

Critical: Import paths are incorrect - example will fail.

These imports are incorrect and will cause ImportError:

Line 10: BaseAdapter is in agentunit.adapters.base, not agentunit.core.adapters

Line 11: There's no Evaluator class in agentunit.core.evaluator based on the provided codebase

Apply this diff to fix the imports:

-from agentunit.core.adapters import BaseAdapter -from agentunit.core.evaluator import Evaluator +from agentunit.adapters.base import BaseAdapter, AdapterOutcome +from agentunit.core.scenario import Scenario +from agentunit.datasets.base import DatasetCase +from agentunit.core.trace import TraceLog

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from agentunit.core.adapters import BaseAdapter

from agentunit.core.evaluator import Evaluator

from agentunit.adapters.base import BaseAdapter, AdapterOutcome

from agentunit.core.scenario import Scenario

from agentunit.datasets.base import DatasetCase

from agentunit.core.trace import TraceLog

🤖 Prompt for AI Agents

In examples/basic_evaluation.py around lines 10-11, the current imports point to non-existent modules and will raise ImportError; change the imports to use the correct modules by importing BaseAdapter from agentunit.adapters.base and importing Evaluator from the correct top-level evaluator module (agentunit.evaluator) so the file reads imports from agentunit.adapters.base import BaseAdapter and from agentunit.evaluator import Evaluator.

coderabbitai · 2025-12-17T16:34:48Z

+class FakeAdapter(BaseAdapter):
+    """
+    A simple mock adapter used only for demonstration.
+    It returns a predictable output so evaluation is easy to understand.
+    """
+
+    def generate(self, prompt: str) -> str:
+        # Always returns the same answer for simplicity
+        return "Hello, this is a fake response!"


⚠️ Potential issue | 🔴 Critical

Critical: FakeAdapter doesn't implement BaseAdapter interface.

The FakeAdapter doesn't properly implement the BaseAdapter abstract interface. According to src/agentunit/adapters/base.py, adapters must implement:

prepare() -> None

execute(case: DatasetCase, trace: TraceLog) -> AdapterOutcome

cleanup() -> None

Your generate(prompt: str) -> str method doesn't match this interface.

Apply this diff to fix the implementation:

class FakeAdapter(BaseAdapter): """ A simple mock adapter used only for demonstration. It returns a predictable output so evaluation is easy to understand. """ - def generate(self, prompt: str) -> str: - # Always returns the same answer for simplicity - return "Hello, this is a fake response!" + def prepare(self) -> None: + """No setup required for the fake adapter.""" + pass + + def execute(self, case: DatasetCase, trace: TraceLog) -> AdapterOutcome: + """Return a fixed response for demonstration.""" + trace.record("agent_response", content="Hello, this is a fake response!") + return AdapterOutcome( + success=True, + output="Hello, this is a fake response!" + ) + + def cleanup(self) -> None: + """No cleanup required for the fake adapter.""" + pass

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

class FakeAdapter(BaseAdapter):

"""

A simple mock adapter used only for demonstration.

It returns a predictable output so evaluation is easy to understand.

"""

def generate(self, prompt: str) -> str:

# Always returns the same answer for simplicity

return "Hello, this is a fake response!"

class FakeAdapter(BaseAdapter):

"""

A simple mock adapter used only for demonstration.

It returns a predictable output so evaluation is easy to understand.

"""

def prepare(self) -> None:

"""No setup required for the fake adapter."""

pass

def execute(self, case: DatasetCase, trace: TraceLog) -> AdapterOutcome:

"""Return a fixed response for demonstration."""

trace.record("agent_response", content="Hello, this is a fake response!")

return AdapterOutcome(

success=True,

output="Hello, this is a fake response!"

)

def cleanup(self) -> None:

"""No cleanup required for the fake adapter."""

pass

🤖 Prompt for AI Agents

In examples/basic_evaluation.py around lines 14 to 22, FakeAdapter currently defines generate(prompt: str) -> str which does not match the BaseAdapter interface; replace it with implementations for prepare(self) -> None, execute(self, case: DatasetCase, trace: TraceLog) -> AdapterOutcome, and cleanup(self) -> None: make prepare and cleanup no-ops (pass), implement execute to produce the same predictable response (e.g., "Hello, this is a fake response!") wrapped in an AdapterOutcome object along with any minimal trace entries (using TraceLog) as required by the adapter contract, and ensure you import or reference DatasetCase, TraceLog, and AdapterOutcome types for proper typing.

coderabbitai · 2025-12-17T16:34:48Z

+def main():
+    # Step 1 — Prepare the adapter
+    adapter = FakeAdapter()
+
+    # Step 2 — Create the evaluator
+    evaluator = Evaluator(adapter=adapter)
+
+    # Step 3 — Prepare an example prompt
+    prompt = "Say hello!"
+
+    # Step 4 — Run the evaluation
+    result = evaluator.evaluate(prompt)
+
+    # Step 5 — Print the output
+    print("Prompt:", prompt)
+    print("Model Output:", result.output)
+    print("Evaluation Score:", result.score)


⚠️ Potential issue | 🔴 Critical

Critical: Example uses non-existent Evaluator class.

The main() function uses an Evaluator class that doesn't exist in the AgentUnit codebase. Based on the actual API shown in other files, you should use Scenario and Runner instead.

Apply this diff to fix the example workflow:

def main(): - # Step 1 — Prepare the adapter + # Step 1 — Create test data + test_case = DatasetCase( + id="hello_1", + query="Say hello!", + expected_output="Hello, this is a fake response!" + ) + + # Step 2 — Prepare the adapter adapter = FakeAdapter() - # Step 2 — Create the evaluator - evaluator = Evaluator(adapter=adapter) - - # Step 3 — Prepare an example prompt - prompt = "Say hello!" + # Step 3 — Create a scenario + scenario = Scenario( + name="Basic Test", + adapter=adapter, + dataset=[test_case] + ) - # Step 4 — Run the evaluation - result = evaluator.evaluate(prompt) + # Step 4 — Run the scenario + from agentunit.core.runner import Runner + runner = Runner(scenarios=[scenario]) + result = runner.run() # Step 5 — Print the output - print("Prompt:", prompt) - print("Model Output:", result.output) - print("Evaluation Score:", result.score) + print("Scenario:", scenario.name) + print("Success rate:", f"{len([r for r in result.scenarios[0].runs if r.success])}/{len(result.scenarios[0].runs)}") + for run in result.scenarios[0].runs: + print(f" Case {run.case_id}: {'✓' if run.success else '✗'}")

🤖 Prompt for AI Agents

In examples/basic_evaluation.py around lines 25 to 41, the example incorrectly instantiates a non-existent Evaluator class; replace that workflow to use the actual API: construct a Scenario (with the prompt and any expected config) and a Runner (passing the FakeAdapter), run the scenario via Runner.run() to obtain results, then print prompt, model output and evaluation score from the Runner/Scenario result; ensure imports for Scenario and Runner are added at the top and remove the Evaluator usage.

coderabbitai · 2025-12-17T16:34:48Z

+### Running CI Checks Locally
+
+Before opening a pull request, you can run the same checks locally that are executed in CI.
+
+#### Requirements
+- Python **3.10 or higher**
+- [Poetry](https://python-poetry.org/) installed
+
+#### Setup
+Install dependencies (including dev tools):
+
+```bash
+poetry install --with dev
+```


⚠️ Potential issue | 🔴 Critical

Critical: Section is incomplete - missing CI check commands.

This section addresses issue #3 but is incomplete. According to the acceptance criteria, you need to document how to run:

poetry check - verify Poetry config

poetry run ruff check . - run linting

poetry run ruff format --check . - verify formatting

poetry run pytest - run tests

Currently, only the setup (poetry install --with dev) is shown. Additionally, the AI summary mentions this content "appears twice in the diff" - please verify there's no duplication in the README.

Apply this diff to complete the section:

#### Setup Install dependencies (including dev tools): ```bash poetry install --with dev

+#### Verify Your Changes
+
+Run these commands before opening a pull request:
+
+bash +# Verify pyproject.toml and lock file consistency +poetry check + +# Lint code with ruff +poetry run ruff check . + +# Verify code formatting +poetry run ruff format --check . + +# Run all tests +poetry run pytest +
+
+If formatting issues are detected, you can auto-fix them:
+
+bash +poetry run ruff format . +

<details> <summary>🤖 Prompt for AI Agents</summary>

In README.md around lines 206 to 219, the "Running CI Checks Locally" section
only shows setup but is missing the CI verification commands; add a "Verify Your
Changes" subsection immediately after the existing "poetry install --with dev"
block that lists the required commands (poetry check, poetry run ruff check .,
poetry run ruff format --check ., poetry run pytest) and mention how to auto-fix
formatting with poetry run ruff format ., and also scan the README for any
duplicate copies of this section and remove the duplicate so the instructions
appear only once.

</details>  

aviralgarg05

LGTM!

…05#41) * Add basic evaluation example script * Fix typos and improve clarity in docstrings across core modules * Add Google-style docstrings to BaseAdapter methods * Format base adapter using ruff * docs: add instructions for running CI checks locally * Remove example file unrelated to CI documentation

Jagriti-student added 5 commits December 7, 2025 14:43

Add basic evaluation example script

b052329

Fix typos and improve clarity in docstrings across core modules

dd2f4fe

Add Google-style docstrings to BaseAdapter methods

80b0706

Format base adapter using ruff

8e7b8c1

docs: add instructions for running CI checks locally

0669ffd

coderabbitai Bot reviewed Dec 17, 2025

View reviewed changes

Remove example file unrelated to CI documentation

fe9d27b

aviralgarg05 approved these changes Dec 19, 2025

View reviewed changes

aviralgarg05 merged commit 315a9a4 into aviralgarg05:main Dec 19, 2025
12 checks passed

This was referenced Dec 20, 2025

Expose __version__ in agentunit package #44

Merged

Replace datetime.utcnow with timezone-aware datetime #47

Merged

coderabbitai Bot mentioned this pull request Dec 25, 2025

feat: add basic evaluation example using FakeAdapter #35

Merged

40 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add quickstart instructions for running CI checks locally#41

Add quickstart instructions for running CI checks locally#41
aviralgarg05 merged 6 commits into
aviralgarg05:mainfrom
Jagriti-student:add-local-ci-instructions

Jagriti-student commented Dec 17, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

continue Bot commented Dec 17, 2025

Uh oh!

coderabbitai Bot commented Dec 17, 2025 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

codecov-commenter commented Dec 17, 2025

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot Dec 17, 2025

Uh oh!

coderabbitai Bot Dec 17, 2025

Uh oh!

coderabbitai Bot Dec 17, 2025

Uh oh!

coderabbitai Bot Dec 17, 2025

Uh oh!

aviralgarg05 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		from agentunit.core.adapters import BaseAdapter
		from agentunit.core.evaluator import Evaluator

-from agentunit.core.adapters import BaseAdapter
-from agentunit.core.evaluator import Evaluator
+from agentunit.adapters.base import BaseAdapter, AdapterOutcome
+from agentunit.core.scenario import Scenario
+from agentunit.datasets.base import DatasetCase
+from agentunit.core.trace import TraceLog

-class FakeAdapter(BaseAdapter):
-    """
-    A simple mock adapter used only for demonstration.
-    It returns a predictable output so evaluation is easy to understand.
-    """
-    def generate(self, prompt: str) -> str:
-        # Always returns the same answer for simplicity
-        return "Hello, this is a fake response!"
+class FakeAdapter(BaseAdapter):
+    """
+    A simple mock adapter used only for demonstration.
+    It returns a predictable output so evaluation is easy to understand.
+    """
+    def prepare(self) -> None:
+        """No setup required for the fake adapter."""
+        pass
+    def execute(self, case: DatasetCase, trace: TraceLog) -> AdapterOutcome:
+        """Return a fixed response for demonstration."""
+        trace.record("agent_response", content="Hello, this is a fake response!")
+        return AdapterOutcome(
+            success=True,
+            output="Hello, this is a fake response!"
+        )
+    def cleanup(self) -> None:
+        """No cleanup required for the fake adapter."""
+        pass

Conversation

Jagriti-student commented Dec 17, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

continue Bot commented Dec 17, 2025

Uh oh!

coderabbitai Bot commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

codecov-commenter commented Dec 17, 2025

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

aviralgarg05 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Jagriti-student commented Dec 17, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Dec 17, 2025 •

edited

Loading