Skip to content

Add quickstart instructions for running CI checks locally#41

Merged
aviralgarg05 merged 6 commits into
aviralgarg05:mainfrom
Jagriti-student:add-local-ci-instructions
Dec 19, 2025
Merged

Add quickstart instructions for running CI checks locally#41
aviralgarg05 merged 6 commits into
aviralgarg05:mainfrom
Jagriti-student:add-local-ci-instructions

Conversation

@Jagriti-student
Copy link
Copy Markdown
Contributor

@Jagriti-student Jagriti-student commented Dec 17, 2025

This PR adds a new section to README.md explaining how contributors can
run the same lint, format, and test checks locally as CI.

This helps beginners validate changes before opening a PR.

closes #3

Summary by CodeRabbit

  • New Features

    • Added a new example script demonstrating basic evaluation workflow with AgentUnit.
  • Documentation

    • Added "Running CI Checks Locally" section to README with setup prerequisites and instructions.
    • Enhanced docstrings across core modules for improved clarity and API documentation.

✏️ Tip: You can customize this high-level summary in your review settings.

@continue
Copy link
Copy Markdown

continue Bot commented Dec 17, 2025

All Green - Keep your PRs mergeable

Learn more

All Green is an AI agent that automatically:

✅ Addresses code review comments

✅ Fixes failing CI checks

✅ Resolves merge conflicts


Unsubscribe from All Green comments

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Dec 17, 2025

📝 Walkthrough

Walkthrough

This PR enhances documentation across the project by adding a new example script demonstrating basic evaluation workflow and expanding docstrings throughout core modules with clarified parameter descriptions and behaviors, without altering any runtime logic or control flow.

Changes

Cohort / File(s) Change Summary
Documentation and Examples
README.md, examples/basic_evaluation.py
README updated with "Running CI Checks Locally" section covering prerequisites and setup instructions. New example script basic_evaluation.py demonstrates minimal evaluation workflow using AgentUnit with FakeAdapter implementation.
Core Module Documentation
src/agentunit/core/__init__.py, src/agentunit/core/exceptions.py, src/agentunit/core/replay.py, src/agentunit/core/runner.py, src/agentunit/core/scenario.py, src/agentunit/core/trace.py
Module and class docstrings converted from single-line to multi-line triple-quoted format across all files. No functional changes; formatting and documentation clarity improvements only.
Adapter Documentation
src/agentunit/adapters/base.py
BaseAdapter method docstrings expanded with detailed descriptions of behavior, parameters, and return values for prepare(), execute(), and cleanup() methods. No signature or logic changes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Majority of changes are cosmetic docstring formatting and documentation updates requiring minimal verification
  • New example script is straightforward with simple, single-responsibility logic (FakeAdapter implementation and basic evaluation flow)
  • No complex logic, control flow modifications, or API surface changes to validate

Pre-merge checks and finishing touches

❌ Failed checks (4 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is minimal and lacks adherence to the repository's comprehensive template. Required sections like Type of Change, Changes Made, Testing, Code Quality, and Documentation are not completed. Complete the PR description template by filling in Type of Change (Documentation update), Changes Made list, Testing sections, and other applicable checklist items to meet repository standards.
Linked Issues check ⚠️ Warning The PR partially addresses issue #3 by adding documentation to README.md, but it duplicates the CI instruction section and includes unrelated changes (docstring formatting, new example file) beyond the scope. Remove the duplicated README section and unrelated changes (docstring updates, examples/basic_evaluation.py) to focus solely on the CI instructions requirement from issue #3.
Out of Scope Changes check ⚠️ Warning The PR contains multiple out-of-scope changes: duplicated README content, extensive docstring formatting updates across multiple files, and a new example script (basic_evaluation.py) unrelated to the CI instructions objective. Remove out-of-scope changes: deduplicate README, remove docstring reformatting from all core modules, and exclude examples/basic_evaluation.py. Keep only the CI instructions addition.
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main objective of the PR: adding quickstart instructions for running CI checks locally. It is concise, specific, and directly related to the primary change.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
src/agentunit/core/scenario.py (1)

82-84: Inconsistent docstring coverage across factory methods.

Only from_crewai and from_autogen have docstrings, while similar factory methods like load_langgraph, from_openai_agents, from_haystack, etc., lack documentation. Consider applying docstrings consistently across all factory methods for uniform API clarity.

Also applies to: 100-102

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fcd78c8 and 0669ffd.

📒 Files selected for processing (9)
  • README.md (1 hunks)
  • examples/basic_evaluation.py (1 hunks)
  • src/agentunit/adapters/base.py (1 hunks)
  • src/agentunit/core/__init__.py (1 hunks)
  • src/agentunit/core/exceptions.py (1 hunks)
  • src/agentunit/core/replay.py (2 hunks)
  • src/agentunit/core/runner.py (1 hunks)
  • src/agentunit/core/scenario.py (4 hunks)
  • src/agentunit/core/trace.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
examples/basic_evaluation.py (1)
src/agentunit/adapters/base.py (1)
  • BaseAdapter (28-69)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test (Python 3.10)
🔇 Additional comments (9)
src/agentunit/core/runner.py (1)

1-3: Docstring formatting change unrelated to PR objective.

This docstring formatting improvement is fine, but it's unrelated to issue #3, which is specifically about adding CI instructions to the README. Consider keeping PRs focused on a single objective to make reviews easier.

src/agentunit/core/trace.py (1)

1-3: Docstring formatting changes unrelated to PR objective.

These docstring improvements are appropriate, but they're outside the scope of issue #3 (adding CI instructions). Consider submitting documentation improvements separately from feature additions.

Also applies to: 16-18, 27-29

src/agentunit/core/exceptions.py (1)

1-3: Docstring formatting changes unrelated to PR objective.

These docstring improvements follow best practices, but they're unrelated to issue #3. Please keep PRs focused on their stated objectives.

Also applies to: 9-11, 15-17, 21-23

src/agentunit/core/replay.py (1)

1-3: Docstring formatting changes unrelated to PR objective.

These changes are fine but outside the scope of issue #3. Consider grouping documentation improvements into a separate PR.

Also applies to: 13-15

src/agentunit/adapters/base.py (1)

35-42: Docstring enhancements unrelated to PR objective.

These docstring improvements add valuable detail to the BaseAdapter interface, but they're outside the scope of issue #3. Consider separating documentation improvements from feature work.

Also applies to: 46-55, 58-66

src/agentunit/core/__init__.py (1)

1-3: Docstring formatting change unrelated to PR objective.

This formatting change is fine but outside the scope of issue #3. Please keep PRs focused on their stated objectives for easier review and clearer git history.

src/agentunit/core/scenario.py (3)

1-3: Module docstring improved for clarity.

The expanded module docstring clearly describes the purpose of the module. This change improves discoverability and onboarding for contributors.


24-26: Class docstring concise and descriptive.

The expanded Scenario class docstring clearly conveys its purpose without being verbose. Good improvement for API documentation.


1-283: Verify PR scope: Documentation vs. CI instructions mismatch.

The PR objectives state the goal is to add quickstart CI instructions to README.md or CONTRIBUTING.md with specific Poetry commands. However, the provided file contains only docstring improvements to a Python module. Please clarify:

  1. Are there other files (README.md, CONTRIBUTING.md, or example scripts) included in this PR that were not provided for review?
  2. Does this PR scope include both docstring improvements and CI documentation, or is there a mismatch?

The code changes themselves are sound, but confirming alignment with stated PR objectives will ensure completeness.

Comment thread examples/basic_evaluation.py Outdated
Comment thread examples/basic_evaluation.py Outdated
Comment on lines +10 to +11
from agentunit.core.adapters import BaseAdapter
from agentunit.core.evaluator import Evaluator
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Import paths are incorrect - example will fail.

These imports are incorrect and will cause ImportError:

  1. Line 10: BaseAdapter is in agentunit.adapters.base, not agentunit.core.adapters
  2. Line 11: There's no Evaluator class in agentunit.core.evaluator based on the provided codebase

Apply this diff to fix the imports:

-from agentunit.core.adapters import BaseAdapter
-from agentunit.core.evaluator import Evaluator
+from agentunit.adapters.base import BaseAdapter, AdapterOutcome
+from agentunit.core.scenario import Scenario
+from agentunit.datasets.base import DatasetCase
+from agentunit.core.trace import TraceLog
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from agentunit.core.adapters import BaseAdapter
from agentunit.core.evaluator import Evaluator
from agentunit.adapters.base import BaseAdapter, AdapterOutcome
from agentunit.core.scenario import Scenario
from agentunit.datasets.base import DatasetCase
from agentunit.core.trace import TraceLog
🤖 Prompt for AI Agents
In examples/basic_evaluation.py around lines 10-11, the current imports point to
non-existent modules and will raise ImportError; change the imports to use the
correct modules by importing BaseAdapter from agentunit.adapters.base and
importing Evaluator from the correct top-level evaluator module
(agentunit.evaluator) so the file reads imports from agentunit.adapters.base
import BaseAdapter and from agentunit.evaluator import Evaluator.

Comment thread examples/basic_evaluation.py Outdated
Comment on lines +14 to +22
class FakeAdapter(BaseAdapter):
"""
A simple mock adapter used only for demonstration.
It returns a predictable output so evaluation is easy to understand.
"""

def generate(self, prompt: str) -> str:
# Always returns the same answer for simplicity
return "Hello, this is a fake response!"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: FakeAdapter doesn't implement BaseAdapter interface.

The FakeAdapter doesn't properly implement the BaseAdapter abstract interface. According to src/agentunit/adapters/base.py, adapters must implement:

  • prepare() -> None
  • execute(case: DatasetCase, trace: TraceLog) -> AdapterOutcome
  • cleanup() -> None

Your generate(prompt: str) -> str method doesn't match this interface.

Apply this diff to fix the implementation:

 class FakeAdapter(BaseAdapter):
     """
     A simple mock adapter used only for demonstration.
     It returns a predictable output so evaluation is easy to understand.
     """
 
-    def generate(self, prompt: str) -> str:
-        # Always returns the same answer for simplicity
-        return "Hello, this is a fake response!"
+    def prepare(self) -> None:
+        """No setup required for the fake adapter."""
+        pass
+
+    def execute(self, case: DatasetCase, trace: TraceLog) -> AdapterOutcome:
+        """Return a fixed response for demonstration."""
+        trace.record("agent_response", content="Hello, this is a fake response!")
+        return AdapterOutcome(
+            success=True,
+            output="Hello, this is a fake response!"
+        )
+
+    def cleanup(self) -> None:
+        """No cleanup required for the fake adapter."""
+        pass
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class FakeAdapter(BaseAdapter):
"""
A simple mock adapter used only for demonstration.
It returns a predictable output so evaluation is easy to understand.
"""
def generate(self, prompt: str) -> str:
# Always returns the same answer for simplicity
return "Hello, this is a fake response!"
class FakeAdapter(BaseAdapter):
"""
A simple mock adapter used only for demonstration.
It returns a predictable output so evaluation is easy to understand.
"""
def prepare(self) -> None:
"""No setup required for the fake adapter."""
pass
def execute(self, case: DatasetCase, trace: TraceLog) -> AdapterOutcome:
"""Return a fixed response for demonstration."""
trace.record("agent_response", content="Hello, this is a fake response!")
return AdapterOutcome(
success=True,
output="Hello, this is a fake response!"
)
def cleanup(self) -> None:
"""No cleanup required for the fake adapter."""
pass
🤖 Prompt for AI Agents
In examples/basic_evaluation.py around lines 14 to 22, FakeAdapter currently
defines generate(prompt: str) -> str which does not match the BaseAdapter
interface; replace it with implementations for prepare(self) -> None,
execute(self, case: DatasetCase, trace: TraceLog) -> AdapterOutcome, and
cleanup(self) -> None: make prepare and cleanup no-ops (pass), implement execute
to produce the same predictable response (e.g., "Hello, this is a fake
response!") wrapped in an AdapterOutcome object along with any minimal trace
entries (using TraceLog) as required by the adapter contract, and ensure you
import or reference DatasetCase, TraceLog, and AdapterOutcome types for proper
typing.

Comment thread examples/basic_evaluation.py Outdated
Comment on lines +25 to +41
def main():
# Step 1 — Prepare the adapter
adapter = FakeAdapter()

# Step 2 — Create the evaluator
evaluator = Evaluator(adapter=adapter)

# Step 3 — Prepare an example prompt
prompt = "Say hello!"

# Step 4 — Run the evaluation
result = evaluator.evaluate(prompt)

# Step 5 — Print the output
print("Prompt:", prompt)
print("Model Output:", result.output)
print("Evaluation Score:", result.score)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Example uses non-existent Evaluator class.

The main() function uses an Evaluator class that doesn't exist in the AgentUnit codebase. Based on the actual API shown in other files, you should use Scenario and Runner instead.

Apply this diff to fix the example workflow:

 def main():
-    # Step 1 — Prepare the adapter
+    # Step 1 — Create test data
+    test_case = DatasetCase(
+        id="hello_1",
+        query="Say hello!",
+        expected_output="Hello, this is a fake response!"
+    )
+
+    # Step 2 — Prepare the adapter
     adapter = FakeAdapter()
 
-    # Step 2 — Create the evaluator
-    evaluator = Evaluator(adapter=adapter)
-
-    # Step 3 — Prepare an example prompt
-    prompt = "Say hello!"
+    # Step 3 — Create a scenario
+    scenario = Scenario(
+        name="Basic Test",
+        adapter=adapter,
+        dataset=[test_case]
+    )
 
-    # Step 4 — Run the evaluation
-    result = evaluator.evaluate(prompt)
+    # Step 4 — Run the scenario
+    from agentunit.core.runner import Runner
+    runner = Runner(scenarios=[scenario])
+    result = runner.run()
 
     # Step 5 — Print the output
-    print("Prompt:", prompt)
-    print("Model Output:", result.output)
-    print("Evaluation Score:", result.score)
+    print("Scenario:", scenario.name)
+    print("Success rate:", f"{len([r for r in result.scenarios[0].runs if r.success])}/{len(result.scenarios[0].runs)}")
+    for run in result.scenarios[0].runs:
+        print(f"  Case {run.case_id}: {'✓' if run.success else '✗'}")
🤖 Prompt for AI Agents
In examples/basic_evaluation.py around lines 25 to 41, the example incorrectly
instantiates a non-existent Evaluator class; replace that workflow to use the
actual API: construct a Scenario (with the prompt and any expected config) and a
Runner (passing the FakeAdapter), run the scenario via Runner.run() to obtain
results, then print prompt, model output and evaluation score from the
Runner/Scenario result; ensure imports for Scenario and Runner are added at the
top and remove the Evaluator usage.

Comment thread README.md
Comment on lines +206 to +219
### Running CI Checks Locally

Before opening a pull request, you can run the same checks locally that are executed in CI.

#### Requirements
- Python **3.10 or higher**
- [Poetry](https://python-poetry.org/) installed

#### Setup
Install dependencies (including dev tools):

```bash
poetry install --with dev
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Section is incomplete - missing CI check commands.

This section addresses issue #3 but is incomplete. According to the acceptance criteria, you need to document how to run:

  • poetry check - verify Poetry config
  • poetry run ruff check . - run linting
  • poetry run ruff format --check . - verify formatting
  • poetry run pytest - run tests

Currently, only the setup (poetry install --with dev) is shown. Additionally, the AI summary mentions this content "appears twice in the diff" - please verify there's no duplication in the README.

Apply this diff to complete the section:

 #### Setup
 Install dependencies (including dev tools):
 
 ```bash
 poetry install --with dev

+#### Verify Your Changes
+
+Run these commands before opening a pull request:
+
+bash +# Verify pyproject.toml and lock file consistency +poetry check + +# Lint code with ruff +poetry run ruff check . + +# Verify code formatting +poetry run ruff format --check . + +# Run all tests +poetry run pytest +
+
+If formatting issues are detected, you can auto-fix them:
+
+bash +poetry run ruff format . +


<details>
<summary>🤖 Prompt for AI Agents</summary>

In README.md around lines 206 to 219, the "Running CI Checks Locally" section
only shows setup but is missing the CI verification commands; add a "Verify Your
Changes" subsection immediately after the existing "poetry install --with dev"
block that lists the required commands (poetry check, poetry run ruff check .,
poetry run ruff format --check ., poetry run pytest) and mention how to auto-fix
formatting with poetry run ruff format ., and also scan the README for any
duplicate copies of this section and remove the duplicate so the instructions
appear only once.


</details>

<!-- fingerprinting:phantom:poseidon:puma -->

<!-- This is an auto-generated comment by CodeRabbit -->

Copy link
Copy Markdown
Owner

@aviralgarg05 aviralgarg05 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@aviralgarg05 aviralgarg05 merged commit 315a9a4 into aviralgarg05:main Dec 19, 2025
12 checks passed
dharapandya85 pushed a commit to dharapandya85/agentunit that referenced this pull request Dec 24, 2025
…05#41)

* Add basic evaluation example script

* Fix typos and improve clarity in docstrings across core modules

* Add Google-style docstrings to BaseAdapter methods

* Format base adapter using ruff

* docs: add instructions for running CI checks locally

* Remove example file unrelated to CI documentation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add quickstart instructions for running CI checks locally

3 participants