Add quickstart instructions for running CI checks locally#41
Conversation
Learn moreAll Green is an AI agent that automatically: ✅ Addresses code review comments ✅ Fixes failing CI checks ✅ Resolves merge conflicts |
📝 WalkthroughWalkthroughThis PR enhances documentation across the project by adding a new example script demonstrating basic evaluation workflow and expanding docstrings throughout core modules with clarified parameter descriptions and behaviors, without altering any runtime logic or control flow. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Pre-merge checks and finishing touches❌ Failed checks (4 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (1)
src/agentunit/core/scenario.py (1)
82-84: Inconsistent docstring coverage across factory methods.Only
from_crewaiandfrom_autogenhave docstrings, while similar factory methods likeload_langgraph,from_openai_agents,from_haystack, etc., lack documentation. Consider applying docstrings consistently across all factory methods for uniform API clarity.Also applies to: 100-102
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
README.md(1 hunks)examples/basic_evaluation.py(1 hunks)src/agentunit/adapters/base.py(1 hunks)src/agentunit/core/__init__.py(1 hunks)src/agentunit/core/exceptions.py(1 hunks)src/agentunit/core/replay.py(2 hunks)src/agentunit/core/runner.py(1 hunks)src/agentunit/core/scenario.py(4 hunks)src/agentunit/core/trace.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
examples/basic_evaluation.py (1)
src/agentunit/adapters/base.py (1)
BaseAdapter(28-69)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Test (Python 3.10)
🔇 Additional comments (9)
src/agentunit/core/runner.py (1)
1-3: Docstring formatting change unrelated to PR objective.This docstring formatting improvement is fine, but it's unrelated to issue #3, which is specifically about adding CI instructions to the README. Consider keeping PRs focused on a single objective to make reviews easier.
src/agentunit/core/trace.py (1)
1-3: Docstring formatting changes unrelated to PR objective.These docstring improvements are appropriate, but they're outside the scope of issue #3 (adding CI instructions). Consider submitting documentation improvements separately from feature additions.
Also applies to: 16-18, 27-29
src/agentunit/core/exceptions.py (1)
1-3: Docstring formatting changes unrelated to PR objective.These docstring improvements follow best practices, but they're unrelated to issue #3. Please keep PRs focused on their stated objectives.
Also applies to: 9-11, 15-17, 21-23
src/agentunit/core/replay.py (1)
1-3: Docstring formatting changes unrelated to PR objective.These changes are fine but outside the scope of issue #3. Consider grouping documentation improvements into a separate PR.
Also applies to: 13-15
src/agentunit/adapters/base.py (1)
35-42: Docstring enhancements unrelated to PR objective.These docstring improvements add valuable detail to the BaseAdapter interface, but they're outside the scope of issue #3. Consider separating documentation improvements from feature work.
Also applies to: 46-55, 58-66
src/agentunit/core/__init__.py (1)
1-3: Docstring formatting change unrelated to PR objective.This formatting change is fine but outside the scope of issue #3. Please keep PRs focused on their stated objectives for easier review and clearer git history.
src/agentunit/core/scenario.py (3)
1-3: Module docstring improved for clarity.The expanded module docstring clearly describes the purpose of the module. This change improves discoverability and onboarding for contributors.
24-26: Class docstring concise and descriptive.The expanded
Scenarioclass docstring clearly conveys its purpose without being verbose. Good improvement for API documentation.
1-283: Verify PR scope: Documentation vs. CI instructions mismatch.The PR objectives state the goal is to add quickstart CI instructions to README.md or CONTRIBUTING.md with specific Poetry commands. However, the provided file contains only docstring improvements to a Python module. Please clarify:
- Are there other files (README.md, CONTRIBUTING.md, or example scripts) included in this PR that were not provided for review?
- Does this PR scope include both docstring improvements and CI documentation, or is there a mismatch?
The code changes themselves are sound, but confirming alignment with stated PR objectives will ensure completeness.
| from agentunit.core.adapters import BaseAdapter | ||
| from agentunit.core.evaluator import Evaluator |
There was a problem hiding this comment.
Critical: Import paths are incorrect - example will fail.
These imports are incorrect and will cause ImportError:
- Line 10:
BaseAdapteris inagentunit.adapters.base, notagentunit.core.adapters - Line 11: There's no
Evaluatorclass inagentunit.core.evaluatorbased on the provided codebase
Apply this diff to fix the imports:
-from agentunit.core.adapters import BaseAdapter
-from agentunit.core.evaluator import Evaluator
+from agentunit.adapters.base import BaseAdapter, AdapterOutcome
+from agentunit.core.scenario import Scenario
+from agentunit.datasets.base import DatasetCase
+from agentunit.core.trace import TraceLog📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from agentunit.core.adapters import BaseAdapter | |
| from agentunit.core.evaluator import Evaluator | |
| from agentunit.adapters.base import BaseAdapter, AdapterOutcome | |
| from agentunit.core.scenario import Scenario | |
| from agentunit.datasets.base import DatasetCase | |
| from agentunit.core.trace import TraceLog |
🤖 Prompt for AI Agents
In examples/basic_evaluation.py around lines 10-11, the current imports point to
non-existent modules and will raise ImportError; change the imports to use the
correct modules by importing BaseAdapter from agentunit.adapters.base and
importing Evaluator from the correct top-level evaluator module
(agentunit.evaluator) so the file reads imports from agentunit.adapters.base
import BaseAdapter and from agentunit.evaluator import Evaluator.
| class FakeAdapter(BaseAdapter): | ||
| """ | ||
| A simple mock adapter used only for demonstration. | ||
| It returns a predictable output so evaluation is easy to understand. | ||
| """ | ||
|
|
||
| def generate(self, prompt: str) -> str: | ||
| # Always returns the same answer for simplicity | ||
| return "Hello, this is a fake response!" |
There was a problem hiding this comment.
Critical: FakeAdapter doesn't implement BaseAdapter interface.
The FakeAdapter doesn't properly implement the BaseAdapter abstract interface. According to src/agentunit/adapters/base.py, adapters must implement:
prepare() -> Noneexecute(case: DatasetCase, trace: TraceLog) -> AdapterOutcomecleanup() -> None
Your generate(prompt: str) -> str method doesn't match this interface.
Apply this diff to fix the implementation:
class FakeAdapter(BaseAdapter):
"""
A simple mock adapter used only for demonstration.
It returns a predictable output so evaluation is easy to understand.
"""
- def generate(self, prompt: str) -> str:
- # Always returns the same answer for simplicity
- return "Hello, this is a fake response!"
+ def prepare(self) -> None:
+ """No setup required for the fake adapter."""
+ pass
+
+ def execute(self, case: DatasetCase, trace: TraceLog) -> AdapterOutcome:
+ """Return a fixed response for demonstration."""
+ trace.record("agent_response", content="Hello, this is a fake response!")
+ return AdapterOutcome(
+ success=True,
+ output="Hello, this is a fake response!"
+ )
+
+ def cleanup(self) -> None:
+ """No cleanup required for the fake adapter."""
+ pass📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| class FakeAdapter(BaseAdapter): | |
| """ | |
| A simple mock adapter used only for demonstration. | |
| It returns a predictable output so evaluation is easy to understand. | |
| """ | |
| def generate(self, prompt: str) -> str: | |
| # Always returns the same answer for simplicity | |
| return "Hello, this is a fake response!" | |
| class FakeAdapter(BaseAdapter): | |
| """ | |
| A simple mock adapter used only for demonstration. | |
| It returns a predictable output so evaluation is easy to understand. | |
| """ | |
| def prepare(self) -> None: | |
| """No setup required for the fake adapter.""" | |
| pass | |
| def execute(self, case: DatasetCase, trace: TraceLog) -> AdapterOutcome: | |
| """Return a fixed response for demonstration.""" | |
| trace.record("agent_response", content="Hello, this is a fake response!") | |
| return AdapterOutcome( | |
| success=True, | |
| output="Hello, this is a fake response!" | |
| ) | |
| def cleanup(self) -> None: | |
| """No cleanup required for the fake adapter.""" | |
| pass |
🤖 Prompt for AI Agents
In examples/basic_evaluation.py around lines 14 to 22, FakeAdapter currently
defines generate(prompt: str) -> str which does not match the BaseAdapter
interface; replace it with implementations for prepare(self) -> None,
execute(self, case: DatasetCase, trace: TraceLog) -> AdapterOutcome, and
cleanup(self) -> None: make prepare and cleanup no-ops (pass), implement execute
to produce the same predictable response (e.g., "Hello, this is a fake
response!") wrapped in an AdapterOutcome object along with any minimal trace
entries (using TraceLog) as required by the adapter contract, and ensure you
import or reference DatasetCase, TraceLog, and AdapterOutcome types for proper
typing.
| def main(): | ||
| # Step 1 — Prepare the adapter | ||
| adapter = FakeAdapter() | ||
|
|
||
| # Step 2 — Create the evaluator | ||
| evaluator = Evaluator(adapter=adapter) | ||
|
|
||
| # Step 3 — Prepare an example prompt | ||
| prompt = "Say hello!" | ||
|
|
||
| # Step 4 — Run the evaluation | ||
| result = evaluator.evaluate(prompt) | ||
|
|
||
| # Step 5 — Print the output | ||
| print("Prompt:", prompt) | ||
| print("Model Output:", result.output) | ||
| print("Evaluation Score:", result.score) |
There was a problem hiding this comment.
Critical: Example uses non-existent Evaluator class.
The main() function uses an Evaluator class that doesn't exist in the AgentUnit codebase. Based on the actual API shown in other files, you should use Scenario and Runner instead.
Apply this diff to fix the example workflow:
def main():
- # Step 1 — Prepare the adapter
+ # Step 1 — Create test data
+ test_case = DatasetCase(
+ id="hello_1",
+ query="Say hello!",
+ expected_output="Hello, this is a fake response!"
+ )
+
+ # Step 2 — Prepare the adapter
adapter = FakeAdapter()
- # Step 2 — Create the evaluator
- evaluator = Evaluator(adapter=adapter)
-
- # Step 3 — Prepare an example prompt
- prompt = "Say hello!"
+ # Step 3 — Create a scenario
+ scenario = Scenario(
+ name="Basic Test",
+ adapter=adapter,
+ dataset=[test_case]
+ )
- # Step 4 — Run the evaluation
- result = evaluator.evaluate(prompt)
+ # Step 4 — Run the scenario
+ from agentunit.core.runner import Runner
+ runner = Runner(scenarios=[scenario])
+ result = runner.run()
# Step 5 — Print the output
- print("Prompt:", prompt)
- print("Model Output:", result.output)
- print("Evaluation Score:", result.score)
+ print("Scenario:", scenario.name)
+ print("Success rate:", f"{len([r for r in result.scenarios[0].runs if r.success])}/{len(result.scenarios[0].runs)}")
+ for run in result.scenarios[0].runs:
+ print(f" Case {run.case_id}: {'✓' if run.success else '✗'}")🤖 Prompt for AI Agents
In examples/basic_evaluation.py around lines 25 to 41, the example incorrectly
instantiates a non-existent Evaluator class; replace that workflow to use the
actual API: construct a Scenario (with the prompt and any expected config) and a
Runner (passing the FakeAdapter), run the scenario via Runner.run() to obtain
results, then print prompt, model output and evaluation score from the
Runner/Scenario result; ensure imports for Scenario and Runner are added at the
top and remove the Evaluator usage.
| ### Running CI Checks Locally | ||
|
|
||
| Before opening a pull request, you can run the same checks locally that are executed in CI. | ||
|
|
||
| #### Requirements | ||
| - Python **3.10 or higher** | ||
| - [Poetry](https://python-poetry.org/) installed | ||
|
|
||
| #### Setup | ||
| Install dependencies (including dev tools): | ||
|
|
||
| ```bash | ||
| poetry install --with dev | ||
| ``` |
There was a problem hiding this comment.
Critical: Section is incomplete - missing CI check commands.
This section addresses issue #3 but is incomplete. According to the acceptance criteria, you need to document how to run:
poetry check- verify Poetry configpoetry run ruff check .- run lintingpoetry run ruff format --check .- verify formattingpoetry run pytest- run tests
Currently, only the setup (poetry install --with dev) is shown. Additionally, the AI summary mentions this content "appears twice in the diff" - please verify there's no duplication in the README.
Apply this diff to complete the section:
#### Setup
Install dependencies (including dev tools):
```bash
poetry install --with dev+#### Verify Your Changes
+
+Run these commands before opening a pull request:
+
+bash +# Verify pyproject.toml and lock file consistency +poetry check + +# Lint code with ruff +poetry run ruff check . + +# Verify code formatting +poetry run ruff format --check . + +# Run all tests +poetry run pytest +
+
+If formatting issues are detected, you can auto-fix them:
+
+bash +poetry run ruff format . +
<details>
<summary>🤖 Prompt for AI Agents</summary>
In README.md around lines 206 to 219, the "Running CI Checks Locally" section
only shows setup but is missing the CI verification commands; add a "Verify Your
Changes" subsection immediately after the existing "poetry install --with dev"
block that lists the required commands (poetry check, poetry run ruff check .,
poetry run ruff format --check ., poetry run pytest) and mention how to auto-fix
formatting with poetry run ruff format ., and also scan the README for any
duplicate copies of this section and remove the duplicate so the instructions
appear only once.
</details>
<!-- fingerprinting:phantom:poseidon:puma -->
<!-- This is an auto-generated comment by CodeRabbit -->
…05#41) * Add basic evaluation example script * Fix typos and improve clarity in docstrings across core modules * Add Google-style docstrings to BaseAdapter methods * Format base adapter using ruff * docs: add instructions for running CI checks locally * Remove example file unrelated to CI documentation

This PR adds a new section to README.md explaining how contributors can
run the same lint, format, and test checks locally as CI.
This helps beginners validate changes before opening a PR.
closes #3
Summary by CodeRabbit
New Features
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.