Add CSV export support for SuiteResultAdd csv reports#62
Conversation
Signed-off-by: Jagriti-student <jagriti7989@gmail.com>
Signed-off-by: Jagriti-student <jagriti7989@gmail.com>
Signed-off-by: Jagriti-student <jagriti7989@gmail.com>
Signed-off-by: Jagriti-student <jagriti7989@gmail.com>
Learn moreAll Green is an AI agent that automatically: ✅ Addresses code review comments ✅ Fixes failing CI checks ✅ Resolves merge conflicts |
|
Warning Rate limit exceeded@Jagriti-student has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 14 minutes and 16 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughAdds CSV export to SuiteResult with metric flattening, refactors JUnit output to a single testsuite and adjusts time reporting, enhances Markdown rendering helpers, and updates pyproject.toml (removes numpy, reorders dependencies, adds integration-tests extra). Changes
Sequence Diagram(s)(omitted — changes are internal export additions and formatting; no multi-component sequential flow requiring visualization) Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/agentunit/reporting/results.py (2)
83-95: One-level flattening is intentional but should be documented.The helper only flattens one level of nesting. If
metricscontains deeper structures like{"outer": {"inner": {"deep": 1}}}, theinnerdict becomes a cell value (e.g.,"{'deep': 1}"), which may not be spreadsheet-friendly.If deeper nesting is a realistic scenario, consider making this recursive or adding a docstring clarifying the depth limit.
217-219: Column order is alphabetical — consider prioritizing core columns.Sorting fieldnames alphabetically places
case_idbeforescenario_nameand interleavesmetric_*columns unpredictably. For spreadsheet usability, leading with fixed columns (scenario_name,case_id,success,duration_ms,error) followed by sorted metric columns is more intuitive.Proposed fix
- fieldnames = sorted( - {key for row in rows for key in row.keys()} - ) + base_fields = ["scenario_name", "case_id", "success", "duration_ms", "error"] + metric_fields = sorted( + {key for row in rows for key in row.keys()} - set(base_fields) + ) + fieldnames = base_fields + metric_fields
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (3)
pyproject.tomlsrc/agentunit/reporting/results.pytests/test_reporting.py
💤 Files with no reviewable changes (1)
- tests/test_reporting.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/agentunit/reporting/results.py (3)
src/agentunit/reporting/html.py (1)
render_html_report(11-102)src/agentunit/core/runner.py (1)
run(45-54)src/agentunit/datasets/base.py (1)
name(38-39)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Test (Python 3.12)
- GitHub Check: Test (Python 3.10)
🔇 Additional comments (4)
pyproject.toml (1)
39-41: LGTM — extras group aligns with test markers.The new
integration-testsextras entry correctly groupslanggraphfor optional testing scenarios, matching the pytest marker defined at line 70.src/agentunit/reporting/results.py (3)
133-177: LGTM — JUnit structure is now a single testsuite element.The refactored XML uses a single
<testsuite>with nested<testcase>elements, which is a more standard JUnit format. Time reporting and failure message handling are implemented correctly.
189-226: CSV export implementation looks solid overall.The method correctly:
- Creates parent directories
- Flattens metrics for spreadsheet compatibility
- Uses
DictWriterwith proper encoding and newline handlingMinor improvements noted in separate comments regarding column ordering and empty-file behavior.
268-272: No changes needed—the type annotation is correct.
ScenarioRun.metricsis properly typed asdict[str, float | None], which is consistent with the.2fformat used in the markdown rendering. The_flatten_metricsfunction handles nested dicts, but it's used only in CSV export (line 210), not in the markdown rendering at lines 268-272. The markdown code correctly assumes float or None values.Likely an incorrect or invalid review comment.
| if not rows: | ||
| return target |
There was a problem hiding this comment.
Inconsistent behavior: empty suite doesn't create file.
When there are no runs, this returns early without writing any file. Other export methods (to_json, to_markdown, to_junit) always create the output file even when empty. This inconsistency could surprise callers who expect the file to exist after a successful call.
Proposed fix: create empty CSV with headers or at minimum an empty file
if not rows:
+ target.touch()
return targetOr, to maintain header consistency for empty results, define a fixed set of base fieldnames.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if not rows: | |
| return target | |
| if not rows: | |
| target.touch() | |
| return target |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @src/agentunit/reporting/results.py:
- Line 206: The comprehension building fieldnames uses row.keys (a method
object) instead of calling it; update the expression to call row.keys() so it
reads: fieldnames = sorted({key for row in rows for key in row.keys()}) to avoid
TypeError and correctly collect keys from each row in rows.
🧹 Nitpick comments (1)
src/agentunit/reporting/results.py (1)
203-204: Consider documenting the empty-suite behavior.When there are no runs, the method returns without creating a file. This is reasonable since fieldnames are derived from row data, but callers might expect a file to exist. Consider documenting this in the docstring.
📝 Suggested docstring update
def to_csv(self, path: str | Path) -> Path: """ Export suite results to CSV. One row per scenario run. + + Note: If there are no scenario runs, no file is created. """
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/agentunit/reporting/results.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/agentunit/reporting/results.py (1)
src/agentunit/core/runner.py (1)
run(45-54)
🔇 Additional comments (5)
src/agentunit/reporting/results.py (5)
5-5: LGTM!The new imports (
csvmodule andAnytype) are appropriate for the CSV export functionality being added.Also applies to: 10-10
82-92: Implementation handles single-level nesting only.The helper correctly flattens one level of nested dictionaries. If metrics contain deeper nesting (e.g.,
{"a": {"b": {"c": 1}}}), inner dicts would be serialized as strings rather than flattened. This is likely sufficient for current use cases whereScenarioRun.metricsis typed asdict[str, float | None].
116-119: LGTM!Explicit
encoding="utf-8"is good practice for cross-platform consistency.
131-166: LGTM!The simplified JUnit output with a single
testsuiteroot element is a valid format widely supported by CI tools.
241-261: LGTM!The explicit type hints and formatting improvements enhance readability without changing behavior.

Summary
This PR adds a
to_csv()method toSuiteResultinsrc/agentunit/reporting/results.py.It allows exporting all test results to a CSV file, flattening nested metrics into separate columns
(e.g., metric_ExactMatch, metric_Latency) for easy analysis in spreadsheets.
Changes
_flatten_metrics()helper (already existing but improved if needed)SuiteResult.to_csv(path: str | Path)methodTesting
Verified manually using: