feat(metrics): loop metrics tracking and ralph-stats analytics (#21)#238
Conversation
…mmand (#21) - Add track_metrics() to ralph_loop.sh: appends JSON Lines to .ralph/logs/metrics.jsonl with timestamp, loop number, duration, success, and API call count - Integrate timing in main loop: captures epoch seconds before/after execute_claude_code - Add print_metrics_summary() called on graceful exit to log aggregate stats - Create ralph-stats.sh: standalone analytics command using jq to compute total_loops, successful, avg_duration, and total_calls from metrics.jsonl - Update install.sh to install ralph-stats wrapper and copy ralph-stats.sh - Add 4 BATS unit tests covering all acceptance criteria - Update integration install tests with ralph-stats mock and assertions
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughThis PR implements metrics tracking and analytics for Ralph loop execution. It introduces Changes
Sequence DiagramsequenceDiagram
actor User
participant RalphLoop as Ralph Loop<br/>(ralph_loop.sh)
participant MetricsFile as metrics.jsonl
participant jq as jq
participant StatsCmd as ralph-stats<br/>Command
User->>RalphLoop: Start Ralph execution
loop Each loop iteration
RalphLoop->>RalphLoop: Execute Claude code
RalphLoop->>RalphLoop: Measure duration & success
RalphLoop->>RalphLoop: track_metrics(...)
RalphLoop->>MetricsFile: Append JSONL line
end
RalphLoop->>RalphLoop: On exit, print_metrics_summary()
RalphLoop->>MetricsFile: Read all records
RalphLoop->>jq: jq -s (aggregate)
jq-->>RalphLoop: Aggregated JSON
RalphLoop->>User: Output summary
User->>StatsCmd: Run ralph-stats
StatsCmd->>MetricsFile: Read metrics.jsonl
StatsCmd->>jq: jq -s (aggregate)
jq-->>StatsCmd: Aggregated JSON
StatsCmd-->>User: Display analytics
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Review: feat(metrics): loop metrics tracking and ralph-stats analyticsThis is a clean, focused implementation. The code follows existing patterns well and the TDD approach with 4 new tests and updated integration tests is solid. A few observations: Minor IssuesNo. 1 — Semantic issue: In both total_calls: (map(.calls) | max // 0)The PR description notes this is intentional — "total_calls = max(calls) from metrics file (cumulative counter at last loop)". Since total_calls: (.[length-1].calls // 0)No. 2 — In local loop_duration=$(( $(get_epoch_seconds) - loop_start_epoch ))Combining local loop_duration
loop_duration=$(( $(get_epoch_seconds) - loop_start_epoch ))Test Coverage GapThe 4 tests cover the happy path well. Consider adding edge cases in a follow-on:
Positive Notes
Overall this is a well-structured, minimal implementation that fits the codebase conventions. The semantic |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
ralph_loop.sh (1)
2071-2071: Address SC2155: Declare and assign separately.The static analysis correctly flags that declaring and assigning in one statement masks the return value of
cat. While the fallback|| echo "0"mitigates risk here, separating declaration and assignment follows best practices.🔧 Separate declaration from assignment
- local calls_made=$(cat "$CALL_COUNT_FILE" 2>/dev/null || echo "0") + local calls_made + calls_made=$(cat "$CALL_COUNT_FILE" 2>/dev/null || echo "0")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ralph_loop.sh` at line 2071, The line combining declaration and command substitution should be split to satisfy SC2155: first declare the local variable (local calls_made) and then assign it on the next line using the command substitution that reads CALL_COUNT_FILE with the fallback; update the code around the existing variable and CALL_COUNT_FILE usage (variable name calls_made and CALL_COUNT_FILE) so the assignment is done separately and still preserves the "0" fallback and redirects stderr to /dev/null.tests/unit/test_metrics_tracking.bats (1)
94-107: print_metrics_summary test could be more specific.The test checks for presence of
"total_loops"and"2"in output, but these are substring matches that could pass for unrelated reasons. Consider checking specific JSON field values.💡 More specific assertions
run print_metrics_summary assert_success - [[ "$output" == *"total_loops"* ]] - [[ "$output" == *"2"* ]] + # Output contains JSON summary - extract and validate + [[ "$output" == *'"total_loops": 2'* ]] || [[ "$output" == *'"total_loops":2'* ]]Or parse the JSON from output for precise validation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/test_metrics_tracking.bats` around lines 94 - 107, The test uses loose substring checks for print_metrics_summary output; instead parse the command output as JSON and assert exact field values—after running print_metrics_summary (function print_metrics_summary) load "$output" and use a JSON parser (e.g. jq) to assert fields like .total_loops == 2 and other expected fields (e.g. .total_calls, .avg_duration) rather than checking for the substrings "total_loops" and "2"; update the assertions to read from the generated "$LOG_DIR/metrics.jsonl" fixture or the captured "$output" and perform exact equality checks on the JSON fields.tests/integration/test_installation.bats (1)
557-606: Consider adding ralph-stats to the end-to-end workflow test.The "complete installation workflow end-to-end" test verifies most commands but doesn't explicitly check for
ralph-stats. While other tests cover it, adding assertions here would provide more comprehensive E2E coverage.📝 Optional: Add ralph-stats verification
# Verify all commands installed assert_file_exists "$TEST_INSTALL_DIR/ralph" assert_file_exists "$TEST_INSTALL_DIR/ralph-monitor" assert_file_exists "$TEST_INSTALL_DIR/ralph-setup" assert_file_exists "$TEST_INSTALL_DIR/ralph-import" assert_file_exists "$TEST_INSTALL_DIR/ralph-migrate" + assert_file_exists "$TEST_INSTALL_DIR/ralph-stats"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/test_installation.bats` around lines 557 - 606, Add checks for the ralph-stats binary to the "complete installation workflow end-to-end" test: in the "Verify all commands installed" block add an assertion to check the file exists (referencing the TEST_INSTALL_DIR/ralph-stats), and in the "Verify all permissions correct" block add an executable permission test for TEST_INSTALL_DIR/ralph-stats; update the test named "complete installation workflow end-to-end" so these two assertions run alongside the other command existence and permission checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/test_metrics_tracking.bats`:
- Around line 60-74: The per-line JSON validation loop in the "track_metrics:
appends one entry per loop iteration" test silently ignores jq failures; update
the while-read loop that iterates over "$LOG_DIR/metrics.jsonl" (used after
calling track_metrics) so each line is validated with jq in a way that
propagates failures (e.g., use jq -e . or check jq's exit status) and cause the
test to fail if any line is invalid JSON; ensure the loop exits non-zero on the
first jq error so the test correctly fails when track_metrics produces invalid
JSON.
- Around line 42-58: The jq validation in the test "track_metrics: writes valid
JSON Lines to metrics.jsonl" silently ignores parse errors because its output is
redirected to /dev/null and stderr is merged (the jq exit code can be lost);
update the test in tests/unit/test_metrics_tracking.bats so the jq invocation
for JSON parsing (used after reading "$LOG_DIR/metrics.jsonl") does not redirect
stderr/stdout to /dev/null, allowing jq to return a non-zero exit and cause the
BATS test to fail on invalid JSON—locate the jq call in that test and remove the
redirection so failures are propagated.
---
Nitpick comments:
In `@ralph_loop.sh`:
- Line 2071: The line combining declaration and command substitution should be
split to satisfy SC2155: first declare the local variable (local calls_made) and
then assign it on the next line using the command substitution that reads
CALL_COUNT_FILE with the fallback; update the code around the existing variable
and CALL_COUNT_FILE usage (variable name calls_made and CALL_COUNT_FILE) so the
assignment is done separately and still preserves the "0" fallback and redirects
stderr to /dev/null.
In `@tests/integration/test_installation.bats`:
- Around line 557-606: Add checks for the ralph-stats binary to the "complete
installation workflow end-to-end" test: in the "Verify all commands installed"
block add an assertion to check the file exists (referencing the
TEST_INSTALL_DIR/ralph-stats), and in the "Verify all permissions correct" block
add an executable permission test for TEST_INSTALL_DIR/ralph-stats; update the
test named "complete installation workflow end-to-end" so these two assertions
run alongside the other command existence and permission checks.
In `@tests/unit/test_metrics_tracking.bats`:
- Around line 94-107: The test uses loose substring checks for
print_metrics_summary output; instead parse the command output as JSON and
assert exact field values—after running print_metrics_summary (function
print_metrics_summary) load "$output" and use a JSON parser (e.g. jq) to assert
fields like .total_loops == 2 and other expected fields (e.g. .total_calls,
.avg_duration) rather than checking for the substrings "total_loops" and "2";
update the assertions to read from the generated "$LOG_DIR/metrics.jsonl"
fixture or the captured "$output" and perform exact equality checks on the JSON
fields.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6d998b11-3098-497c-af69-beaf6506314d
📒 Files selected for processing (6)
CLAUDE.mdinstall.shralph-stats.shralph_loop.shtests/integration/test_installation.batstests/unit/test_metrics_tracking.bats
- Remove 2>&1 on jq validation so failures surface in tests - Separate local declaration from assignment for SC2155 in ralph_loop.sh
|
Review: fix(metrics): address CodeRabbit review feedback The implementation is clean and follows existing conventions well. The TDD approach with 4 unit tests and updated integration tests is solid. Good to see the CodeRabbit feedback being worked through iteratively. A few items still remaining. Still needs attention No. 1 - Weak assertion in print_metrics_summary test (CodeRabbit raised this) In tests/unit/test_metrics_tracking.bats lines 103-104, the substring match on "2" is too loose - it could match a timestamp like "2025-01-01". Since print_metrics_summary outputs structured JSON, extract and assert the specific field value using jq or a tighter match pattern (e.g., checking for "total_loops":2 specifically). No. 2 - SC2155: combined local declaration and command substitution (CodeRabbit raised this) In the metrics block added to ralph_loop.sh, the line New concern No. 3 - No rotation for metrics.jsonl ralph.log has rotation via rotate_logs() in lib/log_utils.sh (10 MB threshold, 4 archives). metrics.jsonl will grow unbounded. After thousands of loops on a long-running project, jq -s in print_metrics_summary and ralph-stats.sh slurps the entire file into memory on each invocation. Consider either adding rotation for metrics.jsonl in rotate_logs(), or capping summarization to the last N entries (e.g., tail -1000 before piping to jq). Acknowledged from prior review No. 4 - total_calls uses max (raised previously) The PR description documents this: "total_calls in summary = max(calls) from metrics file (cumulative counter at last loop)." Since the call counter is monotonically increasing within a run, max is equivalent to last. Acceptable as-is, but a short inline comment in ralph-stats.sh would clarify intent for future readers. Overall this is a well-structured feature. Resolving items 1 and 2 (both already flagged by CodeRabbit) would close out the feedback loop before merge. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ralph_loop.sh`:
- Around line 2074-2087: The code reads CALL_COUNT_FILE (hourly-reset) when
calling track_metrics, causing total_calls to drop; change to use a persistent
cumulative counter (e.g. TOTAL_CALLS_FILE) that is not rotated hourly: after
execute_claude_code and determining exec_result, increment the persistent
counter when a call occurs (or on success/failure as desired), persist it to
TOTAL_CALLS_FILE, then read that persistent value and pass it into track_metrics
instead of reading CALL_COUNT_FILE; update references around
execute_claude_code, exec_result, and track_metrics to use and maintain
TOTAL_CALLS_FILE.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 895abb1c-513e-40e1-979a-eec287c13692
📒 Files selected for processing (2)
ralph_loop.shtests/unit/test_metrics_tracking.bats
✅ Files skipped from review due to trivial changes (1)
- tests/unit/test_metrics_tracking.bats
| # Capture loop start time for duration tracking (Issue #21) | ||
| local loop_start_epoch | ||
| loop_start_epoch=$(get_epoch_seconds) | ||
|
|
||
| # Execute Claude Code | ||
| execute_claude_code "$loop_count" | ||
| local exec_result=$? | ||
|
|
||
|
|
||
| # Record metrics for this loop (Issue #21) | ||
| local loop_duration | ||
| loop_duration=$(( $(get_epoch_seconds) - loop_start_epoch )) | ||
| local loop_success="false" | ||
| [ $exec_result -eq 0 ] && loop_success="true" | ||
| track_metrics "$loop_count" "$loop_duration" "$loop_success" "$(cat "$CALL_COUNT_FILE" 2>/dev/null || echo 0)" |
There was a problem hiding this comment.
calls metric is not cumulative across hourly resets.
At Line 2087, the value passed to track_metrics is read from .call_count, which is reset every hour. That breaks the cumulative-call expectation and can underreport total_calls analytics.
💡 Proposed fix
@@
- while true; do
+ local total_calls_lifetime=0
+ while true; do
@@
- local calls_made=$(cat "$CALL_COUNT_FILE" 2>/dev/null || echo "0")
+ local calls_made
+ calls_made=$(cat "$CALL_COUNT_FILE" 2>/dev/null || echo "0")
update_status "$loop_count" "$calls_made" "executing" "running"
@@
- execute_claude_code "$loop_count"
+ local calls_before_exec="$calls_made"
+ execute_claude_code "$loop_count"
local exec_result=$?
@@
- track_metrics "$loop_count" "$loop_duration" "$loop_success" "$(cat "$CALL_COUNT_FILE" 2>/dev/null || echo 0)"
+ local calls_after_exec
+ calls_after_exec=$(cat "$CALL_COUNT_FILE" 2>/dev/null || echo "0")
+ local calls_delta=0
+ if [[ "$calls_after_exec" -ge "$calls_before_exec" ]]; then
+ calls_delta=$((calls_after_exec - calls_before_exec))
+ else
+ # hourly reset occurred
+ calls_delta=$calls_after_exec
+ fi
+ total_calls_lifetime=$((total_calls_lifetime + calls_delta))
+ track_metrics "$loop_count" "$loop_duration" "$loop_success" "$total_calls_lifetime"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ralph_loop.sh` around lines 2074 - 2087, The code reads CALL_COUNT_FILE
(hourly-reset) when calling track_metrics, causing total_calls to drop; change
to use a persistent cumulative counter (e.g. TOTAL_CALLS_FILE) that is not
rotated hourly: after execute_claude_code and determining exec_result, increment
the persistent counter when a call occurs (or on success/failure as desired),
persist it to TOTAL_CALLS_FILE, then read that persistent value and pass it into
track_metrics instead of reading CALL_COUNT_FILE; update references around
execute_claude_code, exec_result, and track_metrics to use and maintain
TOTAL_CALLS_FILE.
… hourly resets - Track calls_before_exec and compute calls_this_loop as delta instead of reading cumulative CALL_COUNT_FILE (which resets hourly) - Update ralph-stats to sum .calls instead of max for true lifetime total - Update test assertion: total_calls = sum(2+5+7) = 14
Review: fix(metrics): per-loop call delta for total_callsGood progress on this PR. The per-loop call delta approach (using Bug:
|
There was a problem hiding this comment.
♻️ Duplicate comments (1)
ralph_loop.sh (1)
553-558:⚠️ Potential issue | 🟠 Major
total_callsaggregation is incorrect (uses max instead of sum).Line 557 currently returns only the largest per-loop value, which underreports aggregate calls when metrics store per-loop deltas.
💡 Suggested fix
summary=$(jq -s '{ total_loops: length, successful: (map(select(.success==true)) | length), avg_duration: (if length > 0 then (map(.duration) | add) / length else 0 end), - total_calls: (map(.calls) | max // 0) + total_calls: (map(.calls // 0) | add // 0) }' "$metrics_file" 2>/dev/null)#!/bin/bash set -euo pipefail tmp_file="$(mktemp)" cat > "$tmp_file" <<'EOF' {"calls":2} {"calls":5} {"calls":7} EOF echo "Current logic (max):" jq -s '{total_calls:(map(.calls) | max // 0)}' "$tmp_file" echo "Expected logic (sum):" jq -s '{total_calls:(map(.calls // 0) | add // 0)}' "$tmp_file" rm -f "$tmp_file"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ralph_loop.sh` around lines 553 - 558, The aggregation for total_calls in the jq -s block is using max which undercounts; change the expression for total_calls to sum per-loop call values and handle missing values and empty arrays (e.g. replace (map(.calls) | max // 0) with (map(.calls // 0) | add // 0)) so the summary variable correctly totals calls from metrics_file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@ralph_loop.sh`:
- Around line 553-558: The aggregation for total_calls in the jq -s block is
using max which undercounts; change the expression for total_calls to sum
per-loop call values and handle missing values and empty arrays (e.g. replace
(map(.calls) | max // 0) with (map(.calls // 0) | add // 0)) so the summary
variable correctly totals calls from metrics_file.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 27b06437-c812-4d47-bc6f-c91dd5233614
📒 Files selected for processing (3)
ralph-stats.shralph_loop.shtests/unit/test_metrics_tracking.bats
✅ Files skipped from review due to trivial changes (1)
- ralph-stats.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/unit/test_metrics_tracking.bats
PR Review: feat(metrics): loop metrics tracking and ralph-stats analyticsThis is a clean, well-scoped implementation. The architecture follows existing patterns, the tests cover the main behaviors, and the installation wiring is consistent with how other commands are set up. A few things worth addressing: Bug: Off-by-one in
|
Summary
Implements #21: [P3] Phase 3.7: Implement metrics and analytics
track_metrics()toralph_loop.sh— appends a JSON Lines entry to.ralph/logs/metrics.jsonlafter each loop with timestamp, loop number, duration (seconds), success flag, and cumulative API call countget_epoch_seconds()before/afterexecute_claude_code()in the main loop to compute durationprint_metrics_summary()called on graceful exit — logs aggregate stats (total_loops, successful, avg_duration, total_calls)ralph-stats.shstandalone script — readsmetrics.jsonland outputs a JSON summary viajqinstall.shto installralph-statswrapper command and copyralph-stats.shto~/.ralph/Acceptance Criteria
ralph-statscommand worksTest Plan
tests/unit/test_metrics_tracking.bats(TDD — written before implementation)ralph-statsmock, executable check, and uninstall assertionImplementation Notes
track_metrics()placed afterincrement_call_counter()(~line 529) following codebase conventionsget_iso_timestamp()andget_epoch_seconds()from existinglib/date_utils.shralph-stats.shuses the same wrapper pattern as all other Ralph commands ininstall.shtotal_callsin summary =max(calls)from metrics file (cumulative counter at last loop)Closes #21
Summary by CodeRabbit
New Features
ralph-statscommand providing metrics summaries of loop execution analytics, including total loops, success count, average duration, and total calls madeTests