feat: PAAL-233 refactor gauges to differentiate metrics and user_input#12
feat: PAAL-233 refactor gauges to differentiate metrics and user_input#12
Conversation
5fad9e3 to
5eceb19
Compare
45f1fc1 to
e9ad6c0
Compare
7db9c80 to
a5bed35
Compare
|
@claude please review |
|
Claude finished @felixk101's task in 1m 59s —— View job Code Review CompleteI've reviewed the PR that refactors gauge metrics to differentiate between metrics and user_input. Here's my comprehensive feedback: OverviewWhat Changed:
Architecture Change:
Critical Issues1. High Cardinality Risk in OpenTelemetry Metrics 🔴Location: attributes = {
"name": metric_name,
"workflow_name": workflow_name,
"user_input": user_input, # ⚠️ HIGH CARDINALITY
"trace_id": trace_id, # ⚠️ HIGH CARDINALITY
}
metric_gauge.set(score, attributes)Problem: Using Impact:
Best Practices:
Recommendation: # Option 1: Use aggregated metrics + traces for details
# Keep metrics for aggregations only
attributes = {
"name": metric_name,
"workflow_name": workflow_name,
# Don't include user_input or trace_id
}
# Option 2: Use exemplars (if supported by LGTM stack)
# Exemplars let you attach trace IDs to metrics without creating new time seriesAlternative: If you need per-sample visibility, consider:
2. Incomplete OTLP Endpoint MigrationLocation: otlpEndpoint:
default: "http://otel-collector.monitoring.svc.cluster.local:4318"Problem: You changed the default OTLP endpoint from
Questions:
Recommendation: Either:
Code Quality Issues3. Missing Error Handling for Invalid Trace IDsLocation: trace_id = result.get("trace_id", "unknown")Issue: Using
Recommendation: trace_id = result.get("trace_id")
if not trace_id:
logger.warning(f"Missing trace_id for sample: {user_input[:50]}...")
trace_id = "missing-trace-id"4. Silent Skipping of Invalid MetricsLocation: if not _is_metric_value(score):
continueIssue: Invalid metrics are silently skipped. This could hide data quality issues. Recommendation: if not _is_metric_value(score):
logger.debug(f"Skipping invalid metric value for {metric_name} in sample {user_input[:50]}: {score}")
continue5. Inconsistent Error Handling in TestsLocation: The mock provider raises an exception but doesn't use Recommendation: Make mock patterns consistent across all tests. Minor Issues & Suggestions6. Unused Overall ScoresYou removed 7. Metric Naming ConventionLocation: name="testbench_evaluation_metric",Observation: You changed from Recommendations:
8. CI Improvements Are Good ✅The CI fixes in
These should help with the flaky Tilt setup mentioned in the code. 9. Test Coverage Looks Comprehensive ✅The test updates are thorough:
Good job maintaining test quality! 10. Code Style Issue in
|
This is explicitly desired. The issue will be mitigated by a limit to how many times our
This was modified in ba92327 (not part of my changes).
✅ addressed |
a5bed35 to
aa44bff
Compare
| k8s_resource('ai-gateway-litellm', port_forwards=['11001:4000']) | ||
| k8s_resource('weather-agent', port_forwards='11010:8000', labels=['agents'], resource_deps=['agent-runtime']) | ||
| k8s_resource('lgtm', port_forwards=['11000:3000']) | ||
| k8s_resource('lgtm', port_forwards=['11000:3000', '4318:4318']) |
There was a problem hiding this comment.
Why is this port needed?
There was a problem hiding this comment.
I have dropped the commit from Jil's old branch. Fixed.
There was a problem hiding this comment.
Nevermind, this is needed - we want to be able to test run.py locally (See the README). That only works if our locally running python script can send spans to the otel-collector running in our Tiltfile.
2d794d0 to
8eeab05
Compare
8eeab05 to
3ebbe26
Compare
|
I am closing this PR in favor of doing everything all-at-once in #13. This is because dashboard changes are tightly bound to metric changes. |
This branch is based on @qa-jil-kamerling 's feature branch (not main). This PR should stay in draft untilfeature/PAAL-212-trace-ids-in-testworkflowis merged.