Skip to content

feature/PAAL-212-trace-ids-in-testworkflow#11

Merged
qa-jil-kamerling merged 12 commits intomainfrom
feature/PAAL-212-trace-ids-in-testworkflow
Dec 15, 2025
Merged

feature/PAAL-212-trace-ids-in-testworkflow#11
qa-jil-kamerling merged 12 commits intomainfrom
feature/PAAL-212-trace-ids-in-testworkflow

Conversation

@qa-jil-kamerling
Copy link
Contributor

@qa-jil-kamerling qa-jil-kamerling commented Dec 11, 2025

OTEL Tracing Implementation

Phase 2: test run

  • Created otel_setup.py module and instrumented run.py to wrap each agent query in a span with trace context propagation via HTTPX, enabling end-to-end trace visualization in Tempo/Grafana
  • Captured trace IDs in experiment results

Phase 3: evaluation

  • Each test case in ragas_experiment.jsonl now includes a trace_id field linking evaluation results to distributed traces,
    enabling correlation between test execution and agent behavior
  • Fixed trace ID preservation through evaluation
  • Modified evaluate.py to manually preserve trace IDs when RAGAS processes experiment data (RAGAS drops custom fields), ensuring evaluation_scores.json maintains trace context for debugging failed tests

Dependencies added: opentelemetry-api, opentelemetry-sdk, opentelemetry-exporter-otlp-proto-http, opentelemetry-instrumentation-httpx

@qa-jil-kamerling qa-jil-kamerling marked this pull request as ready for review December 12, 2025 10:39
Comment on lines +49 to +81
# Required environment variable for local testing
export OPENAI_API_BASE="http://localhost:11001" # AI Gateway endpoint
export GOOGLE_API_KEY="your-api-key" # Required for Gemini models
```

### Running the 4-Phase Pipeline Locally

```shell
# Phase 1: Download and convert dataset to RAGAS format
uv run python3 scripts/setup.py "http://localhost:11020/dataset.csv"

# Phase 2: Execute queries through agent via A2A protocol
uv run python3 scripts/run.py "http://localhost:11010"

# Phase 3: Evaluate responses using RAGAS metrics
uv run python3 scripts/evaluate.py gemini-2.5-flash-lite "faithfulness answer_relevancy"

# Phase 4: Publish metrics to OTLP endpoint
uv run python3 scripts/publish.py "workflow-name"
```

### Testkube Execution

```shell
# Run complete evaluation workflow in Kubernetes
kubectl testkube run testworkflow ragas-evaluation-workflow \
--config datasetUrl="http://data-server.data-server:8000/dataset.csv" \
--config agentUrl="http://weather-agent.sample-agents:8000" \
--config metrics="nv_accuracy context_recall" \
--config workflowName="Test-Run" \
-n testkube

# Watch workflow execution
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ggf. magst du einfach @README.md verwenden

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Danke, aber wir hatten uns entschieden Claude.mds und readmes zu trennen


# Save to file
with open(output_file, "w") as f:
json.dump(asdict(evaluation_scores), f, indent=2)
Copy link
Contributor

@felixk101 felixk101 Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ich hatte mit dem Code schonmal Probleme. Es wird manchmal NaN als Zahl in die JSON-Outputdatei geschrieben. Das ist allerdings nicht valides JSON. 😕

In meinem Code musste ich in publish.py folgendes einbauen, was nicht so schön ist.

def _is_metric_value(value: Any) -> TypeGuard[int | float]:
    """Check if a value is a valid metric score (numeric and not NaN)."""
    if not isinstance(value, (int, float)):
        return False
    if isinstance(value, float) and math.isnan(value):
        return False
    return True

(Das ist aber nicht Teil des Tickets)

FYI @fmallmann

@qa-jil-kamerling qa-jil-kamerling merged commit b79bfe5 into main Dec 15, 2025
6 checks passed
@qa-jil-kamerling qa-jil-kamerling deleted the feature/PAAL-212-trace-ids-in-testworkflow branch December 15, 2025 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants