Skip to content

add OpenAI Agents SDK zero-code example#72

Merged
krisztianfekete merged 6 commits intoagentevals-dev:mainfrom
shahar-dagan:feat/openai-agents-upstream
Mar 31, 2026
Merged

add OpenAI Agents SDK zero-code example#72
krisztianfekete merged 6 commits intoagentevals-dev:mainfrom
shahar-dagan:feat/openai-agents-upstream

Conversation

@shahar-dagan
Copy link
Copy Markdown
Contributor

Adds a zero-code OTLP integration example for the OpenAI Agents SDK.

What's included

  • examples/zero-code-examples/openai-agents/run.py — a dice-rolling agent wired to OTel with no agentevals SDK imports
  • examples/zero-code-examples/openai-agents/requirements.txt
  • examples/zero-code-examples/openai-agents/eval_set.json — golden multi-turn eval case (greeting, die roll, prime check)
  • TestOpenAIAgentsZeroCode in tests/integration/test_live_agents.py — 3 e2e tests covering session creation, invocation extraction, and API visibility

Notes

  • Uses result.to_input_list() to thread conversation history across turns (preserves tool-call context; raw role/content dicts lose it)
  • force_flush() is in a try/finally so spans are always sent even if a turn raises
  • E2e tests skip automatically when OPENAI_API_KEY is absent

🤖 Generated with Claude Code

shahar-dagan and others added 4 commits March 29, 2026 20:14
Self-contained dice-rolling agent showing zero-code OTLP integration with
openai-agents>=0.3.3 via opentelemetry-instrumentation-openai-agents-v2.
Includes run.py, requirements.txt, a golden multi-turn eval_set.json, and
TestOpenAIAgentsZeroCode e2e tests for session/span/invocation/API verification.

Uses result.to_input_list() for correct conversation context threading across
turns, and try/finally to guarantee force_flush() even on API errors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…example

add OpenAI Agents SDK zero-code example
Self-contained dice-rolling agent showing zero-code OTLP integration with
openai-agents>=0.3.3 via opentelemetry-instrumentation-openai-agents-v2.
Includes run.py, requirements.txt, a golden multi-turn eval_set.json, and
TestOpenAIAgentsZeroCode e2e tests for session/span/invocation/API verification.

Uses result.to_input_list() for correct conversation context threading across
turns, and try/finally to guarantee force_flush() even on API errors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@krisztianfekete krisztianfekete left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR!

Can you please clean it up as per the comments, and make sure to follow our AI usage guidlines mentioned here: https://github.com/agentevals-dev/agentevals/blob/main/CONTRIBUTING.md#responsible-ai-usage?

Comment thread pyproject.toml Outdated
Comment thread CHANGELOG.md Outdated
Comment thread TODOS.md Outdated
Comment thread examples/zero-code-examples/openai-agents/eval_set.json Outdated
Comment thread examples/zero-code-examples/openai-agents/run.py
Comment thread examples/zero-code-examples/openai-agents/run.py Outdated
@shahar-dagan
Copy link
Copy Markdown
Contributor Author

Will do thanks and will push once done.

- remove CHANGELOG.md, TODOS.md, and eval_set.json (not needed)
- revert pyproject.toml version bump to 0.5.2
- reduce comments in openai-agents run.py to match other examples
- remove LoggerProvider: openai-agents instrumentation only emits spans

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@shahar-dagan
Copy link
Copy Markdown
Contributor Author

Done. Lmk if you'd like me to make any other changes!

Comment thread examples/zero-code-examples/openai-agents/run.py Outdated
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@krisztianfekete krisztianfekete merged commit 8b1bb39 into agentevals-dev:main Mar 31, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants