feat: add honeyhive-tracing and honeyhive-evaluators skills for coding agents by devin-ai-integration[bot] · Pull Request #301 · honeyhiveai/python-sdk

devin-ai-integration · 2026-03-04T04:37:20Z

feat: add honeyhive-tracing and honeyhive-evaluators agent skills

Summary

Adds two new reusable SKILL.md files under .agents/skills/ that coding agents can invoke when integrating HoneyHive tracing or evaluators into Python applications. Content is grounded in the mintlify docs (fix/docs-concepts branch) and the SDK implementation on this branch.

honeyhive-tracing/SKILL.md (~516 lines) covers:

Tracer initialization patterns (scripts, web servers, serverless/Lambda)
Auto-instrumentation via OpenInference instrumentors
Custom spans with @trace decorator and context managers
Trace enrichment (all namespaces: config, metadata, metrics, feedback, user_properties, etc.)
Distributed tracing (session ID passing and W3C context propagation)
Multi-turn conversation session management

honeyhive-evaluators/SKILL.md (~460 lines) covers:

Experiment structure (dataset + function + evaluators)
Writing client-side evaluator functions
Running experiments with evaluate()
Multi-step pipeline evaluation (session-level vs span-level metrics)
Integration with tracing (auto-tracing inside evaluate(), the "no global tracer" constraint)
Adding client-side metrics to production traces

No functional code changes — documentation/skill files only.

Updates since last revision

Fixed two bugs caught by automated review:

dataset= → dataset_id= for managed dataset references in evaluate() — the dataset param expects List[Dict], not a string ID
project must be passed explicitly to evaluate() — it does NOT auto-read from HH_PROJECT env var (defaults to "default" silently). Updated all evaluate() examples to include project=os.getenv("HH_PROJECT", "my-project")

Review & Testing Checklist for Human

Spot-check code examples against actual SDK — none of the code examples were executed. The automated bot verified API signatures exist, but runtime behavior is unverified. Recommend running the "Complete Example" from each skill against staging to confirm they work end-to-end.
Verify distributed tracing imports — the tracing skill references from honeyhive.tracer.processing.context import enrich_span_context, inject_context_into_carrier, with_distributed_trace_context. Confirm these exist with the documented call signatures.
Verify acreate_session(skip_api_call=True) pattern — used in the multi-turn conversation example. Confirm this parameter exists and behaves as documented (sets context without making an API call).

Notes

These skills are consumed by coding agents (Devin, Claude Code, Cursor, etc.) as step-by-step procedures. Incorrect API patterns here will propagate into generated code, so accuracy is critical.
Source material: honeyhive-ai-docs repo (fix/docs-concepts branch) and this SDK branch's source code.
No tests to run — these are .md files only. CI passes trivially.
The automated Claude bot review flagged a potential SDK inconsistency: BaseEvaluator.__call__ uses arg order (inputs, outputs, ground_truth) while _run_evaluators() calls free functions as (outputs, inputs, ground_truth). Not a docs issue (the skills correctly document the free-function signature), but worth noting for SDK maintainers.

Link to Devin session: https://app.devin.ai/sessions/2389f4ddad45413780f8b3b3efaca64c
Requested by: @dhruv-hhai

…g agents Co-Authored-By: Dhruv <dhruv@honeyhive.ai>

devin-ai-integration · 2026-03-04T04:37:24Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

snyk-io · 2026-03-04T04:37:35Z

✅ Snyk checks have passed. No issues have been found so far.

Status	Scanner	Critical	High	Medium	Low	Total (0)
✅	Open Source Security	0	0	0	0	0 issues
✅	Licenses	0	0	0	0	0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

claude · 2026-03-04T04:37:35Z

Claude finished @devin-ai-integration[bot]'s task in 4m 42s —— View job

Review Complete

Reviewed both SKILL.md files against the actual SDK source code in src/honeyhive/.

Review Checklist

Read the SKILL.md files added in this PR
Verify API signatures against actual SDK source code
Check evaluate() interface against implementation
Verify distributed tracing imports and signatures
Check "no global tracer with evaluate()" guidance
Verify acreate_session(skip_api_call=True) pattern
Post review feedback

Summary

All API signatures verified — high accuracy overall. Found 2 issues:

Bug — honeyhive-evaluators/SKILL.md:100: dataset="dataset_id_here" should be dataset_id="dataset_id_here" (wrong parameter name will cause TypeError)
Inaccuracy — honeyhive-evaluators/SKILL.md:215-218: Comment says project is auto-read from HH_PROJECT env var, but evaluate() doesn't read it from env — it defaults to "default" silently

See top-level review comment for full details including the API verification table.

.agents/skills/honeyhive-evaluators/SKILL.md

claude · 2026-03-04T04:41:56Z

.agents/skills/honeyhive-evaluators/SKILL.md

+    # api_key and project are read from HH_API_KEY / HH_PROJECT env vars
+    # or pass explicitly:
+    # api_key=os.getenv("HH_API_KEY"),
+    # project=os.getenv("HH_PROJECT"),


Inaccuracy: project is NOT auto-read from env vars by evaluate()

Looking at the evaluate() implementation in src/honeyhive/experiments/core.py:812, the project parameter defaults to "default" — it does not auto-read from HH_PROJECT or HONEYHIVE_PROJECT. Only api_key and server_url are loaded from env vars (lines 923-932).

This comment will mislead coding agents into omitting project, which will silently use "default" instead of the user's actual project.

Suggest changing to:

Suggested change

# api_key and project are read from HH_API_KEY / HH_PROJECT env vars

# or pass explicitly:

# api_key=os.getenv("HH_API_KEY"),

# project=os.getenv("HH_PROJECT"),

name="intent-classifier-v1",

# api_key is read from HH_API_KEY env var if not provided

# project must be passed explicitly (does not auto-read from env):

project=os.getenv("HH_PROJECT", "my-project"),

# api_key=os.getenv("HH_API_KEY"),

Alternatively, fix the SDK's evaluate() to actually read HH_PROJECT from env (which would be the better long-term fix).

Fixed in 1475f38. Updated all evaluate() calls to pass project= explicitly. Also updated the comment to clarify that only api_key is auto-read from env, not project.

claude · 2026-03-04T04:42:16Z

Overall Review

Good work overall. Both SKILL.md files are well-structured, comprehensive, and the vast majority of API signatures and patterns have been verified against the actual SDK source. The tracing skill covers all key patterns (scripts, web servers, serverless, distributed tracing) and the evaluators skill provides a clear progression from dataset → function → evaluators → experiments.

API accuracy is high. I cross-referenced every public API referenced in these files against the implementation:

API	Verified	Notes
`HoneyHiveTracer.init()`	✅	All params match (`api_key`, `project`, `session_name`, `source`, `server_url`, `disable_batch`, `test_mode`)
`create_session()` / `acreate_session()`	✅	Including `skip_api_call=True` for multi-turn
`enrich_span()` / `enrich_session()`	✅	All namespaces correct
`@trace` decorator	✅	Unified sync/async, `event_type`, `event_name`, `tracer` params
`evaluate()`	✅	`function`, `dataset`, `evaluators`, `name` params match
Evaluator signature `(outputs, inputs, ground_truth)`	✅	Matches `_run_evaluators()` call convention
`tracer.provider`	✅	OTel `TracerProvider` attribute
Distributed tracing imports	✅	`enrich_span_context`, `inject_context_into_carrier`, `with_distributed_trace_context` all exist
`result.run_id`	✅	`ExperimentResultSummary.run_id` field exists

Two issues found (see inline comments):

Bug (honeyhive-evaluators/SKILL.md:100): dataset="dataset_id_here" should be dataset_id="dataset_id_here". The dataset param expects List[Dict], not a string. This will cause a TypeError at runtime.
Inaccuracy (honeyhive-evaluators/SKILL.md:215-218): Comment says project is auto-read from HH_PROJECT env var, but evaluate() defaults project to "default" and never reads from env. Only api_key and server_url are loaded from env vars. This will cause traces to land in the wrong project silently.

Aside — SDK inconsistency worth noting: BaseEvaluator.__call__ uses arg order (inputs, outputs, ground_truth) while _run_evaluators() calls free-function evaluators as (outputs, inputs, ground_truth). This means BaseEvaluator subclasses passed to evaluate() would receive arguments in the wrong order. Not a docs issue (the SKILL.md correctly documents the free-function signature), but worth fixing in the SDK.

…per bot review Co-Authored-By: Dhruv <dhruv@honeyhive.ai>

feat: add honeyhive-tracing and honeyhive-evaluators skills for codin…

09fbfbb

…g agents Co-Authored-By: Dhruv <dhruv@honeyhive.ai>

devin-ai-integration bot requested a review from a team March 4, 2026 04:37

devin-ai-integration bot assigned dhruv-hhai Mar 4, 2026

devin-ai-integration bot requested a review from dhruv-hhai March 4, 2026 04:37

claude bot reviewed Mar 4, 2026

View reviewed changes

.agents/skills/honeyhive-evaluators/SKILL.md Show resolved Hide resolved

claude bot reviewed Mar 4, 2026

View reviewed changes

fix: correct evaluate() parameter names and project env var handling …

1475f38

…per bot review Co-Authored-By: Dhruv <dhruv@honeyhive.ai>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add honeyhive-tracing and honeyhive-evaluators skills for coding agents#301

feat: add honeyhive-tracing and honeyhive-evaluators skills for coding agents#301
devin-ai-integration[bot] wants to merge 2 commits intofederated-sdk-release-candidatefrom
devin/1772598811-honeyhive-tracing-evaluator-skills

devin-ai-integration bot commented Mar 4, 2026 •

edited

Loading

Uh oh!

devin-ai-integration bot commented Mar 4, 2026

Uh oh!

snyk-io bot commented Mar 4, 2026 •

edited

Loading

Uh oh!

claude bot commented Mar 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

claude bot Mar 4, 2026

Uh oh!

devin-ai-integration bot Mar 4, 2026

Uh oh!

claude bot commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

devin-ai-integration bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

feat: add honeyhive-tracing and honeyhive-evaluators agent skills

Summary

Updates since last revision

Review & Testing Checklist for Human

Notes

Uh oh!

devin-ai-integration bot commented Mar 4, 2026

🤖 Devin AI Engineer

Uh oh!

snyk-io bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Snyk checks have passed. No issues have been found so far.

Uh oh!

claude bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review Complete

Review Checklist

Summary

Uh oh!

Uh oh!

claude bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot commented Mar 4, 2026

Overall Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

devin-ai-integration bot commented Mar 4, 2026 •

edited

Loading

snyk-io bot commented Mar 4, 2026 •

edited

Loading

claude bot commented Mar 4, 2026 •

edited

Loading