Skip to content

feat: add honeyhive-tracing and honeyhive-evaluators skills for coding agents#301

Open
devin-ai-integration[bot] wants to merge 2 commits intofederated-sdk-release-candidatefrom
devin/1772598811-honeyhive-tracing-evaluator-skills
Open

feat: add honeyhive-tracing and honeyhive-evaluators skills for coding agents#301
devin-ai-integration[bot] wants to merge 2 commits intofederated-sdk-release-candidatefrom
devin/1772598811-honeyhive-tracing-evaluator-skills

Conversation

@devin-ai-integration
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Mar 4, 2026

feat: add honeyhive-tracing and honeyhive-evaluators agent skills

Summary

Adds two new reusable SKILL.md files under .agents/skills/ that coding agents can invoke when integrating HoneyHive tracing or evaluators into Python applications. Content is grounded in the mintlify docs (fix/docs-concepts branch) and the SDK implementation on this branch.

honeyhive-tracing/SKILL.md (~516 lines) covers:

  • Tracer initialization patterns (scripts, web servers, serverless/Lambda)
  • Auto-instrumentation via OpenInference instrumentors
  • Custom spans with @trace decorator and context managers
  • Trace enrichment (all namespaces: config, metadata, metrics, feedback, user_properties, etc.)
  • Distributed tracing (session ID passing and W3C context propagation)
  • Multi-turn conversation session management

honeyhive-evaluators/SKILL.md (~460 lines) covers:

  • Experiment structure (dataset + function + evaluators)
  • Writing client-side evaluator functions
  • Running experiments with evaluate()
  • Multi-step pipeline evaluation (session-level vs span-level metrics)
  • Integration with tracing (auto-tracing inside evaluate(), the "no global tracer" constraint)
  • Adding client-side metrics to production traces

No functional code changes — documentation/skill files only.

Updates since last revision

Fixed two bugs caught by automated review:

  1. dataset=dataset_id= for managed dataset references in evaluate() — the dataset param expects List[Dict], not a string ID
  2. project must be passed explicitly to evaluate() — it does NOT auto-read from HH_PROJECT env var (defaults to "default" silently). Updated all evaluate() examples to include project=os.getenv("HH_PROJECT", "my-project")

Review & Testing Checklist for Human

  • Spot-check code examples against actual SDK — none of the code examples were executed. The automated bot verified API signatures exist, but runtime behavior is unverified. Recommend running the "Complete Example" from each skill against staging to confirm they work end-to-end.
  • Verify distributed tracing imports — the tracing skill references from honeyhive.tracer.processing.context import enrich_span_context, inject_context_into_carrier, with_distributed_trace_context. Confirm these exist with the documented call signatures.
  • Verify acreate_session(skip_api_call=True) pattern — used in the multi-turn conversation example. Confirm this parameter exists and behaves as documented (sets context without making an API call).

Notes

  • These skills are consumed by coding agents (Devin, Claude Code, Cursor, etc.) as step-by-step procedures. Incorrect API patterns here will propagate into generated code, so accuracy is critical.
  • Source material: honeyhive-ai-docs repo (fix/docs-concepts branch) and this SDK branch's source code.
  • No tests to run — these are .md files only. CI passes trivially.
  • The automated Claude bot review flagged a potential SDK inconsistency: BaseEvaluator.__call__ uses arg order (inputs, outputs, ground_truth) while _run_evaluators() calls free functions as (outputs, inputs, ground_truth). Not a docs issue (the skills correctly document the free-function signature), but worth noting for SDK maintainers.

Link to Devin session: https://app.devin.ai/sessions/2389f4ddad45413780f8b3b3efaca64c
Requested by: @dhruv-hhai

…g agents

Co-Authored-By: Dhruv <dhruv@honeyhive.ai>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@snyk-io
Copy link

snyk-io bot commented Mar 4, 2026

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@claude
Copy link
Contributor

claude bot commented Mar 4, 2026

Claude finished @devin-ai-integration[bot]'s task in 4m 42s —— View job


Review Complete

Reviewed both SKILL.md files against the actual SDK source code in src/honeyhive/.

Review Checklist

  • Read the SKILL.md files added in this PR
  • Verify API signatures against actual SDK source code
  • Check evaluate() interface against implementation
  • Verify distributed tracing imports and signatures
  • Check "no global tracer with evaluate()" guidance
  • Verify acreate_session(skip_api_call=True) pattern
  • Post review feedback

Summary

All API signatures verified — high accuracy overall. Found 2 issues:

  1. Bughoneyhive-evaluators/SKILL.md:100: dataset="dataset_id_here" should be dataset_id="dataset_id_here" (wrong parameter name will cause TypeError)
  2. Inaccuracyhoneyhive-evaluators/SKILL.md:215-218: Comment says project is auto-read from HH_PROJECT env var, but evaluate() doesn't read it from env — it defaults to "default" silently

See top-level review comment for full details including the API verification table.

Comment on lines +215 to +218
# api_key and project are read from HH_API_KEY / HH_PROJECT env vars
# or pass explicitly:
# api_key=os.getenv("HH_API_KEY"),
# project=os.getenv("HH_PROJECT"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inaccuracy: project is NOT auto-read from env vars by evaluate()

Looking at the evaluate() implementation in src/honeyhive/experiments/core.py:812, the project parameter defaults to "default" — it does not auto-read from HH_PROJECT or HONEYHIVE_PROJECT. Only api_key and server_url are loaded from env vars (lines 923-932).

This comment will mislead coding agents into omitting project, which will silently use "default" instead of the user's actual project.

Suggest changing to:

Suggested change
# api_key and project are read from HH_API_KEY / HH_PROJECT env vars
# or pass explicitly:
# api_key=os.getenv("HH_API_KEY"),
# project=os.getenv("HH_PROJECT"),
name="intent-classifier-v1",
# api_key is read from HH_API_KEY env var if not provided
# project must be passed explicitly (does not auto-read from env):
project=os.getenv("HH_PROJECT", "my-project"),
# api_key=os.getenv("HH_API_KEY"),

Alternatively, fix the SDK's evaluate() to actually read HH_PROJECT from env (which would be the better long-term fix).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1475f38. Updated all evaluate() calls to pass project= explicitly. Also updated the comment to clarify that only api_key is auto-read from env, not project.

@claude
Copy link
Contributor

claude bot commented Mar 4, 2026

Overall Review

Good work overall. Both SKILL.md files are well-structured, comprehensive, and the vast majority of API signatures and patterns have been verified against the actual SDK source. The tracing skill covers all key patterns (scripts, web servers, serverless, distributed tracing) and the evaluators skill provides a clear progression from dataset → function → evaluators → experiments.

API accuracy is high. I cross-referenced every public API referenced in these files against the implementation:

API Verified Notes
HoneyHiveTracer.init() All params match (api_key, project, session_name, source, server_url, disable_batch, test_mode)
create_session() / acreate_session() Including skip_api_call=True for multi-turn
enrich_span() / enrich_session() All namespaces correct
@trace decorator Unified sync/async, event_type, event_name, tracer params
evaluate() function, dataset, evaluators, name params match
Evaluator signature (outputs, inputs, ground_truth) Matches _run_evaluators() call convention
tracer.provider OTel TracerProvider attribute
Distributed tracing imports enrich_span_context, inject_context_into_carrier, with_distributed_trace_context all exist
result.run_id ExperimentResultSummary.run_id field exists

Two issues found (see inline comments):

  1. Bug (honeyhive-evaluators/SKILL.md:100): dataset="dataset_id_here" should be dataset_id="dataset_id_here". The dataset param expects List[Dict], not a string. This will cause a TypeError at runtime.

  2. Inaccuracy (honeyhive-evaluators/SKILL.md:215-218): Comment says project is auto-read from HH_PROJECT env var, but evaluate() defaults project to "default" and never reads from env. Only api_key and server_url are loaded from env vars. This will cause traces to land in the wrong project silently.

Aside — SDK inconsistency worth noting: BaseEvaluator.__call__ uses arg order (inputs, outputs, ground_truth) while _run_evaluators() calls free-function evaluators as (outputs, inputs, ground_truth). This means BaseEvaluator subclasses passed to evaluate() would receive arguments in the wrong order. Not a docs issue (the SKILL.md correctly documents the free-function signature), but worth fixing in the SDK.

…per bot review

Co-Authored-By: Dhruv <dhruv@honeyhive.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant