feat(aws-sdk, llmobs): support Bedrock Converse and ConverseStream#8079
feat(aws-sdk, llmobs): support Bedrock Converse and ConverseStream#8079PROFeNoM wants to merge 12 commits intoalex/llmobs-tool-definitionsfrom
Conversation
Overall package sizeSelf size: 5.56 MB Dependency sizes| name | version | self size | total size | |------|---------|-----------|------------| | import-in-the-middle | 3.0.1 | 82.56 kB | 817.39 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB |🤖 This report was automatically generated by heaviest-objects-in-the-universe |
🎉 All green!❄️ No new flaky tests detected 🎯 Code Coverage (details) 🔗 Commit SHA: ad1cd0f | Docs | Datadog PR Page | Give us feedback! |
BenchmarksBenchmark execution time: 2026-04-27 13:30:56 Comparing candidate commit ad1cd0f in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 1343 metrics, 101 unstable metrics. |
48ae827 to
7da447f
Compare
abcf58c to
6f9ab4d
Compare
7da447f to
6486db6
Compare
6f9ab4d to
21f766e
Compare
6486db6 to
f22c2ab
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a39f390575
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
cc5a8f8 to
d6bf034
Compare
| const tracesPromise = agent.assertFirstTraceSpan({ | ||
| meta: { | ||
| 'aws.operation': 'converseStream', | ||
| 'aws.bedrock.request.model': converseRequest.modelId.split('.')[1], | ||
| 'aws.bedrock.request.model_provider': converseRequest.provider.toLowerCase(), | ||
| }, | ||
| }) | ||
|
|
||
| const result = await bedrockRuntimeClient.send(command) | ||
| for await (const _event of result.stream) { // eslint-disable-line no-unused-vars | ||
| // drain | ||
| } | ||
| await tracesPromise |
There was a problem hiding this comment.
This is a very small risk for having an unhandled rejection in case the stream fails.
This is unlikely and the fix is not nice: the tracePromise needs a catch handler above to not trigger the unhandled rejection.
9654ef9 to
5c97b1b
Compare
Adds APM span tagging and LLMObs enrichment for ConverseCommand and ConverseStreamCommand on the Bedrock runtime client. - tracing: extend operation whitelist - utils: Converse request/response parsers plus stream-event aggregator for text + toolUse content blocks - instrumentation: wrap `result.stream` as well as `result.body` so ConverseStream events flow through the streamed-chunk channel - llmobs: branch on Converse in setLLMObsTags, surface stop_reason, emit structured output messages with tool_calls - tests + VCR cassettes for system + tool path against Claude 3 Haiku Ref: MLOB-3509
1cd9ac5 to
1451e6c
Compare
…ult variants Emit placeholder markers instead of dropping data silently: - buildToolResult: non-text/non-json content items (document, image, video, searchResult) now produce `[<type> content]` instead of ''. - buildConverseStreamGeneration: ContentBlockDelta events that are neither text nor toolUse (citation, image, reasoningContent, toolResult) now append `[Unsupported delta: <type>]` at their contentBlockIndex, so streamed responses using these variants no longer tag as empty output. Ref: MLOB-3509
When a ConverseCommand or ConverseStreamCommand request includes a
`toolConfig.tools` array, map each `toolSpec` to LLMObs'
`{ name, description, schema }` shape and attach it to the span via
the new `tagToolDefinitions` API (see parent PR #8082).
- utils.js: new `extractConverseToolDefinitions` helper.
- llmobs/plugins/bedrockruntime.js: call it on Converse requests.
- spec: assert on the `toolDefinitions` round-trip.
Brings Node parity with dd-trace-py's Bedrock Converse integration
for the "Available Tools" UI section.
Ref: MLOB-3509
Malformed Converse payloads (e.g. `content: [null]`, `tools: [null]`) would throw in `extractMessagesFromConverseContent` and `extractConverseToolDefinitions`. Plugin subscriber exceptions trigger auto-disable via `this.configure(false)`, turning off Bedrock LLMObs for the rest of the process. Skip invalid entries instead of throwing, matching the defensive optional-chaining style already used by this file's InvokeModel extractors. Flagged by Codex adversarial review.
Align `buildToolResult` fallback wording with dd-trace-py's `[Unsupported content type(s): <keys>]` format (see `dd-trace-py/ddtrace/llmobs/_integrations/utils.py:266`).
Dropping the `[Unsupported delta: <type>]` fallback: it spammed the output on every delta event (thinking-mode responses stream reasoningContent as dozens of small deltas) and had no parity in dd-trace-py. Match Python's silent-drop behavior for unhandled ContentBlockDelta variants; the non-stream `[Unsupported content type: ...]` markers already cover the debuggability need.
- `buildToolCall`: extract `parseToolInput` helper so `args` is set once in a single expression (inputStr wins over input explicitly). - `buildToolResult`: extract `resolveToolResultItem` helper so the three cases (text / json / unsupported) are a flat if-chain instead of a nested ternary inside `.map`. - `extractConverseToolDefinitions`: rename `defs` -> `toolDefinitions`, `spec` -> `toolSpec` to match the AWS SDK field names. - `extractTextAndResponseReasonConverse`: rename the raw-response- handle `message` -> `outputMessage` so it no longer collides with the `messages` LLMObs array built right after.
The original `setLLMObsTags` mixed InvokeModel and Converse flows,
branching on `isConverse` / `isStream` in three places (request
params, generation extraction, stop_reason emission). Split into
two specialized methods so each path reads top-to-bottom:
- `#tagConverseSpan` owns the Converse-only concerns:
tool_definitions, stop_reason metadata, Converse request/response
extractors.
- `#tagInvokeModelSpan` owns the per-provider InvokeModel extractors.
- `#tagCommon` handles the truly shared tagging (temperature,
max_tokens, I/O, metrics) with no shape branching.
To make `#tagCommon` shape-agnostic, `Generation` now auto-derives
its structured `messages` field from `{message, role}` when the
caller doesn't pass `messages` explicitly. The plugin reads
`generation.messages` unconditionally; InvokeModel behavior is
unchanged (same `[{content, role}]` wrapper, just produced at the
class boundary instead of inline at the tag call site).
Also: `stop_reason` moves out of the shared metadata path into
`#tagConverseSpan`. It was leaking onto InvokeModel spans through
the shared code; this restores pre-PR behavior for InvokeModel.
…ctors Stream events describe the same content-block structure as the non- stream response, just chunked across start/delta events. Reassemble the stream chunks into a normalized `ContentBlock[]` shape and reuse the non-stream extractor via a shared `toOutputMessages` helper. - `buildConverseStreamGeneration`: replaces the parallel `textByIdx` / `toolByIdx` maps (and the dedicated `assembleStreamMessage` helper, now deleted) with a single `blocksByIdx` Map of reassembled blocks. One code path for content-block -> output-message regardless of stream vs non-stream. - `toOutputMessages`: new helper that wraps `extractMessagesFromConverseContent` with the "always emit at least one output message" fallback, shared by both paths. - `extractMessagesFromConverseContent`: early-return when nothing extracted; end-of-function double-check replaced with a plain `return [message]`. - `extractTextAndResponseReason` (InvokeModel Amazon path): replace the four explicit token fields with `...buildUsage(body.usage)`. Incidentally fixes a latent bug where `cacheRead/WriteInputTokenCount` were passed to a `Generation` constructor that only destructures `cacheRead/WriteTokens`, silently dropping those values.
The ContentBlockDelta union has 6 variants (text, toolUse, toolResult,
reasoningContent, citation, image). Only text and toolUse.input were
decoded; the rest are intended to be silently ignored per Python
parity.
The previous implementation leaked empty `{}` blocks into
`blocksByIdx` when a delta arrived at a new contentBlockIndex with no
matching handler (e.g. Claude 3.7 thinking mode emitting
reasoningContent deltas). Downstream,
`extractMessagesFromConverseContent` saw the empty block and emitted
`[Unsupported content type: unknown]`.
Fix: only write into `blocksByIdx` inside the branches that actually
populate a block. Unhandled deltas now leave the Map untouched.
Flagged by review on utils.js:647-651.
If `send` (or stream iteration) rejects before the test reaches `await tracesPromise`, the trace-assertion promise stays pending and later settles with no observer, triggering an unhandled-rejection warning in some Node versions / CI configs. Attach a no-op `.catch()` to the original `tracesPromise` right after creation. This satisfies the "promise was handled" check without mutating the source — the subsequent `await tracesPromise` still sees the rejection and propagates it to the test runner. Flagged on review.
…sent `tagToolDefinitions` now logs a failure for non-array / empty input (see #8082). The Bedrock plugin previously called it unconditionally on every Converse request — `extractConverseToolDefinitions` returns `[]` when there's no `toolConfig`, which would now produce noisy `invalid_tool_definitions` logs on every tool-less Converse call. Gate the call on `length > 0`.
5c97b1b to
ad1cd0f
Compare
Adds LLMObs support for AWS Bedrock
ConverseCommandandConverseStreamCommand, bringing Node.js to parity with dd-trace-py (PRs DataDog/dd-trace-py#12560, DataDog/dd-trace-py#12873, DataDog/dd-trace-py#13756).👉 Live Datadog UI traces — Python (existing integration) vs Node.js (this PR), same operations side-by-side
Sample comparison (Left=Python; right=NodeJS)
What gets emitted
One
llm-kind LLMObs span per Converse call, namedbedrock-runtime.command. Stream and non-stream produce the same span shape; the stream path assembles it from theConverseStreamOutputevent union (messageStart→contentBlockStart→contentBlockDelta→contentBlockStop→messageStop→metadata).Example span for a Converse call with a system prompt + tool:
{ "name": "bedrock-runtime.command", "meta": { "span.kind": "llm", "model_name": "anthropic.claude-3-haiku-20240307-v1:0", "model_provider": "amazon_bedrock", "input": { "messages": [ { "role": "system", "content": "You are an expert swe..." }, { "role": "user", "content": "Explain distributed tracing." } ] }, "output": { "messages": [ { "role": "assistant", "tool_calls": [{ "name": "fetch_concept", "arguments": { "concept": "distributed tracing" }, "tool_id": "tooluse_...", "type": "toolUse" }] } ] }, "tool_definitions": [ { "name": "fetch_concept", "description": "...", "schema": { "json": {...} } } ], "metadata": { "temperature": 0.5, "max_tokens": 512, "stop_reason": "tool_use" } }, "metrics": { "input_tokens": 364, "output_tokens": 55, "total_tokens": 419, "cache_read_input_tokens": 0, "cache_write_input_tokens": 0 } }Content in both input and output follows Bedrock's
ContentBlocktagged union. Supported variants (text,toolUse,toolResult) are decoded into the shape above. Unsupported ones (image,document,video,reasoningContent,citationsContent,searchResult,cachePoint) become inline markers on non-stream responses only —[Unsupported content type: <type>]for top-level blocks (dd-trace-py utils.py:279) and[Unsupported content type(s): <keys>]inside tool results (dd-trace-py utils.py:266). The stream aggregator silently ignores unhandledContentBlockDeltavariants, matching dd-trace-py.Known disparities with dd-trace-py not addressed in this PR
Verified side-by-side against
dd-trace-pyon a demo app running both tracers end-to-end against the Datadog UI. The differences below were identified and noted, but ignored.Service name
Python hardcodes
aws.bedrock-runtime(ignoresDD_SERVICE; usesDD_BOTOCORE_SERVICE=aws+ endpoint-name suffix). Node respectsDD_SERVICE, as shown in the screenshot above.Why not acted on: pre-existing divergence, unrelated to Converse. Changing either side is a service-naming policy call, and a breaking change; not something that belongs to that PR.
model_name/model_providerPython splits via
parse_model_id→(anthropic, claude-3-haiku-20240307-v1:0); backend matchesclaude-3-haiku/anthropic. Node sends the full lowercasedmodelId+ literalamazon_bedrockprovider; the backend falls back to the raw string becauseamazon_bedrockisn't a recognized foundation-model provider.Why not acted on: pre-existing Node behavior from PR #7952 (
fix(llmobs): fix missing estimated cost on Bedrock LLM spans, commit2bac2030d), unchanged by this PR.stop_reasononConverseStreamPython captures
stopReasonfor non-streamConversebut drops it on the stream path (bedrock.py:304-308). Node emitsstop_reasonuniformly for both paths.Why not acted on: Node is doing better parity than Python here tbh.
APM span name, resource case, tag prefix
bedrock-runtime.commandaws.requestConverseStream(PascalCase)converseStream(camelCase)bedrock.request.*aws.bedrock.request.*Why not acted on: not introduced by this PR. Unchanged since the Bedrock plugin was introduced. A rename would be a breaking change.
Test plan
PLUGINS="aws-sdk" SPEC="bedrockruntime.spec.js" npm run test:plugins— greenpackages/dd-trace/test/llmobs/plugins/aws-sdk/bedrockruntime.spec.js— greennpm run linton touched files — cleandemo-apps/bedrock-converse) — see live-traces link at the top of this description