Skip to content

Conversation

@lievan
Copy link
Contributor

@lievan lievan commented Feb 27, 2025

This PR supports instrumenting LLM spans for bedrock's Converse method. This PR does not touch ConverseStream, but we document it’s behavior in test_llmobs_converse_stream.

Its helpful to review the bedrock request syntax, and the response syntax

Example bedrock code snippet:

  response = bedrock_runtime.converse(
      system=[{
          "text": "You are an app that creates play lists for a radio station that plays rock and pop music. Only return song names and the artist. "
      }],
      modelId=MODEL_ID,
      messages=messages,
      inferenceConfig=…
      toolConfig=…
  )

Manual QA

Example with tool calls
Example without tool calls

Data this PR traces

  • System prompts in meta.input.messages[0].content with system role
  • Text based input in meta.input.messages[i].content with user role
  • Text based output in meta.input.messages[i].content with assistant role
  • Tool call outputs in meta.output.messages[0].tool_calls
  • Inference parameter metadata max_tokens and temperature
  • stop_reason

Implementation details:

We register a separate trace handler for processing bedrock converse responses.

core.on("botocore.bedrock.process_response_converse", _on_botocore_bedrock_process_response_converse)

This is to avoid the code-path that does extra post-processing of invoke model responses before it's ready for llmobs_set_tags.

Converse still relies on the same trace handler for processing 1) request input 2) bedrock exceptions.

Cassettes

I chose to use cassettes since there were some difficulties with mocking out the bedrock calls with respx. There are some authentication steps that happen within the botocore library before the mocked LLM call, leading me to run into errors like:

E           botocore.exceptions.ClientError: An error occurred (UnrecognizedClientException) when calling the Converse operation: The security token included in the request is invalid.
E           botocore.exceptions.ClientError: An error occurred (MissingAuthenticationTokenException) when calling the Converse operation: Missing Authentication Token

This means we needed to mock out or find a way to skip the internal authentication steps, which would cause the test to be dependent on non-bedrock parts of the botocore library which may be subject to change. In my opinion, this makes cassettes the better option.

To Do

  • Support converse stream
  • Support more inference params like top_p and stop_sequences

Checklist

  • PR author has checked that all the criteria below are met
  • The PR description includes an overview of the change
  • The PR description articulates the motivation for the change
  • The change includes tests OR the PR description describes a testing strategy
  • The PR description notes risks associated with the change, if any
  • Newly-added code is easy to change
  • The change follows the library release note guidelines
  • The change includes or references documentation updates if necessary
  • Backport labels are set (if applicable)

Reviewer Checklist

  • Reviewer has checked that all the criteria below are met
  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Newly-added code is easy to change
  • Release note makes sense to a user of the library
  • If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

@github-actions
Copy link
Contributor

github-actions bot commented Feb 27, 2025

CODEOWNERS have been resolved as:

.riot/requirements/15f7356.txt                                          @DataDog/apm-python
.riot/requirements/1ecd900.txt                                          @DataDog/apm-python
.riot/requirements/5295cd7.txt                                          @DataDog/apm-python
.riot/requirements/df0b19d.txt                                          @DataDog/apm-python
.riot/requirements/e1342cb.txt                                          @DataDog/apm-python
releasenotes/notes/bedrock-converse-api-20dd255c1ee18cf4.yaml           @DataDog/apm-python
tests/contrib/botocore/bedrock_cassettes/bedrock_converse.yaml          @DataDog/ml-observability
tests/contrib/botocore/bedrock_cassettes/bedrock_converse_error.yaml    @DataDog/ml-observability
tests/contrib/botocore/bedrock_cassettes/bedrock_converse_stream.yaml   @DataDog/ml-observability
tests/snapshots/tests.contrib.botocore.test_bedrock.test_converse.json  @DataDog/apm-python
ddtrace/_trace/trace_handlers.py                                        @DataDog/apm-sdk-api-python
ddtrace/contrib/internal/botocore/patch.py                              @DataDog/apm-core-python @DataDog/apm-idm-python
ddtrace/contrib/internal/botocore/services/bedrock.py                   @DataDog/ml-observability
ddtrace/llmobs/_integrations/bedrock.py                                 @DataDog/ml-observability
ddtrace/llmobs/_integrations/utils.py                                   @DataDog/ml-observability
riotfile.py                                                             @DataDog/apm-python
tests/contrib/botocore/bedrock_utils.py                                 @DataDog/ml-observability
tests/contrib/botocore/test.py                                          @DataDog/apm-core-python @DataDog/apm-idm-python
tests/contrib/botocore/test_bedrock.py                                  @DataDog/ml-observability
tests/contrib/botocore/test_bedrock_llmobs.py                           @DataDog/ml-observability

@pr-commenter
Copy link

pr-commenter bot commented Feb 28, 2025

Benchmarks

Benchmark execution time: 2025-03-12 21:37:42

Comparing candidate commit 224d6b0 in PR branch evan.li/claude-code-converse-api with baseline commit 8723688 in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 282 metrics, 2 unstable metrics.

@lievan lievan changed the title feat(llmobs): trace bedrock converse api feat(llmobs): trace text-based bedrock converse api Feb 28, 2025
@datadog-dd-trace-py-rkomorn
Copy link

Datadog Report

Branch report: evan.li/claude-code-converse-api
Commit report: e700fca
Test service: dd-trace-py

✅ 0 Failed, 43 Passed, 290 Skipped, 49.39s Total duration (5m 5.68s time saved)

Copy link
Contributor

@Yun-Kim Yun-Kim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one suggestion but otherwise LGTM!

@lievan lievan enabled auto-merge (squash) March 12, 2025 21:27
@lievan lievan merged commit d5f52da into main Mar 12, 2025
821 of 822 checks passed
@lievan lievan deleted the evan.li/claude-code-converse-api branch March 12, 2025 21:39
lievan pushed a commit that referenced this pull request Mar 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants