-
Notifications
You must be signed in to change notification settings - Fork 476
feat(llmobs): trace text-based bedrock converse api #12560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
BenchmarksBenchmark execution time: 2025-03-12 21:37:42 Comparing candidate commit 224d6b0 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 282 metrics, 2 unstable metrics. |
Datadog ReportBranch report: ✅ 0 Failed, 43 Passed, 290 Skipped, 49.39s Total duration (5m 5.68s time saved) |
…aude-code-converse-api
Yun-Kim
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one suggestion but otherwise LGTM!
This PR supports instrumenting LLM spans for bedrock's
Conversemethod. This PR does not touchConverseStream, but we document it’s behavior intest_llmobs_converse_stream.Its helpful to review the bedrock request syntax, and the response syntax
Example bedrock code snippet:
Manual QA
Example with tool calls
Example without tool calls
Data this PR traces
systemroleuserroleassistantrolemax_tokensandtemperaturestop_reasonImplementation details:
We register a separate trace handler for processing bedrock converse responses.
core.on("botocore.bedrock.process_response_converse", _on_botocore_bedrock_process_response_converse)This is to avoid the code-path that does extra post-processing of invoke model responses before it's ready for
llmobs_set_tags.Converse still relies on the same trace handler for processing 1) request input 2) bedrock exceptions.
Cassettes
I chose to use cassettes since there were some difficulties with mocking out the bedrock calls with respx. There are some authentication steps that happen within the botocore library before the mocked LLM call, leading me to run into errors like:
This means we needed to mock out or find a way to skip the internal authentication steps, which would cause the test to be dependent on non-bedrock parts of the botocore library which may be subject to change. In my opinion, this makes cassettes the better option.
To Do
Checklist
Reviewer Checklist