⚡️ Speed up method NativeCallbackHandler._extract_token_usage by 96% in PR #11689 (aka/traces-v0)#11944
Closed
codeflash-ai[bot] wants to merge 72 commits into
Closed
⚡️ Speed up method NativeCallbackHandler._extract_token_usage by 96% in PR #11689 (aka/traces-v0)#11944codeflash-ai[bot] wants to merge 72 commits into
NativeCallbackHandler._extract_token_usage by 96% in PR #11689 (aka/traces-v0)#11944codeflash-ai[bot] wants to merge 72 commits into
Conversation
v0 for traces includes: - filters: status, token usage range and datatime - accordian rows per trace Could add: - more filter options. Ecamples: session_id, trace_id and latency range
add sidebar buttons for logs and trace remove lods canvas control
hopefully fix duplicate trace ID insertion on windows
update tests and alembic tables for uts
was flow_name - trace_id now flow_name - flow_id
address gabriel simple changes in traces.py and native.py
model name is now set using name = f"{operation} {model_name}" if model_name else operation
* feat: use uv sources for CPU-only PyTorch Configure [tool.uv.sources] with pytorch-cpu index to avoid ~6GB CUDA dependencies in Docker images. This replaces hardcoded wheel URLs with a cleaner index-based approach. - Add pytorch-cpu index with explicit = true - Add torch/torchvision to [tool.uv.sources] - Add explicit torch/torchvision deps to trigger source override - Regenerate lockfile without nvidia/cuda/triton packages - Add required-environments for multi-platform support Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: update regex to only replace name in [project] section The previous regex matched all lines starting with `name = "..."`, which incorrectly renamed the UV index `pytorch-cpu` to `langflow-nightly` during nightly builds. This caused `uv lock` to fail with: "Package torch references an undeclared index: pytorch-cpu" The new regex specifically targets the name field within the [project] section only, avoiding unintended replacements in other sections like [[tool.uv.index]]. * style: fix ruff quote style * fix: remove required-environments to fix Python 3.13 macOS x86_64 CI The required-environments setting was causing hard failures when packages like torch didn't have wheels for specific platform/Python combinations. Without this setting, uv resolves optimistically and handles missing wheels gracefully at runtime instead of failing during resolution. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* LE-270: add fix hydration issues * LE-270: fix disable field on max token on language model --------- Co-authored-by: Olayinka Adelakun <olayinkaadelakun@mac.war.can.ibm.com>
* Add wait for selector in mcp server tests * [autofix.ci] apply automated fixes * Add more awit for selectors * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
* Reduce lag in frontend by batching react events and reducing minimval visual build time * Cleanup * [autofix.ci] apply automated fixes * add tests and improve code read * [autofix.ci] apply automated fixes * Remove debug log --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: cristhianzl <cristhian.lousa@gmail.com>
* Lazy load imports for language model component Ensures that only the necessary dependencies are required. For example, if OpenAI provider is used, it will now only import langchain_openai, rather than requiring langchain_anthropic, langchain_ibm, etc. * Add backwards-compat functions * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * Add exception handling * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * comp index * docs: azure default temperature (#11829) * change-azure-openai-default-temperature-to-1.0 * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * [autofix.ci] apply automated fixes (attempt 3/3) * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * fix unit test? * add no-group dev to docker builds * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Hamza Rashid <74062092+HzaRashid@users.noreply.github.com> Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com>
address backend code rabbit comments
address code rabbit frontend comments
test_native_tracer minor fix address c1
address C2 + C3
address H1-H5
update test_native_tracer
address m2
… into aka/traces-v0
address M1
fix 422 spam and clean comments
address M12
address M3
address M4
address M5
clean up for M7, M9, M11
address L2,L4,L5 and L6 + any test
alembic + comment clean up
The optimized code achieves a **95% speedup** (from 806μs to 412μs) by eliminating redundant operations in the token usage extraction logic.
**Key Optimizations:**
1. **Removed Lambda Function Creation** - The original code created a lambda function on every iteration when `usage` was not a dict:
```python
_get = usage.get if isinstance(usage, dict) else lambda k, d=None, u=usage: getattr(u, k, d)
```
This lambda was called 3 times per iteration. The optimized version uses explicit `if/else` branching to handle dict vs object cases separately, avoiding lambda overhead entirely.
2. **Eliminated Redundant Dictionary Fallbacks** - The original code used `or {}` patterns even when the dict was already checked:
```python
# Original
resp_meta = getattr(message, "response_metadata", None) or {}
gen_info = getattr(gen, "generation_info", None) or {}
```
These created unnecessary empty dict objects. The optimized version removes the `or {}` since the subsequent `isinstance()` check handles `None` correctly.
3. **Reduced Dictionary Accesses in Fallback Chains** - When checking `resp_meta.get("token_usage") or resp_meta.get("usage", {})`, the original code always evaluated both `.get()` calls and created an empty dict. The optimized version uses `or resp_meta.get("usage")` without the empty dict default, letting the subsequent `isinstance()` check filter out `None` values.
**Why This Matters:**
The line profiler shows the nested loops iterate ~1000-2000 times per call (1006 gen_list iterations × 2004 gen iterations). The original code had these expensive operations in the hot path:
- Lambda creation: 2 hits but conceptually happens every time `usage` exists
- Redundant `or {}` operations: 1002 + 1000 = 2002 empty dict creations
- Extra `.get()` calls with dict defaults: thousands of unnecessary lookups
The optimized version specifically benefits test cases with:
- **Many generations without usage data** (test_large_number_of_generations_with_usage_at_end): Reduced wasted work per empty generation
- **Usage in standardized locations** (test_generations_usage_metadata_overrides_legacy_when_missing_total, test_usage_metadata_as_object_with_attributes_instead_of_dict): Faster object vs dict handling without lambda overhead
Since this is a callback handler for LangChain tracing, it's likely called frequently during LLM operations. The ~400μs reduction per call can significantly impact applications with high LLM usage rates.
Contributor
Codecov Report✅ All modified and coverable lines are covered by tests. ❌ Your project check has failed because the head coverage (41.46%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## aka/traces-v0 #11944 +/- ##
==============================================
Coverage 36.53% 36.53%
==============================================
Files 1580 1580
Lines 77116 77116
Branches 11778 11778
==============================================
Hits 28178 28178
Misses 47325 47325
Partials 1613 1613
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
Contributor
Author
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
⚡️ This pull request contains optimizations for PR #11689
If you approve this dependent PR, these changes will be merged into the original PR branch
aka/traces-v0.📄 96% (0.96x) speedup for
NativeCallbackHandler._extract_token_usageinsrc/backend/base/langflow/services/tracing/native_callback.py⏱️ Runtime :
806 microseconds→412 microseconds(best of18runs)📝 Explanation and details
The optimized code achieves a 95% speedup (from 806μs to 412μs) by eliminating redundant operations in the token usage extraction logic.
Key Optimizations:
Removed Lambda Function Creation - The original code created a lambda function on every iteration when
usagewas not a dict:This lambda was called 3 times per iteration. The optimized version uses explicit
if/elsebranching to handle dict vs object cases separately, avoiding lambda overhead entirely.Eliminated Redundant Dictionary Fallbacks - The original code used
or {}patterns even when the dict was already checked:These created unnecessary empty dict objects. The optimized version removes the
or {}since the subsequentisinstance()check handlesNonecorrectly.Reduced Dictionary Accesses in Fallback Chains - When checking
resp_meta.get("token_usage") or resp_meta.get("usage", {}), the original code always evaluated both.get()calls and created an empty dict. The optimized version usesor resp_meta.get("usage")without the empty dict default, letting the subsequentisinstance()check filter outNonevalues.Why This Matters:
The line profiler shows the nested loops iterate ~1000-2000 times per call (1006 gen_list iterations × 2004 gen iterations). The original code had these expensive operations in the hot path:
usageexistsor {}operations: 1002 + 1000 = 2002 empty dict creations.get()calls with dict defaults: thousands of unnecessary lookupsThe optimized version specifically benefits test cases with:
Since this is a callback handler for LangChain tracing, it's likely called frequently during LLM operations. The ~400μs reduction per call can significantly impact applications with high LLM usage rates.
✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-pr11689-2026-02-28T01.48.16and push.