Skip to content

⚡️ Speed up method NativeCallbackHandler._extract_token_usage by 96% in PR #11689 (aka/traces-v0)#11944

Closed
codeflash-ai[bot] wants to merge 72 commits into
mainfrom
codeflash/optimize-pr11689-2026-02-28T01.48.16
Closed

⚡️ Speed up method NativeCallbackHandler._extract_token_usage by 96% in PR #11689 (aka/traces-v0)#11944
codeflash-ai[bot] wants to merge 72 commits into
mainfrom
codeflash/optimize-pr11689-2026-02-28T01.48.16

Conversation

@codeflash-ai
Copy link
Copy Markdown
Contributor

@codeflash-ai codeflash-ai Bot commented Feb 28, 2026

⚡️ This pull request contains optimizations for PR #11689

If you approve this dependent PR, these changes will be merged into the original PR branch aka/traces-v0.

This PR will be automatically closed if the original PR is merged.


📄 96% (0.96x) speedup for NativeCallbackHandler._extract_token_usage in src/backend/base/langflow/services/tracing/native_callback.py

⏱️ Runtime : 806 microseconds 412 microseconds (best of 18 runs)

📝 Explanation and details

The optimized code achieves a 95% speedup (from 806μs to 412μs) by eliminating redundant operations in the token usage extraction logic.

Key Optimizations:

  1. Removed Lambda Function Creation - The original code created a lambda function on every iteration when usage was not a dict:

    _get = usage.get if isinstance(usage, dict) else lambda k, d=None, u=usage: getattr(u, k, d)

    This lambda was called 3 times per iteration. The optimized version uses explicit if/else branching to handle dict vs object cases separately, avoiding lambda overhead entirely.

  2. Eliminated Redundant Dictionary Fallbacks - The original code used or {} patterns even when the dict was already checked:

    # Original
    resp_meta = getattr(message, "response_metadata", None) or {}
    gen_info = getattr(gen, "generation_info", None) or {}

    These created unnecessary empty dict objects. The optimized version removes the or {} since the subsequent isinstance() check handles None correctly.

  3. Reduced Dictionary Accesses in Fallback Chains - When checking resp_meta.get("token_usage") or resp_meta.get("usage", {}), the original code always evaluated both .get() calls and created an empty dict. The optimized version uses or resp_meta.get("usage") without the empty dict default, letting the subsequent isinstance() check filter out None values.

Why This Matters:

The line profiler shows the nested loops iterate ~1000-2000 times per call (1006 gen_list iterations × 2004 gen iterations). The original code had these expensive operations in the hot path:

  • Lambda creation: 2 hits but conceptually happens every time usage exists
  • Redundant or {} operations: 1002 + 1000 = 2002 empty dict creations
  • Extra .get() calls with dict defaults: thousands of unnecessary lookups

The optimized version specifically benefits test cases with:

  • Many generations without usage data (test_large_number_of_generations_with_usage_at_end): Reduced wasted work per empty generation
  • Usage in standardized locations (test_generations_usage_metadata_overrides_legacy_when_missing_total, test_usage_metadata_as_object_with_attributes_instead_of_dict): Faster object vs dict handling without lambda overhead

Since this is a callback handler for LangChain tracing, it's likely called frequently during LLM operations. The ~400μs reduction per call can significantly impact applications with high LLM usage rates.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 18 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
from types import \
    SimpleNamespace  # lightweight objects with attribute access for test inputs
from typing import Any

# imports
import pytest  # used for our unit tests
from langflow.services.tracing.native_callback import NativeCallbackHandler


def test_llm_output_token_usage_simple():
    # Create a handler instance. The tracer argument is not used by _extract_token_usage, so None is fine.
    handler = NativeCallbackHandler(tracer=None)

    # Build a response with llm_output containing token_usage in the legacy location.
    response = SimpleNamespace(
        llm_output={"token_usage": {"prompt_tokens": 1, "completion_tokens": 2, "total_tokens": 3}}
    )

    # Call the method and assert it extracts the values from llm_output correctly.
    prompt, completion, total = handler._extract_token_usage(response)


def test_generations_usage_metadata_overrides_legacy_when_missing_total():
    handler = NativeCallbackHandler(tracer=None)

    # Simulate a generation message with usage_metadata in the new standardized location.
    message = SimpleNamespace(usage_metadata={"input_tokens": 10, "output_tokens": 20, "total_tokens": 30})
    gen = SimpleNamespace(message=message)  # a single generation with a message field
    # generations is a list of lists of generation objects (per LangChain shape)
    response = SimpleNamespace(llm_output={}, generations=[[gen]])

    # Expect values pulled from message.usage_metadata
    prompt, completion, total = handler._extract_token_usage(response)


def test_response_metadata_provider_specific_fallback():
    handler = NativeCallbackHandler(tracer=None)

    # Some providers place usage in message.response_metadata -> token_usage.
    resp_meta = {"token_usage": {"prompt_tokens": 4, "completion_tokens": 5, "total_tokens": 9}}
    message = SimpleNamespace(response_metadata=resp_meta)
    gen = SimpleNamespace(message=message)
    response = SimpleNamespace(llm_output=None, generations=[[gen]])

    prompt, completion, total = handler._extract_token_usage(response)


def test_generation_info_usage_fallback_for_older_providers():
    handler = NativeCallbackHandler(tracer=None)

    # Some older adapters put usage in generation_info
    gen_info = {"usage": {"input_tokens": 7, "output_tokens": 8, "total_tokens": 15}}
    gen = SimpleNamespace(generation_info=gen_info)
    response = SimpleNamespace(llm_output=None, generations=[[gen]])

    prompt, completion, total = handler._extract_token_usage(response)


def test_partial_values_preserve_prior_values_when_not_overridden():
    handler = NativeCallbackHandler(tracer=None)

    # Legacy llm_output has prompt_tokens and total_tokens, but generation only provides completion_tokens.
    response = SimpleNamespace(
        llm_output={"token_usage": {"prompt_tokens": 1, "total_tokens": 3}},
        generations=[
            [
                SimpleNamespace(
                    message=SimpleNamespace(response_metadata={"usage": {"completion_tokens": 5}})
                )
            ]
        ],
    )

    # Expect prompt stays 1, completion updated to 5, total stays 3.
    prompt, completion, total = handler._extract_token_usage(response)


def test_no_usage_anywhere_returns_nones():
    handler = NativeCallbackHandler(tracer=None)

    # No llm_output and empty generations should yield (None, None, None)
    response = SimpleNamespace(llm_output=None, generations=[])
    prompt, completion, total = handler._extract_token_usage(response)


def test_usage_metadata_as_object_with_attributes_instead_of_dict():
    handler = NativeCallbackHandler(tracer=None)

    # usage_metadata might be an object with attributes rather than a dict. The code handles that by
    # creating a lambda that uses getattr on the object.
    usage_obj = SimpleNamespace(input_tokens=11, output_tokens=22, total_tokens=33)
    message = SimpleNamespace(usage_metadata=usage_obj)
    gen = SimpleNamespace(message=message)
    response = SimpleNamespace(llm_output=None, generations=[[gen]])

    prompt, completion, total = handler._extract_token_usage(response)


def test_large_number_of_generations_with_usage_at_end():
    handler = NativeCallbackHandler(tracer=None)

    # Build 1000 generation lists, each containing a single gen object without usage.
    gens = []
    for _ in range(999):
        gens.append([SimpleNamespace(message=SimpleNamespace(response_metadata={}))])

    # Place the real usage in the final generation to ensure code scans through many elements.
    final_message = SimpleNamespace(response_metadata={"token_usage": {"prompt_tokens": 100, "completion_tokens": 200, "total_tokens": 300}})
    gens.append([SimpleNamespace(message=final_message)])

    response = SimpleNamespace(llm_output=None, generations=gens)

    prompt, completion, total = handler._extract_token_usage(response)


def test_large_number_of_generations_with_early_break():
    handler = NativeCallbackHandler(tracer=None)

    # Place usage in the very first generation to ensure the function breaks out early despite large size.
    first_message = SimpleNamespace(response_metadata={"token_usage": {"prompt_tokens": 1, "completion_tokens": 2, "total_tokens": 3}})
    gens = [[SimpleNamespace(message=first_message)]]

    # Add many more empty generation lists which should not be traversed after the function finds a total.
    for _ in range(1000):
        gens.append([SimpleNamespace(message=SimpleNamespace(response_metadata={}))])

    response = SimpleNamespace(llm_output=None, generations=gens)

    prompt, completion, total = handler._extract_token_usage(response)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from uuid import UUID

# imports
import pytest
from langchain.schema import LLMResult
from langchain.schema.messages import BaseMessage, HumanMessage
from langflow.services.tracing.native import NativeTracer
from langflow.services.tracing.native_callback import NativeCallbackHandler


# fixtures
@pytest.fixture
def native_tracer():
    """Create a real NativeTracer instance for testing."""
    return NativeTracer()


@pytest.fixture
def callback_handler(native_tracer):
    """Create a real NativeCallbackHandler instance for testing."""
    return NativeCallbackHandler(tracer=native_tracer)

To edit these changes git checkout codeflash/optimize-pr11689-2026-02-28T01.48.16 and push.

Codeflash

Adam-Aghili and others added 30 commits February 9, 2026 16:12
v0 for traces includes:
- filters: status, token usage range and datatime
- accordian rows per trace

Could add:
- more filter options. Ecamples: session_id, trace_id and latency range
add sidebar buttons for logs and trace
remove lods canvas control
hopefully fix duplicate trace ID insertion on windows
update tests and alembic tables for uts
was flow_name - trace_id
now flow_name - flow_id
address gabriel simple changes in traces.py and native.py
model name is now set using name = f"{operation} {model_name}" if model_name else operation
* feat: use uv sources for CPU-only PyTorch

Configure [tool.uv.sources] with pytorch-cpu index to avoid ~6GB CUDA
dependencies in Docker images. This replaces hardcoded wheel URLs with
a cleaner index-based approach.

- Add pytorch-cpu index with explicit = true
- Add torch/torchvision to [tool.uv.sources]
- Add explicit torch/torchvision deps to trigger source override
- Regenerate lockfile without nvidia/cuda/triton packages
- Add required-environments for multi-platform support

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: update regex to only replace name in [project] section

The previous regex matched all lines starting with `name = "..."`,
which incorrectly renamed the UV index `pytorch-cpu` to `langflow-nightly`
during nightly builds. This caused `uv lock` to fail with:
"Package torch references an undeclared index: pytorch-cpu"

The new regex specifically targets the name field within the [project]
section only, avoiding unintended replacements in other sections like
[[tool.uv.index]].

* style: fix ruff quote style

* fix: remove required-environments to fix Python 3.13 macOS x86_64 CI

The required-environments setting was causing hard failures when packages
like torch didn't have wheels for specific platform/Python combinations.
Without this setting, uv resolves optimistically and handles missing wheels
gracefully at runtime instead of failing during resolution.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* LE-270: add fix hydration issues

* LE-270: fix disable field on max token on language model

---------

Co-authored-by: Olayinka Adelakun <olayinkaadelakun@mac.war.can.ibm.com>
* Add wait for selector in mcp server tests

* [autofix.ci] apply automated fixes

* Add more awit for selectors

* [autofix.ci] apply automated fixes

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
* Reduce lag in frontend by batching react events and reducing minimval visual build time

* Cleanup

* [autofix.ci] apply automated fixes

* add tests and improve code read

* [autofix.ci] apply automated fixes

* Remove debug log

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: cristhianzl <cristhian.lousa@gmail.com>
* Lazy load imports for language model component

Ensures that only the necessary dependencies are required.
For example, if OpenAI provider is used, it will now only
import langchain_openai, rather than requiring langchain_anthropic,
langchain_ibm, etc.

* Add backwards-compat functions

* [autofix.ci] apply automated fixes

* [autofix.ci] apply automated fixes (attempt 2/3)

* Add exception handling

* [autofix.ci] apply automated fixes

* [autofix.ci] apply automated fixes

* [autofix.ci] apply automated fixes

* [autofix.ci] apply automated fixes

* [autofix.ci] apply automated fixes

* [autofix.ci] apply automated fixes (attempt 2/3)

* comp index

* docs: azure default temperature (#11829)

* change-azure-openai-default-temperature-to-1.0

* [autofix.ci] apply automated fixes

* [autofix.ci] apply automated fixes (attempt 2/3)

* [autofix.ci] apply automated fixes (attempt 3/3)

* [autofix.ci] apply automated fixes

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

* [autofix.ci] apply automated fixes

* [autofix.ci] apply automated fixes (attempt 2/3)

* [autofix.ci] apply automated fixes

* [autofix.ci] apply automated fixes

* [autofix.ci] apply automated fixes

* [autofix.ci] apply automated fixes (attempt 2/3)

* fix unit test?

* add no-group dev to docker builds

* [autofix.ci] apply automated fixes

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: Hamza Rashid <74062092+HzaRashid@users.noreply.github.com>
Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com>
Adam-Aghili and others added 22 commits February 27, 2026 13:32
address backend code rabbit comments
address code rabbit frontend comments
test_native_tracer minor fix address c1
address C2 + C3
address H1-H5
update test_native_tracer
address m2
address M1
fix 422 spam and clean comments
address M12
 address M3
address M4
address M5
clean up for M7, M9, M11
address L2,L4,L5 and L6 + any test
alembic + comment clean up
The optimized code achieves a **95% speedup** (from 806μs to 412μs) by eliminating redundant operations in the token usage extraction logic.

**Key Optimizations:**

1. **Removed Lambda Function Creation** - The original code created a lambda function on every iteration when `usage` was not a dict:
   ```python
   _get = usage.get if isinstance(usage, dict) else lambda k, d=None, u=usage: getattr(u, k, d)
   ```
   This lambda was called 3 times per iteration. The optimized version uses explicit `if/else` branching to handle dict vs object cases separately, avoiding lambda overhead entirely.

2. **Eliminated Redundant Dictionary Fallbacks** - The original code used `or {}` patterns even when the dict was already checked:
   ```python
   # Original
   resp_meta = getattr(message, "response_metadata", None) or {}
   gen_info = getattr(gen, "generation_info", None) or {}
   ```
   These created unnecessary empty dict objects. The optimized version removes the `or {}` since the subsequent `isinstance()` check handles `None` correctly.

3. **Reduced Dictionary Accesses in Fallback Chains** - When checking `resp_meta.get("token_usage") or resp_meta.get("usage", {})`, the original code always evaluated both `.get()` calls and created an empty dict. The optimized version uses `or resp_meta.get("usage")` without the empty dict default, letting the subsequent `isinstance()` check filter out `None` values.

**Why This Matters:**

The line profiler shows the nested loops iterate ~1000-2000 times per call (1006 gen_list iterations × 2004 gen iterations). The original code had these expensive operations in the hot path:
- Lambda creation: 2 hits but conceptually happens every time `usage` exists
- Redundant `or {}` operations: 1002 + 1000 = 2002 empty dict creations
- Extra `.get()` calls with dict defaults: thousands of unnecessary lookups

The optimized version specifically benefits test cases with:
- **Many generations without usage data** (test_large_number_of_generations_with_usage_at_end): Reduced wasted work per empty generation
- **Usage in standardized locations** (test_generations_usage_metadata_overrides_legacy_when_missing_total, test_usage_metadata_as_object_with_attributes_instead_of_dict): Faster object vs dict handling without lambda overhead

Since this is a callback handler for LangChain tracing, it's likely called frequently during LLM operations. The ~400μs reduction per call can significantly impact applications with high LLM usage rates.
@codeflash-ai codeflash-ai Bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Feb 28, 2026
@github-actions github-actions Bot added the community Pull Request from an external contributor label Feb 28, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Frontend Unit Test Coverage Report

Coverage Summary

Lines Statements Branches Functions
Coverage: 22%
22.76% (7929/34837) 15.34% (4188/27287) 15.5% (1138/7341)

Unit Test Results

Tests Skipped Failures Errors Time
2599 0 💤 0 ❌ 0 🔥 42.144s ⏱️

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 36.53%. Comparing base (1e205f4) to head (63ed364).

❌ Your project check has failed because the head coverage (41.46%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@              Coverage Diff               @@
##           aka/traces-v0   #11944   +/-   ##
==============================================
  Coverage          36.53%   36.53%           
==============================================
  Files               1580     1580           
  Lines              77116    77116           
  Branches           11778    11778           
==============================================
  Hits               28178    28178           
  Misses             47325    47325           
  Partials            1613     1613           
Flag Coverage Δ
frontend 20.37% <ø> (ø)
lfx 41.46% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Base automatically changed from aka/traces-v0 to main March 2, 2026 20:30
@codeflash-ai codeflash-ai Bot closed this Mar 2, 2026
@codeflash-ai
Copy link
Copy Markdown
Contributor Author

codeflash-ai Bot commented Mar 2, 2026

This PR has been automatically closed because the original PR #11689 by Adam-Aghili was closed.

@codeflash-ai codeflash-ai Bot deleted the codeflash/optimize-pr11689-2026-02-28T01.48.16 branch March 2, 2026 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI community Pull Request from an external contributor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants