⚡️ Speed up method `NativeCallbackHandler._extract_token_usage` by 96% in PR #11689 (`aka/traces-v0`) by codeflash-ai[bot] · Pull Request #11944 · langflow-ai/langflow

codeflash-ai · 2026-02-28T01:48:22Z

⚡️ This pull request contains optimizations for PR #11689

If you approve this dependent PR, these changes will be merged into the original PR branch aka/traces-v0.

This PR will be automatically closed if the original PR is merged.

📄 96% (0.96x) speedup for `NativeCallbackHandler._extract_token_usage` in `src/backend/base/langflow/services/tracing/native_callback.py`

⏱️ Runtime : 806 microseconds → 412 microseconds (best of 18 runs)

📝 Explanation and details

The optimized code achieves a 95% speedup (from 806μs to 412μs) by eliminating redundant operations in the token usage extraction logic.

Key Optimizations:

Removed Lambda Function Creation - The original code created a lambda function on every iteration when usage was not a dict:
```
_get = usage.get if isinstance(usage, dict) else lambda k, d=None, u=usage: getattr(u, k, d)
```
This lambda was called 3 times per iteration. The optimized version uses explicit if/else branching to handle dict vs object cases separately, avoiding lambda overhead entirely.
Eliminated Redundant Dictionary Fallbacks - The original code used or {} patterns even when the dict was already checked:
```
# Original
resp_meta = getattr(message, "response_metadata", None) or {}
gen_info = getattr(gen, "generation_info", None) or {}
```
These created unnecessary empty dict objects. The optimized version removes the or {} since the subsequent isinstance() check handles None correctly.
Reduced Dictionary Accesses in Fallback Chains - When checking resp_meta.get("token_usage") or resp_meta.get("usage", {}), the original code always evaluated both .get() calls and created an empty dict. The optimized version uses or resp_meta.get("usage") without the empty dict default, letting the subsequent isinstance() check filter out None values.

Why This Matters:

The line profiler shows the nested loops iterate ~1000-2000 times per call (1006 gen_list iterations × 2004 gen iterations). The original code had these expensive operations in the hot path:

Lambda creation: 2 hits but conceptually happens every time usage exists
Redundant or {} operations: 1002 + 1000 = 2002 empty dict creations
Extra .get() calls with dict defaults: thousands of unnecessary lookups

The optimized version specifically benefits test cases with:

Many generations without usage data (test_large_number_of_generations_with_usage_at_end): Reduced wasted work per empty generation
Usage in standardized locations (test_generations_usage_metadata_overrides_legacy_when_missing_total, test_usage_metadata_as_object_with_attributes_instead_of_dict): Faster object vs dict handling without lambda overhead

Since this is a callback handler for LangChain tracing, it's likely called frequently during LLM operations. The ~400μs reduction per call can significantly impact applications with high LLM usage rates.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 18 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Click to see Generated Regression Tests

from types import \
    SimpleNamespace  # lightweight objects with attribute access for test inputs
from typing import Any

# imports
import pytest  # used for our unit tests
from langflow.services.tracing.native_callback import NativeCallbackHandler


def test_llm_output_token_usage_simple():
    # Create a handler instance. The tracer argument is not used by _extract_token_usage, so None is fine.
    handler = NativeCallbackHandler(tracer=None)

    # Build a response with llm_output containing token_usage in the legacy location.
    response = SimpleNamespace(
        llm_output={"token_usage": {"prompt_tokens": 1, "completion_tokens": 2, "total_tokens": 3}}
    )

    # Call the method and assert it extracts the values from llm_output correctly.
    prompt, completion, total = handler._extract_token_usage(response)


def test_generations_usage_metadata_overrides_legacy_when_missing_total():
    handler = NativeCallbackHandler(tracer=None)

    # Simulate a generation message with usage_metadata in the new standardized location.
    message = SimpleNamespace(usage_metadata={"input_tokens": 10, "output_tokens": 20, "total_tokens": 30})
    gen = SimpleNamespace(message=message)  # a single generation with a message field
    # generations is a list of lists of generation objects (per LangChain shape)
    response = SimpleNamespace(llm_output={}, generations=[[gen]])

    # Expect values pulled from message.usage_metadata
    prompt, completion, total = handler._extract_token_usage(response)


def test_response_metadata_provider_specific_fallback():
    handler = NativeCallbackHandler(tracer=None)

    # Some providers place usage in message.response_metadata -> token_usage.
    resp_meta = {"token_usage": {"prompt_tokens": 4, "completion_tokens": 5, "total_tokens": 9}}
    message = SimpleNamespace(response_metadata=resp_meta)
    gen = SimpleNamespace(message=message)
    response = SimpleNamespace(llm_output=None, generations=[[gen]])

    prompt, completion, total = handler._extract_token_usage(response)


def test_generation_info_usage_fallback_for_older_providers():
    handler = NativeCallbackHandler(tracer=None)

    # Some older adapters put usage in generation_info
    gen_info = {"usage": {"input_tokens": 7, "output_tokens": 8, "total_tokens": 15}}
    gen = SimpleNamespace(generation_info=gen_info)
    response = SimpleNamespace(llm_output=None, generations=[[gen]])

    prompt, completion, total = handler._extract_token_usage(response)


def test_partial_values_preserve_prior_values_when_not_overridden():
    handler = NativeCallbackHandler(tracer=None)

    # Legacy llm_output has prompt_tokens and total_tokens, but generation only provides completion_tokens.
    response = SimpleNamespace(
        llm_output={"token_usage": {"prompt_tokens": 1, "total_tokens": 3}},
        generations=[
            [
                SimpleNamespace(
                    message=SimpleNamespace(response_metadata={"usage": {"completion_tokens": 5}})
                )
            ]
        ],
    )

    # Expect prompt stays 1, completion updated to 5, total stays 3.
    prompt, completion, total = handler._extract_token_usage(response)


def test_no_usage_anywhere_returns_nones():
    handler = NativeCallbackHandler(tracer=None)

    # No llm_output and empty generations should yield (None, None, None)
    response = SimpleNamespace(llm_output=None, generations=[])
    prompt, completion, total = handler._extract_token_usage(response)


def test_usage_metadata_as_object_with_attributes_instead_of_dict():
    handler = NativeCallbackHandler(tracer=None)

    # usage_metadata might be an object with attributes rather than a dict. The code handles that by
    # creating a lambda that uses getattr on the object.
    usage_obj = SimpleNamespace(input_tokens=11, output_tokens=22, total_tokens=33)
    message = SimpleNamespace(usage_metadata=usage_obj)
    gen = SimpleNamespace(message=message)
    response = SimpleNamespace(llm_output=None, generations=[[gen]])

    prompt, completion, total = handler._extract_token_usage(response)


def test_large_number_of_generations_with_usage_at_end():
    handler = NativeCallbackHandler(tracer=None)

    # Build 1000 generation lists, each containing a single gen object without usage.
    gens = []
    for _ in range(999):
        gens.append([SimpleNamespace(message=SimpleNamespace(response_metadata={}))])

    # Place the real usage in the final generation to ensure code scans through many elements.
    final_message = SimpleNamespace(response_metadata={"token_usage": {"prompt_tokens": 100, "completion_tokens": 200, "total_tokens": 300}})
    gens.append([SimpleNamespace(message=final_message)])

    response = SimpleNamespace(llm_output=None, generations=gens)

    prompt, completion, total = handler._extract_token_usage(response)


def test_large_number_of_generations_with_early_break():
    handler = NativeCallbackHandler(tracer=None)

    # Place usage in the very first generation to ensure the function breaks out early despite large size.
    first_message = SimpleNamespace(response_metadata={"token_usage": {"prompt_tokens": 1, "completion_tokens": 2, "total_tokens": 3}})
    gens = [[SimpleNamespace(message=first_message)]]

    # Add many more empty generation lists which should not be traversed after the function finds a total.
    for _ in range(1000):
        gens.append([SimpleNamespace(message=SimpleNamespace(response_metadata={}))])

    response = SimpleNamespace(llm_output=None, generations=gens)

    prompt, completion, total = handler._extract_token_usage(response)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from uuid import UUID

# imports
import pytest
from langchain.schema import LLMResult
from langchain.schema.messages import BaseMessage, HumanMessage
from langflow.services.tracing.native import NativeTracer
from langflow.services.tracing.native_callback import NativeCallbackHandler


# fixtures
@pytest.fixture
def native_tracer():
    """Create a real NativeTracer instance for testing."""
    return NativeTracer()


@pytest.fixture
def callback_handler(native_tracer):
    """Create a real NativeCallbackHandler instance for testing."""
    return NativeCallbackHandler(tracer=native_tracer)

To edit these changes git checkout codeflash/optimize-pr11689-2026-02-28T01.48.16 and push.

v0 for traces includes: - filters: status, token usage range and datatime - accordian rows per trace Could add: - more filter options. Ecamples: session_id, trace_id and latency range

add sidebar buttons for logs and trace remove lods canvas control

hopefully fix duplicate trace ID insertion on windows

update tests and alembic tables for uts

was flow_name - trace_id now flow_name - flow_id

address gabriel simple changes in traces.py and native.py

#11689 (comment) #11689 (comment)

model name is now set using name = f"{operation} {model_name}" if model_name else operation

* feat: use uv sources for CPU-only PyTorch Configure [tool.uv.sources] with pytorch-cpu index to avoid ~6GB CUDA dependencies in Docker images. This replaces hardcoded wheel URLs with a cleaner index-based approach. - Add pytorch-cpu index with explicit = true - Add torch/torchvision to [tool.uv.sources] - Add explicit torch/torchvision deps to trigger source override - Regenerate lockfile without nvidia/cuda/triton packages - Add required-environments for multi-platform support Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: update regex to only replace name in [project] section The previous regex matched all lines starting with `name = "..."`, which incorrectly renamed the UV index `pytorch-cpu` to `langflow-nightly` during nightly builds. This caused `uv lock` to fail with: "Package torch references an undeclared index: pytorch-cpu" The new regex specifically targets the name field within the [project] section only, avoiding unintended replacements in other sections like [[tool.uv.index]]. * style: fix ruff quote style * fix: remove required-environments to fix Python 3.13 macOS x86_64 CI The required-environments setting was causing hard failures when packages like torch didn't have wheels for specific platform/Python combinations. Without this setting, uv resolves optimistically and handles missing wheels gracefully at runtime instead of failing during resolution. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* LE-270: add fix hydration issues * LE-270: fix disable field on max token on language model --------- Co-authored-by: Olayinka Adelakun <olayinkaadelakun@mac.war.can.ibm.com>

* Add wait for selector in mcp server tests * [autofix.ci] apply automated fixes * Add more awit for selectors * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

* Reduce lag in frontend by batching react events and reducing minimval visual build time * Cleanup * [autofix.ci] apply automated fixes * add tests and improve code read * [autofix.ci] apply automated fixes * Remove debug log --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: cristhianzl <cristhian.lousa@gmail.com>

* Lazy load imports for language model component Ensures that only the necessary dependencies are required. For example, if OpenAI provider is used, it will now only import langchain_openai, rather than requiring langchain_anthropic, langchain_ibm, etc. * Add backwards-compat functions * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * Add exception handling * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * comp index * docs: azure default temperature (#11829) * change-azure-openai-default-temperature-to-1.0 * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * [autofix.ci] apply automated fixes (attempt 3/3) * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * fix unit test? * add no-group dev to docker builds * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Hamza Rashid <74062092+HzaRashid@users.noreply.github.com> Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com>

address backend code rabbit comments

address code rabbit frontend comments

test_native_tracer minor fix address c1

address C2 + C3

address H1-H5

update test_native_tracer

address m2

… into aka/traces-v0

address M1

fix 422 spam and clean comments

address M12

address M3

address M4

address M5

clean up for M7, M9, M11

address L2,L4,L5 and L6 + any test

alembic + comment clean up

The optimized code achieves a **95% speedup** (from 806μs to 412μs) by eliminating redundant operations in the token usage extraction logic. **Key Optimizations:** 1. **Removed Lambda Function Creation** - The original code created a lambda function on every iteration when `usage` was not a dict: ```python _get = usage.get if isinstance(usage, dict) else lambda k, d=None, u=usage: getattr(u, k, d) ``` This lambda was called 3 times per iteration. The optimized version uses explicit `if/else` branching to handle dict vs object cases separately, avoiding lambda overhead entirely. 2. **Eliminated Redundant Dictionary Fallbacks** - The original code used `or {}` patterns even when the dict was already checked: ```python # Original resp_meta = getattr(message, "response_metadata", None) or {} gen_info = getattr(gen, "generation_info", None) or {} ``` These created unnecessary empty dict objects. The optimized version removes the `or {}` since the subsequent `isinstance()` check handles `None` correctly. 3. **Reduced Dictionary Accesses in Fallback Chains** - When checking `resp_meta.get("token_usage") or resp_meta.get("usage", {})`, the original code always evaluated both `.get()` calls and created an empty dict. The optimized version uses `or resp_meta.get("usage")` without the empty dict default, letting the subsequent `isinstance()` check filter out `None` values. **Why This Matters:** The line profiler shows the nested loops iterate ~1000-2000 times per call (1006 gen_list iterations × 2004 gen iterations). The original code had these expensive operations in the hot path: - Lambda creation: 2 hits but conceptually happens every time `usage` exists - Redundant `or {}` operations: 1002 + 1000 = 2002 empty dict creations - Extra `.get()` calls with dict defaults: thousands of unnecessary lookups The optimized version specifically benefits test cases with: - **Many generations without usage data** (test_large_number_of_generations_with_usage_at_end): Reduced wasted work per empty generation - **Usage in standardized locations** (test_generations_usage_metadata_overrides_legacy_when_missing_total, test_usage_metadata_as_object_with_attributes_instead_of_dict): Faster object vs dict handling without lambda overhead Since this is a callback handler for LangChain tracing, it's likely called frequently during LLM operations. The ~400μs reduction per call can significantly impact applications with high LLM usage rates.

github-actions · 2026-02-28T01:50:39Z

Frontend Unit Test Coverage Report

Coverage Summary

Lines	Statements	Branches	Functions
	22.76% (7929/34837)	15.34% (4188/27287)	15.5% (1138/7341)

Unit Test Results

Tests	Skipped	Failures	Errors	Time
2599	0 💤	0 ❌	0 🔥	42.144s ⏱️

codecov · 2026-02-28T01:50:57Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 36.53%. Comparing base (1e205f4) to head (63ed364).

❌ Your project check has failed because the head coverage (41.46%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@              Coverage Diff               @@
##           aka/traces-v0   #11944   +/-   ##
==============================================
  Coverage          36.53%   36.53%           
==============================================
  Files               1580     1580           
  Lines              77116    77116           
  Branches           11778    11778           
==============================================
  Hits               28178    28178           
  Misses             47325    47325           
  Partials            1613     1613

Flag	Coverage Δ
frontend	`20.37% <ø> (ø)`
lfx	`41.46% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codeflash-ai · 2026-03-02T20:30:22Z

This PR has been automatically closed because the original PR #11689 by Adam-Aghili was closed.

Adam-Aghili and others added 30 commits February 9, 2026 16:12

feat: traces v0

8e7f901

v0 for traces includes: - filters: status, token usage range and datatime - accordian rows per trace Could add: - more filter options. Ecamples: session_id, trace_id and latency range

fix: token range

76ab2db

Merge branch 'main' into aka/traces-v0

4361126

feat: create sidebar buttons for logs and trace

47ddd89

add sidebar buttons for logs and trace remove lods canvas control

fix: fix duplicate trace ID insertion

6600166

hopefully fix duplicate trace ID insertion on windows

Merge branch 'main' into aka/traces-v0

514dc93

fix: update tests and alembic tables for uts

71dd799

update tests and alembic tables for uts

chore: add session_id

dfee4f4

chore: allo grouping by session_id and flow_id

79a61e9

chore: update race input output

5faf212

chore: change run name to flow_name - flow_id

bee6d9b

was flow_name - trace_id now flow_name - flow_id

facelift

517bddd

Merge remote-tracking branch 'origin/main' into aka/traces-v0

6b4f3c1

clean up and add testcases

5e37372

clean up and add testcases

3b5da15

Merge remote-tracking branch 'origin/main' into aka/traces-v0

560f044

merge Alembic detected multiple heads

fd9234b

[autofix.ci] apply automated fixes

b943827

improve testcases

6029e75

remodel files

8a52f21

chore: address gabriel simple changes

8d5d0ae

address gabriel simple changes in traces.py and native.py

clean up and testcases

df8d48f

chore: address OTel and PG status comments

394285e

#11689 (comment) #11689 (comment)

chore: OTel span naming convention

55033d2

model name is now set using name = f"{operation} {model_name}" if model_name else operation

add traces

a9fbef8

LE-270: Hydration and Console Log error (#11628)

85b0fbe

* LE-270: add fix hydration issues * LE-270: fix disable field on max token on language model --------- Co-authored-by: Olayinka Adelakun <olayinkaadelakun@mac.war.can.ibm.com>

Adam-Aghili and others added 22 commits February 27, 2026 13:32

Merge branch 'main' into aka/traces-v0

bfc2e17

chore: address backend code rabbit comments

66c7e88

address backend code rabbit comments

chore: address code rabbit frontend comments

68ffc34

address code rabbit frontend comments

chore: test_native_tracer minor fix address c1

a40ab4e

test_native_tracer minor fix address c1

chore: address C2 + C3

296db40

address C2 + C3

chore: address H1-H5

8ad0751

address H1-H5

test: update test_native_tracer

6234f55

update test_native_tracer

fixes

628077e

chore: address M2

2793167

address m2

Merge branch 'aka/traces-v0' of https://github.com/langflow-ai/langflow…

e905896

… into aka/traces-v0

chore: address M1

661b08a

address M1

dry changes, factorization

bc06289

chore: fix 422 spam and clean comments

e2810ff

fix 422 spam and clean comments

chore: address M12

2272a17

address M12

chore: address M3

7554e68

address M3

chore: address M4

92c4526

address M4

chore: address M5

a081bd6

address M5

chore: clean up for M7, M9, M11

7341ce0

clean up for M7, M9, M11

chore: address L2,L4,L5,L6 + any test

7166829

address L2,L4,L5 and L6 + any test

Merge branch 'main' into aka/traces-v0

938b9ea

chore: alembic + comment clean up

1e205f4

alembic + comment clean up

codeflash-ai Bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Feb 28, 2026

github-actions Bot added the community Pull Request from an external contributor label Feb 28, 2026

Base automatically changed from aka/traces-v0 to main March 2, 2026 20:30

codeflash-ai Bot closed this Mar 2, 2026

codeflash-ai Bot deleted the codeflash/optimize-pr11689-2026-02-28T01.48.16 branch March 2, 2026 20:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡️ Speed up method `NativeCallbackHandler._extract_token_usage` by 96% in PR #11689 (`aka/traces-v0`)#11944

⚡️ Speed up method `NativeCallbackHandler._extract_token_usage` by 96% in PR #11689 (`aka/traces-v0`)#11944
codeflash-ai[bot] wants to merge 72 commits into
mainfrom
codeflash/optimize-pr11689-2026-02-28T01.48.16

codeflash-ai Bot commented Feb 28, 2026

Uh oh!

github-actions Bot commented Feb 28, 2026

Uh oh!

codecov Bot commented Feb 28, 2026

Uh oh!

codeflash-ai Bot commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

codeflash-ai Bot commented Feb 28, 2026

⚡️ This pull request contains optimizations for PR #11689

📄 96% (0.96x) speedup for NativeCallbackHandler._extract_token_usage in src/backend/base/langflow/services/tracing/native_callback.py

📝 Explanation and details

Uh oh!

github-actions Bot commented Feb 28, 2026

Frontend Unit Test Coverage Report

Coverage Summary

Unit Test Results

Uh oh!

codecov Bot commented Feb 28, 2026

Codecov Report

Uh oh!

codeflash-ai Bot commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

📄 96% (0.96x) speedup for `NativeCallbackHandler._extract_token_usage` in `src/backend/base/langflow/services/tracing/native_callback.py`