⚡️ Speed up method NativeTracer._map_trace_type by 136% in PR #11934 (temp-branch)#11937
Closed
codeflash-ai[bot] wants to merge 1 commit into
Closed
⚡️ Speed up method NativeTracer._map_trace_type by 136% in PR #11934 (temp-branch)#11937codeflash-ai[bot] wants to merge 1 commit into
NativeTracer._map_trace_type by 136% in PR #11934 (temp-branch)#11937codeflash-ai[bot] wants to merge 1 commit into
Conversation
Brief: The optimized version speeds up _map_trace_type and initialization by removing per-call allocations and reducing string scanning. The key wins are (1) moving the trace-type dict to a module-level constant so it is not rebuilt on every call, and (2) using str.rpartition(" - ") for flow_id extraction to avoid scanning/splitting the trace_name twice. Together these reduce CPU work, temporary allocations, and Python bytecode executed per call — producing the measured ~135% speedup.
What changed (concrete):
- Moved the mapping dict from inside _map_trace_type to a module-level _TYPE_MAP constant. The static method now does a single _TYPE_MAP.get(trace_type.lower(), SpanType.CHAIN).
- Replaced the flow_id fallback logic flow_id or (trace_name.split(" - ")[-1] if " - " in trace_name else trace_name) with flow_id or trace_name.rpartition(" - ")[-1].
- Minor reorganization of imports and annotations (no behavioral change).
Why this is faster:
- Module-level mapping: building a dict is non-trivial (creates new objects and memory each call). The original profiler shows significant time spent on those dict literal lines every call. By creating the dict once at import time, each call to _map_trace_type only does a lower() and a dict lookup (both cheap), eliminating repeated allocations and GC churn.
- rpartition vs "in"+"split": the original code did an "in" test and a split (or used split[-1]) which can scan the string twice and allocate a list. rpartition scans once and returns the parts without creating a list of arbitrary length; that reduces CPU and allocations when building flow_id.
- Fewer Python-level operations: fewer bytecode instructions and attribute lookups per call. The optimized _map_trace_type is one line, which the line profiler confirms — nearly all time becomes the .lower() + .get() cost.
How this affects workloads:
- Big benefit when _map_trace_type is called many times (hot path, loops, repeated mapping). The annotated tests that call the mapper thousands of times (large-scale deterministic tests and repeated calls) are the cases that show the largest improvements.
- If _map_trace_type is only called occasionally (e.g., once per process start), the user-exvisible effect is small. But if many tracers are created or trace types are resolved repeatedly, the improvement compounds.
- The rpartition change speeds up tracer initialization where flow_id fallback is executed; again, more beneficial if many tracer objects are constructed.
Behavioral compatibility:
- Semantics are preserved: case-insensitive matching via .lower() is unchanged, unknown values still default to SpanType.CHAIN, and rpartition returns the full trace_name when the separator is absent (same as the original split behavior).
- No change to error behavior for non-string inputs (calling .lower() on None will still raise AttributeError), so tests that assert current behavior still pass.
Tests that benefit most:
- Repeated-call and large-scale tests (1000-entry deterministic mapping, repeated calls loops) — these are the scenarios where the profiler and runtime show the greatest gains.
Codecov Report❌ Patch coverage is
❌ Your project check has failed because the head coverage (42.40%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## temp-branch #11937 +/- ##
===============================================
- Coverage 35.74% 35.73% -0.01%
===============================================
Files 1532 1532
Lines 74562 74562
Branches 11146 11146
===============================================
- Hits 26651 26645 -6
- Misses 46475 46481 +6
Partials 1436 1436
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Contributor
Author
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
⚡️ This pull request contains optimizations for PR #11934
If you approve this dependent PR, these changes will be merged into the original PR branch
temp-branch.📄 136% (1.36x) speedup for
NativeTracer._map_trace_typeinsrc/backend/base/langflow/services/tracing/native.py⏱️ Runtime :
1.23 milliseconds→523 microseconds(best of295runs)📝 Explanation and details
Brief: The optimized version speeds up _map_trace_type and initialization by removing per-call allocations and reducing string scanning. The key wins are (1) moving the trace-type dict to a module-level constant so it is not rebuilt on every call, and (2) using str.rpartition(" - ") for flow_id extraction to avoid scanning/splitting the trace_name twice. Together these reduce CPU work, temporary allocations, and Python bytecode executed per call — producing the measured ~135% speedup.
What changed (concrete):
Why this is faster:
How this affects workloads:
Behavioral compatibility:
Tests that benefit most:
✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-pr11934-2026-02-27T10.49.30and push.