⚡️ Speed up function `calculate_text_metrics` by 16% in PR #6732 (`better-langflow-base`) by codeflash-ai[bot] · Pull Request #11332 · langflow-ai/langflow

codeflash-ai · 2026-01-16T20:00:40Z

⚡️ This pull request contains optimizations for PR #6732

If you approve this dependent PR, these changes will be merged into the original PR branch better-langflow-base.

This PR will be automatically closed if the original PR is merged.

📄 16% (0.16x) speedup for `calculate_text_metrics` in `src/backend/base/langflow/api/v1/knowledge_bases.py`

⏱️ Runtime : 46.5 milliseconds → 40.0 milliseconds (best of 96 runs)

📝 Explanation and details

The optimized code achieves a 16% speedup (from 46.5ms to 40.0ms) through two key algorithmic improvements:

1. Vectorized Regex Word Counting (Primary Optimization)

What changed:

Original: text_series.str.split().str.len().sum() - splits every string into a Python list of words, then counts list lengths
Optimized: text_series.str.count(_WORD_RE).sum() with precompiled regex r'\S+' - counts non-whitespace sequences directly without materializing lists

Why it's faster:
The original approach creates intermediate Python list objects for every row during .str.split(), which triggers significant memory allocation and garbage collection overhead. The optimized version uses pandas' vectorized regex counting that operates at the C level, avoiding the costly list materialization step entirely.

Performance impact from profiler:

Original word counting: 73.2ms (42.2% of total time)
Optimized word counting: 43.5ms (30.6% of total time)
~41% reduction in this operation alone

The precompiled regex _WORD_RE is defined once at module load, eliminating repeated pattern compilation on every call.

2. Set-Based Column Membership Check

What changed:

Original: if col not in df.columns - checks membership against pandas Index
Optimized: columns_set = set(df.columns) followed by if col not in columns_set

Why it's faster:
Set lookups are O(1) vs O(n) for pandas Index sequential search. With multiple columns to check, this adds up.

Performance impact from profiler:

Original column checks: 2.25ms (1.3% of total time)
Optimized column checks: 0.08ms (0.1% of total time)
~96% reduction in this operation

Test Case Performance

The optimization excels across all test categories:

Large-scale tests (500+ rows): Maximum benefit from vectorized operations avoiding per-row list creation
Multiple column tests: Set-based membership check overhead pays off when checking multiple columns
Unicode/emoji tests: Regex approach handles these correctly while maintaining performance
Edge cases (empty strings, None values): Behavior preserved via .fillna("") and regex semantics

The optimization maintains correctness because \S+ (non-whitespace sequences) matches the same word boundaries as .split() for all practical text inputs, while being significantly more efficient at the pandas/numpy vectorization level.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 51 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	91.7%

🌀 Click to see Generated Regression Tests

import numpy as np
import pandas as pd
# imports
import pytest  # used for our unit tests
from langflow.api.v1.knowledge_bases import calculate_text_metrics


# function to test
def _to_int(value) -> int:
    """Convert a pandas/numpy scalar to int, handling both old and new pandas versions."""
    # Newer pandas returns native Python types, older versions return numpy scalars with .item()
    if hasattr(value, "item"):
        return int(value.item())
    return int(value)


# unit tests

def test_basic_single_column():
    # Basic functionality: one text column with simple sentences
    df = pd.DataFrame({
        "text": ["hello world", "foo"]
    })
    # "hello world" -> 2 words, length 11; "foo" -> 1 word, length 3
    expected_words = 3
    expected_chars = len("hello world") + len("foo")  # 11 + 3 = 14

    words, chars = calculate_text_metrics(df, ["text"])


def test_basic_multiple_columns():
    # Multiple text columns should aggregate across columns
    df = pd.DataFrame({
        "a": ["one two", "three"],
        "b": ["x", "y z"]
    })
    # Column 'a': 3 words, lengths = len("one two") + len("three") = 7 + 5 = 12
    # Column 'b': 3 words, lengths = len("x") + len("y z") = 1 + 3 = 4
    expected_words = 3 + 3
    expected_chars = 12 + 4

    words, chars = calculate_text_metrics(df, ["a", "b"])


def test_missing_columns_ignored():
    # If a column name in text_columns is missing from df, it should be ignored
    df = pd.DataFrame({"text": ["alpha beta"]})
    # Provide a missing column name "missing" as well
    words, chars = calculate_text_metrics(df, ["missing", "text", "also_missing"])


def test_non_string_and_nan_handling():
    # The function uses .astype(str).fillna("") - check conversion of non-string values
    df = pd.DataFrame({
        "text": [None, np.nan, 123, True, ""]
    })

    # Note: astype(str) converts None -> "None" and np.nan -> "nan" (string)
    # So we expect them to be counted as strings.
    expected_strings = [str(None), str(np.nan), str(123), str(True), ""]
    # Compute expected counts explicitly to avoid depending on implementation
    expected_chars = sum(len(s) for s in expected_strings)
    expected_words = sum(len(s.split()) for s in expected_strings)  # split on whitespace

    words, chars = calculate_text_metrics(df, ["text"])


def test_whitespace_and_multispace_handling():
    # Multiple spaces, newlines and tabs should be treated as whitespace by .str.split()
    df = pd.DataFrame({
        "text": ["  hello   world \n tab\tend  ", "\n\nsingle"]
    })

    # For first string: words are ["hello", "world", "tab", "end"] -> 4
    # For second string: ["single"] -> 1
    expected_words = 4 + 1
    # Characters counted including whitespace characters
    expected_chars = len("  hello   world \n tab\tend  ") + len("\n\nsingle")

    words, chars = calculate_text_metrics(df, ["text"])


def test_unicode_and_emoji_handling():
    # Ensure unicode (including emojis) are counted correctly.
    df = pd.DataFrame({
        "text": ["café naïve", "emoji 🙂 test", "中文 字"]
    })
    # Words: "café naïve" -> 2, "emoji 🙂 test" -> 3 (emoji is its own token), "中文 字" -> 2
    expected_words = 2 + 3 + 2
    # Characters measured in Python are codepoints (len counts codepoints)
    expected_chars = len("café naïve") + len("emoji 🙂 test") + len("中文 字")

    words, chars = calculate_text_metrics(df, ["text"])


def test_duplicate_columns_counted_twice():
    # If same column appears multiple times in text_columns, it should be processed each time
    df = pd.DataFrame({
        "text": ["a b", "c"]
    })
    # Single column would give: words = 3, chars = len("a b") + len("c") = 3 + 1 = 4
    single_words, single_chars = calculate_text_metrics(df, ["text"])

    # If we provide the column twice, the totals should double
    double_words, double_chars = calculate_text_metrics(df, ["text", "text"])


def test_empty_text_columns_and_empty_df():
    # If text_columns is empty, result should be zeros regardless of df
    df = pd.DataFrame({"a": ["one"]})
    words, chars = calculate_text_metrics(df, [])

    # If df is empty, counts should be zero even if columns are listed
    empty_df = pd.DataFrame(columns=["text"])
    words, chars = calculate_text_metrics(empty_df, ["text"])


def test_large_scale_correctness_and_types():
    # Large-scale-style test but keeps data structure under 1000 elements for this environment
    num_rows = 250  # keep under 1000 as requested
    # Use repeating short strings to make expected values simple to compute
    base_strings = ["alpha beta", "gamma"]
    # Build DataFrame with two text columns and 250 rows: total elements = 250 * 2 = 500 (<1000)
    df = pd.DataFrame({
        "col1": [base_strings[0]] * num_rows,
        "col2": [base_strings[1]] * num_rows
    })

    # For col1: each row -> 2 words, len("alpha beta") = 10
    # For col2: each row -> 1 word, len("gamma") = 5
    expected_words = num_rows * (2 + 1)
    expected_chars = num_rows * (10 + 5)

    words, chars = calculate_text_metrics(df, ["col1", "col2"])
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import pandas as pd
import pytest
from langflow.api.v1.knowledge_bases import calculate_text_metrics


class TestCalculateTextMetricsBasic:
    """Basic test cases for calculate_text_metrics function under normal conditions."""

    def test_single_column_single_row(self):
        """Test with a single text column and single row of data."""
        df = pd.DataFrame({"text": ["hello world"]})
        words, characters = calculate_text_metrics(df, ["text"])

    def test_single_column_multiple_rows(self):
        """Test with a single column containing multiple rows."""
        df = pd.DataFrame({"text": ["hello", "world", "test"]})
        words, characters = calculate_text_metrics(df, ["text"])

    def test_multiple_columns(self):
        """Test with multiple text columns."""
        df = pd.DataFrame({
            "col1": ["hello world"],
            "col2": ["foo bar"]
        })
        words, characters = calculate_text_metrics(df, ["col1", "col2"])

    def test_single_word_entries(self):
        """Test with entries that contain only single words."""
        df = pd.DataFrame({"text": ["hello", "world", "python"]})
        words, characters = calculate_text_metrics(df, ["text"])

    def test_text_with_punctuation(self):
        """Test that punctuation is counted as part of characters."""
        df = pd.DataFrame({"text": ["hello, world!"]})
        words, characters = calculate_text_metrics(df, ["text"])

    def test_text_with_numbers(self):
        """Test text containing numeric values."""
        df = pd.DataFrame({"text": ["test 123 example"]})
        words, characters = calculate_text_metrics(df, ["text"])

    def test_mixed_whitespace(self):
        """Test handling of different whitespace characters."""
        df = pd.DataFrame({"text": ["hello  world"]})  # double space
        words, characters = calculate_text_metrics(df, ["text"])


class TestCalculateTextMetricsEdgeCases:
    """Edge case test cases for unusual or extreme conditions."""

    def test_empty_dataframe(self):
        """Test with empty dataframe."""
        df = pd.DataFrame({"text": []})
        words, characters = calculate_text_metrics(df, ["text"])

    def test_empty_string_entries(self):
        """Test with empty string values."""
        df = pd.DataFrame({"text": ["", "", ""]})
        words, characters = calculate_text_metrics(df, ["text"])

    def test_none_values_in_column(self):
        """Test with None/NaN values in the column."""
        df = pd.DataFrame({"text": ["hello", None, "world"]})
        words, characters = calculate_text_metrics(df, ["text"])

    def test_column_not_in_dataframe(self):
        """Test when specified column doesn't exist in dataframe."""
        df = pd.DataFrame({"other_col": ["hello"]})
        words, characters = calculate_text_metrics(df, ["missing_col"])

    def test_mix_of_existing_and_missing_columns(self):
        """Test when some columns exist and some don't."""
        df = pd.DataFrame({"col1": ["hello world"], "col2": ["foo"]})
        words, characters = calculate_text_metrics(df, ["col1", "missing", "col2"])

    def test_whitespace_only_strings(self):
        """Test with strings that contain only whitespace."""
        df = pd.DataFrame({"text": ["   ", "\t", "\n"]})
        words, characters = calculate_text_metrics(df, ["text"])

    def test_single_character_entries(self):
        """Test with single character entries."""
        df = pd.DataFrame({"text": ["a", "b", "c"]})
        words, characters = calculate_text_metrics(df, ["text"])

    def test_numeric_datatype_column(self):
        """Test conversion of numeric columns to string."""
        df = pd.DataFrame({"numbers": [123, 456, 789]})
        words, characters = calculate_text_metrics(df, ["numbers"])

    def test_float_datatype_column(self):
        """Test conversion of float columns to string."""
        df = pd.DataFrame({"floats": [1.5, 2.7, 3.14]})
        words, characters = calculate_text_metrics(df, ["floats"])

    def test_boolean_datatype_column(self):
        """Test conversion of boolean columns to string."""
        df = pd.DataFrame({"bools": [True, False, True]})
        words, characters = calculate_text_metrics(df, ["bools"])

    def test_very_long_single_word(self):
        """Test with extremely long single word."""
        long_word = "a" * 1000
        df = pd.DataFrame({"text": [long_word]})
        words, characters = calculate_text_metrics(df, ["text"])

    def test_empty_column_list(self):
        """Test with empty column list."""
        df = pd.DataFrame({"text": ["hello world"]})
        words, characters = calculate_text_metrics(df, [])

    def test_duplicate_column_names_in_list(self):
        """Test when same column is specified multiple times."""
        df = pd.DataFrame({"text": ["hello"]})
        words, characters = calculate_text_metrics(df, ["text", "text"])

    def test_case_sensitivity_in_column_names(self):
        """Test that column name lookup is case-sensitive."""
        df = pd.DataFrame({"Text": ["hello"]})
        words, characters = calculate_text_metrics(df, ["text"])  # lowercase

    def test_special_characters_and_unicode(self):
        """Test with special characters and unicode characters."""
        df = pd.DataFrame({"text": ["héllo wørld 你好"]})
        words, characters = calculate_text_metrics(df, ["text"])

    def test_tabs_as_word_separators(self):
        """Test that tabs are treated as word separators."""
        df = pd.DataFrame({"text": ["hello\tworld\ttest"]})
        words, characters = calculate_text_metrics(df, ["text"])

    def test_newlines_as_word_separators(self):
        """Test that newlines are treated as word separators."""
        df = pd.DataFrame({"text": ["hello\nworld\ntest"]})
        words, characters = calculate_text_metrics(df, ["text"])


class TestCalculateTextMetricsLargeScale:
    """Large scale test cases for performance and scalability assessment."""

    def test_large_number_of_rows(self):
        """Test with large number of rows in dataframe."""
        # Create 500 rows with varying text content
        texts = ["hello world"] * 500
        df = pd.DataFrame({"text": texts})
        words, characters = calculate_text_metrics(df, ["text"])

    def test_large_text_content(self):
        """Test with large text content in single entry."""
        # Create a single entry with many words
        large_text = " ".join(["word"] * 500)
        df = pd.DataFrame({"text": [large_text]})
        words, characters = calculate_text_metrics(df, ["text"])

    def test_many_columns(self):
        """Test with many text columns."""
        # Create 50 columns with text
        data = {f"col_{i}": ["hello world"] for i in range(50)}
        df = pd.DataFrame(data)
        column_names = [f"col_{i}" for i in range(50)]
        words, characters = calculate_text_metrics(df, column_names)

    def test_mixed_content_large_dataframe(self):
        """Test with mixed content types in large dataframe."""
        # Create 300 rows with mixed content
        texts = [
            "hello world",
            "foo bar baz",
            "single",
            "",
            None,
            "a b c d e f g h i j",
        ] * 50
        df = pd.DataFrame({"text": texts})
        words, characters = calculate_text_metrics(df, ["text"])

    def test_multiple_columns_large_dataframe(self):
        """Test with multiple columns and large dataframe."""
        # Create 200 rows with 5 columns
        df = pd.DataFrame({
            "col1": ["hello"] * 200,
            "col2": ["world test"] * 200,
            "col3": ["foo bar baz"] * 200,
            "col4": ["single"] * 200,
            "col5": ["a b c"] * 200,
        })
        words, characters = calculate_text_metrics(df, ["col1", "col2", "col3", "col4", "col5"])

    def test_very_long_words_in_large_data(self):
        """Test performance with very long words in large dataset."""
        # Create 100 rows with long words
        long_word = "a" * 100
        texts = [f"{long_word} {long_word}"] * 100
        df = pd.DataFrame({"text": texts})
        words, characters = calculate_text_metrics(df, ["text"])

    def test_sparse_data_large_dataframe(self):
        """Test with sparse data (many empty/None values) in large dataframe."""
        # Create 400 rows where 75% are empty or None
        texts = ["hello world"] * 100 + [""] * 150 + [None] * 150
        df = pd.DataFrame({"text": texts})
        words, characters = calculate_text_metrics(df, ["text"])

    def test_categorical_data_large_scale(self):
        """Test with categorical-like data in large dataset."""
        # Create 300 rows with repeated categories
        categories = ["product review", "customer feedback", "user comment"] * 100
        df = pd.DataFrame({"text": categories})
        words, characters = calculate_text_metrics(df, ["text"])

    def test_incremental_data_accumulation(self):
        """Test that metrics accumulate correctly across rows."""
        # Create incrementally larger dataset and verify accumulation
        for size in [10, 50, 100, 200]:
            df = pd.DataFrame({"text": ["hello world"] * size})
            words, characters = calculate_text_metrics(df, ["text"])

    def test_memory_efficient_processing(self):
        """Test that function processes data efficiently without excessive memory use."""
        # Create a moderately large dataframe
        df = pd.DataFrame({
            "text1": ["hello world test"] * 300,
            "text2": ["foo bar baz"] * 300,
            "text3": ["single word"] * 300,
        })
        # This should complete without issues
        words, characters = calculate_text_metrics(df, ["text1", "text2", "text3"])

    def test_consistency_across_multiple_calls(self):
        """Test that multiple calls return consistent results."""
        df = pd.DataFrame({"text": ["hello world"] * 250})
        # Call function multiple times
        codeflash_output = calculate_text_metrics(df, ["text"]); result1 = codeflash_output
        codeflash_output = calculate_text_metrics(df, ["text"]); result2 = codeflash_output
        codeflash_output = calculate_text_metrics(df, ["text"]); result3 = codeflash_output
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr6732-2026-01-16T20.00.34 and push.

…rganization and clarity

…ctionality and organization

… comprehensive support

…ndencies in pyproject.toml

…on and restructure complete installation groups

…oject.toml

…for existing packages in pyproject.toml

…local' to 'complete' installation and remove dev extra

… structure

… in pyproject.toml

# Conflicts: # pyproject.toml # src/backend/base/pyproject.toml # uv.lock

# Conflicts: # src/lfx/src/lfx/_assets/component_index.json

The optimized code achieves a **16% speedup** (from 46.5ms to 40.0ms) through two key algorithmic improvements: ## **1. Vectorized Regex Word Counting (Primary Optimization)** **What changed:** - **Original:** `text_series.str.split().str.len().sum()` - splits every string into a Python list of words, then counts list lengths - **Optimized:** `text_series.str.count(_WORD_RE).sum()` with precompiled regex `r'\S+'` - counts non-whitespace sequences directly without materializing lists **Why it's faster:** The original approach creates intermediate Python list objects for every row during `.str.split()`, which triggers significant memory allocation and garbage collection overhead. The optimized version uses pandas' vectorized regex counting that operates at the C level, avoiding the costly list materialization step entirely. **Performance impact from profiler:** - Original word counting: **73.2ms** (42.2% of total time) - Optimized word counting: **43.5ms** (30.6% of total time) - **~41% reduction** in this operation alone The precompiled regex `_WORD_RE` is defined once at module load, eliminating repeated pattern compilation on every call. ## **2. Set-Based Column Membership Check** **What changed:** - **Original:** `if col not in df.columns` - checks membership against pandas Index - **Optimized:** `columns_set = set(df.columns)` followed by `if col not in columns_set` **Why it's faster:** Set lookups are O(1) vs O(n) for pandas Index sequential search. With multiple columns to check, this adds up. **Performance impact from profiler:** - Original column checks: **2.25ms** (1.3% of total time) - Optimized column checks: **0.08ms** (0.1% of total time) - **~96% reduction** in this operation ## **Test Case Performance** The optimization excels across all test categories: - **Large-scale tests** (500+ rows): Maximum benefit from vectorized operations avoiding per-row list creation - **Multiple column tests**: Set-based membership check overhead pays off when checking multiple columns - **Unicode/emoji tests**: Regex approach handles these correctly while maintaining performance - **Edge cases** (empty strings, None values): Behavior preserved via `.fillna("")` and regex semantics The optimization maintains correctness because `\S+` (non-whitespace sequences) matches the same word boundaries as `.split()` for all practical text inputs, while being significantly more efficient at the pandas/numpy vectorization level.

coderabbitai · 2026-01-16T20:00:52Z

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-01-16T20:03:59Z

Codecov Report

❌ Patch coverage is 25.00000% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 34.24%. Comparing base (6b4f946) to head (12c5144).
⚠️ Report is 162 commits behind head on main.

Files with missing lines	Patch %	Lines
...rc/backend/base/langflow/api/v1/knowledge_bases.py	30.00%	7 Missing ⚠️
src/backend/base/langflow/api/build.py	0.00%	1 Missing ⚠️
src/lfx/src/lfx/base/models/unified_models.py	0.00%	1 Missing ⚠️

❌ Your patch status has failed because the patch coverage (25.00%) is below the target coverage (40.00%). You can increase the patch coverage or adjust the target coverage.
❌ Your project status has failed because the head coverage (40.80%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #11332      +/-   ##
==========================================
- Coverage   34.24%   34.24%   -0.01%     
==========================================
  Files        1409     1409              
  Lines       66929    66936       +7     
  Branches     9877     9877              
==========================================
+ Hits        22918    22919       +1     
- Misses      42810    42816       +6     
  Partials     1201     1201

Flag	Coverage Δ
backend	`53.52% <27.27%> (-0.02%)`	⬇️
lfx	`40.80% <0.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
src/backend/base/langflow/api/build.py	`71.01% <0.00%> (-0.73%)`	⬇️
src/lfx/src/lfx/base/models/unified_models.py	`23.74% <0.00%> (ø)`
...rc/backend/base/langflow/api/v1/knowledge_bases.py	`17.03% <30.00%> (+0.68%)`	⬆️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

ogabrielluiz · 2026-03-03T18:12:40Z

Closing automated codeflash PR.

ogabrielluiz and others added 29 commits June 12, 2025 13:49

fix: Remove redundant dependencies from pyproject.toml

8a9d8d0

feat: Refactor optional dependencies in pyproject.toml for improved o…

7bad8a0

…rganization and clarity

feat: Update optional dependencies in pyproject.toml for enhanced fun…

95955e0

…ctionality and organization

feat: Consolidate dependencies in pyproject.toml for better management

b44539f

feat: Expand complete installation dependencies in pyproject.toml for…

a155e4d

… comprehensive support

new lock

63ff4d9

feat: Update package versions and refactor complete installation depe…

8287a81

…ndencies in pyproject.toml

update lock

21773b4

feat: Update dependencies in pyproject.toml to use 'local' installati…

36ad1f0

…on and restructure complete installation groups

feat: Add langchain_elasticsearch dependency to elasticsearch in pypr…

00f3bfb

…oject.toml

feat: Add types-cachetools dependency and update version constraints …

5c7d577

…for existing packages in pyproject.toml

feat: Update pyproject.toml to change langflow-base dependency from '…

3ce5450

…local' to 'complete' installation and remove dev extra

Update pyproject.toml to modify dependencies and improve installation…

fb7e6ee

… structure

refactor: Remove 'deploy' and 'dev' extras from complete installation…

32bfdfa

… in pyproject.toml

Merge remote-tracking branch 'origin/main' into better-langflow-base

1f5b25f

# Conflicts: # pyproject.toml # src/backend/base/pyproject.toml # uv.lock

add missing deps

48c001c

fix: update langchain-chroma dependency version to 0.2.6

782ac70

Merge branch 'main' into better-langflow-base

bc488d4

Remove clickhouse and pypdf from main deps

d15266f

move clickhouse

4e52530

add lock

6729c81

update lockfile

4afe06c

Add utility function to convert pandas/numpy scalars to int

56c987c

Update component index

4457f48

Merge branch 'main' into better-langflow-base

002895f

# Conflicts: # src/lfx/src/lfx/_assets/component_index.json

Update component index

ac10b8d

fix: update no_leaks decorator to include threads for test isolation

8b9e7ec

fix: refine model_is_empty condition for better clarity

a308f3f

codeflash-ai Bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Jan 16, 2026

github-actions Bot added the community Pull Request from an external contributor label Jan 16, 2026

Base automatically changed from better-langflow-base to main January 21, 2026 23:20

ogabrielluiz closed this Mar 3, 2026

codeflash-ai Bot deleted the codeflash/optimize-pr6732-2026-01-16T20.00.34 branch March 3, 2026 18:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡️ Speed up function `calculate_text_metrics` by 16% in PR #6732 (`better-langflow-base`)#11332

⚡️ Speed up function `calculate_text_metrics` by 16% in PR #6732 (`better-langflow-base`)#11332
codeflash-ai[bot] wants to merge 29 commits into
mainfrom
codeflash/optimize-pr6732-2026-01-16T20.00.34

codeflash-ai Bot commented Jan 16, 2026

Uh oh!

coderabbitai Bot commented Jan 16, 2026

Review skipped

Uh oh!

codecov Bot commented Jan 16, 2026 •

edited

Loading

Uh oh!

ogabrielluiz commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

codeflash-ai Bot commented Jan 16, 2026

⚡️ This pull request contains optimizations for PR #6732

📄 16% (0.16x) speedup for calculate_text_metrics in src/backend/base/langflow/api/v1/knowledge_bases.py

📝 Explanation and details

1. Vectorized Regex Word Counting (Primary Optimization)

2. Set-Based Column Membership Check

Test Case Performance

Uh oh!

coderabbitai Bot commented Jan 16, 2026

Review skipped

Uh oh!

codecov Bot commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ogabrielluiz commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

📄 16% (0.16x) speedup for `calculate_text_metrics` in `src/backend/base/langflow/api/v1/knowledge_bases.py`

codecov Bot commented Jan 16, 2026 •

edited

Loading