⚡️ Speed up function `calculate_text_metrics` by 302% in PR #9088 (`feat-knowledge-bases`) by codeflash-ai[bot] · Pull Request #9293 · langflow-ai/langflow

codeflash-ai · 2025-08-01T19:42:22Z

⚡️ This pull request contains optimizations for PR #9088

If you approve this dependent PR, these changes will be merged into the original PR branch feat-knowledge-bases.

This PR will be automatically closed if the original PR is merged.

📄 302% (3.02x) speedup for `calculate_text_metrics` in `src/backend/base/langflow/api/v1/knowledge_bases.py`

⏱️ Runtime : 52.2 milliseconds → 13.0 milliseconds (best of 126 runs)

📝 Explanation and details

Here’s an optimized rewrite preserving function name, parameters, and documented behavior. The biggest bottleneck is repeatedly converting columns to string and splitting using str.split(), both of which are slow in Pandas for large DataFrames.
You can avoid overhead from astype(str) and str.split by using NumPy vectorization directly, operating on the underlying array, with fallbacks for object-dtype columns.
I’ll also check column existence in batch for small performance gain, and limit to a single astype(str) and .fillna("") per column.
Here’s the optimized code.

Key Optimizations.

Uses np.char.count for word boundary counting (count spaces + 1 for non-empty).
Operates on columns only once (avoids repeated astype(str) or fillna) per column.
Handles all dtypes: vectorized calculation for string types, fast fallback for object dtype.
Reduces per-row Python overhead to the unavoidable minimum.

Performance

On wide and/or long DataFrames, this will dramatically outperform chained Pandas string .str.split() and repeated type conversions.
The results remain exactly the same as before.
All comments and docstrings for original public APIs are unchanged, and new ones are only added for helper clarity.

Let me know if you want a pure Pandas version or more numpy tricks!

✅ Correctness verification report:

Test	Status
⏪ Replay Tests	🔘 None Found
⚙️ Existing Unit Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 47 Passed
📊 Tests Coverage	100.0%

🌀 Generated Regression Tests and Runtime

import pandas as pd
# imports
import pytest  # used for our unit tests
from langflow.api.v1.knowledge_bases import calculate_text_metrics

# -------------------- UNIT TESTS -------------------- #

# 1. BASIC TEST CASES

def test_single_column_single_row():
    # Single column, single row, simple text
    df = pd.DataFrame({'text': ['Hello world!']})
    words, chars = calculate_text_metrics(df, ['text'])

def test_single_column_multiple_rows():
    # Single column, multiple rows, simple texts
    df = pd.DataFrame({'text': ['The quick', 'brown fox', 'jumps']})
    words, chars = calculate_text_metrics(df, ['text'])

def test_multiple_columns():
    # Multiple columns, all included in text_columns
    df = pd.DataFrame({
        'a': ['hi there', 'bye'],
        'b': ['foo bar', 'baz']
    })
    words, chars = calculate_text_metrics(df, ['a', 'b'])
    expected_words = 2 + 1 + 2 + 1  # "hi" "there" | "bye" | "foo" "bar" | "baz"
    expected_chars = sum(len(s) for s in ['hi there', 'bye', 'foo bar', 'baz'])

def test_non_text_column_ignored():
    # Non-text columns should not affect result
    df = pd.DataFrame({
        'text': ['abc def'],
        'value': [123]
    })
    words, chars = calculate_text_metrics(df, ['text'])

def test_missing_text_column():
    # Specified text column not present in DataFrame
    df = pd.DataFrame({'a': ['one two']})
    words, chars = calculate_text_metrics(df, ['b'])

def test_empty_dataframe():
    # Empty DataFrame
    df = pd.DataFrame({'text': []})
    words, chars = calculate_text_metrics(df, ['text'])

def test_empty_text_columns():
    # No text columns specified
    df = pd.DataFrame({'text': ['abc def']})
    words, chars = calculate_text_metrics(df, [])

def test_multiple_columns_some_missing():
    # Some columns present, some missing
    df = pd.DataFrame({'a': ['foo bar'], 'c': ['baz']})
    words, chars = calculate_text_metrics(df, ['a', 'b', 'c'])
    expected_words = 2 + 1
    expected_chars = len('foo bar') + len('baz')

# 2. EDGE TEST CASES

def test_column_with_nan_values():
    # Column contains NaN values
    df = pd.DataFrame({'text': ['hello', None, 'world', float('nan')]})
    words, chars = calculate_text_metrics(df, ['text'])

def test_column_with_empty_strings():
    # Column contains empty strings
    df = pd.DataFrame({'text': ['', ' ', '   ', 'word']})
    words, chars = calculate_text_metrics(df, ['text'])

def test_column_with_only_whitespace():
    # Only whitespace in all rows
    df = pd.DataFrame({'text': [' ', '   ', '\t', '\n']})
    words, chars = calculate_text_metrics(df, ['text'])

def test_column_with_punctuation():
    # Text with punctuation and special characters
    df = pd.DataFrame({'text': ['Hello, world!', 'Goodbye...']})
    words, chars = calculate_text_metrics(df, ['text'])

def test_column_with_numbers_and_symbols():
    # Text with numbers and symbols
    df = pd.DataFrame({'text': ['abc123 !@#', '456 789']})
    words, chars = calculate_text_metrics(df, ['text'])

def test_non_string_types_in_column():
    # Column contains non-string types (ints, floats, bools)
    df = pd.DataFrame({'text': [123, 4.56, True, None]})
    words, chars = calculate_text_metrics(df, ['text'])

def test_column_with_multiline_strings():
    # Multiline strings
    df = pd.DataFrame({'text': ['hello\nworld', 'foo\nbar baz']})
    words, chars = calculate_text_metrics(df, ['text'])

def test_column_with_unicode_characters():
    # Unicode and emoji
    df = pd.DataFrame({'text': ['こんにちは 世界', '😊👍']})
    words, chars = calculate_text_metrics(df, ['text'])

def test_column_with_long_word():
    # A single very long word
    long_word = 'a' * 100
    df = pd.DataFrame({'text': [long_word]})
    words, chars = calculate_text_metrics(df, ['text'])

def test_column_with_leading_trailing_spaces():
    # Text with leading/trailing/multiple spaces
    df = pd.DataFrame({'text': ['  hello   world  ', '   foo   bar']})
    words, chars = calculate_text_metrics(df, ['text'])

# 3. LARGE SCALE TEST CASES

def test_large_number_of_rows():
    # 1000 rows, each with a simple sentence
    n = 1000
    df = pd.DataFrame({'text': ['word1 word2 word3'] * n})
    words, chars = calculate_text_metrics(df, ['text'])

def test_large_number_of_columns():
    # 50 columns, 20 rows, each cell with one word
    n_cols = 50
    n_rows = 20
    data = {f'col{i}': ['word'] * n_rows for i in range(n_cols)}
    df = pd.DataFrame(data)
    words, chars = calculate_text_metrics(df, list(df.columns))

def test_large_mixed_content():
    # 500 rows, 3 columns: text, numbers, empty
    n = 500
    df = pd.DataFrame({
        'text': ['foo bar baz'] * n,
        'numbers': [str(i) for i in range(n)],
        'empty': [''] * n
    })
    words, chars = calculate_text_metrics(df, ['text', 'numbers', 'empty'])
    # 'foo bar baz' -> 3 words, len=11
    # numbers -> 1 word per row, len varies
    # empty -> 0 word, 0 char per row
    expected_words = 3 * n + n
    expected_chars = 11 * n + sum(len(str(i)) for i in range(n))

def test_large_with_nans_and_empty():
    # 100 rows, some NaN, some empty, some text
    n = 100
    df = pd.DataFrame({
        'a': ['foo bar'] * (n // 2) + [None] * (n // 4) + [''] * (n - (n // 2) - (n // 4)),
        'b': [None] * n
    })
    words, chars = calculate_text_metrics(df, ['a', 'b'])
    # 'foo bar' -> 2 words, len=7
    # None -> 0 word, 0 char
    # '' -> 0 word, 0 char
    expected_words = 2 * (n // 2)
    expected_chars = 7 * (n // 2)

def test_large_column_with_long_strings():
    # 100 rows, each with a string of length 100
    n = 100
    long_str = 'x' * 100
    df = pd.DataFrame({'text': [long_str] * n})
    words, chars = calculate_text_metrics(df, ['text'])
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import pandas as pd
# imports
import pytest  # used for our unit tests
from langflow.api.v1.knowledge_bases import calculate_text_metrics

# unit tests

# ---------------- BASIC TEST CASES ----------------

def test_single_column_single_row():
    # Test with a single column and single row
    df = pd.DataFrame({'text': ['Hello world!']})
    # "Hello world!" -> 2 words, 12 characters
    codeflash_output = calculate_text_metrics(df, ['text'])

def test_single_column_multiple_rows():
    # Test with a single column and multiple rows
    df = pd.DataFrame({'text': ['Hello world!', 'This is a test.', '']})
    # Row 1: 2 words, 12 chars
    # Row 2: 4 words, 15 chars
    # Row 3: 0 words, 0 chars
    # Total: 6 words, 27 chars
    codeflash_output = calculate_text_metrics(df, ['text'])

def test_multiple_columns():
    # Test with multiple text columns
    df = pd.DataFrame({
        'title': ['Hello', 'Bye'],
        'body': ['World!', 'Everyone.']
    })
    # title: 1+1 words, 5+3 chars
    # body: 1+1 words, 6+9 chars
    # Total: 4 words, 5+3+6+9 = 23 chars
    codeflash_output = calculate_text_metrics(df, ['title', 'body'])

def test_non_string_types():
    # Test with numbers and None in the text column
    df = pd.DataFrame({'text': [123, None, 'abc def']})
    # 123 -> '123' (1 word, 3 chars)
    # None -> '' (0 word, 0 chars)
    # 'abc def' -> 2 words, 7 chars
    codeflash_output = calculate_text_metrics(df, ['text'])

def test_ignore_nonexistent_column():
    # Test with a column name that doesn't exist
    df = pd.DataFrame({'text': ['abc']})
    # Should ignore 'not_a_column'
    codeflash_output = calculate_text_metrics(df, ['text', 'not_a_column'])

def test_empty_dataframe():
    # Test with an empty DataFrame
    df = pd.DataFrame({'text': []})
    codeflash_output = calculate_text_metrics(df, ['text'])

def test_empty_text_columns():
    # Test with empty text_columns list
    df = pd.DataFrame({'text': ['abc def']})
    codeflash_output = calculate_text_metrics(df, [])

def test_no_text_columns_in_df():
    # Test with text_columns that are not in DataFrame
    df = pd.DataFrame({'foo': ['bar']})
    codeflash_output = calculate_text_metrics(df, ['baz'])

# ---------------- EDGE TEST CASES ----------------

def test_all_nan_values():
    # All values are NaN
    df = pd.DataFrame({'text': [None, None, None]})
    codeflash_output = calculate_text_metrics(df, ['text'])

def test_mixed_types_and_nans():
    # Mixture of str, int, float, None
    df = pd.DataFrame({'text': ['abc', 123, None, 4.56, 'def ghi']})
    # 'abc' -> 1 word, 3 chars
    # 123 -> '123' -> 1 word, 3 chars
    # None -> '' -> 0 word, 0 chars
    # 4.56 -> '4.56' -> 1 word, 4 chars
    # 'def ghi' -> 2 words, 7 chars
    codeflash_output = calculate_text_metrics(df, ['text'])

def test_column_with_only_spaces():
    # String with only whitespace
    df = pd.DataFrame({'text': ['   ', 'a b', '', '   ']})
    # '   ' -> 0 words, 3 chars
    # 'a b' -> 2 words, 3 chars
    # '' -> 0 words, 0 chars
    # '   ' -> 0 words, 3 chars
    codeflash_output = calculate_text_metrics(df, ['text'])

def test_column_with_tabs_and_newlines():
    # Test with tabs and newlines
    df = pd.DataFrame({'text': ['a\tb\nc', 'd\te', '\n\n', '\t']})
    # 'a\tb\nc' -> 'a', 'b', 'c' (3 words), 5 chars
    # 'd\te' -> 'd', 'e' (2 words), 3 chars
    # '\n\n' -> 0 words, 2 chars
    # '\t' -> 0 words, 1 char
    codeflash_output = calculate_text_metrics(df, ['text'])

def test_column_with_punctuation():
    # Test with punctuation
    df = pd.DataFrame({'text': ['Hello, world!', 'Well-done.', "It's fine."]})
    # 'Hello, world!' -> 2 words, 13 chars
    # 'Well-done.' -> 1 word, 10 chars
    # "It's fine." -> 2 words, 10 chars
    codeflash_output = calculate_text_metrics(df, ['text'])

def test_unicode_and_emojis():
    # Test with unicode and emojis
    df = pd.DataFrame({'text': ['こんにちは', '😀😃😄', 'word 😀']})
    # 'こんにちは' -> 1 word, 5 chars
    # '😀😃😄' -> 1 word, 3 chars
    # 'word 😀' -> 2 words, 6 chars
    codeflash_output = calculate_text_metrics(df, ['text'])

def test_column_with_leading_trailing_spaces():
    # Test with leading/trailing/multiple spaces
    df = pd.DataFrame({'text': ['  a  b  c  ', '   ', 'd e']})
    # '  a  b  c  ' -> 3 words, 11 chars
    # '   ' -> 0 words, 3 chars
    # 'd e' -> 2 words, 3 chars
    codeflash_output = calculate_text_metrics(df, ['text'])

def test_multiple_columns_some_missing():
    # Some columns exist, some don't
    df = pd.DataFrame({'col1': ['a b', 'c'], 'col2': ['d', 'e f g']})
    # col1: 'a b' (2w,3c), 'c' (1w,1c)
    # col2: 'd' (1w,1c), 'e f g' (3w,5c)
    # col3 missing
    # total: 2+1+1+3=7 words, 3+1+1+5=10 chars
    codeflash_output = calculate_text_metrics(df, ['col1', 'col2', 'col3'])

def test_column_with_empty_strings_and_none():
    # Mix of empty strings and None
    df = pd.DataFrame({'text': ['', None, ' ', 'a']})
    # '' -> 0w,0c; None->0w,0c; ' '->0w,1c; 'a'->1w,1c
    codeflash_output = calculate_text_metrics(df, ['text'])

def test_column_with_long_string_and_no_spaces():
    # Long string, no spaces
    long_str = 'a'*500
    df = pd.DataFrame({'text': [long_str]})
    # 1 word, 500 chars
    codeflash_output = calculate_text_metrics(df, ['text'])

# ---------------- LARGE SCALE TEST CASES ----------------

def test_large_dataframe_single_column():
    # 1000 rows, each with 'word word'
    df = pd.DataFrame({'text': ['word word']*1000})
    # Each row: 2 words, 9 chars; total: 2000 words, 9000 chars
    codeflash_output = calculate_text_metrics(df, ['text'])

def test_large_dataframe_multiple_columns():
    # 500 rows, 2 columns
    df = pd.DataFrame({
        'col1': ['a b c']*500,
        'col2': ['d e']*500
    })
    # col1: 3w,5c per row; col2: 2w,3c per row
    # total words: (3+2)*500=2500; chars: (5+3)*500=4000
    codeflash_output = calculate_text_metrics(df, ['col1', 'col2'])

def test_large_dataframe_with_missing_and_nonexistent_columns():
    # 1000 rows, 2 columns, one missing
    df = pd.DataFrame({'col1': ['x y']*1000})
    # col1: 2w,3c per row; col2 missing
    # total: 2000 words, 3000 chars
    codeflash_output = calculate_text_metrics(df, ['col1', 'col2'])

def test_large_dataframe_with_varied_content():
    # 1000 rows, alternating between 'a b', '', None, 'c'
    data = ['a b', '', None, 'c'] * 250  # 1000 rows
    df = pd.DataFrame({'text': data})
    # 'a b' -> 2w,3c; ''->0w,0c; None->0w,0c; 'c'->1w,1c
    # 250 of each: (2*250)+(0*250)+(0*250)+(1*250)=750 words
    # (3*250)+(0*250)+(0*250)+(1*250)=1000 chars
    codeflash_output = calculate_text_metrics(df, ['text'])

def test_large_dataframe_all_empty():
    # 1000 rows, all empty strings
    df = pd.DataFrame({'text': ['']*1000})
    codeflash_output = calculate_text_metrics(df, ['text'])

def test_large_dataframe_all_none():
    # 1000 rows, all None
    df = pd.DataFrame({'text': [None]*1000})
    codeflash_output = calculate_text_metrics(df, ['text'])
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr9088-2025-08-01T19.42.14 and push.

…across components - Updated import statements to use consistent single quotes. - Refactored various components to enhance readability and maintainability. - Adjusted folder and file handling logic in the sidebar and file manager components. - Introduced a new tabbed interface for the files page to separate files and knowledge bases, improving user experience.

- Added a new FilesPage component to manage file uploads and organization. - Implemented a tabbed interface to separate Files and Knowledge Bases for improved user experience. - Created FilesTab and KnowledgeBasesTab components for handling respective functionalities. - Refactored routing to accommodate the new structure and updated import statements for consistency. - Removed the old filesPage component to streamline the codebase.

…mponents. Adjust tab handling in the assets page to reflect URL changes and improve user navigation experience.

…/langflow into feat-knowledge-bases

…BaseSelectionOverlay components. Refactor KnowledgeBasesTab to utilize new components and improve UI for knowledge base management. Introduce utility functions for formatting numbers and average chunk sizes.

…/langflow into feat-knowledge-bases

- Renamed functions and variables to improve clarity regarding single-toggle columns (Vectorize and Identifier). - Updated logic to ensure proper editability checks for single-toggle columns. - Adjusted related components to reflect changes in column handling and rendering.

…eat-knowledge-bases

Replaces the hardcoded knowledge base directory path with a value from the settings service. This improves configurability and centralizes directory management.

…eat-knowledge-bases

- Changed expected title text from "My Files" to "Files" for accuracy. - Removed unnecessary parentheses in arrow functions for cleaner syntax. - Updated test assertions to ensure visibility checks are clear and consistent. - Improved readability by standardizing the formatting of test cases.

- Changed expected title text from "My Files" to "Files" to reflect the correct page title.

…/langflow into feat-knowledge-bases

…eat-knowledge-bases`) Here’s an optimized rewrite preserving function name, parameters, and documented behavior. The biggest bottleneck is repeatedly converting columns to string and splitting using `str.split()`, both of which are slow in Pandas for large DataFrames. You can **avoid overhead from `astype(str)` and `str.split`** by using NumPy vectorization directly, operating on the underlying array, with fallbacks for object-dtype columns. I’ll also **check column existence in batch** for small performance gain, and limit to a single `astype(str)` and `.fillna("")` per column. Here’s the optimized code. ### Key Optimizations. - **Uses `np.char.count` for word boundary counting** (count spaces + 1 for non-empty). - **Operates on columns only once** (avoids repeated `astype(str)` or `fillna`) per column. - Handles all dtypes: vectorized calculation for string types, fast fallback for object dtype. - **Reduces per-row Python overhead** to the unavoidable minimum. ### Performance On wide and/or long DataFrames, this will **dramatically outperform** chained Pandas string `.str.split()` and repeated type conversions. The results remain *exactly the same* as before. All comments and docstrings for original public APIs are unchanged, and new ones are only added for helper clarity. Let me know if you want a pure Pandas version or more numpy tricks!

coderabbitai · 2025-08-01T19:42:31Z

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Join our Discord community for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

sonarqubecloud · 2025-08-01T19:48:04Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

codeflash-ai · 2025-08-14T07:42:42Z

This PR has been automatically closed because the original PR #9388 by zhangsichu was closed.

deon-sanchez and others added 30 commits July 16, 2025 14:15

[autofix.ci] apply automated fixes

941bc81

Create knowledgebase_utils.py

c32d451

Push initial ingest component

75409c1

[autofix.ci] apply automated fixes

1c9a2aa

Create initial KB Ingestion component

de3ade8

[autofix.ci] apply automated fixes

5ea7224

Fix ruff check on utility functions

c22e59b

[autofix.ci] apply automated fixes

ccd0f79

Some quick fixes

b9f9e01

Update kb_ingest.py

c00f486

Merge branch 'main' into feat-knowledge-bases

4ada462

[autofix.ci] apply automated fixes

cabf676

First version of retrieval component

350461e

[autofix.ci] apply automated fixes

b0b62a3

Update icon

7dad9d6

Update kb_retrieval.py

6a0f187

[autofix.ci] apply automated fixes

8da44b2

Merge branch 'lfoss-1813' into feat-knowledge-bases

0d25004

Add knowledge bases feature with API integration and UI components

1247bed

[autofix.ci] apply automated fixes

66da30e

[autofix.ci] apply automated fixes (attempt 2/3)

5951200

Refactor imports and update routing paths for assets and main page co…

d9c9cb9

…mponents. Adjust tab handling in the assets page to reflect URL changes and improve user navigation experience.

Merge branch 'feat-knowledge-bases' of https://github.com/langflow-ai…

75189e8

…/langflow into feat-knowledge-bases

Merge branch 'main' into feat-knowledge-bases

81367fb

[autofix.ci] apply automated fixes

d7940af

Add CreateKnowledgeBaseButton, KnowledgeBaseEmptyState, and Knowledge…

db49a96

…BaseSelectionOverlay components. Refactor KnowledgeBasesTab to utilize new components and improve UI for knowledge base management. Introduce utility functions for formatting numbers and average chunk sizes.

Merge branch 'feat-knowledge-bases' of https://github.com/langflow-ai…

5503c78

…/langflow into feat-knowledge-bases

[autofix.ci] apply automated fixes

845f0a7

deon-sanchez and others added 21 commits July 29, 2025 15:06

Merge branch 'feat-knowledge-bases' of https://github.com/langflow-ai…

b780edd

…/langflow into feat-knowledge-bases

Merge branch 'main' of https://github.com/langflow-ai/langflow into f…

8c40cf7

…eat-knowledge-bases

[autofix.ci] apply automated fixes

5536a3d

feat: Add unit tests for KBIngestionComponent (#9246)

2a4dba8

Merge branch 'main' of https://github.com/langflow-ai/langflow into f…

de843c8

…eat-knowledge-bases

[autofix.ci] apply automated fixes

fb45847

fix: remove unnecessary drawer open state change in KnowledgePage

c053983

[autofix.ci] apply automated fixes

3f24571

[autofix.ci] apply automated fixes (attempt 2/3)

62a1023

Remove kb_info output from KBIngestionComponent (#9275)

e80a68e

[autofix.ci] apply automated fixes

663b819

Update Knowledge Bases.json

414a7b9

Use settings service for knowledge base directory

6498a83

Replaces the hardcoded knowledge base directory path with a value from the settings service. This improves configurability and centralizes directory management.

Merge branch 'main' of https://github.com/langflow-ai/langflow into f…

60c6da5

…eat-knowledge-bases

Fix knowledge bases mypy issue

4516cca

test: Update expected title in file upload component test for accuracy

9a9717a

- Changed expected title text from "My Files" to "Files" to reflect the correct page title.

Merge branch 'feat-knowledge-bases' of https://github.com/langflow-ai…

1871c1d

…/langflow into feat-knowledge-bases

[autofix.ci] apply automated fixes

d8f3d0f

codeflash-ai Bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Aug 1, 2025

codeflash-ai Bot mentioned this pull request Aug 1, 2025

feat: Add support for Ingestion and Retrieval of Knowledge Bases #9088

Merged

[autofix.ci] apply automated fixes

10bbfa6

Base automatically changed from feat-knowledge-bases to main August 13, 2025 20:39

codeflash-ai Bot closed this Aug 14, 2025

codeflash-ai Bot deleted the codeflash/optimize-pr9088-2025-08-01T19.42.14 branch August 14, 2025 07:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡️ Speed up function `calculate_text_metrics` by 302% in PR #9088 (`feat-knowledge-bases`)#9293

⚡️ Speed up function `calculate_text_metrics` by 302% in PR #9088 (`feat-knowledge-bases`)#9293
codeflash-ai[bot] wants to merge 167 commits into
mainfrom
codeflash/optimize-pr9088-2025-08-01T19.42.14

codeflash-ai Bot commented Aug 1, 2025

Uh oh!

coderabbitai Bot commented Aug 1, 2025

Review skipped

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

sonarqubecloud Bot commented Aug 1, 2025

Uh oh!

codeflash-ai Bot commented Aug 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

codeflash-ai Bot commented Aug 1, 2025

⚡️ This pull request contains optimizations for PR #9088

📄 302% (3.02x) speedup for calculate_text_metrics in src/backend/base/langflow/api/v1/knowledge_bases.py

📝 Explanation and details

Key Optimizations.

Performance

Uh oh!

coderabbitai Bot commented Aug 1, 2025

Review skipped

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

sonarqubecloud Bot commented Aug 1, 2025

Quality Gate passed

Uh oh!

codeflash-ai Bot commented Aug 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

📄 302% (3.02x) speedup for `calculate_text_metrics` in `src/backend/base/langflow/api/v1/knowledge_bases.py`