⚡️ Speed up function `sanitize_filename` by 11% in PR #10819 (`s3-file-size-and-associations-to-flows`) by codeflash-ai[bot] · Pull Request #10823 · langflow-ai/langflow

codeflash-ai · 2025-12-01T20:19:17Z

⚡️ This pull request contains optimizations for PR #10819

If you approve this dependent PR, these changes will be merged into the original PR branch s3-file-size-and-associations-to-flows.

This PR will be automatically closed if the original PR is merged.

📄 11% (0.11x) speedup for `sanitize_filename` in `src/backend/base/langflow/api/v2/files.py`

⏱️ Runtime : 1.69 milliseconds → 1.52 milliseconds (best of 102 runs)

📝 Explanation and details

The optimized code achieves a 10% speedup through two key optimizations that reduce the most expensive operations in the function:

Primary Optimization - Precompiled Regex Pattern:
The regex pattern r"[^\w.\- ()]" is now precompiled as a module-level constant _DANGEROUS_CHARS_RE. The line profiler shows this dramatically reduces the regex substitution time from 1.87ms (23.8% of runtime) to 0.54ms (8.7% of runtime) - a 71% reduction in this operation's cost. Regex compilation is expensive, and since this function processes filenames repeatedly, precompiling eliminates redundant pattern compilation overhead.

Secondary Optimization - PurePath vs Path:
Replacing Path(filename).name with PurePath(filename).name provides a modest improvement. PurePath is a lightweight path manipulation class that doesn't perform filesystem operations or validation checks that Path does, making it faster for pure string manipulation tasks like extracting the filename component.

Performance Impact Analysis:
Based on the test cases, these optimizations are particularly effective for:

High-frequency filename processing - The precompiled regex saves compilation overhead on every call
Large-scale operations - Tests show consistent benefits across various filename lengths and character types
Mixed content scenarios - Both simple filenames and complex cases with dangerous characters benefit from the regex optimization

The optimizations maintain identical security and functionality while reducing computational overhead, making this especially valuable if the function is called frequently in file upload or processing workflows.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 178 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Generated Regression Tests and Runtime

import re
import string  # used for generating large filenames
from pathlib import Path

# imports
import pytest  # used for our unit tests
from langflow.api.v2.files import sanitize_filename

MAX_FILENAME_LENGTH = 255
# Maximum reasonable extension length
MAX_EXTENSION_LENGTH = 20
from langflow.api.v2.files import sanitize_filename

# unit tests

# -------------------------
# Basic Test Cases
# -------------------------

def test_basic_alphanumeric_filename():
    # Should remain unchanged
    codeflash_output = sanitize_filename("file123.txt")

def test_basic_spaces_and_hyphens():
    # Spaces and hyphens are allowed
    codeflash_output = sanitize_filename("my file-name.txt")

def test_basic_underscores_and_parentheses():
    # Underscores and parentheses are allowed
    codeflash_output = sanitize_filename("data_set (v2).csv")

def test_basic_multiple_dots():
    # Multiple dots are allowed
    codeflash_output = sanitize_filename("archive.tar.gz")

def test_basic_extension_only():
    # Should sanitize to "unnamed" if only extension after stripping
    codeflash_output = sanitize_filename(".gitignore")

# -------------------------
# Edge Test Cases
# -------------------------

def test_empty_string():
    # Should return "unnamed" for empty input
    codeflash_output = sanitize_filename("")

def test_none_string():
    # Should return "unnamed" for None input
    codeflash_output = sanitize_filename(None)

def test_path_traversal_attempt():
    # Should strip path traversal and sanitize dangerous chars
    codeflash_output = sanitize_filename("../../etc/passwd")

def test_windows_path_separator():
    # Should strip Windows path and sanitize
    codeflash_output = sanitize_filename("C:\\Users\\user\\secret.txt")

def test_leading_trailing_spaces_and_dots():
    # Should strip leading/trailing spaces and dots
    codeflash_output = sanitize_filename("  .hiddenfile. ")

def test_only_dangerous_chars():
    # Should sanitize all dangerous chars to underscore, then fallback to "unnamed"
    codeflash_output = sanitize_filename("////")
    codeflash_output = sanitize_filename("<<>>")

def test_control_characters():
    # Control chars should be replaced by underscores
    codeflash_output = sanitize_filename("bad\x00file.txt")

def test_quotes_and_semicolons():
    # Quotes and semicolons become underscores
    codeflash_output = sanitize_filename("weird'file;name.txt")

def test_unicode_characters():
    # Non-ascii unicode chars should be replaced with underscores
    codeflash_output = sanitize_filename("файл.txt")
    codeflash_output = sanitize_filename("résumé.pdf")

def test_filename_with_multiple_extensions():
    # Should preserve all dots, sanitize dangerous chars
    codeflash_output = sanitize_filename("my.file.backup.tar.gz")

def test_filename_with_only_spaces():
    # Should strip spaces and fallback to "unnamed"
    codeflash_output = sanitize_filename("    ")

def test_filename_with_leading_trailing_underscores_and_dots():
    # Should only strip dots, not underscores
    codeflash_output = sanitize_filename(".__myfile__.txt.")

def test_filename_with_reserved_windows_names():
    # Should sanitize reserved names but not change them unless dangerous
    codeflash_output = sanitize_filename("CON.txt")
    codeflash_output = sanitize_filename("AUX")

def test_filename_with_no_extension():
    # Should sanitize and preserve filename
    codeflash_output = sanitize_filename("just_a_file")

def test_filename_with_long_extension():
    # Extension longer than MAX_EXTENSION_LENGTH should be truncated
    ext = "a" * (MAX_EXTENSION_LENGTH + 5)
    name = "file"
    codeflash_output = sanitize_filename(f"{name}.{ext}"); result = codeflash_output

def test_filename_with_hidden_file_dot():
    # Should strip leading dot
    codeflash_output = sanitize_filename(".hidden")

def test_filename_with_dot_only():
    # Should fallback to "unnamed"
    codeflash_output = sanitize_filename(".")

def test_filename_with_multiple_path_separators():
    # Should collapse all separators and sanitize
    codeflash_output = sanitize_filename("folder////file.txt")

def test_filename_with_non_ascii_and_spaces():
    # Should replace non-ascii, preserve spaces
    codeflash_output = sanitize_filename("résumé 2024.pdf")

# -------------------------
# Large Scale Test Cases
# -------------------------

def test_large_filename_max_length():
    # Create a filename exactly at MAX_FILENAME_LENGTH
    base = "a" * (MAX_FILENAME_LENGTH - 4)  # leave room for ".txt"
    filename = f"{base}.txt"
    codeflash_output = sanitize_filename(filename); sanitized = codeflash_output

def test_large_filename_over_max_length_with_short_extension():
    # Filename over max length, extension short enough to preserve
    base = "b" * (MAX_FILENAME_LENGTH + 50 - 4)
    filename = f"{base}.txt"
    codeflash_output = sanitize_filename(filename); sanitized = codeflash_output

def test_large_filename_over_max_length_with_long_extension():
    # Extension too long, should truncate whole filename
    ext = "c" * (MAX_EXTENSION_LENGTH + 10)
    base = "d" * (MAX_FILENAME_LENGTH + 50 - len(ext) - 1)
    filename = f"{base}.{ext}"
    codeflash_output = sanitize_filename(filename); sanitized = codeflash_output

def test_large_filename_with_dangerous_chars():
    # Large filename with many dangerous chars
    base = "e" * (MAX_FILENAME_LENGTH - 10)
    dangerous = "<>:\"/\\|?*" * 10
    filename = f"{base}{dangerous}.log"
    codeflash_output = sanitize_filename(filename); sanitized = codeflash_output

def test_large_filename_all_dangerous_chars():
    # Filename with only dangerous chars, long length
    filename = "<>:\"/\\|?*" * 40
    codeflash_output = sanitize_filename(filename); sanitized = codeflash_output

def test_large_filename_with_unicode():
    # Large filename with unicode characters
    base = "файл" * 50
    filename = f"{base}.txt"
    codeflash_output = sanitize_filename(filename); sanitized = codeflash_output

def test_large_filename_with_spaces_and_parentheses():
    # Large filename with spaces and parentheses
    base = ("(abc) " * 40).strip()
    filename = f"{base}.csv"
    codeflash_output = sanitize_filename(filename); sanitized = codeflash_output

def test_many_unique_large_filenames():
    # Test sanitizing many unique large filenames for determinism and performance
    for i in range(100):
        base = f"file_{i}_" + "x" * (MAX_FILENAME_LENGTH - 10)
        filename = f"{base}.dat"
        codeflash_output = sanitize_filename(filename); sanitized = codeflash_output

def test_filename_with_mixed_whitespace_and_dangerous_chars():
    # Mixed whitespace and dangerous chars, should sanitize correctly
    filename = "   bad<file>|name?.txt   "
    codeflash_output = sanitize_filename(filename); sanitized = codeflash_output

def test_filename_with_newlines_and_tabs():
    # Newlines and tabs should be replaced with underscores
    filename = "file\nname\t2024.doc"
    codeflash_output = sanitize_filename(filename); sanitized = codeflash_output
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import re
import string  # used for generating test strings
from pathlib import Path

# imports
import pytest  # used for our unit tests
from langflow.api.v2.files import sanitize_filename

MAX_FILENAME_LENGTH = 255
MAX_EXTENSION_LENGTH = 20
from langflow.api.v2.files import sanitize_filename

# unit tests

# --------------------------
# Basic Test Cases
# --------------------------

def test_basic_alphanumeric_filename():
    # Should remain unchanged
    codeflash_output = sanitize_filename("myfile.txt")

def test_basic_filename_with_spaces():
    # Spaces are allowed
    codeflash_output = sanitize_filename("my file.txt")

def test_basic_filename_with_underscore_and_dash():
    # Underscores and dashes are allowed
    codeflash_output = sanitize_filename("my_file-name.txt")

def test_basic_filename_with_parentheses():
    # Parentheses are allowed
    codeflash_output = sanitize_filename("report (final).pdf")

def test_basic_filename_with_multiple_dots():
    # Multiple dots are allowed
    codeflash_output = sanitize_filename("archive.tar.gz")

def test_basic_filename_with_uppercase():
    # Case should be preserved
    codeflash_output = sanitize_filename("Photo.JPG")

# --------------------------
# Edge Test Cases
# --------------------------

def test_empty_string_returns_unnamed():
    # Empty string should return "unnamed"
    codeflash_output = sanitize_filename("")

def test_none_string_returns_unnamed():
    # None should return "unnamed"
    codeflash_output = sanitize_filename(None)

def test_filename_with_only_invalid_characters():
    # All invalid chars should be replaced, then stripped, then fallback to "unnamed"
    codeflash_output = sanitize_filename("$$")

def test_filename_with_leading_and_trailing_spaces_and_dots():
    # Leading/trailing spaces and dots are stripped
    codeflash_output = sanitize_filename("   .hiddenfile.   ")

def test_filename_with_path_traversal():
    # Path traversal should be stripped
    codeflash_output = sanitize_filename("../../etc/passwd")

def test_filename_with_windows_path():
    # Windows path separators should be stripped
    codeflash_output = sanitize_filename("C:\\Users\\user\\Desktop\\file.txt")

def test_filename_with_mixed_separators():
    # Mixed separators should be stripped
    codeflash_output = sanitize_filename("folder/subfolder\\file.doc")

def test_filename_with_control_characters():
    # Control characters should be replaced with underscores
    codeflash_output = sanitize_filename("my\x00file.txt")

def test_filename_with_quotes_and_semicolon():
    # Quotes and semicolons replaced with underscores
    codeflash_output = sanitize_filename('my"file;name.txt')

def test_filename_with_unicode_characters():
    # Non-ASCII unicode replaced with underscores
    codeflash_output = sanitize_filename("résumé.pdf")

def test_filename_with_only_dots():
    # Only dots should fallback to "unnamed"
    codeflash_output = sanitize_filename("...")

def test_filename_with_only_spaces():
    # Only spaces should fallback to "unnamed"
    codeflash_output = sanitize_filename("     ")

def test_filename_with_long_extension():
    # Extension longer than MAX_EXTENSION_LENGTH is truncated
    ext = "a" * (MAX_EXTENSION_LENGTH + 5)
    fname = f"file.{ext}"
    codeflash_output = sanitize_filename(fname); result = codeflash_output

def test_filename_with_short_extension_and_long_name():
    # Extension preserved, name truncated
    ext = "txt"
    name = "a" * (MAX_FILENAME_LENGTH + 10)
    fname = f"{name}.{ext}"
    codeflash_output = sanitize_filename(fname); result = codeflash_output

def test_filename_with_no_extension_and_long_name():
    # No extension, just truncate to max length
    name = "b" * (MAX_FILENAME_LENGTH + 50)
    codeflash_output = sanitize_filename(name); result = codeflash_output

def test_filename_with_dot_and_extension_length_exact_max():
    # Extension exactly MAX_EXTENSION_LENGTH, should be preserved
    ext = "e" * MAX_EXTENSION_LENGTH
    name = "x" * (MAX_FILENAME_LENGTH - MAX_EXTENSION_LENGTH - 1)
    fname = f"{name}.{ext}"
    codeflash_output = sanitize_filename(fname); result = codeflash_output

def test_filename_with_dot_and_extension_length_above_max():
    # Extension longer than MAX_EXTENSION_LENGTH, truncate whole filename
    ext = "e" * (MAX_EXTENSION_LENGTH + 1)
    name = "y" * (MAX_FILENAME_LENGTH - MAX_EXTENSION_LENGTH - 1)
    fname = f"{name}.{ext}"
    codeflash_output = sanitize_filename(fname); result = codeflash_output

def test_filename_with_multiple_dots_and_long_name():
    # Only last extension is considered
    ext = "ext"
    name = "x" * (MAX_FILENAME_LENGTH + 40)
    fname = f"{name}.tar.{ext}"
    codeflash_output = sanitize_filename(fname); result = codeflash_output

def test_filename_with_leading_dot_hidden_file():
    # Leading dot is stripped
    codeflash_output = sanitize_filename(".hidden")

def test_filename_with_trailing_dot():
    # Trailing dot is stripped
    codeflash_output = sanitize_filename("file.")

def test_filename_with_trailing_whitespace():
    # Trailing whitespace is stripped
    codeflash_output = sanitize_filename("file.txt   ")

def test_filename_with_leading_whitespace():
    # Leading whitespace is stripped
    codeflash_output = sanitize_filename("   file.txt")

def test_filename_with_multiple_consecutive_invalid_chars():
    # Multiple invalid chars replaced with underscores
    codeflash_output = sanitize_filename("file<>|*?.txt")

def test_filename_with_mixed_valid_and_invalid_chars():
    # Invalid chars replaced, valid preserved
    codeflash_output = sanitize_filename("my*file@name#2024!.txt")

def test_filename_with_newlines():
    # Newlines replaced with underscores
    codeflash_output = sanitize_filename("my\nfile.txt")

def test_filename_with_tab_character():
    # Tabs replaced with underscores
    codeflash_output = sanitize_filename("my\tfile.txt")

# --------------------------
# Large Scale Test Cases
# --------------------------

def test_large_filename_with_valid_chars():
    # Should be truncated to MAX_FILENAME_LENGTH
    fname = "a" * (MAX_FILENAME_LENGTH * 2)
    codeflash_output = sanitize_filename(fname); result = codeflash_output

def test_large_filename_with_invalid_chars():
    # Should be replaced and truncated
    fname = "$" * (MAX_FILENAME_LENGTH * 2)
    codeflash_output = sanitize_filename(fname); result = codeflash_output

def test_large_filename_with_long_extension():
    # Extension longer than MAX_EXTENSION_LENGTH, whole filename truncated
    ext = "z" * (MAX_EXTENSION_LENGTH + 50)
    name = "y" * (MAX_FILENAME_LENGTH * 2)
    fname = f"{name}.{ext}"
    codeflash_output = sanitize_filename(fname); result = codeflash_output

def test_large_filename_with_short_extension():
    # Name truncated, extension preserved
    ext = "abc"
    name = "n" * (MAX_FILENAME_LENGTH * 2)
    fname = f"{name}.{ext}"
    codeflash_output = sanitize_filename(fname); result = codeflash_output

def test_large_filename_with_mixed_valid_invalid_chars():
    # Mix of valid/invalid, replaced and truncated
    valid = "abcDEF123"
    invalid = "<>|*?/"
    fname = (valid + invalid) * 100
    codeflash_output = sanitize_filename(fname); result = codeflash_output
    # Should only contain allowed chars and underscores
    allowed = set(string.ascii_letters + string.digits + " .-_()")

def test_large_filename_with_path_components():
    # Path components stripped, only last part sanitized
    fname = "/".join([f"folder{i}" for i in range(50)]) + "/final_file.txt"
    codeflash_output = sanitize_filename(fname); result = codeflash_output

def test_large_filename_with_spaces_and_dots():
    # Spaces and dots allowed, but string truncated
    fname = (" . " * 1000) + "end.txt"
    codeflash_output = sanitize_filename(fname); result = codeflash_output

def test_large_filename_with_unicode_characters():
    # Unicode replaced with underscores, then truncated
    fname = "файл" * 300
    codeflash_output = sanitize_filename(fname); result = codeflash_output

def test_large_filename_with_multiple_dots_and_extensions():
    # Only last extension considered
    fname = "data." * 500 + "csv"
    codeflash_output = sanitize_filename(fname); result = codeflash_output

def test_large_filename_with_parentheses_and_valid_chars():
    # Parentheses preserved, string truncated
    fname = ("(abc)" * 200) + ".pdf"
    codeflash_output = sanitize_filename(fname); result = codeflash_output
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr10819-2025-12-01T20.19.10 and push.

The optimized code achieves a **10% speedup** through two key optimizations that reduce the most expensive operations in the function: **Primary Optimization - Precompiled Regex Pattern:** The regex pattern `r"[^\w.\- ()]"` is now precompiled as a module-level constant `_DANGEROUS_CHARS_RE`. The line profiler shows this dramatically reduces the regex substitution time from 1.87ms (23.8% of runtime) to 0.54ms (8.7% of runtime) - a **71% reduction** in this operation's cost. Regex compilation is expensive, and since this function processes filenames repeatedly, precompiling eliminates redundant pattern compilation overhead. **Secondary Optimization - PurePath vs Path:** Replacing `Path(filename).name` with `PurePath(filename).name` provides a modest improvement. `PurePath` is a lightweight path manipulation class that doesn't perform filesystem operations or validation checks that `Path` does, making it faster for pure string manipulation tasks like extracting the filename component. **Performance Impact Analysis:** Based on the test cases, these optimizations are particularly effective for: - **High-frequency filename processing** - The precompiled regex saves compilation overhead on every call - **Large-scale operations** - Tests show consistent benefits across various filename lengths and character types - **Mixed content scenarios** - Both simple filenames and complex cases with dangerous characters benefit from the regex optimization The optimizations maintain identical security and functionality while reducing computational overhead, making this especially valuable if the function is called frequently in file upload or processing workflows.

coderabbitai · 2025-12-01T20:19:22Z

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2025-12-01T20:22:14Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 32.38%. Comparing base (ef63f8d) to head (a5111e6).

❌ Your project status has failed because the head coverage (40.04%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@                            Coverage Diff                             @@
##           s3-file-size-and-associations-to-flows   #10823      +/-   ##
==========================================================================
- Coverage                                   32.39%   32.38%   -0.01%     
==========================================================================
  Files                                        1367     1367              
  Lines                                       63235    63225      -10     
  Branches                                     9358     9357       -1     
==========================================================================
- Hits                                        20482    20478       -4     
+ Misses                                      41720    41714       -6     
  Partials                                     1033     1033

Flag	Coverage Δ
frontend	`14.13% <ø> (ø)`
lfx	`40.04% <ø> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
src/backend/base/langflow/api/v2/files.py	`59.10% <ø> (-0.15%)`	⬇️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2025-12-01T20:27:43Z

Frontend Unit Test Coverage Report

Coverage Summary

Lines	Statements	Branches	Functions
	15.29% (4188/27381)	8.49% (1778/20935)	9.6% (579/6031)

Unit Test Results

Tests	Skipped	Failures	Errors	Time
1638	0 💤	0 ❌	0 🔥	20.069s ⏱️

ogabrielluiz · 2026-03-03T18:14:43Z

Closing automated codeflash PR.

codeflash-ai Bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Dec 1, 2025

github-actions Bot added the community Pull Request from an external contributor label Dec 1, 2025

[autofix.ci] apply automated fixes

a5111e6

ogabrielluiz closed this Mar 3, 2026

codeflash-ai Bot deleted the codeflash/optimize-pr10819-2025-12-01T20.19.10 branch March 3, 2026 18:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡️ Speed up function `sanitize_filename` by 11% in PR #10819 (`s3-file-size-and-associations-to-flows`)#10823

⚡️ Speed up function `sanitize_filename` by 11% in PR #10819 (`s3-file-size-and-associations-to-flows`)#10823
codeflash-ai[bot] wants to merge 2 commits into
s3-file-size-and-associations-to-flowsfrom
codeflash/optimize-pr10819-2025-12-01T20.19.10

codeflash-ai Bot commented Dec 1, 2025

Uh oh!

coderabbitai Bot commented Dec 1, 2025

Review skipped

Uh oh!

codecov Bot commented Dec 1, 2025 •

edited

Loading

Uh oh!

github-actions Bot commented Dec 1, 2025

Uh oh!

ogabrielluiz commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

codeflash-ai Bot commented Dec 1, 2025

⚡️ This pull request contains optimizations for PR #10819

📄 11% (0.11x) speedup for sanitize_filename in src/backend/base/langflow/api/v2/files.py

📝 Explanation and details

Uh oh!

coderabbitai Bot commented Dec 1, 2025

Review skipped

Uh oh!

codecov Bot commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Dec 1, 2025

Frontend Unit Test Coverage Report

Coverage Summary

Unit Test Results

Uh oh!

ogabrielluiz commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

📄 11% (0.11x) speedup for `sanitize_filename` in `src/backend/base/langflow/api/v2/files.py`

codecov Bot commented Dec 1, 2025 •

edited

Loading