⚡️ Speed up function `sanitize_content_disposition` by 15% in PR #10819 (`s3-file-size-and-associations-to-flows`) by codeflash-ai[bot] · Pull Request #10824 · langflow-ai/langflow

codeflash-ai · 2025-12-01T20:28:17Z

⚡️ This pull request contains optimizations for PR #10819

If you approve this dependent PR, these changes will be merged into the original PR branch s3-file-size-and-associations-to-flows.

This PR will be automatically closed if the original PR is merged.

📄 15% (0.15x) speedup for `sanitize_content_disposition` in `src/backend/base/langflow/api/v2/files.py`

⏱️ Runtime : 8.03 milliseconds → 6.95 milliseconds (best of 63 runs)

📝 Explanation and details

The optimized code achieves a 15% speedup through two key performance optimizations:

What was optimized:

Precompiled regex pattern: Moved re.compile(r"[^\w.\- ()]") to module scope as _SANITIZE_FILENAME_RE, eliminating regex compilation overhead on every function call.
Faster path extraction: Replaced Path(filename).name with PurePath(filename).name. PurePath is a lighter-weight class that handles path operations without filesystem access or validation, making it faster for simple string operations like extracting the filename component.

Why this leads to speedup:

Regex compilation cost: The line profiler shows the original re.sub() call took 7.5ms (14.1% of total time). With precompilation, this drops to 1.8ms (4.1% of total time) - a 76% reduction in regex processing time.
Path object overhead: Path objects include filesystem validation and OS-specific behavior that's unnecessary when we only need to extract the basename. PurePath reduces this overhead from 41.5ms to 37.8ms - an 9% improvement in path processing.

Impact on workloads:

The optimizations are most beneficial for:

High-frequency filename sanitization (evident from the 1,663 test iterations)
Batch file processing scenarios where the same sanitization logic runs repeatedly
Web upload handlers processing multiple files simultaneously

Test case performance:

The annotated tests show consistent improvements across all scenarios - from simple ASCII filenames to complex Unicode cases with path traversal attempts. The optimization maintains identical behavior while reducing CPU overhead, making it particularly valuable for file upload endpoints that may process hundreds of filenames per request.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 1654 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Generated Regression Tests and Runtime

import random
import re
import string
from pathlib import Path
from urllib.parse import quote

# imports
import pytest
from langflow.api.v2.files import sanitize_content_disposition

# function to test
# (Paste the provided sanitize_content_disposition and sanitize_filename functions here)


MAX_FILENAME_LENGTH = 255
from langflow.api.v2.files import sanitize_content_disposition

# unit tests

# ------------------ BASIC TEST CASES ------------------

def test_ascii_filename_simple():
    """Test with a simple ASCII filename."""
    codeflash_output = sanitize_content_disposition("file.txt"); result = codeflash_output

def test_ascii_filename_with_spaces():
    """Test with spaces in filename (should be preserved and quoted)."""
    codeflash_output = sanitize_content_disposition("my file.txt"); result = codeflash_output

def test_ascii_filename_with_special_chars():
    """Test with allowed special chars (hyphen, underscore, parentheses, dot)."""
    codeflash_output = sanitize_content_disposition("my-file_(v2).txt"); result = codeflash_output

def test_ascii_filename_with_quote():
    """Test with a quote in the filename (should be escaped)."""
    codeflash_output = sanitize_content_disposition('my"file.txt'); result = codeflash_output

def test_ascii_filename_with_backslash():
    """Test with a backslash in the filename (should be escaped)."""
    codeflash_output = sanitize_content_disposition('my\\file.txt'); result = codeflash_output

def test_ascii_filename_with_multiple_dots():
    """Test with multiple dots in the filename."""
    codeflash_output = sanitize_content_disposition("v1.2.3.final.txt"); result = codeflash_output

# ------------------ EDGE TEST CASES ------------------

def test_empty_filename():
    """Test with an empty filename (should return unnamed)."""
    codeflash_output = sanitize_content_disposition(""); result = codeflash_output


def test_filename_only_path_separators():
    """Test with only path separators (should return unnamed)."""
    codeflash_output = sanitize_content_disposition("////"); result = codeflash_output

def test_filename_only_dots():
    """Test with only dots (should return unnamed)."""
    codeflash_output = sanitize_content_disposition("..."); result = codeflash_output

def test_filename_with_path_traversal():
    """Test with path traversal (should strip path)."""
    codeflash_output = sanitize_content_disposition("../../etc/passwd"); result = codeflash_output

def test_filename_with_windows_path():
    """Test with Windows-style path (should strip path)."""
    codeflash_output = sanitize_content_disposition("C:\\Windows\\system32\\cmd.exe"); result = codeflash_output

def test_filename_with_disallowed_chars():
    """Test with disallowed chars (should be replaced with underscores)."""
    codeflash_output = sanitize_content_disposition("my*file?name|.txt"); result = codeflash_output

def test_filename_with_leading_trailing_spaces_and_dots():
    """Test with leading/trailing spaces and dots (should be stripped)."""
    codeflash_output = sanitize_content_disposition("  .myfile.txt.  "); result = codeflash_output

def test_filename_with_unicode():
    """Test with a Unicode filename (should use RFC 5987 encoding)."""
    codeflash_output = sanitize_content_disposition("résumé.pdf"); result = codeflash_output

def test_filename_with_unicode_and_spaces():
    """Test with Unicode and spaces (should encode spaces as %20)."""
    codeflash_output = sanitize_content_disposition("привет мир.txt"); result = codeflash_output

def test_filename_with_emoji():
    """Test with emoji in filename."""
    codeflash_output = sanitize_content_disposition("file😀.txt"); result = codeflash_output

def test_filename_with_only_unicode():
    """Test with only non-ASCII characters."""
    codeflash_output = sanitize_content_disposition("数据.csv"); result = codeflash_output

def test_filename_with_leading_trailing_underscore():
    """Test with leading/trailing underscores (should be preserved)."""
    codeflash_output = sanitize_content_disposition("__file__.txt"); result = codeflash_output

def test_filename_with_no_extension():
    """Test with no extension."""
    codeflash_output = sanitize_content_disposition("myfile"); result = codeflash_output

def test_filename_with_dotfile():
    """Test with a dotfile (should strip leading dot)."""
    codeflash_output = sanitize_content_disposition(".hiddenfile"); result = codeflash_output

def test_filename_with_long_extension():
    """Test with a very long extension (should truncate correctly)."""
    long_ext = "a" * 30
    filename = f"file.{long_ext}"
    codeflash_output = sanitize_content_disposition(filename); result = codeflash_output

def test_filename_with_max_length_and_extension():
    """Test with a filename at the max length and a short extension."""
    name = "a" * (MAX_FILENAME_LENGTH - 4)
    ext = "txt"
    filename = f"{name}.{ext}"
    codeflash_output = sanitize_content_disposition(filename); result = codeflash_output
    # Should not be truncated

def test_filename_too_long_with_extension():
    """Test with a filename exceeding max length, with a short extension."""
    name = "a" * (MAX_FILENAME_LENGTH + 10)
    ext = "txt"
    filename = f"{name}.{ext}"
    codeflash_output = sanitize_content_disposition(filename); result = codeflash_output
    # Extension preserved, name truncated
    expected_name = name[:MAX_FILENAME_LENGTH - len(ext) - 1]

def test_filename_too_long_with_long_extension():
    """Test with a filename exceeding max length, with a long extension."""
    name = "a" * (MAX_FILENAME_LENGTH + 10)
    ext = "x" * 25  # longer than MAX_EXTENSION_LENGTH
    filename = f"{name}.{ext}"
    codeflash_output = sanitize_content_disposition(filename); result = codeflash_output

def test_filename_with_only_extension():
    """Test with a filename that's just an extension (e.g., '.txt')."""
    codeflash_output = sanitize_content_disposition(".txt"); result = codeflash_output

def test_filename_with_multiple_path_separators():
    """Test with multiple path separators in filename."""
    codeflash_output = sanitize_content_disposition("////foo.txt"); result = codeflash_output

# ------------------ LARGE SCALE TEST CASES ------------------

def test_large_ascii_filename():
    """Test with a large ASCII filename (max allowed)."""
    name = "a" * (MAX_FILENAME_LENGTH - 4)
    ext = "txt"
    filename = f"{name}.{ext}"
    codeflash_output = sanitize_content_disposition(filename); result = codeflash_output


def test_many_random_filenames():
    """Test many random filenames for performance and robustness."""
    allowed = string.ascii_letters + string.digits + " .-_()"
    for _ in range(100):
        # Randomly mix in some disallowed chars
        raw = "".join(random.choice(allowed + "!@#$%^&*[]{};:'\",/?\\|") for _ in range(random.randint(1, 255)))
        codeflash_output = sanitize_content_disposition(raw); result = codeflash_output
        # Should never contain path separators in the filename part
        if 'filename="' in result:
            fname = result.split('filename="', 1)[1].rstrip('"')

def test_all_ascii_printable_chars():
    """Test with all ASCII printable chars (should escape/disallow as necessary)."""
    all_printable = "".join(chr(i) for i in range(32, 127))
    codeflash_output = sanitize_content_disposition(all_printable); result = codeflash_output
    # Only allowed chars are preserved, others replaced with '_'
    expected = re.sub(r"[^\w.\- ()]", "_", all_printable).strip().strip(".")

def test_very_large_batch_of_filenames():
    """Test a batch of 1000 filenames for consistent results."""
    for i in range(1000):
        fn = f"file_{i}.txt"
        codeflash_output = sanitize_content_disposition(fn); result = codeflash_output
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import pytest
from langflow.api.v2.files import sanitize_content_disposition

# function to test
# (see provided code above; not repeated here for brevity)

# --------------------------
# Basic Test Cases
# --------------------------

def test_ascii_filename_simple():
    # Simple ASCII filename, should be quoted and unchanged
    codeflash_output = sanitize_content_disposition("report.txt"); header = codeflash_output

def test_ascii_filename_with_spaces():
    # Spaces are allowed, should be preserved and quoted
    codeflash_output = sanitize_content_disposition("my report 2024.txt"); header = codeflash_output

def test_ascii_filename_with_safe_symbols():
    # Allowed symbols: _, -, ., (, )
    codeflash_output = sanitize_content_disposition("data-set_(v1.0).csv"); header = codeflash_output

def test_ascii_filename_with_dangerous_chars():
    # Dangerous chars replaced by underscores
    codeflash_output = sanitize_content_disposition("evil/\\:*?\"<>|.txt"); header = codeflash_output

def test_ascii_filename_with_path_traversal():
    # Path traversal should be stripped
    codeflash_output = sanitize_content_disposition("../../etc/passwd"); header = codeflash_output

def test_ascii_filename_leading_trailing_whitespace_and_dots():
    # Leading/trailing whitespace and dots removed
    codeflash_output = sanitize_content_disposition("  .hiddenfile.  "); header = codeflash_output

def test_ascii_filename_empty_string():
    # Empty filename returns "unnamed"
    codeflash_output = sanitize_content_disposition(""); header = codeflash_output

def test_ascii_filename_only_dangerous_chars():
    # Only dangerous chars replaced, then fallback to "unnamed"
    codeflash_output = sanitize_content_disposition("///////"); header = codeflash_output

def test_ascii_filename_with_quotes_and_backslash():
    # Quotes and backslashes are escaped in the header value
    codeflash_output = sanitize_content_disposition('my"file\\name.txt'); header = codeflash_output

# --------------------------
# Unicode/Non-ASCII Test Cases
# --------------------------

def test_unicode_filename_simple():
    # Non-ASCII: triggers RFC 5987 encoding
    codeflash_output = sanitize_content_disposition("résumé.pdf"); header = codeflash_output

def test_unicode_filename_with_spaces():
    # Spaces are encoded as %20
    codeflash_output = sanitize_content_disposition("données 2024.xlsx"); header = codeflash_output

def test_unicode_filename_with_dangerous_chars():
    # Dangerous chars replaced, then encoded
    codeflash_output = sanitize_content_disposition("测试/文档?.txt"); header = codeflash_output

def test_unicode_filename_with_emoji():
    # Emoji triggers RFC 5987 encoding
    codeflash_output = sanitize_content_disposition("report_😀.pdf"); header = codeflash_output

def test_unicode_filename_only_dangerous_chars():
    # All dangerous, non-ASCII chars replaced, fallback to "unnamed"
    codeflash_output = sanitize_content_disposition("测试/\\:*?\"<>|"); header = codeflash_output

# --------------------------
# Edge Test Cases
# --------------------------

def test_filename_max_length_ascii():
    # Filename at exactly the max length (255)
    base = "a" * (255 - 4) + ".txt"  # 251 'a's + ".txt"
    codeflash_output = sanitize_content_disposition(base); header = codeflash_output




def test_filename_with_multiple_dots():
    # Only the last dot is considered the extension separator
    codeflash_output = sanitize_content_disposition("archive.tar.gz"); header = codeflash_output

def test_filename_with_leading_dot():
    # Leading dot is stripped (hidden file protection)
    codeflash_output = sanitize_content_disposition(".bashrc"); header = codeflash_output

def test_filename_with_only_dot():
    # Only dot, should fallback to "unnamed"
    codeflash_output = sanitize_content_disposition("."); header = codeflash_output

def test_filename_with_only_spaces():
    # Only spaces, should fallback to "unnamed"
    codeflash_output = sanitize_content_disposition("   "); header = codeflash_output

def test_filename_with_non_ascii_and_ascii_mix():
    # Mix of ASCII and non-ASCII, triggers encoding
    codeflash_output = sanitize_content_disposition("file_数据.csv"); header = codeflash_output

def test_filename_with_trailing_dot():
    # Trailing dot is stripped
    codeflash_output = sanitize_content_disposition("myfile."); header = codeflash_output

def test_filename_with_trailing_spaces():
    # Trailing spaces are stripped
    codeflash_output = sanitize_content_disposition("myfile.txt   "); header = codeflash_output


def test_filename_with_reserved_windows_names():
    # Reserved Windows names should be sanitized but not replaced
    codeflash_output = sanitize_content_disposition("CON.txt"); header = codeflash_output

# --------------------------
# Large Scale Test Cases
# --------------------------

def test_many_ascii_filenames():
    # Test 500 different ASCII filenames
    for i in range(500):
        fname = f"file_{i}.txt"
        codeflash_output = sanitize_content_disposition(fname); header = codeflash_output




def test_determinism():
    # The same input always produces the same output
    fname = "My File.txt"
    codeflash_output = sanitize_content_disposition(fname); header1 = codeflash_output
    codeflash_output = sanitize_content_disposition(fname); header2 = codeflash_output

# --------------------------
# Case Sensitivity Test
# --------------------------

def test_case_sensitivity():
    # Case is preserved in the output
    fname = "MyFile.TXT"
    codeflash_output = sanitize_content_disposition(fname); header = codeflash_output
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr10819-2025-12-01T20.28.11 and push.

The optimized code achieves a **15% speedup** through two key performance optimizations: **What was optimized:** 1. **Precompiled regex pattern**: Moved `re.compile(r"[^\w.\- ()]")` to module scope as `_SANITIZE_FILENAME_RE`, eliminating regex compilation overhead on every function call. 2. **Faster path extraction**: Replaced `Path(filename).name` with `PurePath(filename).name`. `PurePath` is a lighter-weight class that handles path operations without filesystem access or validation, making it faster for simple string operations like extracting the filename component. **Why this leads to speedup:** - **Regex compilation cost**: The line profiler shows the original `re.sub()` call took 7.5ms (14.1% of total time). With precompilation, this drops to 1.8ms (4.1% of total time) - a **76% reduction** in regex processing time. - **Path object overhead**: `Path` objects include filesystem validation and OS-specific behavior that's unnecessary when we only need to extract the basename. `PurePath` reduces this overhead from 41.5ms to 37.8ms - an **9% improvement** in path processing. **Impact on workloads:** The optimizations are most beneficial for: - **High-frequency filename sanitization** (evident from the 1,663 test iterations) - **Batch file processing scenarios** where the same sanitization logic runs repeatedly - **Web upload handlers** processing multiple files simultaneously **Test case performance:** The annotated tests show consistent improvements across all scenarios - from simple ASCII filenames to complex Unicode cases with path traversal attempts. The optimization maintains identical behavior while reducing CPU overhead, making it particularly valuable for file upload endpoints that may process hundreds of filenames per request.

coderabbitai · 2025-12-01T20:28:25Z

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-12-01T20:33:16Z

Frontend Unit Test Coverage Report

Coverage Summary

Lines	Statements	Branches	Functions
	15.29% (4188/27381)	8.49% (1778/20935)	9.6% (579/6031)

Unit Test Results

Tests	Skipped	Failures	Errors	Time
1638	0 💤	0 ❌	0 🔥	20.733s ⏱️

codecov · 2025-12-01T20:33:50Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 32.39%. Comparing base (ef63f8d) to head (ea24333).

❌ Your project status has failed because the head coverage (40.04%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@                           Coverage Diff                           @@
##           s3-file-size-and-associations-to-flows   #10824   +/-   ##
=======================================================================
  Coverage                                   32.39%   32.39%           
=======================================================================
  Files                                        1367     1367           
  Lines                                       63235    63225   -10     
  Branches                                     9358     9357    -1     
=======================================================================
- Hits                                        20482    20479    -3     
+ Misses                                      41720    41714    -6     
+ Partials                                     1033     1032    -1

Flag	Coverage Δ
frontend	`14.13% <ø> (ø)`
lfx	`40.04% <ø> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
src/backend/base/langflow/api/v2/files.py	`59.10% <ø> (-0.15%)`	⬇️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

ogabrielluiz · 2026-03-03T18:14:40Z

Closing automated codeflash PR.

codeflash-ai Bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Dec 1, 2025

github-actions Bot added the community Pull Request from an external contributor label Dec 1, 2025

[autofix.ci] apply automated fixes

ea24333

ogabrielluiz closed this Mar 3, 2026

codeflash-ai Bot deleted the codeflash/optimize-pr10819-2025-12-01T20.28.11 branch March 3, 2026 18:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡️ Speed up function `sanitize_content_disposition` by 15% in PR #10819 (`s3-file-size-and-associations-to-flows`)#10824

⚡️ Speed up function `sanitize_content_disposition` by 15% in PR #10819 (`s3-file-size-and-associations-to-flows`)#10824
codeflash-ai[bot] wants to merge 2 commits into
s3-file-size-and-associations-to-flowsfrom
codeflash/optimize-pr10819-2025-12-01T20.28.11

codeflash-ai Bot commented Dec 1, 2025

Uh oh!

coderabbitai Bot commented Dec 1, 2025

Review skipped

Uh oh!

github-actions Bot commented Dec 1, 2025

Uh oh!

codecov Bot commented Dec 1, 2025 •

edited

Loading

Uh oh!

ogabrielluiz commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

codeflash-ai Bot commented Dec 1, 2025

⚡️ This pull request contains optimizations for PR #10819

📄 15% (0.15x) speedup for sanitize_content_disposition in src/backend/base/langflow/api/v2/files.py

📝 Explanation and details

Uh oh!

coderabbitai Bot commented Dec 1, 2025

Review skipped

Uh oh!

github-actions Bot commented Dec 1, 2025

Frontend Unit Test Coverage Report

Coverage Summary

Unit Test Results

Uh oh!

codecov Bot commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ogabrielluiz commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

📄 15% (0.15x) speedup for `sanitize_content_disposition` in `src/backend/base/langflow/api/v2/files.py`

codecov Bot commented Dec 1, 2025 •

edited

Loading