Skip to content

⚡️ Speed up method LocalStorageService.parse_file_path by 64% in PR #10929 (fix-image-s3)#10930

Closed
codeflash-ai[bot] wants to merge 2 commits into
release-1.7.0from
codeflash/optimize-pr10929-2025-12-08T17.16.19
Closed

⚡️ Speed up method LocalStorageService.parse_file_path by 64% in PR #10929 (fix-image-s3)#10930
codeflash-ai[bot] wants to merge 2 commits into
release-1.7.0from
codeflash/optimize-pr10929-2025-12-08T17.16.19

Conversation

@codeflash-ai
Copy link
Copy Markdown
Contributor

@codeflash-ai codeflash-ai Bot commented Dec 8, 2025

⚡️ This pull request contains optimizations for PR #10929

If you approve this dependent PR, these changes will be merged into the original PR branch fix-image-s3.

This PR will be automatically closed if the original PR is merged.


📄 64% (0.64x) speedup for LocalStorageService.parse_file_path in src/backend/base/langflow/services/storage/local.py

⏱️ Runtime : 1.18 milliseconds 714 microseconds (best of 110 runs)

📝 Explanation and details

The optimized code achieves a 64% speedup through two key optimizations that eliminate expensive repeated operations:

1. Precomputed string conversion (56% time savings)
The original code called str(self.data_dir) on every function call, which consumed 56.8% of execution time. The optimized version precomputes this as self._data_dir_str during initialization, reducing this operation to a simple attribute access (10.1% of execution time).

2. Optimized path splitting (8% time savings)
The original code used rsplit("/", 1) which internally scans the string and creates temporary substrings. The optimized version uses rfind("/") to locate the last slash once, then performs direct string slicing ([:slash_index] and [slash_index+1:]), which is more efficient for Python's string operations.

3. Minor control flow improvement
The optimized version avoids unnecessary variable assignments when the path doesn't start with the data directory prefix, using an if/else structure instead of always assigning then conditionally reassigning.

Performance impact on test cases:

  • Basic cases (paths with/without data_dir): Benefit significantly from cached string conversion
  • Edge cases (empty paths, trailing slashes): Maintain correctness while gaining speed
  • Large scale cases (many nested folders, long paths): Double benefit from both optimizations since they avoid repeated expensive operations

The optimizations preserve all original behavior and edge case handling while reducing the most expensive operations in the hot path. Since parse_file_path appears to be called frequently (1116 hits in profiling), these micro-optimizations compound into meaningful performance gains.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 1179 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import types

import anyio
# imports
import pytest
from langflow.services.storage.local import LocalStorageService

# --- Minimal stubs for dependencies (to allow instantiation) ---

class DummySettings:
    def __init__(self, config_dir):
        self.config_dir = config_dir

class DummySettingsService:
    def __init__(self, config_dir):
        self.settings = DummySettings(config_dir)

class DummySessionService:
    pass

# --- Copied function to test (with minimal base class) ---


class StorageService:
    def __init__(self, session_service, settings_service):
        self.settings_service = settings_service
        self.session_service = session_service
        self.data_dir = anyio.Path(settings_service.settings.config_dir)
        self.set_ready()
    def set_ready(self):
        pass
from langflow.services.storage.local import LocalStorageService

# --- Fixtures for test setup ---

@pytest.fixture
def service_slash():
    # config_dir ends with slash
    return LocalStorageService(
        DummySessionService(),
        DummySettingsService("/data/")
    )

@pytest.fixture
def service_noslash():
    # config_dir does not end with slash
    return LocalStorageService(
        DummySessionService(),
        DummySettingsService("/data")
    )

@pytest.fixture
def service_customdir():
    # config_dir is a custom path
    return LocalStorageService(
        DummySessionService(),
        DummySettingsService("/custom/path")
    )

# --- Basic Test Cases ---

def test_basic_with_data_dir(service_noslash):
    # Path starts with data_dir, normal case
    flow_id, file_name = service_noslash.parse_file_path("/data/user_123/image.png")

def test_basic_without_data_dir(service_noslash):
    # Path does not start with data_dir, normal case
    flow_id, file_name = service_noslash.parse_file_path("user_123/image.png")

def test_basic_nested_flow_id(service_noslash):
    # Nested flow_id (subfolders)
    flow_id, file_name = service_noslash.parse_file_path("/data/user_123/subfolder/image.png")

def test_basic_filename_only(service_noslash):
    # Only filename, no flow_id
    flow_id, file_name = service_noslash.parse_file_path("image.png")

def test_basic_filename_only_with_data_dir(service_noslash):
    # Only filename, with data_dir
    flow_id, file_name = service_noslash.parse_file_path("/data/image.png")

# --- Edge Test Cases ---

def test_edge_empty_string(service_noslash):
    # Empty string as input
    flow_id, file_name = service_noslash.parse_file_path("")

def test_edge_only_slash(service_noslash):
    # Only a slash
    flow_id, file_name = service_noslash.parse_file_path("/")

def test_edge_multiple_slashes(service_noslash):
    # Path with multiple consecutive slashes
    flow_id, file_name = service_noslash.parse_file_path("/data//user_123///image.png")

def test_edge_trailing_slash(service_noslash):
    # Trailing slash (should treat as folder, not file)
    flow_id, file_name = service_noslash.parse_file_path("/data/user_123/")

def test_edge_dot_in_filename(service_noslash):
    # Filename with multiple dots
    flow_id, file_name = service_noslash.parse_file_path("/data/user_123/my.photo.png")

def test_edge_dotfile(service_noslash):
    # Hidden file (starts with dot)
    flow_id, file_name = service_noslash.parse_file_path("/data/user_123/.env")

def test_edge_data_dir_with_trailing_slash(service_slash):
    # data_dir ends with slash, path starts with same
    flow_id, file_name = service_slash.parse_file_path("/data/user_123/image.png")

def test_edge_data_dir_without_leading_slash(service_customdir):
    # data_dir is custom, path does not start with slash
    flow_id, file_name = service_customdir.parse_file_path("custom/path/user_123/image.png")

def test_edge_path_is_data_dir(service_noslash):
    # Path is exactly the data_dir
    flow_id, file_name = service_noslash.parse_file_path("/data")

def test_edge_path_is_data_dir_with_slash(service_slash):
    # Path is exactly the data_dir with trailing slash
    flow_id, file_name = service_slash.parse_file_path("/data/")

def test_edge_file_in_root_of_data_dir(service_noslash):
    # File directly in data_dir
    flow_id, file_name = service_noslash.parse_file_path("/data/file.txt")

def test_edge_file_with_unicode(service_noslash):
    # Unicode characters in filename and flow_id
    flow_id, file_name = service_noslash.parse_file_path("/data/用户_测试/文件🌟.txt")

def test_edge_file_with_spaces(service_noslash):
    # Spaces in path
    flow_id, file_name = service_noslash.parse_file_path("/data/user 123/my file.txt")

def test_edge_file_with_special_chars(service_noslash):
    # Special characters in filename
    flow_id, file_name = service_noslash.parse_file_path("/data/user_123/file-@#$.txt")

def test_edge_path_with_backslashes(service_noslash):
    # Backslashes in path (should not be split)
    flow_id, file_name = service_noslash.parse_file_path(r"/data/user_123\subfolder\image.png")

def test_edge_path_with_leading_and_trailing_spaces(service_noslash):
    # Leading/trailing spaces in path
    flow_id, file_name = service_noslash.parse_file_path("   /data/user_123/image.png   ")

def test_edge_path_with_dot_and_dotdot(service_noslash):
    # Path contains . or ..
    flow_id, file_name = service_noslash.parse_file_path("/data/user_123/../image.png")

def test_edge_path_with_multiple_nested_subfolders(service_noslash):
    # Deeply nested subfolders
    path = "/data/user_123/a/b/c/d/e/f/g/image.png"
    flow_id, file_name = service_noslash.parse_file_path(path)

# --- Large Scale Test Cases ---

def test_large_number_of_subfolders(service_noslash):
    # Many subfolders (but <1000)
    subfolders = "/".join(f"folder{i}" for i in range(100))
    path = f"/data/{subfolders}/file.txt"
    flow_id, file_name = service_noslash.parse_file_path(path)

def test_large_filename(service_noslash):
    # Very long filename
    filename = "a" * 255 + ".txt"
    path = f"/data/user_123/{filename}"
    flow_id, file_name = service_noslash.parse_file_path(path)

def test_large_path(service_noslash):
    # Very long path with many folders and a long filename
    subfolders = "/".join(f"folder{i}" for i in range(500))
    filename = "x" * 200 + ".dat"
    path = f"/data/{subfolders}/{filename}"
    flow_id, file_name = service_noslash.parse_file_path(path)

def test_many_calls_performance(service_noslash):
    # Call the function many times to check for consistent performance
    for i in range(500):
        path = f"/data/user_{i}/file_{i}.txt"
        flow_id, file_name = service_noslash.parse_file_path(path)

def test_large_scale_mixed_inputs(service_noslash):
    # Mix of valid and edge-case paths in a batch
    paths = [
        "/data/user_1/file1.txt",
        "user_2/file2.txt",
        "/data/user_3/sub/file3.txt",
        "file4.txt",
        "/data/file5.txt",
        "",
        "/data/",
        "/data/user_4/",
        "/data/user_5/.hidden",
        "/data/user_6/../file6.txt",
    ]
    expected = [
        ("user_1", "file1.txt"),
        ("user_2", "file2.txt"),
        ("user_3/sub", "file3.txt"),
        ("", "file4.txt"),
        ("", "file5.txt"),
        ("", ""),
        ("", ""),
        ("user_4", ""),
        ("user_5", ".hidden"),
        ("user_6/..", "file6.txt"),
    ]
    for path, exp in zip(paths, expected):
        flow_id, file_name = service_noslash.parse_file_path(path)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import pytest
from langflow.services.storage.local import LocalStorageService


# Minimal stubs for required classes to allow instantiation
class Settings:
    def __init__(self, config_dir):
        self.config_dir = config_dir

class SettingsService:
    def __init__(self, config_dir="/data"):
        self.settings = Settings(config_dir)

class SessionService:
    pass

# Minimal StorageService base class as per provided code
class StorageService:
    def __init__(self, session_service, settings_service):
        self.settings_service = settings_service
        self.session_service = session_service
        self.data_dir = settings_service.settings.config_dir  # Use str for simplicity

    def set_ready(self):
        pass
from langflow.services.storage.local import LocalStorageService


# Fixtures for the service
@pytest.fixture
def storage_service_default():
    # Default config_dir is "/data"
    return LocalStorageService(SessionService(), SettingsService("/data"))

@pytest.fixture
def storage_service_custom():
    # Custom config_dir for edge cases
    return LocalStorageService(SessionService(), SettingsService("/custom_dir"))

# 1. Basic Test Cases

def test_basic_with_data_dir(storage_service_default):
    # Path includes data_dir
    flow_id, file_name = storage_service_default.parse_file_path("/data/user_123/image.png")

def test_basic_without_data_dir(storage_service_default):
    # Path does not include data_dir
    flow_id, file_name = storage_service_default.parse_file_path("user_123/image.png")

def test_basic_nested_flow_id(storage_service_default):
    # Path with nested flow_id
    flow_id, file_name = storage_service_default.parse_file_path("/data/group/subgroup/user_123/image.png")

def test_basic_no_flow_id(storage_service_default):
    # Path with only filename
    flow_id, file_name = storage_service_default.parse_file_path("image.png")

def test_basic_no_flow_id_with_data_dir(storage_service_default):
    # Path with only filename, includes data_dir
    flow_id, file_name = storage_service_default.parse_file_path("/data/image.png")

# 2. Edge Test Cases

def test_edge_empty_path(storage_service_default):
    # Empty string path
    flow_id, file_name = storage_service_default.parse_file_path("")

def test_edge_slash_only(storage_service_default):
    # Path is only a slash
    flow_id, file_name = storage_service_default.parse_file_path("/")

def test_edge_trailing_slash(storage_service_default):
    # Path ends with a slash (should treat as empty filename)
    flow_id, file_name = storage_service_default.parse_file_path("/data/user_123/")

def test_edge_leading_and_trailing_slashes(storage_service_default):
    # Path with leading and trailing slashes
    flow_id, file_name = storage_service_default.parse_file_path("/data//user_123//image.png/")

def test_edge_double_slash_in_flow_id(storage_service_default):
    # Path with double slashes in flow_id
    flow_id, file_name = storage_service_default.parse_file_path("/data/group//user_123/image.png")

def test_edge_filename_with_slash(storage_service_default):
    # Path with filename containing a slash (should not happen, but test)
    flow_id, file_name = storage_service_default.parse_file_path("/data/user_123/image.png/extra")

def test_edge_filename_is_empty_string(storage_service_default):
    # Path ending with slash, no filename
    flow_id, file_name = storage_service_default.parse_file_path("/data/user_123/")

def test_edge_custom_data_dir(storage_service_custom):
    # Custom data_dir, path includes custom dir
    flow_id, file_name = storage_service_custom.parse_file_path("/custom_dir/user_456/file.txt")

def test_edge_custom_data_dir_not_in_path(storage_service_custom):
    # Custom data_dir, path does not include custom dir
    flow_id, file_name = storage_service_custom.parse_file_path("user_456/file.txt")

def test_edge_data_dir_as_substring(storage_service_default):
    # Path where data_dir is a substring but not a prefix
    flow_id, file_name = storage_service_default.parse_file_path("not_data/user_123/image.png")

def test_edge_data_dir_prefix_but_partial_match(storage_service_default):
    # Path starts with /data but not exactly data_dir
    flow_id, file_name = storage_service_default.parse_file_path("/data_extra/user_123/image.png")

def test_edge_filename_with_multiple_periods(storage_service_default):
    # Filename has multiple periods
    flow_id, file_name = storage_service_default.parse_file_path("/data/user_123/image.backup.png")

def test_edge_flow_id_with_period(storage_service_default):
    # Flow_id contains a period
    flow_id, file_name = storage_service_default.parse_file_path("/data/user.123/image.png")

def test_edge_path_with_spaces(storage_service_default):
    # Path contains spaces
    flow_id, file_name = storage_service_default.parse_file_path("/data/user 123/image 1.png")

def test_edge_path_with_unicode(storage_service_default):
    # Path contains unicode characters
    flow_id, file_name = storage_service_default.parse_file_path("/data/用户/image文件.png")

def test_edge_path_with_special_chars(storage_service_default):
    # Path contains special characters
    flow_id, file_name = storage_service_default.parse_file_path("/data/user_!@#/file_#$.txt")

def test_edge_path_is_data_dir_only(storage_service_default):
    # Path is exactly data_dir
    flow_id, file_name = storage_service_default.parse_file_path("/data")

def test_edge_path_is_data_dir_with_trailing_slash(storage_service_default):
    # Path is data_dir with trailing slash
    flow_id, file_name = storage_service_default.parse_file_path("/data/")

def test_edge_path_is_dot(storage_service_default):
    # Path is "."
    flow_id, file_name = storage_service_default.parse_file_path(".")

def test_edge_path_is_dot_slash_filename(storage_service_default):
    # Path is "./filename.txt"
    flow_id, file_name = storage_service_default.parse_file_path("./filename.txt")

def test_edge_path_is_relative(storage_service_default):
    # Relative path
    flow_id, file_name = storage_service_default.parse_file_path("relative/path/to/file.txt")

def test_edge_path_is_absolute(storage_service_default):
    # Absolute path not starting with data_dir
    flow_id, file_name = storage_service_default.parse_file_path("/absolute/path/to/file.txt")

# 3. Large Scale Test Cases

def test_large_scale_many_segments(storage_service_default):
    # Path with many segments in flow_id
    segments = [f"segment{i}" for i in range(100)]
    flow_id_str = "/".join(segments)
    path = f"/data/{flow_id_str}/file.txt"
    flow_id, file_name = storage_service_default.parse_file_path(path)

def test_large_scale_long_filename(storage_service_default):
    # Very long filename
    long_filename = "a" * 500 + ".txt"
    path = f"/data/user_123/{long_filename}"
    flow_id, file_name = storage_service_default.parse_file_path(path)

def test_large_scale_long_flow_id(storage_service_default):
    # Very long flow_id
    long_flow_id = "user_" + "x" * 500
    path = f"/data/{long_flow_id}/file.txt"
    flow_id, file_name = storage_service_default.parse_file_path(path)

def test_large_scale_many_files(storage_service_default):
    # Test many different file paths in a loop
    for i in range(100):
        flow_id = f"user_{i}"
        filename = f"file_{i}.dat"
        path = f"/data/{flow_id}/{filename}"
        result_flow_id, result_file_name = storage_service_default.parse_file_path(path)

def test_large_scale_no_data_dir(storage_service_default):
    # Many files without data_dir prefix
    for i in range(100):
        flow_id = f"user_{i}"
        filename = f"file_{i}.dat"
        path = f"{flow_id}/{filename}"
        result_flow_id, result_file_name = storage_service_default.parse_file_path(path)

def test_large_scale_mixed_paths(storage_service_default):
    # Mix of paths with and without data_dir, nested and flat
    for i in range(50):
        flow_id = f"group_{i}/user_{i}"
        filename = f"file_{i}.dat"
        # With data_dir
        path1 = f"/data/{flow_id}/{filename}"
        rfid1, rfn1 = storage_service_default.parse_file_path(path1)
        # Without data_dir
        path2 = f"{flow_id}/{filename}"
        rfid2, rfn2 = storage_service_default.parse_file_path(path2)

def test_large_scale_filename_with_slashes(storage_service_default):
    # Many paths where filename is empty due to trailing slash
    for i in range(100):
        flow_id = f"user_{i}"
        path = f"/data/{flow_id}/"
        result_flow_id, result_file_name = storage_service_default.parse_file_path(path)

def test_large_scale_edge_cases(storage_service_default):
    # Mix of edge cases in large scale
    for i in range(50):
        # Empty filename
        path1 = f"/data/user_{i}/"
        flow_id, file_name = storage_service_default.parse_file_path(path1)
        # Only filename
        path2 = f"file_{i}.txt"
        flow_id, file_name = storage_service_default.parse_file_path(path2)
        # Deep nesting
        path3 = f"/data/a/b/c/d/e/f/g/user_{i}/file_{i}.txt"
        flow_id, file_name = storage_service_default.parse_file_path(path3)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr10929-2025-12-08T17.16.19 and push.

Codeflash

jordanrfrazier and others added 2 commits December 8, 2025 12:05
* Fix image pathing to operate with s3 storage

* [autofix.ci] apply automated fixes

* [autofix.ci] apply automated fixes (attempt 2/3)

* add test

* [autofix.ci] apply automated fixes

* ruff

* Add abstract method annotation

* [autofix.ci] apply automated fixes

* fix: use parse_file_path in get_files for S3 storage compatibility

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: himavarshagoutham <himavarshajan17@gmail.com>
The optimized code achieves a **64% speedup** through two key optimizations that eliminate expensive repeated operations:

**1. Precomputed string conversion (56% time savings)**
The original code called `str(self.data_dir)` on every function call, which consumed 56.8% of execution time. The optimized version precomputes this as `self._data_dir_str` during initialization, reducing this operation to a simple attribute access (10.1% of execution time).

**2. Optimized path splitting (8% time savings)**
The original code used `rsplit("/", 1)` which internally scans the string and creates temporary substrings. The optimized version uses `rfind("/")` to locate the last slash once, then performs direct string slicing (`[:slash_index]` and `[slash_index+1:]`), which is more efficient for Python's string operations.

**3. Minor control flow improvement**
The optimized version avoids unnecessary variable assignments when the path doesn't start with the data directory prefix, using an if/else structure instead of always assigning then conditionally reassigning.

**Performance impact on test cases:**
- **Basic cases** (paths with/without data_dir): Benefit significantly from cached string conversion
- **Edge cases** (empty paths, trailing slashes): Maintain correctness while gaining speed
- **Large scale cases** (many nested folders, long paths): Double benefit from both optimizations since they avoid repeated expensive operations

The optimizations preserve all original behavior and edge case handling while reducing the most expensive operations in the hot path. Since `parse_file_path` appears to be called frequently (1116 hits in profiling), these micro-optimizations compound into meaningful performance gains.
@codeflash-ai codeflash-ai Bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Dec 8, 2025
@github-actions github-actions Bot added the community Pull Request from an external contributor label Dec 8, 2025
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Dec 8, 2025

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 8, 2025

Codecov Report

❌ Patch coverage is 61.70213% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 32.72%. Comparing base (1174a6a) to head (f77cc18).
⚠️ Report is 53 commits behind head on release-1.7.0.

Files with missing lines Patch % Lines
src/lfx/src/lfx/services/storage/local.py 11.11% 8 Missing ⚠️
src/lfx/src/lfx/utils/image.py 33.33% 7 Missing and 1 partial ⚠️
.../backend/base/langflow/services/storage/service.py 80.00% 1 Missing ⚠️
src/lfx/src/lfx/schema/image.py 50.00% 1 Missing ⚠️

❌ Your project status has failed because the head coverage (40.02%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@                Coverage Diff                @@
##           release-1.7.0   #10930      +/-   ##
=================================================
+ Coverage          32.43%   32.72%   +0.29%     
=================================================
  Files               1367     1368       +1     
  Lines              63315    63497     +182     
  Branches            9357     9379      +22     
=================================================
+ Hits               20538    20782     +244     
+ Misses             41744    41674      -70     
- Partials            1033     1041       +8     
Flag Coverage Δ
backend 51.90% <95.83%> (+0.66%) ⬆️
lfx 40.02% <26.08%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...rc/backend/base/langflow/services/storage/local.py 87.50% <100.00%> (+1.61%) ⬆️
src/backend/base/langflow/services/storage/s3.py 25.49% <100.00%> (+13.07%) ⬆️
src/lfx/src/lfx/base/data/base_file.py 35.45% <ø> (-0.65%) ⬇️
src/lfx/src/lfx/services/interfaces.py 100.00% <ø> (ø)
src/lfx/src/lfx/services/storage/service.py 57.89% <ø> (-4.02%) ⬇️
.../backend/base/langflow/services/storage/service.py 78.04% <80.00%> (-0.74%) ⬇️
src/lfx/src/lfx/schema/image.py 55.75% <50.00%> (+12.05%) ⬆️
src/lfx/src/lfx/services/storage/local.py 20.73% <11.11%> (-1.19%) ⬇️
src/lfx/src/lfx/utils/image.py 73.52% <33.33%> (-4.25%) ⬇️

... and 31 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Base automatically changed from fix-image-s3 to release-1.7.0 December 8, 2025 18:04
@ogabrielluiz
Copy link
Copy Markdown
Contributor

Closing automated codeflash PR.

@codeflash-ai codeflash-ai Bot deleted the codeflash/optimize-pr10929-2025-12-08T17.16.19 branch March 3, 2026 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI community Pull Request from an external contributor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants