Skip to content

⚡️ Speed up function _classify_dependency by 7,582% in PR #9192 (add-deps-metadata)#9193

Merged
ogabrielluiz merged 1 commit into
add-deps-metadatafrom
codeflash/optimize-pr9192-2025-07-25T17.43.34
Jul 25, 2025
Merged

⚡️ Speed up function _classify_dependency by 7,582% in PR #9192 (add-deps-metadata)#9193
ogabrielluiz merged 1 commit into
add-deps-metadatafrom
codeflash/optimize-pr9192-2025-07-25T17.43.34

Conversation

@codeflash-ai
Copy link
Copy Markdown
Contributor

@codeflash-ai codeflash-ai Bot commented Jul 25, 2025

⚡️ This pull request contains optimizations for PR #9192

If you approve this dependent PR, these changes will be merged into the original PR branch add-deps-metadata.

This PR will be automatically closed if the original PR is merged.


📄 7,582% (75.82x) speedup for _classify_dependency in src/backend/base/langflow/custom/dependency_analyzer.py

⏱️ Runtime : 4.77 milliseconds 62.1 microseconds (best of 123 runs)

📝 Explanation and details

Here is a faster version of your _classify_dependency function. The profiling shows that the real bottlenecks are md.distribution(dep.name) and accessing dist.version, both of which trigger expensive package metadata resolution.

Optimizations:

  • Use an LRU cache for package version lookup. Since md.distribution does not cache results, this can greatly reduce overhead for repeated package names.
  • Only import importlib.metadata module objects once (move md.distribution and exceptions out to top-level).
  • If dep.is_local is true or dep.name is falsy, skip lookups immediately.

Notes:

  • This keeps function signature and behavior identical.
  • Comments are unchanged unless the relevant code changes.

Optimized code.

Why this is faster:

  • Repeated queries for the same package name are almost instant due to the LRU cache.
  • Fewer redundant imports or lookups.
  • No extra overhead unless package version lookup is actually needed.

Correctness verification report:

Test Status
⏪ Replay Tests 🔘 None Found
⚙️ Existing Unit Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
🌀 Generated Regression Tests 29 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

import importlib.metadata as md
from dataclasses import dataclass
from typing import Optional

# imports
import pytest  # used for our unit tests
from langflow.custom.dependency_analyzer import _classify_dependency


@dataclass(frozen=True)
class DependencyInfo:
    name: Optional[str]
    version: Optional[str]
    is_local: bool
from langflow.custom.dependency_analyzer import _classify_dependency

# unit tests

# ----------- Basic Test Cases -----------

def test_external_dependency_installed():
    # Test with a common installed package (pytest itself)
    dep = DependencyInfo(name="pytest", version=None, is_local=False)
    codeflash_output = _classify_dependency(dep); result = codeflash_output

def test_external_dependency_not_installed():
    # Test with a non-existent package
    dep = DependencyInfo(name="nonexistent_package_abcdefg", version=None, is_local=False)
    codeflash_output = _classify_dependency(dep); result = codeflash_output

def test_local_dependency():
    # Local dependencies should not be resolved, version remains None
    dep = DependencyInfo(name="my_local_lib", version=None, is_local=True)
    codeflash_output = _classify_dependency(dep); result = codeflash_output

def test_external_dependency_with_version_preserves_name_and_local():
    # Should preserve name and is_local fields
    dep = DependencyInfo(name="pytest", version="1.2.3", is_local=False)
    codeflash_output = _classify_dependency(dep); result = codeflash_output

def test_external_dependency_empty_name():
    # Empty name should not attempt to resolve version
    dep = DependencyInfo(name="", version=None, is_local=False)
    codeflash_output = _classify_dependency(dep); result = codeflash_output

def test_external_dependency_none_name():
    # None name should not attempt to resolve version
    dep = DependencyInfo(name=None, version=None, is_local=False)
    codeflash_output = _classify_dependency(dep); result = codeflash_output

# ----------- Edge Test Cases -----------

def test_local_dependency_with_none_name():
    # Local dependency with None name should not error
    dep = DependencyInfo(name=None, version=None, is_local=True)
    codeflash_output = _classify_dependency(dep); result = codeflash_output

def test_local_dependency_with_empty_name():
    # Local dependency with empty name should not error
    dep = DependencyInfo(name="", version=None, is_local=True)
    codeflash_output = _classify_dependency(dep); result = codeflash_output

def test_external_dependency_case_sensitivity():
    # Some systems are case-sensitive; test with capitalized package name
    dep = DependencyInfo(name="PyTest", version=None, is_local=False)
    codeflash_output = _classify_dependency(dep); result = codeflash_output

def test_external_dependency_with_special_characters():
    # Unlikely to be a valid package, but should not crash
    dep = DependencyInfo(name="pytest!@#$%", version=None, is_local=False)
    codeflash_output = _classify_dependency(dep); result = codeflash_output

def test_external_dependency_with_long_name():
    # Very long package name, should not crash
    long_name = "a" * 256
    dep = DependencyInfo(name=long_name, version=None, is_local=False)
    codeflash_output = _classify_dependency(dep); result = codeflash_output

def test_external_dependency_with_unicode_name():
    # Unicode in package name, should not crash
    dep = DependencyInfo(name="pytesté", version=None, is_local=False)
    codeflash_output = _classify_dependency(dep); result = codeflash_output

def test_external_dependency_with_leading_trailing_spaces():
    # Spaces should be preserved, but package won't be found
    dep = DependencyInfo(name=" pytest ", version=None, is_local=False)
    codeflash_output = _classify_dependency(dep); result = codeflash_output

def test_dependencyinfo_immutable():
    # Ensure DependencyInfo is immutable (frozen dataclass)
    dep = DependencyInfo(name="pytest", version=None, is_local=True)
    with pytest.raises(Exception):
        dep.name = "foo"

# ----------- Large Scale Test Cases -----------






from __future__ import annotations

import importlib.metadata as md
from dataclasses import dataclass
from typing import Optional

# imports
import pytest  # used for our unit tests
from langflow.custom.dependency_analyzer import _classify_dependency


@dataclass(frozen=True)
class DependencyInfo:
    name: Optional[str]
    version: Optional[str]
    is_local: bool
from langflow.custom.dependency_analyzer import _classify_dependency

# unit tests

# ----------- BASIC TEST CASES ------------

def test_external_dependency_installed_package():
    """Test with a common installed external package (pytest itself)."""
    dep = DependencyInfo(name="pytest", version=None, is_local=False)
    codeflash_output = _classify_dependency(dep); result = codeflash_output

def test_external_dependency_not_installed():
    """Test with a non-existent external package name."""
    dep = DependencyInfo(name="definitely_not_installed_package_12345", version=None, is_local=False)
    codeflash_output = _classify_dependency(dep); result = codeflash_output

def test_local_dependency_with_name():
    """Test with a local dependency, name given."""
    dep = DependencyInfo(name="my_local_package", version=None, is_local=True)
    codeflash_output = _classify_dependency(dep); result = codeflash_output

def test_local_dependency_no_name():
    """Test with a local dependency, name is None."""
    dep = DependencyInfo(name=None, version=None, is_local=True)
    codeflash_output = _classify_dependency(dep); result = codeflash_output

def test_external_dependency_no_name():
    """Test with an external dependency, name is None."""
    dep = DependencyInfo(name=None, version=None, is_local=False)
    codeflash_output = _classify_dependency(dep); result = codeflash_output

# ----------- EDGE TEST CASES ------------

def test_external_dependency_empty_string_name():
    """Test with external dependency, name is empty string."""
    dep = DependencyInfo(name="", version=None, is_local=False)
    codeflash_output = _classify_dependency(dep); result = codeflash_output

def test_local_dependency_empty_string_name():
    """Test with local dependency, name is empty string."""
    dep = DependencyInfo(name="", version=None, is_local=True)
    codeflash_output = _classify_dependency(dep); result = codeflash_output

def test_external_dependency_case_sensitivity():
    """Test with external dependency, name with different case."""
    # 'PyTeSt' should resolve to the same package as 'pytest'
    dep = DependencyInfo(name="PyTeSt", version=None, is_local=False)
    codeflash_output = _classify_dependency(dep); result = codeflash_output

def test_dependency_with_existing_version():
    """Test that existing version in DependencyInfo is ignored and replaced."""
    dep = DependencyInfo(name="pytest", version="0.0.1", is_local=False)
    codeflash_output = _classify_dependency(dep); result = codeflash_output

def test_external_dependency_with_special_characters():
    """Test with a package name containing special characters (invalid)."""
    dep = DependencyInfo(name="!nv@lid_n@me", version=None, is_local=False)
    codeflash_output = _classify_dependency(dep); result = codeflash_output

def test_external_dependency_with_hyphens_and_underscores():
    """Test with a package name that uses hyphens and underscores."""
    # 'importlib_metadata' is installed as 'importlib-metadata'
    dep = DependencyInfo(name="importlib_metadata", version=None, is_local=False)
    codeflash_output = _classify_dependency(dep); result = codeflash_output

def test_dependencyinfo_immutability():
    """Test that returned DependencyInfo is a new object and not mutated in place."""
    dep = DependencyInfo(name="pytest", version=None, is_local=False)
    codeflash_output = _classify_dependency(dep); result = codeflash_output

# ----------- LARGE SCALE TEST CASES ------------




def test_determinism_for_same_input():
    """Test that repeated calls with the same input yield the same output."""
    dep = DependencyInfo(name="pytest", version=None, is_local=False)
    codeflash_output = _classify_dependency(dep); result1 = codeflash_output
    codeflash_output = _classify_dependency(dep); result2 = codeflash_output

# ----------- TYPE AND ATTRIBUTE TESTS ------------

def test_return_type_and_fields():
    """Test that the returned object is of type DependencyInfo and has correct fields."""
    dep = DependencyInfo(name="pytest", version=None, is_local=False)
    codeflash_output = _classify_dependency(dep); result = codeflash_output
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr9192-2025-07-25T17.43.34 and push.

Codeflash

…dd-deps-metadata`)

Here is a faster version of your `_classify_dependency` function. The profiling shows that the real bottlenecks are `md.distribution(dep.name)` and accessing `dist.version`, both of which trigger expensive package metadata resolution.

**Optimizations:**
- Use an **LRU cache** for package version lookup. Since `md.distribution` does not cache results, this can greatly reduce overhead for repeated package names.
- Only import `importlib.metadata` module objects once (move `md.distribution` and exceptions out to top-level).
- If `dep.is_local` is true or `dep.name` is falsy, skip lookups immediately.

**Notes:**  
- This keeps function signature and behavior identical.
- Comments are unchanged unless the relevant code changes.

Optimized code.



**Why this is faster:**  
- Repeated queries for the same package name are almost instant due to the LRU cache.
- Fewer redundant imports or lookups.
- No extra overhead unless package version lookup is actually needed.
@codeflash-ai codeflash-ai Bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Jul 25, 2025
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jul 25, 2025

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Join our Discord community for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@misrasaurabh1
Copy link
Copy Markdown
Contributor

this looks good, i suspect repeated dependencies. caching could be useful especially because the gains are considerable.

@github-actions github-actions Bot added the lgtm This PR has been approved by a maintainer label Jul 25, 2025
@ogabrielluiz ogabrielluiz merged commit 07d3bf4 into add-deps-metadata Jul 25, 2025
15 of 17 checks passed
@ogabrielluiz ogabrielluiz deleted the codeflash/optimize-pr9192-2025-07-25T17.43.34 branch July 25, 2025 17:49
github-merge-queue Bot pushed a commit that referenced this pull request Aug 25, 2025
…ing (#9192)

* feat: add dependency analysis utilities for custom components

- Introduced `dependency_analyzer.py` to analyze and classify dependencies in Python code.
- Implemented functions to extract import information and categorize dependencies as standard library, local, or external.
- Enhanced `build_component_metadata` to include dependency analysis results in component metadata.
- Added unit tests to validate the functionality of the dependency analysis features.

* refactor: streamline dependency analysis by filtering out stdlib and local imports

- Updated `dependency_analyzer.py` to focus on external dependencies only, removing standard library and local imports from analysis results.
- Simplified the `DependencyInfo` class by eliminating unnecessary attributes and adjusting the deduplication logic.
- Modified `build_component_metadata` to reflect changes in dependency structure, removing counts for stdlib and local dependencies.
- Enhanced unit tests to validate the new filtering behavior and ensure no duplicates in external dependencies.

* feat: update starter project metadata with dependency information

- Added dependency sections to multiple starter project JSON files, specifying required packages and their versions.
- Included `langflow` version `1.5.0.post1` and other relevant dependencies such as `orjson`, `fastapi`, and `pydantic` across various projects.
- Enhanced project metadata to improve clarity on external dependencies for better maintainability and user guidance.

* ⚡️ Speed up function `_classify_dependency` by 7,582% in PR #9192 (`add-deps-metadata`) (#9193)

Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>

* fix: ensure distribution version is returned correctly in `_get_distribution_version`

- Updated `_get_distribution_version` function to return the distribution version after successfully retrieving it, addressing a potential issue where `None` could be returned prematurely.

* fix: improve distribution version lookup in `_get_distribution_version`

* fix: handle distribution version lookup exceptions more gracefully

* [autofix.ci] apply automated fixes

* [autofix.ci] apply automated fixes

* [autofix.ci] apply automated fixes (attempt 2/3)

* fix(apply_tweaks): skip tweaks to code field and log warning (#9467)

* fix: add security warning for overriding code field in tweaks

* test: add tests for preventing code field overrides in tweaks

* ref: Refactor vectorstore components structure (#9486)

* Refactor vectorstore components structure

Moved vectorstore components for Chroma, ClickHouse, Couchbase, DataStax, Elastic, Milvus, MongoDB, Pinecone, Qdrant, Supabase, Upstash, Vectara, and Weaviate into dedicated subfolders with __init__.py files for each. Updated Redis vectorstore implementation to reside in redis.py and removed the old vectorstores/redis.py. Adjusted starter project JSONs and frontend constants to reflect new module paths and sidebar entries for these vectorstores.

* Refactor vectorstore components and add lazy imports

Moved Datastax-related files from vectorstores to a dedicated datastax directory. Added lazy import logic to __init__.py files for chroma, clickhouse, couchbase, elastic, milvus, mongodb, pinecone, qdrant, supabase, upstash, vectara, and weaviate components. Cleaned up vectorstores/__init__.py to only include local and faiss components, improving modularity and import efficiency.

* [autofix.ci] apply automated fixes

* Refactor vectorstore components structure

Moved FAISS, Cassandra, and pgvector components to dedicated subdirectories with lazy-loading __init__.py files. Updated imports and references throughout the backend and frontend to reflect new locations. Removed obsolete datastax Cassandra component. Added new sidebar bundle entries for FAISS, Cassandra, and pgvector in frontend constants and style utilities.

* Add lazy imports and Redis chat memory component

Refactored the Redis module to support lazy imports for RedisIndexChatMemory and RedisVectorStoreComponent, improving import efficiency. Added a new redis_chat.py file implementing RedisIndexChatMemory for chat message storage and retrieval using Redis.

* Fix vector store astra imports

* Revert package lock changes

* More test fixes

* Update test_vector_store_rag.py

* Update test_dynamic_imports.py

* Update vector_store_rag.py

* Update test_dynamic_imports.py

* Refactor the cassandra chat component

* Fix frontend tests for bundle

* Mark Local DB as legacy

* Update inputComponent.spec.ts

* [autofix.ci] apply automated fixes

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Hare <ericrhare@gmail.com>
Co-authored-by: Carlos Coelho <80289056+carlosrcoelho@users.noreply.github.com>

* feat: add dependencies metadata to starter projects

* feat: add caching for packages_distributions to improve performance

* refactor: update test descriptions and remove unused imports in metadata tests

---------

Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: Edwin Jose <edwin.jose@datastax.com>
Co-authored-by: Eric Hare <ericrhare@gmail.com>
Co-authored-by: Carlos Coelho <80289056+carlosrcoelho@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI lgtm This PR has been approved by a maintainer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants