⚡️ Speed up function _classify_dependency by 7,582% in PR #9192 (add-deps-metadata)#9193
Merged
ogabrielluiz merged 1 commit intoJul 25, 2025
Conversation
…dd-deps-metadata`) Here is a faster version of your `_classify_dependency` function. The profiling shows that the real bottlenecks are `md.distribution(dep.name)` and accessing `dist.version`, both of which trigger expensive package metadata resolution. **Optimizations:** - Use an **LRU cache** for package version lookup. Since `md.distribution` does not cache results, this can greatly reduce overhead for repeated package names. - Only import `importlib.metadata` module objects once (move `md.distribution` and exceptions out to top-level). - If `dep.is_local` is true or `dep.name` is falsy, skip lookups immediately. **Notes:** - This keeps function signature and behavior identical. - Comments are unchanged unless the relevant code changes. Optimized code. **Why this is faster:** - Repeated queries for the same package name are almost instant due to the LRU cache. - Fewer redundant imports or lookups. - No extra overhead unless package version lookup is actually needed.
Contributor
|
Important Review skippedBot user detected. To trigger a single review, invoke the You can disable this status message by setting the 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Join our Discord community for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
Contributor
|
this looks good, i suspect repeated dependencies. caching could be useful especially because the gains are considerable. |
ogabrielluiz
approved these changes
Jul 25, 2025
github-merge-queue Bot
pushed a commit
that referenced
this pull request
Aug 25, 2025
…ing (#9192) * feat: add dependency analysis utilities for custom components - Introduced `dependency_analyzer.py` to analyze and classify dependencies in Python code. - Implemented functions to extract import information and categorize dependencies as standard library, local, or external. - Enhanced `build_component_metadata` to include dependency analysis results in component metadata. - Added unit tests to validate the functionality of the dependency analysis features. * refactor: streamline dependency analysis by filtering out stdlib and local imports - Updated `dependency_analyzer.py` to focus on external dependencies only, removing standard library and local imports from analysis results. - Simplified the `DependencyInfo` class by eliminating unnecessary attributes and adjusting the deduplication logic. - Modified `build_component_metadata` to reflect changes in dependency structure, removing counts for stdlib and local dependencies. - Enhanced unit tests to validate the new filtering behavior and ensure no duplicates in external dependencies. * feat: update starter project metadata with dependency information - Added dependency sections to multiple starter project JSON files, specifying required packages and their versions. - Included `langflow` version `1.5.0.post1` and other relevant dependencies such as `orjson`, `fastapi`, and `pydantic` across various projects. - Enhanced project metadata to improve clarity on external dependencies for better maintainability and user guidance. * ⚡️ Speed up function `_classify_dependency` by 7,582% in PR #9192 (`add-deps-metadata`) (#9193) Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com> * fix: ensure distribution version is returned correctly in `_get_distribution_version` - Updated `_get_distribution_version` function to return the distribution version after successfully retrieving it, addressing a potential issue where `None` could be returned prematurely. * fix: improve distribution version lookup in `_get_distribution_version` * fix: handle distribution version lookup exceptions more gracefully * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * fix(apply_tweaks): skip tweaks to code field and log warning (#9467) * fix: add security warning for overriding code field in tweaks * test: add tests for preventing code field overrides in tweaks * ref: Refactor vectorstore components structure (#9486) * Refactor vectorstore components structure Moved vectorstore components for Chroma, ClickHouse, Couchbase, DataStax, Elastic, Milvus, MongoDB, Pinecone, Qdrant, Supabase, Upstash, Vectara, and Weaviate into dedicated subfolders with __init__.py files for each. Updated Redis vectorstore implementation to reside in redis.py and removed the old vectorstores/redis.py. Adjusted starter project JSONs and frontend constants to reflect new module paths and sidebar entries for these vectorstores. * Refactor vectorstore components and add lazy imports Moved Datastax-related files from vectorstores to a dedicated datastax directory. Added lazy import logic to __init__.py files for chroma, clickhouse, couchbase, elastic, milvus, mongodb, pinecone, qdrant, supabase, upstash, vectara, and weaviate components. Cleaned up vectorstores/__init__.py to only include local and faiss components, improving modularity and import efficiency. * [autofix.ci] apply automated fixes * Refactor vectorstore components structure Moved FAISS, Cassandra, and pgvector components to dedicated subdirectories with lazy-loading __init__.py files. Updated imports and references throughout the backend and frontend to reflect new locations. Removed obsolete datastax Cassandra component. Added new sidebar bundle entries for FAISS, Cassandra, and pgvector in frontend constants and style utilities. * Add lazy imports and Redis chat memory component Refactored the Redis module to support lazy imports for RedisIndexChatMemory and RedisVectorStoreComponent, improving import efficiency. Added a new redis_chat.py file implementing RedisIndexChatMemory for chat message storage and retrieval using Redis. * Fix vector store astra imports * Revert package lock changes * More test fixes * Update test_vector_store_rag.py * Update test_dynamic_imports.py * Update vector_store_rag.py * Update test_dynamic_imports.py * Refactor the cassandra chat component * Fix frontend tests for bundle * Mark Local DB as legacy * Update inputComponent.spec.ts * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Hare <ericrhare@gmail.com> Co-authored-by: Carlos Coelho <80289056+carlosrcoelho@users.noreply.github.com> * feat: add dependencies metadata to starter projects * feat: add caching for packages_distributions to improve performance * refactor: update test descriptions and remove unused imports in metadata tests --------- Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Edwin Jose <edwin.jose@datastax.com> Co-authored-by: Eric Hare <ericrhare@gmail.com> Co-authored-by: Carlos Coelho <80289056+carlosrcoelho@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
⚡️ This pull request contains optimizations for PR #9192
If you approve this dependent PR, these changes will be merged into the original PR branch
add-deps-metadata.📄 7,582% (75.82x) speedup for
_classify_dependencyinsrc/backend/base/langflow/custom/dependency_analyzer.py⏱️ Runtime :
4.77 milliseconds→62.1 microseconds(best of123runs)📝 Explanation and details
Here is a faster version of your
_classify_dependencyfunction. The profiling shows that the real bottlenecks aremd.distribution(dep.name)and accessingdist.version, both of which trigger expensive package metadata resolution.Optimizations:
md.distributiondoes not cache results, this can greatly reduce overhead for repeated package names.importlib.metadatamodule objects once (movemd.distributionand exceptions out to top-level).dep.is_localis true ordep.nameis falsy, skip lookups immediately.Notes:
Optimized code.
Why this is faster:
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-pr9192-2025-07-25T17.43.34and push.