feat: custom worker count for IBM models in OpenSearch#10866
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds conditional concurrency control for embeddings by detecting IBM Watsonx models and restricting parallelism to a single worker when identified. Otherwise, standard parallelism calculation min(max(len(texts), 1), 8) is used. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touchesImportant Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (5 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR introduces concurrency control for IBM/Watsonx embedding models to mitigate rate limiting issues during document ingestion in OpenSearch. The change restricts parallel embedding requests to a single worker for IBM models while maintaining the existing multi-threaded approach (up to 8 workers) for other embedding providers.
Key Changes:
- Added detection logic to identify IBM/Watsonx models based on model name or embedding class name
- Set
max_workers=1for IBM/Watsonx models to prevent rate limit errors during parallel embedding generation - Preserved existing concurrency logic for non-IBM models
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| is_ibm = (embedding_model and "ibm" in str(embedding_model).lower()) or ( | ||
| selected_embedding and "watsonx" in type(selected_embedding).__name__.lower() | ||
| ) |
There was a problem hiding this comment.
The detection logic for IBM/Watsonx models relies on substring matching which could be fragile. Consider checking the exact class type instead:
from langchain_ibm import WatsonxEmbeddings
is_ibm = isinstance(selected_embedding, WatsonxEmbeddings)This would be more robust than checking for "watsonx" in the class name, as it would work correctly even if the class is renamed or extended. If you need to support multiple IBM embedding classes, you could check against a tuple of types.
There was a problem hiding this comment.
could be a future use case but currently only for ingestion hence selected embeddings is fine
| # Restrict concurrency for IBM/Watsonx models to avoid rate limits | ||
| is_ibm = (embedding_model and "ibm" in str(embedding_model).lower()) or ( | ||
| selected_embedding and "watsonx" in type(selected_embedding).__name__.lower() | ||
| ) | ||
| max_workers = 1 if is_ibm else min(max(len(texts), 1), 8) |
There was a problem hiding this comment.
The new concurrency restriction logic for IBM/Watsonx models lacks test coverage. Consider adding tests to verify:
- When an IBM/Watsonx embedding is used,
max_workersis set to 1 - When a non-IBM embedding is used,
max_workersfollows the original logic (min(max(len(texts), 1), 8)) - The detection logic correctly identifies IBM models by checking the embedding_model string
- The detection logic correctly identifies Watsonx models by checking the class name
This would help prevent regressions if the detection logic or concurrency settings change in the future.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py (1)
878-887: IBM/Watsonx concurrency guard looks correct; consider small readability tweaksThe
is_ibmcheck andmax_workers = 1branch achieve the PR goal without changing behavior for other providers. A couple of minor, optional nits:
is_ibmandmax_workersdon’t depend on the retry attempt; you could compute them once before thewhileloop to avoid repeating the same logic and make intent clearer.- If you expect more IBM/Watsonx-specific handling later, extracting the detection into a tiny helper (e.g.
_is_ibm_embedding(embedding_model, selected_embedding)) would centralize the heuristic and make it easier to extend (e.g. other IBM model name patterns).No functional changes are required here; these would just tighten clarity and future maintainability.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: CodeQL analysis (python)
- GitHub Check: Agent
- GitHub Check: Update Component Index
- GitHub Check: Update Starter Projects
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #10866 +/- ##
==========================================
- Coverage 32.82% 32.54% -0.29%
==========================================
Files 1367 1370 +3
Lines 64017 63518 -499
Branches 9553 9391 -162
==========================================
- Hits 21014 20672 -342
+ Misses 41943 41806 -137
+ Partials 1060 1040 -20
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
* Update opensearch_multimodal.py * [autofix.ci] apply automated fixes * Update opensearch_multimodal.py * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
This pull request introduces a targeted update to the embedding concurrency logic in
opensearch_multimodal.py. The main change is the restriction of parallel embedding requests for IBM/Watsonx models to avoid hitting rate limits.Concurrency and rate limit handling:
embed_chunk, themax_workersvalue is set to1for IBM/Watsonx models, ensuring only one embedding request is processed at a time and thus reducing the risk of rate limit errors. For other models, the previous logic of up to 8 workers is retained.Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.