Skip to content

feat: opensearch multimodal: support filters, adjust defaults#12319

Merged
HimavarshaVS merged 16 commits into
release-1.9.0from
fix-openserach
Apr 2, 2026
Merged

feat: opensearch multimodal: support filters, adjust defaults#12319
HimavarshaVS merged 16 commits into
release-1.9.0from
fix-openserach

Conversation

@edwinjosechittilappilly
Copy link
Copy Markdown
Collaborator

@edwinjosechittilappilly edwinjosechittilappilly commented Mar 25, 2026

This pull request updates the OpenSearchVectorStoreComponentMultimodalMultiEmbedding class in opensearch_multimodal.py to improve how authentication and filtering are handled. The main changes include stricter input types, defaulting to JWT authentication, altering the default bearer prefix setting, and enhancing the raw_search method to support filter expressions in a more robust way.

Authentication and Input Handling:

  • Changed the default authentication mode from "basic" to "jwt" in the auth_mode dropdown input, making JWT the default for new configurations.
  • Changed the default value of the bearer_prefix boolean input from True to False, so the "Bearer " prefix is not included by default in authentication headers.
  • Limited the accepted input_types for a specific input from ["Data", "JSON"] to just ["Data"], enforcing stricter input validation.

Filtering and Query Improvements:

  • Enhanced the raw_search method to support filter_expression parsing and application, including:

    • Parsing JSON filter expressions and raising clear errors for invalid JSON.
    • Integrating filter clauses into the query body using the same logic as the search() method.
    • Automatically applying limit and score_threshold/scoreThreshold from the filter expression to the query if not already set.

    Testing

image

Update OpenSearch multimodal vector store component to parse and apply filter_expression JSON to search queries, wrapping existing queries in a bool with filter clauses and applying limit (size) and min_score from the filter object when present. Also validate filter_expression JSON and raise a clear ValueError on parse errors. Adjust component inputs and defaults: remove "JSON" from input_types, change default auth_mode to "jwt", and set bearer_prefix default to false. Uses existing _coerce_filter_clauses helper to build filter clauses.
Copilot AI review requested due to automatic review settings March 25, 2026 14:32
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 25, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a02d966c-6b90-40f9-bc01-67cc87197f78

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-openserach

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@edwinjosechittilappilly
Copy link
Copy Markdown
Collaborator Author

Reference PR: langflow-ai/openrag#1247

@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Mar 25, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the OpenSearch multimodal vector store component to tighten input validation, change authentication defaults, and enhance raw_search() so it can apply the same filter_expression semantics as search().

Changes:

  • Restricts docs_metadata input types to ["Data"] only.
  • Switches auth defaults to JWT and disables the Bearer prefix by default.
  • Extends raw_search() to parse/apply filter_expression, including propagating limit and score threshold into the query body.

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 4 comments.

File Description
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py Adjusts input/auth defaults and adds filter-expression support to raw_search().
src/lfx/src/lfx/_assets/component_index.json Updates the component registry snapshot (defaults/code hash) to reflect the new component behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +417 to +427
# Apply filter_expression if configured (same parsing as search())
filter_obj = None
if getattr(self, "filter_expression", "") and self.filter_expression.strip():
try:
filter_obj = json.loads(self.filter_expression)
except json.JSONDecodeError as e:
msg = f"Invalid filter_expression JSON: {e}"
raise ValueError(msg) from e

filter_clauses = self._coerce_filter_clauses(filter_obj)

Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filter_expression is parsed with json.loads(...), but the result isn’t validated to be an object/dict before using .get(...). Valid JSON like [], "foo", or 123 will raise an AttributeError here (and may also break inside _coerce_filter_clauses). Recommendation: after json.loads, enforce isinstance(filter_obj, dict) (otherwise raise a ValueError explaining the expected shape), and consider coercing limit to an int (or rejecting non-integers) before assigning it to query_body["size"].

Copilot uses AI. Check for mistakes.
Comment on lines +445 to +456
if filter_obj:
# Apply limit if not already set in the raw query
if "size" not in query_body:
limit = filter_obj.get("limit")
if limit is not None:
query_body["size"] = limit

# Apply score_threshold / scoreThreshold as min_score if not already set
if "min_score" not in query_body:
score_threshold = filter_obj.get("score_threshold") or filter_obj.get("scoreThreshold")
if isinstance(score_threshold, (int, float)) and score_threshold > 0:
query_body["min_score"] = score_threshold
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filter_expression is parsed with json.loads(...), but the result isn’t validated to be an object/dict before using .get(...). Valid JSON like [], "foo", or 123 will raise an AttributeError here (and may also break inside _coerce_filter_clauses). Recommendation: after json.loads, enforce isinstance(filter_obj, dict) (otherwise raise a ValueError explaining the expected shape), and consider coercing limit to an int (or rejecting non-integers) before assigning it to query_body["size"].

Copilot uses AI. Check for mistakes.
Comment on lines +428 to +443
if filter_clauses:
if "query" in query_body:
original_query = query_body["query"]
query_body["query"] = {
"bool": {
"must": [original_query],
"filter": filter_clauses,
}
}
else:
query_body["query"] = {
"bool": {
"must": [{"match_all": {}}],
"filter": filter_clauses,
}
}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mutates query_body in-place. When raw_query is a dict (or when self.search_query is a dict), query_body likely aliases the caller-owned object; applying filters then permanently changes that dict for subsequent calls and can cause surprising side effects. Recommendation: copy the raw dict before modification (e.g., deep copy if nested structures are expected) so raw_search() operates on an isolated query body.

Copilot uses AI. Check for mistakes.
Comment on lines +417 to +426
# Apply filter_expression if configured (same parsing as search())
filter_obj = None
if getattr(self, "filter_expression", "") and self.filter_expression.strip():
try:
filter_obj = json.loads(self.filter_expression)
except json.JSONDecodeError as e:
msg = f"Invalid filter_expression JSON: {e}"
raise ValueError(msg) from e

filter_clauses = self._coerce_filter_clauses(filter_obj)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filter_expression parsing/validation logic is now duplicated between search() and raw_search(). To reduce drift (e.g., one path adding support for new keys like scoreThreshold while the other doesn’t), consider extracting a shared helper (e.g., _parse_filter_expression() returning (filter_obj, filter_clauses)), and reuse it in both methods.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 25, 2026

Frontend Unit Test Coverage Report

Coverage Summary

Lines Statements Branches Functions
Coverage: 28%
28.01% (29184/104184) 64.76% (3717/5739) 30.03% (689/2294)

Unit Test Results

Tests Skipped Failures Errors Time
3014 0 💤 0 ❌ 0 🔥 4m 7s ⏱️

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 48.88%. Comparing base (389f702) to head (b6ef3e2).
⚠️ Report is 5 commits behind head on release-1.9.0.

❌ Your project status has failed because the head coverage (45.54%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@                Coverage Diff                @@
##           release-1.9.0   #12319      +/-   ##
=================================================
+ Coverage          48.86%   48.88%   +0.01%     
=================================================
  Files               1897     1897              
  Lines             167656   167656              
  Branches           23193    23188       -5     
=================================================
+ Hits               81928    81957      +29     
+ Misses             84817    84789      -28     
+ Partials             911      910       -1     
Flag Coverage Δ
backend 54.78% <ø> (-0.12%) ⬇️
frontend 48.26% <ø> (+0.04%) ⬆️
lfx 45.54% <ø> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 23 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Member

@Cristhianzl Cristhianzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR makes three types of changes to OpenSearchVectorStoreComponentMultimodalMultiEmbedding:

  1. Default changes: auth_mode default "basic""jwt", bearer_prefix default TrueFalse, input_types restricted from ["Data", "JSON"] to ["Data"].
  2. New feature: raw_search() now parses and applies filter_expression (JSON filters, limit, score_threshold/scoreThreshold) — reusing _coerce_filter_clauses() and mirroring search() logic.

Files Changed

File Change Lines
src/lfx/src/lfx/components/elastic/opensearch_multimodal.py Filter support in raw_search, default changes +44 / -3
src/lfx/src/lfx/_assets/component_index.json Reflects new defaults and hash (ignored) +6 / -7

Review by Category


🔴 CRITICAL: Security & PII

Item Notes
No PII in logs No new logging added; existing logger.info(f"query: {query_body}") at line ~462 logs the full query body including filter clauses — pre-existing, not introduced by this PR
No secrets/credentials in code No secrets introduced
No hardcoded API keys N/A
No internal details in error messages to end users ValueError for invalid JSON includes the parse error — acceptable for developer-facing component
JSON injection via filter_expression json.loads() parses the string, then _coerce_filter_clauses() validates structure (only accepts known term/terms patterns). No raw string interpolation into queries

Result: ✅ PASS


🔴 CRITICAL: DRY Principle

Item Notes
No duplicate type definitions No new types added
No duplicate logic ⚠️ The filter-parsing block (lines ~417-425) is a copy-paste of the identical block in search() (lines ~1533-1539): same getattr guard, same json.loads, same ValueError message. This is introduced duplication
No duplicate constants N/A
Shared module reuse _coerce_filter_clauses() is correctly reused — good

Observation: The filter-parsing + _coerce_filter_clauses + bool-wrapping + limit/score_threshold extraction pattern appears in both search() and raw_search(). The parsing and _coerce_filter_clauses call are short (~10 lines) but the full pattern with limit/score_threshold is ~25 lines duplicated. This could be extracted to a helper like _parse_and_apply_filters(query_body).

Result: ⚠️ CONCERN — new duplication introduced. Not blocking, but should be flagged.


🔴 CRITICAL: File Structure Limits

Item Notes
opensearch_multimodal.py line count 2021 lines — far exceeds the 500-line limit. This is pre-existing but the PR adds 41 more lines to an already oversized file
Functions per file ⚠️ 33 methods/functions — very high, pre-existing
Main classes per file 1 class + 2 module-level helpers
Mixed responsibilities ⚠️ Pre-existing: auth, embedding, search, indexing, filtering all in one class

Result: ⚠️ PRE-EXISTING VIOLATION — file was already 1980 lines with 33 functions. PR adds to it but doesn't introduce the violation.


🟠 IMPORTANT: Architecture & Structure

Item Notes
Single Responsibility Changes are scoped to raw_search filter support and default adjustments
Layer Separation No layer violations
No business logic in handlers N/A

Observation: The raw_search filter integration mirrors search() correctly — same parse → coerce → wrap-in-bool flow. The approach is consistent.

Result: ✅ PASS


🟠 IMPORTANT: Code Quality

Item Notes
Strong typing Types maintained
No any/object types N/A
Immutability query_body is mutated in-place but is locally created — acceptable
Clean naming filter_obj, filter_clauses, score_threshold — consistent with search()

Observation (default change risk): Changing auth_mode default from "basic" to "jwt" and bearer_prefix from True to False is a breaking change for users who rely on the current defaults. Existing configurations saved in flows/JSON will keep their stored values, but new component instances will default to JWT. This should be called out in release notes.

Result: ✅ PASS (code quality is fine; default change is a product decision)


🟠 IMPORTANT: Error Handling

Item Notes
No silent failures Invalid JSON raises ValueError with context
Explicit error handling json.JSONDecodeError caught and re-raised as ValueError with message
Meaningful error context f"Invalid filter_expression JSON: {e}" — same pattern as search()

Concern: If filter_expression attribute doesn't exist, getattr(self, "filter_expression", "") returns "" safely. However, if it exists but is None, .strip() will raise AttributeError. The getattr guard with "" default plus the truthy check should prevent this — the and self.filter_expression.strip() short-circuits. This is safe.

Result: ✅ PASS


🟡 RECOMMENDED: Observability

Item Notes
Logging at key points The existing logger.info(f"query: {query_body}") at line ~462 will now include filter clauses — sufficient
No sensitive data in logs ⚠️ Pre-existinglogger.info(f"query: {query_body}") logs the full query body. If filter clauses contain user-identifying data (e.g., owner field values), these go to logs. Not introduced by this PR

Result: ✅ PASS


🟡 RECOMMENDED: Comments

Item Notes
No comments explaining WHAT
Only WHY comments # Apply filter_expression if configured (same parsing as search()) — explains intent and cross-references
No TODO without ticket No TODOs added

Result: ✅ PASS


🟢 TESTING

Item Notes
Unit tests for core logic No tests provided
Happy path covered Missing: raw_search with valid filter_expression
Adversarial/error cases Missing: raw_search with invalid JSON filter_expression
Edge cases Missing: raw_search with empty filter, filter without "query" key in body
Default value changes tested Missing: verify auth_mode defaults to "jwt", bearer_prefix defaults to False

Missing Test Scenarios (required)

Scenario Severity
raw_search with valid filter_expression JSON — verify clauses are injected into query body 🔴 CRITICAL
raw_search with invalid JSON in filter_expression — verify ValueError raised 🔴 CRITICAL
raw_search with filter_expression containing limit — verify size is set 🟡 RECOMMENDED
raw_search with filter_expression containing score_threshold — verify min_score is set 🟡 RECOMMENDED
raw_search with query body that already has "query" key vs one that doesn't — verify both bool-wrap paths 🟡 RECOMMENDED
raw_search with filter_expression but size already in query_body — verify no override 🟡 RECOMMENDED
Default values for auth_mode and bearer_prefix on new component instance 🟡 RECOMMENDED

Result: ❌ FAIL — no tests provided for new functionality


🟢 LEGACY CODE AWARENESS

Item Notes
Not prolonging bad patterns ⚠️ Adds 41 lines to a 2021-line file. The duplication with search() could be avoided with a shared helper
No copy-paste from legacy ⚠️ The filter-parsing block IS copy-pasted from search()

Result: ⚠️ CONCERN — copy-paste detected


Overall Assessment

Score: ⚠️ REQUEST CHANGES

Category
🔴 Security & PII
🔴 DRY ⚠️
🔴 File Structure ⚠️
🟠 Architecture
🟠 Code Quality
🟠 Error Handling
🟡 Observability
🟡 Comments
🟢 Testing
🟢 Legacy Awareness ⚠️

Blocking Issues

  1. No tests — The PR adds ~41 lines of new logic (filter parsing, bool query wrapping, limit/score_threshold extraction) with zero test coverage. At minimum, tests for the happy path and invalid JSON error path are required.

Recommendations (non-blocking)

  1. Extract shared filter logic — The filter-parsing + apply pattern (~25 lines) is duplicated between search() and raw_search(). Extract to a private method like _apply_filter_expression(query_body: dict) -> dict to keep DRY.
  2. Default change documentation — The auth_mode (basicjwt) and bearer_prefix (TrueFalse) changes are breaking for new instances. Document in release notes.
  3. input_types restriction — Removing "JSON" from accepted input types for the table input could break existing flows that feed JSON data. Verify backwards compatibility.
  4. Pre-existing: File is 2021 lines with 33 methods — consider splitting in a future refactor.

# Conflicts:
#	src/lfx/src/lfx/_assets/component_index.json
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Apr 1, 2026
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Apr 1, 2026
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Apr 1, 2026
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Apr 1, 2026
@github-actions github-actions Bot added the enhancement New feature or request label Apr 1, 2026
@HimavarshaVS HimavarshaVS requested a review from Cristhianzl April 1, 2026 10:49
Copy link
Copy Markdown
Member

@Cristhianzl Cristhianzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@github-actions github-actions Bot added the lgtm This PR has been approved by a maintainer label Apr 1, 2026
@HimavarshaVS HimavarshaVS enabled auto-merge April 1, 2026 13:31
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Apr 1, 2026
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Apr 1, 2026
@HimavarshaVS HimavarshaVS added lgtm This PR has been approved by a maintainer and removed lgtm This PR has been approved by a maintainer labels Apr 1, 2026
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Apr 1, 2026
@HimavarshaVS HimavarshaVS added this pull request to the merge queue Apr 1, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to a conflict with the base branch Apr 1, 2026
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Apr 1, 2026
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Apr 1, 2026
@HimavarshaVS HimavarshaVS added this pull request to the merge queue Apr 2, 2026
Merged via the queue into release-1.9.0 with commit 8c08e1b Apr 2, 2026
102 of 103 checks passed
@HimavarshaVS HimavarshaVS deleted the fix-openserach branch April 2, 2026 13:29
Adam-Aghili pushed a commit that referenced this pull request Apr 15, 2026
* opensearch multimodal: support filters, adjust defaults

Update OpenSearch multimodal vector store component to parse and apply filter_expression JSON to search queries, wrapping existing queries in a bool with filter clauses and applying limit (size) and min_score from the filter object when present. Also validate filter_expression JSON and raise a clear ValueError on parse errors. Adjust component inputs and defaults: remove "JSON" from input_types, change default auth_mode to "jwt", and set bearer_prefix default to false. Uses existing _coerce_filter_clauses helper to build filter clauses.

* Update component_index.json

* [autofix.ci] apply automated fixes

* [autofix.ci] apply automated fixes

* resolve review comments

* [autofix.ci] apply automated fixes

* fix ruff errors

* [autofix.ci] apply automated fixes

* fix ruff errors

* [autofix.ci] apply automated fixes

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: himavarshagoutham <himavarshajan17@gmail.com>
Co-authored-by: Himavarsha <40851462+HimavarshaVS@users.noreply.github.com>
Adam-Aghili pushed a commit that referenced this pull request Apr 15, 2026
* opensearch multimodal: support filters, adjust defaults

Update OpenSearch multimodal vector store component to parse and apply filter_expression JSON to search queries, wrapping existing queries in a bool with filter clauses and applying limit (size) and min_score from the filter object when present. Also validate filter_expression JSON and raise a clear ValueError on parse errors. Adjust component inputs and defaults: remove "JSON" from input_types, change default auth_mode to "jwt", and set bearer_prefix default to false. Uses existing _coerce_filter_clauses helper to build filter clauses.

* Update component_index.json

* [autofix.ci] apply automated fixes

* [autofix.ci] apply automated fixes

* resolve review comments

* [autofix.ci] apply automated fixes

* fix ruff errors

* [autofix.ci] apply automated fixes

* fix ruff errors

* [autofix.ci] apply automated fixes

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: himavarshagoutham <himavarshajan17@gmail.com>
Co-authored-by: Himavarsha <40851462+HimavarshaVS@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request lgtm This PR has been approved by a maintainer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants