fix: Knowledge Filter Does Not Restrict Document Retrieval Scope#1245
fix: Knowledge Filter Does Not Restrict Document Retrieval Scope#1245Wallgau wants to merge 6 commits into
Conversation
Scope provider credentials and MCP global vars to the selected embedding provider, prune stale provider headers, and retry URL ingestion once after targeted stale-state reconciliation errors. Made-with: Cursor
Include all configured provider credentials in Langflow global vars and add tests for single-provider and multi-provider ingestion scenarios to prevent regressions when documents are embedded with different models. Made-with: Cursor
…not import from openrag
|
|
||
| Used by the Langflow OpenSearch ``raw_search`` path so document scope matches |
There was a problem hiding this comment.
@Wallgau May I know why do we need this file if the change are already merged to component file ?
There was a problem hiding this comment.
reverting changes from last PR and used for unit test but we can remove it since you mention having those changes in another PR.
| @@ -0,0 +1,150 @@ | |||
| from types import SimpleNamespace | |||
| from unittest.mock import AsyncMock, MagicMock, patch | |||
There was a problem hiding this comment.
Lets remove these unit tests.
I am not sure if they are required, if possible lets add a new integration test in another follow up PR.
| ) | ||
|
|
||
| @staticmethod | ||
| def _should_reconcile_url_ingestion_error(error_text: str, selected_provider: str) -> bool: |
There was a problem hiding this comment.
Could you explain the use of these functions ?
There was a problem hiding this comment.
_reconcile_url_ingestion_runtime_state(selected_provider)
When should_reconcile… is true, this runs once before one retry of the same URL ingest request. It tries to reset and realign Langflow with OpenRAG settings:
reset_langflow_flow("url_ingest") (if flows_service supports it) — reload/reset the URL ingest flow so stale graph state is cleared.
change_langflow_model_value(...) — push the current selected_provider, embedding_model, and force_embedding_update=True into the url_ingest flow so globals match the backend config (e.g. stop calling Ollama when the user chose OpenAI).
Both steps are best-effort (exceptions are logged, not fatal); then the code refreshes the flow id and re-POSTs the run. If the retry still fails, _raise_retry_provider_error turns the body into a clearer, provider-oriented error where applicable.
|
|
||
| # Remove stale provider headers not present in the desired set. | ||
| # Keep non-provider headers (JWT/OWNER/etc.) untouched. | ||
| pruned_args: List[str] = [] |
There was a problem hiding this comment.
Cant we just check if the keys are present or not and not pass if the values are none or empty ?
|
@Wallgau May I know if the changes in langflow mcp service and headers requried for fixing the filter changes? |
|
fixed in another PR |
issue: #1130 (check there for before behavior)
Now:

Summary
Applies chat knowledge filters to the Langflow OpenSearch component’s raw_search path so behavior matches search_documents (filter clauses, optional limit / score_threshold, top-level knn scoping). The merge logic lives entirely inside the component (and in exported flow JSON). No imports from OpenRAG src (e.g. no utils.opensearch_filter_merge), so the Langflow image does not need COPY src or PYTHONPATH for this feature.
Motivation
Custom Langflow components run in the Langflow process with imports resolved like normal Python. OpenRAG’s utils package is not part of the Langflow dependency set. Importing it implies putting OpenRAG src on the image and bloating the Langflow Dockerfile. Review feedback: keep helper logic in the component source until a proper upstream Langflow story exists.
What changed
flows/components/opensearch_multimodal.py: Inlined helpers (coerce_filter_clauses_from_filter_obj, merge_filter_clauses_into_search_body, apply_chat_filter_limits_to_body, apply_chat_filter_expression_to_search_body, etc.), raw_search updated to merge filter_expression into the request body with validation errors for bad JSON, _coerce_filter_clauses delegates to the shared module-level helper (includes connector_types → connector_type and empty terms handling).
flows/components/opensearch_filter_merge_standalone.py: Stdlib-only copy of the same logic for unit tests and as the reference to keep in sync with the inline block.
tests/unit/test_opensearch_filter_merge.py: TDD coverage for coerce / merge / limits / end-to-end behavior (37 tests).
flows/ingestion_flow.json, openrag_agent.json, openrag_url_mcp.json, openrag_nudges.json: Embedded component source synced from opensearch_multimodal.py.