fix: remove mcp url ingest code path#1489
Conversation
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
This PR removes the MCP/URL-based ingestion code path and related configuration, shifting default OpenRAG docs ingestion to be strictly file-based and updating prompts/flows/docs to reflect that URL ingestion is disabled.
Changes:
- Remove URL-ingestion processors/services/config/env/helm wiring (including Langflow MCP service and URL ingest flow IDs).
- Simplify default docs ingestion/refresh logic to only support packaged file ingestion (manual refresh remains force-based).
- Centralize and update the default system prompt to explicitly disable URL ingestion; update agent flow JSON and documentation accordingly.
Reviewed changes
Copilot reviewed 31 out of 32 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_settings_refresh_endpoint.py | Updates tests to satisfy new models_service dependency for refresh endpoint. |
| tests/unit/test_settings_async_post_save.py | Updates async post-save test expectations after MCP server update removal. |
| tests/unit/test_main_docs_signature.py | Removes tests for deleted remote-docs signature logic. |
| src/utils/telemetry/message_id.py | Removes telemetry IDs specific to default URL docs ingestion. |
| src/utils/langflow_headers.py | Updates docstring and removes MCP global-vars builder helper. |
| src/tui/managers/env_manager.py | Removes URL ingest flow ID from TUI env config and .env generation. |
| src/services/task_service.py | Removes Langflow URL upload task factory. |
| src/services/langflow_mcp_service.py | Deletes Langflow MCP server patch/update implementation. |
| src/services/langflow_file_service.py | Removes Langflow URL ingestion flow runner and URL flow auto-heal logic. |
| src/services/flows_service.py | Removes URL ingest flow from backup/reset/ensure/check/model-value update paths. |
| src/services/auth_service.py | Removes login-time best-effort MCP server updates and related task tracking. |
| src/models/url.py | Deletes URL ingestion task processor implementation. |
| src/models/processors.py | Removes leftover import of the deleted URL processor. |
| src/main.py | Removes URL-based default docs ingestion/refresh/signature code; standardizes default docs connector type to openrag_docs. |
| src/config/settings.py | Removes DEFAULT_DOCS_URL/CRAWL_DEPTH/startup-fetch envs; defaults docs source to files. |
| src/config/config_manager.py | Introduces DEFAULT_SYSTEM_PROMPT constant; swaps AgentConfig default prompt and adds stale prompt rewrite logic. |
| src/api/settings.py | Removes MCP server update hooks and URL-signature bookkeeping during onboarding/settings updates. |
| src/agent.py | Uses backend DEFAULT_SYSTEM_PROMPT constant for conversation thread system message. |
| kubernetes/helm/openrag/values.yaml | Removes URL ingest flow JSON and URL-specific defaults from chart values. |
| kubernetes/helm/openrag/templates/langflow/langflow-dotenv.yaml | Removes CONNECTOR_TYPE_URL from Langflow dotenv secret template. |
| kubernetes/helm/openrag/templates/configmaps/flow-ids-configmap.yaml | Removes url-ingest flow id entry. |
| kubernetes/helm/openrag/templates/backend/backend-dotenv.yaml | Removes LANGFLOW_URL_INGEST_FLOW_ID wiring from backend dotenv secret template. |
| frontend/lib/constants.ts | Adds frontend DEFAULT_SYSTEM_PROMPT constant and references it in default settings. |
| flows/openrag_agent.json | Removes MCP URL ingestion tool component/edge and updates README/prompt text to reflect URL ingestion disabled. |
| docs/docs/reference/configuration.mdx | Removes LANGFLOW_URL_INGEST_FLOW_ID from documented config variables. |
| docs/docs/core-components/ingestion.mdx | Updates ingestion docs to state URL ingestion is disabled and removes URL-ingestion instructions. |
| docs/docs/core-components/chat.mdx | Updates chat docs to remove URL-fetch/MCP-tool narrative and reflect current toolset. |
| docs/docs/core-components/agents.mdx | Removes references to the URL ingestion flow from built-in flows list. |
| docker-compose.yml | Removes URL ingest flow ID and CONNECTOR_TYPE_URL env vars from compose setup. |
| .env.example | Removes URL-ingestion-related env vars and documents file-based default docs ingestion. |
Comments suppressed due to low confidence (1)
src/main.py:540
- _delete_existing_default_docs filters deletion by a single connector_type value. This PR changes default-doc ingestion to use connector_type="openrag_docs" (previously "local"), so upgrade reingestion may leave older default docs behind and create duplicates. Consider deleting by owner_email + is_sample_data, or matching both legacy and new connector types (e.g., terms query for ["local","openrag_docs"]).
# Default docs may be owned by the anonymous onboarding user.
{
"bool": {
"must": [
{"term": {"connector_type": connector_type}},
{"term": {"owner_email": anonymous_user.email}},
]
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Default OpenRAG docs sample ingestion source. | ||
| # URL ingestion is disabled; use packaged files from the openrag-documents directory. | ||
| DEFAULT_DOCS_INGEST_SOURCE = os.getenv("DEFAULT_DOCS_INGEST_SOURCE", "files").lower() | ||
|
|
There was a problem hiding this comment.
DEFAULT_DOCS_INGEST_SOURCE still accepts arbitrary env values (including "url") even though URL ingestion is now disabled elsewhere. That can leave the app logging ingest_source="url" while behaving as file-based ingestion, which is confusing operationally. Consider clamping/validating to "files" (and logging a warning when another value is provided).
|
|
||
| def __post_init__(self): | ||
| stale_url_markers = ("URL " + "Ingestion Tool", "URL " + "Ingestion Rules") | ||
| if any(marker in self.system_prompt for marker in stale_url_markers): |
There was a problem hiding this comment.
AgentConfig.post_init overwrites any user-provided system_prompt that happens to mention "URL Ingestion Tool"/"URL Ingestion Rules". This can unintentionally discard legitimate custom prompts (e.g., docs or guardrails that reference URLs) and makes prompt changes hard to persist. Prefer migrating only when the prompt exactly matches (or is a known hash of) the previous default prompt, or gate the rewrite behind an explicit config version/migration flag.
| def __post_init__(self): | |
| stale_url_markers = ("URL " + "Ingestion Tool", "URL " + "Ingestion Rules") | |
| if any(marker in self.system_prompt for marker in stale_url_markers): | |
| migrate_legacy_system_prompt: bool = False | |
| def __post_init__(self): | |
| stale_url_markers = ("URL " + "Ingestion Tool", "URL " + "Ingestion Rules") | |
| if self.migrate_legacy_system_prompt and any( | |
| marker in self.system_prompt for marker in stale_url_markers | |
| ): |
|
probably close in favor of #1474 |
This comment has been minimized.
This comment has been minimized.
|
Build successful! ✅ |
|
@phact I think we can close this one |
No description provided.