Skip to content

fix: remove mcp url ingest code path#1489

Open
phact wants to merge 2 commits into
release-saas-0.1from
disable-url-ingest-release-saas-0.1
Open

fix: remove mcp url ingest code path#1489
phact wants to merge 2 commits into
release-saas-0.1from
disable-url-ingest-release-saas-0.1

Conversation

@phact
Copy link
Copy Markdown
Collaborator

@phact phact commented Apr 28, 2026

No description provided.

@github-actions github-actions Bot added documentation 📘 Improvements or additions to documentation frontend 🟨 Issues related to the UI/UX backend 🔷 Issues related to backend services (OpenSearch, Langflow, APIs) docker tests labels Apr 28, 2026
@phact phact changed the title remove mcp url ingest code path fix: remove mcp url ingest code path Apr 28, 2026
@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the MCP/URL-based ingestion code path and related configuration, shifting default OpenRAG docs ingestion to be strictly file-based and updating prompts/flows/docs to reflect that URL ingestion is disabled.

Changes:

  • Remove URL-ingestion processors/services/config/env/helm wiring (including Langflow MCP service and URL ingest flow IDs).
  • Simplify default docs ingestion/refresh logic to only support packaged file ingestion (manual refresh remains force-based).
  • Centralize and update the default system prompt to explicitly disable URL ingestion; update agent flow JSON and documentation accordingly.

Reviewed changes

Copilot reviewed 31 out of 32 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/unit/test_settings_refresh_endpoint.py Updates tests to satisfy new models_service dependency for refresh endpoint.
tests/unit/test_settings_async_post_save.py Updates async post-save test expectations after MCP server update removal.
tests/unit/test_main_docs_signature.py Removes tests for deleted remote-docs signature logic.
src/utils/telemetry/message_id.py Removes telemetry IDs specific to default URL docs ingestion.
src/utils/langflow_headers.py Updates docstring and removes MCP global-vars builder helper.
src/tui/managers/env_manager.py Removes URL ingest flow ID from TUI env config and .env generation.
src/services/task_service.py Removes Langflow URL upload task factory.
src/services/langflow_mcp_service.py Deletes Langflow MCP server patch/update implementation.
src/services/langflow_file_service.py Removes Langflow URL ingestion flow runner and URL flow auto-heal logic.
src/services/flows_service.py Removes URL ingest flow from backup/reset/ensure/check/model-value update paths.
src/services/auth_service.py Removes login-time best-effort MCP server updates and related task tracking.
src/models/url.py Deletes URL ingestion task processor implementation.
src/models/processors.py Removes leftover import of the deleted URL processor.
src/main.py Removes URL-based default docs ingestion/refresh/signature code; standardizes default docs connector type to openrag_docs.
src/config/settings.py Removes DEFAULT_DOCS_URL/CRAWL_DEPTH/startup-fetch envs; defaults docs source to files.
src/config/config_manager.py Introduces DEFAULT_SYSTEM_PROMPT constant; swaps AgentConfig default prompt and adds stale prompt rewrite logic.
src/api/settings.py Removes MCP server update hooks and URL-signature bookkeeping during onboarding/settings updates.
src/agent.py Uses backend DEFAULT_SYSTEM_PROMPT constant for conversation thread system message.
kubernetes/helm/openrag/values.yaml Removes URL ingest flow JSON and URL-specific defaults from chart values.
kubernetes/helm/openrag/templates/langflow/langflow-dotenv.yaml Removes CONNECTOR_TYPE_URL from Langflow dotenv secret template.
kubernetes/helm/openrag/templates/configmaps/flow-ids-configmap.yaml Removes url-ingest flow id entry.
kubernetes/helm/openrag/templates/backend/backend-dotenv.yaml Removes LANGFLOW_URL_INGEST_FLOW_ID wiring from backend dotenv secret template.
frontend/lib/constants.ts Adds frontend DEFAULT_SYSTEM_PROMPT constant and references it in default settings.
flows/openrag_agent.json Removes MCP URL ingestion tool component/edge and updates README/prompt text to reflect URL ingestion disabled.
docs/docs/reference/configuration.mdx Removes LANGFLOW_URL_INGEST_FLOW_ID from documented config variables.
docs/docs/core-components/ingestion.mdx Updates ingestion docs to state URL ingestion is disabled and removes URL-ingestion instructions.
docs/docs/core-components/chat.mdx Updates chat docs to remove URL-fetch/MCP-tool narrative and reflect current toolset.
docs/docs/core-components/agents.mdx Removes references to the URL ingestion flow from built-in flows list.
docker-compose.yml Removes URL ingest flow ID and CONNECTOR_TYPE_URL env vars from compose setup.
.env.example Removes URL-ingestion-related env vars and documents file-based default docs ingestion.
Comments suppressed due to low confidence (1)

src/main.py:540

  • _delete_existing_default_docs filters deletion by a single connector_type value. This PR changes default-doc ingestion to use connector_type="openrag_docs" (previously "local"), so upgrade reingestion may leave older default docs behind and create duplicates. Consider deleting by owner_email + is_sample_data, or matching both legacy and new connector types (e.g., terms query for ["local","openrag_docs"]).
                    # Default docs may be owned by the anonymous onboarding user.
                    {
                        "bool": {
                            "must": [
                                {"term": {"connector_type": connector_type}},
                                {"term": {"owner_email": anonymous_user.email}},
                            ]

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/config/settings.py
Comment on lines +80 to 83
# Default OpenRAG docs sample ingestion source.
# URL ingestion is disabled; use packaged files from the openrag-documents directory.
DEFAULT_DOCS_INGEST_SOURCE = os.getenv("DEFAULT_DOCS_INGEST_SOURCE", "files").lower()

Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEFAULT_DOCS_INGEST_SOURCE still accepts arbitrary env values (including "url") even though URL ingestion is now disabled elsewhere. That can leave the app logging ingest_source="url" while behaving as file-based ingestion, which is confusing operationally. Consider clamping/validating to "files" (and logging a warning when another value is provided).

Copilot uses AI. Check for mistakes.
Comment thread src/config/config_manager.py Outdated
Comment on lines +139 to +142

def __post_init__(self):
stale_url_markers = ("URL " + "Ingestion Tool", "URL " + "Ingestion Rules")
if any(marker in self.system_prompt for marker in stale_url_markers):
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AgentConfig.post_init overwrites any user-provided system_prompt that happens to mention "URL Ingestion Tool"/"URL Ingestion Rules". This can unintentionally discard legitimate custom prompts (e.g., docs or guardrails that reference URLs) and makes prompt changes hard to persist. Prefer migrating only when the prompt exactly matches (or is a known hash of) the previous default prompt, or gate the rewrite behind an explicit config version/migration flag.

Suggested change
def __post_init__(self):
stale_url_markers = ("URL " + "Ingestion Tool", "URL " + "Ingestion Rules")
if any(marker in self.system_prompt for marker in stale_url_markers):
migrate_legacy_system_prompt: bool = False
def __post_init__(self):
stale_url_markers = ("URL " + "Ingestion Tool", "URL " + "Ingestion Rules")
if self.migrate_legacy_system_prompt and any(
marker in self.system_prompt for marker in stale_url_markers
):

Copilot uses AI. Check for mistakes.
@phact
Copy link
Copy Markdown
Collaborator Author

phact commented Apr 28, 2026

probably close in favor of #1474

@phact phact closed this Apr 28, 2026
@phact phact reopened this Apr 28, 2026
@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 28, 2026

Build successful! ✅
Deploying docs draft.
Deploy successful! View draft

@lucaseduoli
Copy link
Copy Markdown
Collaborator

@phact I think we can close this one

@mpawlow mpawlow removed their request for review May 5, 2026 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend 🔷 Issues related to backend services (OpenSearch, Langflow, APIs) docker documentation 📘 Improvements or additions to documentation frontend 🟨 Issues related to the UI/UX tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants