From be6d3bc7adda27f5f3b5e6d3f696c0d655055c4c Mon Sep 17 00:00:00 2001 From: Superset Dev Date: Thu, 16 Apr 2026 19:36:15 -0700 Subject: [PATCH 1/5] docs(mcp): update MCP server docs for 6.1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add new tools to user guide: list_databases, get_database_info, create_virtual_dataset, save_sql_query, get_chart_type_schema - Add preview-first workflow tip with 3-step iterative pattern - Add Browse Databases and Create Virtual Datasets capability sections - Add RBAC enforcement table mapping tools to FAB permissions - Add Audit Log section (Settings → Action Log) - Add Middleware Pipeline table (7 layers with purpose/default) - Add Error Sanitization section (what gets redacted and why) - Add Performance section: connection pooling math, response caching guide - Add Tool Search config section with 85% token savings context - Add MCP_RBAC_ENABLED, MCP_PARSE_REQUEST_ENABLED, MCP_USER_RESOLVER to config reference - Add MCP_TOOL_SEARCH_CONFIG to config reference Co-Authored-By: Claude Sonnet 4.6 --- docs/admin_docs/configuration/mcp-server.mdx | 132 ++++++++++++++++++ .../using-superset/using-ai-with-superset.mdx | 37 ++++- 2 files changed, 168 insertions(+), 1 deletion(-) diff --git a/docs/admin_docs/configuration/mcp-server.mdx b/docs/admin_docs/configuration/mcp-server.mdx index 1475f3d6468e..7c243724b677 100644 --- a/docs/admin_docs/configuration/mcp-server.mdx +++ b/docs/admin_docs/configuration/mcp-server.mdx @@ -501,6 +501,7 @@ All MCP settings go in `superset_config.py`. Defaults are defined in `superset/m | `MCP_SERVICE_URL` | `None` | Public base URL for MCP-generated links (set this when behind a reverse proxy) | | `MCP_DEBUG` | `False` | Enable debug logging | | `MCP_DEV_USERNAME` | -- | Superset username for development mode (no auth) | +| `MCP_RBAC_ENABLED` | `True` | Enforce Superset's role-based access control on MCP tool calls. When `True`, each tool checks that the authenticated user has the required FAB permission before executing. Disable only for testing or trusted-network deployments. | | `MCP_PARSE_REQUEST_ENABLED` | `True` | Pre-parse MCP tool inputs from JSON strings into objects. Set to `False` for clients (Claude Desktop, LangChain) that do not double-serialize arguments — this produces cleaner tool schemas for those clients | ### Authentication @@ -517,6 +518,7 @@ All MCP settings go in `superset_config.py`. Defaults are defined in `superset/m | `MCP_REQUIRED_SCOPES` | `[]` | Required JWT scopes | | `MCP_JWT_DEBUG_ERRORS` | `False` | Log detailed JWT errors server-side (never exposed in HTTP responses per RFC 6750) | | `MCP_AUTH_FACTORY` | `None` | Custom auth provider factory `(flask_app) -> auth_provider`. Takes precedence over built-in JWT | +| `MCP_USER_RESOLVER` | `None` | Custom function `(app, access_token) -> username` to extract a Superset username from a validated JWT token. When `None`, the default resolver checks `preferred_username`, `username`, `email`, and `sub` claims in that order. | ### Response Size Guard @@ -600,6 +602,41 @@ MCP_STORE_CONFIG = { | `event_store_max_events` | `100` | Maximum events retained per session | | `event_store_ttl` | `3600` | Event TTL in seconds | +### Tool Search + +By default the MCP server exposes a lightweight tool-search interface instead of advertising every tool at once. This reduces the initial context sent to the LLM by ~70%, which lowers cost and latency. The AI client discovers tools on demand by calling `search_tools` and then invokes them via `call_tool`. + +```python +MCP_TOOL_SEARCH_CONFIG = { + "enabled": True, + "strategy": "bm25", # "bm25" (natural language) or "regex" + "max_results": 5, + "always_visible": [ # Tools always listed (pinned) + "health_check", + "get_instance_info", + ], + "search_tool_name": "search_tools", + "call_tool_name": "call_tool", + "compact_schemas": True, # Strip $defs in search results to save tokens + "max_description_length": 300, +} +``` + +| Key | Default | Description | +|-----|---------|-------------| +| `enabled` | `True` | Enable tool search. When `False`, all tools are listed upfront | +| `strategy` | `"bm25"` | Search ranking algorithm. `"bm25"` supports natural language; `"regex"` supports pattern matching | +| `max_results` | `5` | Maximum tools returned per search query | +| `always_visible` | See above | Tools that always appear in `list_tools`, regardless of search | +| `compact_schemas` | `True` | Strip `$defs` from search results to reduce token cost. Full schemas are used when the tool is actually called | +| `max_description_length` | `300` | Truncate tool descriptions in search results (0 = no truncation) | + +:::tip +Set `enabled: False` to revert to the traditional "show all tools at once" behavior, which some clients or workflows may prefer. +::: + +Tool search reduces the initial token cost from ~15–20K tokens (full catalog) down to ~4–5K tokens (pinned tools + search interface) — roughly 85% savings at the start of each conversation. + ### Session & CSRF These values are flat-merged into the Flask app config used by the MCP server process: @@ -621,6 +658,101 @@ MCP_CSRF_CONFIG = { --- +## Access Control + +### RBAC Enforcement + +The MCP server respects Superset's full role-based access control (RBAC). Every authenticated user can only access the data and operations their Superset roles permit — the same rules that apply in the Superset UI apply through MCP. + +Each tool declares one or more required FAB permissions. The table below maps tool groups to their permission requirements: + +| Tool group | Required FAB permission | +|------------|------------------------| +| `list_charts`, `get_chart_info`, `get_chart_data`, `get_chart_preview`, `generate_chart`, `update_chart` | `can_read` on `Chart` (read), `can_write` on `Chart` (mutate) | +| `list_dashboards`, `get_dashboard_info`, `generate_dashboard`, `add_chart_to_existing_dashboard` | `can_read` on `Dashboard` (read), `can_write` on `Dashboard` (mutate) | +| `list_datasets`, `get_dataset_info`, `create_virtual_dataset` | `can_read` on `Dataset` (read), `can_write` on `Dataset` (mutate) | +| `list_databases`, `get_database_info` | `can_read` on `Database` | +| `execute_sql` | `can_execute_sql_query` on `SQLLab` | +| `open_sql_lab_with_context`, `save_sql_query` | `can_read` on `SQLLab` | +| `health_check` | None (public) | + +To disable RBAC checking globally (for trusted-network deployments or testing), set: + +```python +# superset_config.py +MCP_RBAC_ENABLED = False +``` + +:::warning +Disabling RBAC removes all permission checks from MCP tool calls. Only do this on isolated, internal deployments where all MCP users are trusted admins. +::: + +### Audit Log + +All MCP tool calls are recorded in Superset's action log. You can view them at **Settings → Action Log** (admin only). Each log entry records: + +- The tool name (e.g., `mcp.generate_chart.db_write`) +- The authenticated user +- A timestamp + +This makes MCP activity fully auditable alongside regular Superset activity. The action log uses the same event logger as the rest of Superset, so existing log ingestion pipelines (e.g., sending logs to Elasticsearch or a SIEM) capture MCP events automatically. + +### Middleware Pipeline + +Every MCP request passes through a fixed middleware stack before reaching the tool function. The layers run in this order: + +| Middleware | Purpose | Default | +|------------|---------|---------| +| `RateLimitMiddleware` | Sliding-window rate limiting (Redis or in-memory) | Disabled | +| `ResponseSizeGuardMiddleware` | Estimates token count, warns at 80% of limit, blocks at limit | Enabled | +| `FieldPermissionsMiddleware` | Strips response fields the authenticated user lacks access to | Enabled | +| `LoggingMiddleware` | Logs each tool call with user, parameters, and duration | Enabled | +| `GlobalErrorHandlerMiddleware` | Catches unhandled exceptions and sanitizes sensitive data before it reaches the client | Enabled | +| `PrivateToolMiddleware` | Blocks invocation of tools tagged as private | Enabled | +| `StructuredContentStripperMiddleware` | Strips `structuredContent` from responses for Claude.ai bridge compatibility | Enabled | + +### Error Sanitization + +The `GlobalErrorHandlerMiddleware` automatically redacts sensitive information from all error messages before they reach the LLM client. The following are replaced with generic messages: + +- **Database connection strings** — replaced with a generic connection error message +- **API keys and tokens** — redacted from error traces +- **File system paths** — stripped to prevent information disclosure +- **IP addresses** — removed from error context + +This ensures that a misconfigured database connection or an unexpected exception never leaks credentials or internal topology to the LLM or its users. All regex patterns used for redaction are bounded to prevent ReDoS attacks. + +--- + +## Performance + +### Connection Pooling + +Each MCP server process maintains its own SQLAlchemy connection pool to the database. For multi-worker deployments, total open connections = **workers × pool size**. + +```python +# superset_config.py +SQLALCHEMY_POOL_SIZE = 5 +SQLALCHEMY_MAX_OVERFLOW = 10 +SQLALCHEMY_POOL_TIMEOUT = 30 +SQLALCHEMY_POOL_RECYCLE = 3600 # Recycle connections after 1 hour +``` + +For a 3-pod Kubernetes deployment with the defaults above, expect up to 3 × (5 + 10) = 45 connections. Size your database's `max_connections` accordingly. + +### Response Caching + +Enable response caching for read-heavy workloads (dashboards/datasets that don't change frequently). With the in-memory backend (default when `MCP_STORE_CONFIG` is disabled), caching is per-process. Use Redis-backed caching for consistent cache hits across multiple pods: + +```python +MCP_CACHE_CONFIG = {"enabled": True, "call_tool_ttl": 3600} +MCP_STORE_CONFIG = {"enabled": True, "CACHE_REDIS_URL": "redis://redis:6379/0"} +``` + +Mutating tools (`generate_chart`, `update_chart`, `execute_sql`, `generate_dashboard`) are always excluded from caching regardless of this setting. + +--- + ## Troubleshooting ### Server won't start diff --git a/docs/docs/using-superset/using-ai-with-superset.mdx b/docs/docs/using-superset/using-ai-with-superset.mdx index 67becfd9482e..b419ccc92691 100644 --- a/docs/docs/using-superset/using-ai-with-superset.mdx +++ b/docs/docs/using-superset/using-ai-with-superset.mdx @@ -55,9 +55,10 @@ Ask your AI assistant to browse what's available in your Superset instance: Describe the visualization you want and AI creates it for you: +- **Preview-first workflow** -- by default AI generates an Explore link so you can review the chart before it is saved. Say "save it" to commit permanently - **Create charts from natural language** -- describe what you want to see and AI picks the right chart type, metrics, and dimensions - **Preview before saving** -- `generate_chart` defaults to `save_chart=False`, showing the chart in Explore before it's committed. Ask AI to save once you're satisfied. -- **Modify existing charts** -- `update_chart` also supports preview mode so you can review changes before saving +- **Modify existing charts** -- `update_chart` also supports preview mode so you can review changes before saving (update filters, change chart types, add metrics) - **Get Explore links** -- open any chart in Superset's Explore view for further refinement **Example prompts:** @@ -65,6 +66,16 @@ Describe the visualization you want and AI creates it for you: > "Update chart 42 to use a line chart instead" > "Give me a link to explore this chart further" +:::tip Preview-first workflow +Charts are **not saved by default**. The workflow is intentionally iterative: + +1. **Explore** — AI generates an Explore link so you can see the chart before it exists in Superset +2. **Iterate** — ask the AI to adjust the chart; changes are previewed without touching the database +3. **Save** — when you're happy, say "save it" and the chart is permanently stored + +To skip the preview and save immediately, include "and save it" in your prompt. +::: + ### Create Dashboards Build dashboards from a collection of charts: @@ -76,16 +87,40 @@ Build dashboards from a collection of charts: > "Create a dashboard called 'Q4 Sales Overview' with charts 10, 15, and 22" > "Add the revenue trend chart to the executive dashboard" +### Browse Databases + +Discover what database connections are configured in your Superset instance: + +- **List databases** -- see all database connections you have access to +- **Get database details** -- name, backend type (PostgreSQL, Snowflake, etc.), and connection status + +**Example prompts:** +> "What databases are connected to Superset?" +> "Show me details about the data warehouse connection" + +### Create Virtual Datasets + +Build ad-hoc SQL datasets that can be used as the basis for charts: + +- **Create virtual datasets** -- write a SQL query and save it as a reusable dataset +- **Use immediately in charts** -- the returned dataset ID can be passed directly to chart creation + +**Example prompts:** +> "Create a dataset from: SELECT region, SUM(revenue) as total_revenue FROM orders GROUP BY region" +> "Make a virtual dataset called 'monthly_signups' from the users table filtered to last 12 months" + ### Run SQL Queries Execute SQL directly through your AI assistant: - **Run queries** -- execute SQL with full Superset RBAC enforcement (you can only query data your roles allow) - **Open SQL Lab** -- get a link to SQL Lab pre-populated with a query, ready to run and explore +- **Save queries** -- save a SQL query to SQL Lab's Saved Queries for later reuse **Example prompts:** > "Run this query: SELECT region, SUM(revenue) FROM sales GROUP BY region" > "Open SQL Lab with a query to show the top 10 customers by order count" +> "Save this query as 'Weekly Revenue Report'" ### Analyze Chart Data From 76e3fb39cb53adecfc9173d0f79834336800e271 Mon Sep 17 00:00:00 2001 From: Evan Rusackas Date: Thu, 23 Apr 2026 01:20:19 -0700 Subject: [PATCH 2/5] address review: remove unsupported MCP_PARSE_REQUEST_ENABLED config row --- docs/admin_docs/configuration/mcp-server.mdx | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/admin_docs/configuration/mcp-server.mdx b/docs/admin_docs/configuration/mcp-server.mdx index 7c243724b677..d287f8b09a10 100644 --- a/docs/admin_docs/configuration/mcp-server.mdx +++ b/docs/admin_docs/configuration/mcp-server.mdx @@ -502,7 +502,6 @@ All MCP settings go in `superset_config.py`. Defaults are defined in `superset/m | `MCP_DEBUG` | `False` | Enable debug logging | | `MCP_DEV_USERNAME` | -- | Superset username for development mode (no auth) | | `MCP_RBAC_ENABLED` | `True` | Enforce Superset's role-based access control on MCP tool calls. When `True`, each tool checks that the authenticated user has the required FAB permission before executing. Disable only for testing or trusted-network deployments. | -| `MCP_PARSE_REQUEST_ENABLED` | `True` | Pre-parse MCP tool inputs from JSON strings into objects. Set to `False` for clients (Claude Desktop, LangChain) that do not double-serialize arguments — this produces cleaner tool schemas for those clients | ### Authentication From f374571813e49ddcf43e0cebb54b7d85d915490b Mon Sep 17 00:00:00 2001 From: Evan Rusackas Date: Thu, 23 Apr 2026 01:20:47 -0700 Subject: [PATCH 3/5] address review: align middleware pipeline docs with actual default stack --- docs/admin_docs/configuration/mcp-server.mdx | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/admin_docs/configuration/mcp-server.mdx b/docs/admin_docs/configuration/mcp-server.mdx index d287f8b09a10..27e74cff0d56 100644 --- a/docs/admin_docs/configuration/mcp-server.mdx +++ b/docs/admin_docs/configuration/mcp-server.mdx @@ -698,17 +698,17 @@ This makes MCP activity fully auditable alongside regular Superset activity. The ### Middleware Pipeline -Every MCP request passes through a fixed middleware stack before reaching the tool function. The layers run in this order: +Every MCP request passes through a middleware stack before reaching the tool function. The default stack (assembled in `build_middleware_list()` in `server.py`) is: | Middleware | Purpose | Default | |------------|---------|---------| -| `RateLimitMiddleware` | Sliding-window rate limiting (Redis or in-memory) | Disabled | -| `ResponseSizeGuardMiddleware` | Estimates token count, warns at 80% of limit, blocks at limit | Enabled | -| `FieldPermissionsMiddleware` | Strips response fields the authenticated user lacks access to | Enabled | +| `StructuredContentStripperMiddleware` | Strips `structuredContent` from responses for Claude.ai bridge compatibility | Enabled | | `LoggingMiddleware` | Logs each tool call with user, parameters, and duration | Enabled | | `GlobalErrorHandlerMiddleware` | Catches unhandled exceptions and sanitizes sensitive data before it reaches the client | Enabled | -| `PrivateToolMiddleware` | Blocks invocation of tools tagged as private | Enabled | -| `StructuredContentStripperMiddleware` | Strips `structuredContent` from responses for Claude.ai bridge compatibility | Enabled | +| `ResponseSizeGuardMiddleware` | Estimates token count, warns at 80% of limit, blocks at limit | Enabled (configurable via `MCP_RESPONSE_SIZE_CONFIG`) | +| `ResponseCachingMiddleware` | Caches read-heavy tool responses (in-memory or Redis) | Disabled (enable via `MCP_CACHE_CONFIG`) | + +Additional middleware classes (`RateLimitMiddleware`, `FieldPermissionsMiddleware`, `PrivateToolMiddleware`) are implemented in `superset/mcp_service/middleware.py` but are not added to the default pipeline. They are available for operators who want to layer them in via a custom startup path. ### Error Sanitization From 748f3e8abf37a1d9ccdf7225441f626e06329092 Mon Sep 17 00:00:00 2001 From: Evan Rusackas Date: Thu, 23 Apr 2026 01:21:10 -0700 Subject: [PATCH 4/5] address review: correct save_sql_query RBAC requirement to SavedQuery write --- docs/admin_docs/configuration/mcp-server.mdx | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/admin_docs/configuration/mcp-server.mdx b/docs/admin_docs/configuration/mcp-server.mdx index 27e74cff0d56..626a202e9e89 100644 --- a/docs/admin_docs/configuration/mcp-server.mdx +++ b/docs/admin_docs/configuration/mcp-server.mdx @@ -672,7 +672,8 @@ Each tool declares one or more required FAB permissions. The table below maps to | `list_datasets`, `get_dataset_info`, `create_virtual_dataset` | `can_read` on `Dataset` (read), `can_write` on `Dataset` (mutate) | | `list_databases`, `get_database_info` | `can_read` on `Database` | | `execute_sql` | `can_execute_sql_query` on `SQLLab` | -| `open_sql_lab_with_context`, `save_sql_query` | `can_read` on `SQLLab` | +| `open_sql_lab_with_context` | `can_read` on `SQLLab` | +| `save_sql_query` | `can_write` on `SavedQuery` | | `health_check` | None (public) | To disable RBAC checking globally (for trusted-network deployments or testing), set: From 6be8aa53801b1016757f3600ead0dd44daa9af61 Mon Sep 17 00:00:00 2001 From: Evan Rusackas Date: Thu, 23 Apr 2026 01:21:32 -0700 Subject: [PATCH 5/5] address review: document include_schemas default and interaction with compact_schemas --- docs/admin_docs/configuration/mcp-server.mdx | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/docs/admin_docs/configuration/mcp-server.mdx b/docs/admin_docs/configuration/mcp-server.mdx index 626a202e9e89..df299acaf8da 100644 --- a/docs/admin_docs/configuration/mcp-server.mdx +++ b/docs/admin_docs/configuration/mcp-server.mdx @@ -616,7 +616,8 @@ MCP_TOOL_SEARCH_CONFIG = { ], "search_tool_name": "search_tools", "call_tool_name": "call_tool", - "compact_schemas": True, # Strip $defs in search results to save tokens + "include_schemas": False, # False=summary mode (name + parameters_hint) + "compact_schemas": True, # Strip $defs (only applies when include_schemas=True) "max_description_length": 300, } ``` @@ -627,8 +628,9 @@ MCP_TOOL_SEARCH_CONFIG = { | `strategy` | `"bm25"` | Search ranking algorithm. `"bm25"` supports natural language; `"regex"` supports pattern matching | | `max_results` | `5` | Maximum tools returned per search query | | `always_visible` | See above | Tools that always appear in `list_tools`, regardless of search | -| `compact_schemas` | `True` | Strip `$defs` from search results to reduce token cost. Full schemas are used when the tool is actually called | -| `max_description_length` | `300` | Truncate tool descriptions in search results (0 = no truncation) | +| `include_schemas` | `False` | When `False` (default, "summary mode"), search results omit `inputSchema` entirely and include a lightweight `parameters_hint` listing top-level parameter names. Set to `True` to include the full `inputSchema` in search results. Full schemas are always used when a tool is actually invoked via `call_tool`. | +| `compact_schemas` | `True` | Strip `$defs` / `$ref` and replace with `{"type": "object"}` in search results to reduce token cost. Only takes effect when `include_schemas=True` — ignored in summary mode. | +| `max_description_length` | `300` | Truncate tool descriptions in search results (0 = no truncation). Applies in both summary and full-schema modes. | :::tip Set `enabled: False` to revert to the traditional "show all tools at once" behavior, which some clients or workflows may prefer.