fix(embedded): prevent global guest RLS from being applied to virtual dataset inner tables#38601
fix(embedded): prevent global guest RLS from being applied to virtual dataset inner tables#38601kgabryje wants to merge 2 commits into
Conversation
… dataset inner tables Global guest RLS rules (without a dataset ID) were being applied to each physical table inside a virtual dataset's SQL, in addition to the outer query. When the RLS clause referenced a column that didn't exist on all inner tables (e.g., a JOIN), this caused SQL errors. The fix adds an `include_guest_rls` parameter (default True) threaded through `get_sqla_row_level_filters()` → `get_predicates_for_table()` → `apply_rls()`. In `get_from_clause()`, `include_guest_rls=False` is passed so inner tables skip guest RLS. Guest RLS continues to be applied once at the outer virtual dataset query level. Fixes apache#37359 Fixes apache#37551 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Code Review Agent Run #a64bc4Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
Sequence DiagramThis PR changes RLS handling so global guest rules are not injected into inner physical tables of a virtual dataset. Guest RLS is now applied only once at the outer virtual dataset query level, preventing missing column SQL errors. sequenceDiagram
participant Guest
participant QueryBuilder
participant RLS
participant Database
Guest->>QueryBuilder: Request chart query on virtual dataset
QueryBuilder->>RLS: Apply RLS to inner tables with guest rules disabled
RLS->>Database: Fetch table predicates using role based rules only
QueryBuilder->>RLS: Build outer dataset query and request full RLS
RLS->>Database: Add guest and role filters to outer query
QueryBuilder-->>Guest: Return successful filtered results
Generated by CodeAnt AI |
✅ Deploy Preview for superset-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
… dataset inner tables Global guest RLS rules (without a dataset ID) were being applied to each physical table inside a virtual dataset's SQL, in addition to the outer query. When the RLS clause referenced a column that didn't exist on all inner tables (e.g., a JOIN), this caused SQL errors. The fix adds an `include_global_guest_rls` parameter to `get_sqla_row_level_filters()` that, when False, skips only global (unscoped) guest RLS rules while preserving dataset-scoped guest rules. `get_predicates_for_table()` passes `include_global_guest_rls=False` so that inner physical tables skip global guest RLS. Global guest RLS continues to be applied once at the outer virtual dataset query level. Dataset-scoped guest rules (those with a specific dataset ID) are always applied regardless, ensuring they reach the targeted physical table even inside virtual datasets. Fixes apache#37359 Fixes apache#37551 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| table_name="virtual_birth_names_rls_test", | ||
| database=birth_names.database, | ||
| schema=birth_names.schema, | ||
| sql=f"SELECT * FROM {birth_names.table_name}", # noqa: S608 |
There was a problem hiding this comment.
Suggestion: The new regression test does not actually create a virtual dataset with multiple inner tables, so it cannot catch the original bug scenario where a global guest filter is incorrectly pushed down to joined inner tables that lack the filtered column. Build the virtual dataset with a JOIN so the test exercises the intended failure mode. [logic error]
Severity Level: Major ⚠️
- ⚠️ Embedded guest JOIN regressions escape integration test coverage.
- ⚠️ CI may pass despite broken virtual-dataset RLS behavior.| sql=f"SELECT * FROM {birth_names.table_name}", # noqa: S608 | |
| sql=f"SELECT b.name, e.value FROM {birth_names.table_name} b JOIN energy_usage e ON 1=1", # noqa: S608 |
Steps of Reproduction ✅
1. Run
`GuestTokenRowLevelSecurityTests.test_global_rls_not_applied_to_virtual_dataset_inner_tables`
in `tests/integration_tests/security/row_level_security_tests.py:851-891`; it builds
virtual SQL at line 865 as `SELECT * FROM birth_names` (single inner table).
2. The test calls `virtual_dataset.get_query_str(query_obj)` at
`row_level_security_tests.py:885`, which flows into
`superset/models/helpers.py:get_query_str` (line 2186) and then virtual SQL parsing in
`get_from_clause` (line 2029).
3. Inner-table RLS application loops `parsed_statement.tables` in
`superset/utils/rls.py:53-63`; with current test SQL there is only one table
(`birth_names`), so no multi-table pushdown case is exercised.
4. The documented failure mode needs joined inner tables with different schemas;
`energy_usage` fixture columns are `source/target/value`
(`tests/integration_tests/fixtures/energy_dashboard.py:49`), so adding JOIN coverage is
required to validate "column missing on some inner tables" regressions.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/integration_tests/security/row_level_security_tests.py
**Line:** 865:865
**Comment:**
*Logic Error: The new regression test does not actually create a virtual dataset with multiple inner tables, so it cannot catch the original bug scenario where a global guest filter is incorrectly pushed down to joined inner tables that lack the filtered column. Build the virtual dataset with a JOIN so the test exercises the intended failure mode.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| # Guest RLS should appear in the outer WHERE clause | ||
| assert re.search(RLS_ALICE_REGEX, sql) |
There was a problem hiding this comment.
Suggestion: The assertion only checks that the guest RLS clause exists somewhere in the SQL, which still passes if the clause is applied both to inner tables and the outer query. Assert that the clause appears exactly once to verify the fix truly prevents duplicate inner-table application. [logic error]
Severity Level: Major ⚠️
- ⚠️ Duplicate guest RLS injection not detected by assertion.
- ⚠️ Regression can ship despite failing outer-only guarantee.| # Guest RLS should appear in the outer WHERE clause | |
| assert re.search(RLS_ALICE_REGEX, sql) | |
| # Guest RLS should appear exactly once (outer query only) | |
| assert len(re.findall(RLS_ALICE_REGEX, sql)) == 1 |
Steps of Reproduction ✅
1. In `tests/integration_tests/security/row_level_security_tests.py:887-888`, the check
uses `re.search`, which validates only "at least one match".
2. Query generation path for this test is `virtual_dataset.get_query_str()`
(`row_level_security_tests.py:885`) → `superset/models/helpers.py:get_sqla_query` where
outer RLS is appended at line 3215.
3. Guest RLS clauses are produced in `superset/connectors/sqla/models.py:770-772`; if
guest RLS is mistakenly also included for inner virtual-dataset tables, SQL can contain
the same clause multiple times (inner + outer).
4. With duplicated predicates, `re.search(RLS_ALICE_REGEX, sql)` still passes;
`len(re.findall(...)) == 1` is needed to assert the fix intent ("outer query only") and
catch duplicate injection regressions.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/integration_tests/security/row_level_security_tests.py
**Line:** 887:888
**Comment:**
*Logic Error: The assertion only checks that the guest RLS clause exists somewhere in the SQL, which still passes if the clause is applied both to inner tables and the outer query. Assert that the clause appears exactly once to verify the fix truly prevents duplicate inner-table application.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.|
Closed in favor of #37395 |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #38601 +/- ##
==========================================
- Coverage 65.01% 64.42% -0.60%
==========================================
Files 1817 2529 +712
Lines 72317 128891 +56574
Branches 23031 29708 +6677
==========================================
+ Hits 47020 83034 +36014
- Misses 25297 44413 +19116
- Partials 0 1444 +1444
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Code Review Agent Run #cfd844Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
SUMMARY
Fixes a bug where global guest RLS rules (without a
datasetID) for embedded users were being incorrectly applied to the inner physical tables of virtual datasets, in addition to the outer query. When the RLS clause referenced a column that didn't exist on all inner tables (e.g., a virtual dataset with JOINs), this caused SQL errors likecolumn "channel_name" does not exist.Root cause:
get_sqla_row_level_filters()unconditionally includes guest RLS on every call. For virtual datasets, it's called twice:get_from_clause()→apply_rls()→get_predicates_for_table()get_sqla_query()(line 3213 in helpers.py)Global guest RLS matches ALL datasets, so it was applied to both inner tables and the outer query — causing errors when inner tables didn't have the filtered column.
Fix: Add an
include_guest_rlsparameter (defaultTrue, fully backward-compatible) threaded throughget_sqla_row_level_filters()→get_predicates_for_table()→apply_rls(). Inget_from_clause(), passinclude_guest_rls=Falseso inner tables only get role-based RLS. Guest RLS is applied once at the outer virtual dataset query level.Fixes #37359
Fixes #37551
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
Test dashboard:
Applied guest token RLS:
"channel_name = 'general'"Before:
After:
TESTING INSTRUCTIONS
Slack Members and Channelsfrom examples)datasetID):{"rls": [{"clause": "channel_name = 'general'"}]}ADDITIONAL INFORMATION
EMBEDDED_SUPERSET