Skip to content

fix(embedded): prevent global guest RLS from being applied to virtual dataset inner tables#38601

Closed
kgabryje wants to merge 2 commits into
apache:masterfrom
kgabryje:global-rls-embedded
Closed

fix(embedded): prevent global guest RLS from being applied to virtual dataset inner tables#38601
kgabryje wants to merge 2 commits into
apache:masterfrom
kgabryje:global-rls-embedded

Conversation

@kgabryje
Copy link
Copy Markdown
Member

@kgabryje kgabryje commented Mar 12, 2026

SUMMARY

Fixes a bug where global guest RLS rules (without a dataset ID) for embedded users were being incorrectly applied to the inner physical tables of virtual datasets, in addition to the outer query. When the RLS clause referenced a column that didn't exist on all inner tables (e.g., a virtual dataset with JOINs), this caused SQL errors like column "channel_name" does not exist.

Root cause: get_sqla_row_level_filters() unconditionally includes guest RLS on every call. For virtual datasets, it's called twice:

  1. Inner tables via get_from_clause()apply_rls()get_predicates_for_table()
  2. Outer query via get_sqla_query() (line 3213 in helpers.py)

Global guest RLS matches ALL datasets, so it was applied to both inner tables and the outer query — causing errors when inner tables didn't have the filtered column.

Fix: Add an include_guest_rls parameter (default True, fully backward-compatible) threaded through get_sqla_row_level_filters()get_predicates_for_table()apply_rls(). In get_from_clause(), pass include_guest_rls=False so inner tables only get role-based RLS. Guest RLS is applied once at the outer virtual dataset query level.

Fixes #37359
Fixes #37551

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Test dashboard:

image

Applied guest token RLS: "channel_name = 'general'"

Before:

image

After:

image

TESTING INSTRUCTIONS

  1. Create a virtual dataset with a JOIN (e.g., Slack Members and Channels from examples)
  2. Add a chart using this dataset to a dashboard, enable embedded mode
  3. Generate a guest token with a global RLS rule (no dataset ID): {"rls": [{"clause": "channel_name = 'general'"}]}
  4. Access the dashboard with the guest token
  5. Verify the chart renders without SQL errors

ADDITIONAL INFORMATION

… dataset inner tables

Global guest RLS rules (without a dataset ID) were being applied to each
physical table inside a virtual dataset's SQL, in addition to the outer
query. When the RLS clause referenced a column that didn't exist on all
inner tables (e.g., a JOIN), this caused SQL errors.

The fix adds an `include_guest_rls` parameter (default True) threaded
through `get_sqla_row_level_filters()` → `get_predicates_for_table()` →
`apply_rls()`. In `get_from_clause()`, `include_guest_rls=False` is
passed so inner tables skip guest RLS. Guest RLS continues to be applied
once at the outer virtual dataset query level.

Fixes apache#37359
Fixes apache#37551

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented Mar 12, 2026

Code Review Agent Run #a64bc4

Actionable Suggestions - 0
Review Details
  • Files reviewed - 5 · Commit Range: ede872d..ede872d
    • superset/connectors/sqla/models.py
    • superset/models/helpers.py
    • superset/utils/rls.py
    • tests/integration_tests/security/row_level_security_tests.py
    • tests/unit_tests/sql_lab_test.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

@codeant-ai-for-open-source codeant-ai-for-open-source Bot added the size:L This PR changes 100-499 lines, ignoring generated files label Mar 12, 2026
@dosubot dosubot Bot added authentication:row-level-security Related to Row Level Security change:backend Requires changing the backend labels Mar 12, 2026
@codeant-ai-for-open-source
Copy link
Copy Markdown
Contributor

Sequence Diagram

This PR changes RLS handling so global guest rules are not injected into inner physical tables of a virtual dataset. Guest RLS is now applied only once at the outer virtual dataset query level, preventing missing column SQL errors.

sequenceDiagram
    participant Guest
    participant QueryBuilder
    participant RLS
    participant Database

    Guest->>QueryBuilder: Request chart query on virtual dataset
    QueryBuilder->>RLS: Apply RLS to inner tables with guest rules disabled
    RLS->>Database: Fetch table predicates using role based rules only
    QueryBuilder->>RLS: Build outer dataset query and request full RLS
    RLS->>Database: Add guest and role filters to outer query
    QueryBuilder-->>Guest: Return successful filtered results
Loading

Generated by CodeAnt AI

@netlify
Copy link
Copy Markdown

netlify Bot commented Mar 12, 2026

Deploy Preview for superset-docs-preview ready!

Name Link
🔨 Latest commit ede872d
🔍 Latest deploy log https://app.netlify.com/projects/superset-docs-preview/deploys/69b2a9e55521a7000822e3e8
😎 Deploy Preview https://deploy-preview-38601--superset-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

… dataset inner tables

Global guest RLS rules (without a dataset ID) were being applied to each
physical table inside a virtual dataset's SQL, in addition to the outer
query. When the RLS clause referenced a column that didn't exist on all
inner tables (e.g., a JOIN), this caused SQL errors.

The fix adds an `include_global_guest_rls` parameter to
`get_sqla_row_level_filters()` that, when False, skips only global
(unscoped) guest RLS rules while preserving dataset-scoped guest rules.
`get_predicates_for_table()` passes `include_global_guest_rls=False` so
that inner physical tables skip global guest RLS. Global guest RLS
continues to be applied once at the outer virtual dataset query level.

Dataset-scoped guest rules (those with a specific dataset ID) are always
applied regardless, ensuring they reach the targeted physical table even
inside virtual datasets.

Fixes apache#37359
Fixes apache#37551

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@pull-request-size pull-request-size Bot added size/M and removed size/L labels Mar 12, 2026
table_name="virtual_birth_names_rls_test",
database=birth_names.database,
schema=birth_names.schema,
sql=f"SELECT * FROM {birth_names.table_name}", # noqa: S608
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The new regression test does not actually create a virtual dataset with multiple inner tables, so it cannot catch the original bug scenario where a global guest filter is incorrectly pushed down to joined inner tables that lack the filtered column. Build the virtual dataset with a JOIN so the test exercises the intended failure mode. [logic error]

Severity Level: Major ⚠️
- ⚠️ Embedded guest JOIN regressions escape integration test coverage.
- ⚠️ CI may pass despite broken virtual-dataset RLS behavior.
Suggested change
sql=f"SELECT * FROM {birth_names.table_name}", # noqa: S608
sql=f"SELECT b.name, e.value FROM {birth_names.table_name} b JOIN energy_usage e ON 1=1", # noqa: S608
Steps of Reproduction ✅
1. Run
`GuestTokenRowLevelSecurityTests.test_global_rls_not_applied_to_virtual_dataset_inner_tables`
in `tests/integration_tests/security/row_level_security_tests.py:851-891`; it builds
virtual SQL at line 865 as `SELECT * FROM birth_names` (single inner table).

2. The test calls `virtual_dataset.get_query_str(query_obj)` at
`row_level_security_tests.py:885`, which flows into
`superset/models/helpers.py:get_query_str` (line 2186) and then virtual SQL parsing in
`get_from_clause` (line 2029).

3. Inner-table RLS application loops `parsed_statement.tables` in
`superset/utils/rls.py:53-63`; with current test SQL there is only one table
(`birth_names`), so no multi-table pushdown case is exercised.

4. The documented failure mode needs joined inner tables with different schemas;
`energy_usage` fixture columns are `source/target/value`
(`tests/integration_tests/fixtures/energy_dashboard.py:49`), so adding JOIN coverage is
required to validate "column missing on some inner tables" regressions.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/integration_tests/security/row_level_security_tests.py
**Line:** 865:865
**Comment:**
	*Logic Error: The new regression test does not actually create a virtual dataset with multiple inner tables, so it cannot catch the original bug scenario where a global guest filter is incorrectly pushed down to joined inner tables that lack the filtered column. Build the virtual dataset with a JOIN so the test exercises the intended failure mode.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

Comment on lines +887 to +888
# Guest RLS should appear in the outer WHERE clause
assert re.search(RLS_ALICE_REGEX, sql)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The assertion only checks that the guest RLS clause exists somewhere in the SQL, which still passes if the clause is applied both to inner tables and the outer query. Assert that the clause appears exactly once to verify the fix truly prevents duplicate inner-table application. [logic error]

Severity Level: Major ⚠️
- ⚠️ Duplicate guest RLS injection not detected by assertion.
- ⚠️ Regression can ship despite failing outer-only guarantee.
Suggested change
# Guest RLS should appear in the outer WHERE clause
assert re.search(RLS_ALICE_REGEX, sql)
# Guest RLS should appear exactly once (outer query only)
assert len(re.findall(RLS_ALICE_REGEX, sql)) == 1
Steps of Reproduction ✅
1. In `tests/integration_tests/security/row_level_security_tests.py:887-888`, the check
uses `re.search`, which validates only "at least one match".

2. Query generation path for this test is `virtual_dataset.get_query_str()`
(`row_level_security_tests.py:885`) → `superset/models/helpers.py:get_sqla_query` where
outer RLS is appended at line 3215.

3. Guest RLS clauses are produced in `superset/connectors/sqla/models.py:770-772`; if
guest RLS is mistakenly also included for inner virtual-dataset tables, SQL can contain
the same clause multiple times (inner + outer).

4. With duplicated predicates, `re.search(RLS_ALICE_REGEX, sql)` still passes;
`len(re.findall(...)) == 1` is needed to assert the fix intent ("outer query only") and
catch duplicate injection regressions.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/integration_tests/security/row_level_security_tests.py
**Line:** 887:888
**Comment:**
	*Logic Error: The assertion only checks that the guest RLS clause exists somewhere in the SQL, which still passes if the clause is applied both to inner tables and the outer query. Assert that the clause appears exactly once to verify the fix truly prevents duplicate inner-table application.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

@kgabryje kgabryje closed this Mar 12, 2026
@kgabryje
Copy link
Copy Markdown
Member Author

Closed in favor of #37395

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.42%. Comparing base (9215eb5) to head (b995e53).
⚠️ Report is 8 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #38601      +/-   ##
==========================================
- Coverage   65.01%   64.42%   -0.60%     
==========================================
  Files        1817     2529     +712     
  Lines       72317   128891   +56574     
  Branches    23031    29708    +6677     
==========================================
+ Hits        47020    83034   +36014     
- Misses      25297    44413   +19116     
- Partials        0     1444    +1444     
Flag Coverage Δ
hive 40.77% <0.00%> (?)
mysql 61.93% <100.00%> (?)
postgres 62.00% <100.00%> (?)
presto 40.79% <0.00%> (?)
python 63.63% <100.00%> (?)
sqlite 61.63% <100.00%> (?)
unit 100.00% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented Mar 12, 2026

Code Review Agent Run #cfd844

Actionable Suggestions - 0
Review Details
  • Files reviewed - 4 · Commit Range: ede872d..b995e53
    • superset/connectors/sqla/models.py
    • superset/models/helpers.py
    • superset/utils/rls.py
    • tests/unit_tests/sql_lab_test.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

authentication:row-level-security Related to Row Level Security change:backend Requires changing the backend size:L This PR changes 100-499 lines, ignoring generated files size/M

Projects

None yet

1 participant