-
Notifications
You must be signed in to change notification settings - Fork 17.4k
fix(embedded): prevent global guest RLS from being applied to virtual dataset inner tables #38601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -845,3 +845,47 @@ def test_dataset_id_can_be_string(self): | |||||||||
| sql = dataset.get_query_str(self.query_obj) | ||||||||||
|
|
||||||||||
| assert re.search(RLS_ALICE_REGEX, sql) | ||||||||||
|
|
||||||||||
| @pytest.mark.usefixtures("load_birth_names_dashboard_with_slices") | ||||||||||
| @pytest.mark.usefixtures("load_energy_table_with_slice") | ||||||||||
| def test_global_rls_not_applied_to_virtual_dataset_inner_tables(self): | ||||||||||
| """ | ||||||||||
| Global guest RLS (no dataset ID) should only be applied to the outer | ||||||||||
| query of a virtual dataset, not to individual inner tables. This | ||||||||||
| prevents SQL errors when the RLS column doesn't exist on all inner | ||||||||||
| tables (e.g., a virtual dataset JOINing tables where only one has the | ||||||||||
| filtered column). | ||||||||||
| """ | ||||||||||
| birth_names = self.get_table(name="birth_names") | ||||||||||
|
|
||||||||||
| virtual_dataset = SqlaTable( | ||||||||||
| table_name="virtual_birth_names_rls_test", | ||||||||||
| database=birth_names.database, | ||||||||||
| schema=birth_names.schema, | ||||||||||
| sql=f"SELECT * FROM {birth_names.table_name}", # noqa: S608 | ||||||||||
| ) | ||||||||||
| db.session.add(virtual_dataset) | ||||||||||
| db.session.commit() | ||||||||||
|
|
||||||||||
| try: | ||||||||||
| # Global RLS rule (no dataset ID) — should only apply to outer | ||||||||||
| # query, not to the inner birth_names table reference | ||||||||||
| g.user = self.guest_user_with_rls(rules=[{"clause": "name = 'Alice'"}]) | ||||||||||
| query_obj = { | ||||||||||
| "groupby": [], | ||||||||||
| "metrics": None, | ||||||||||
| "filter": [], | ||||||||||
| "is_timeseries": False, | ||||||||||
| "columns": ["name"], | ||||||||||
| "granularity": None, | ||||||||||
| "from_dttm": None, | ||||||||||
| "to_dttm": None, | ||||||||||
| "extras": {}, | ||||||||||
| } | ||||||||||
| sql = virtual_dataset.get_query_str(query_obj) | ||||||||||
|
|
||||||||||
| # Guest RLS should appear in the outer WHERE clause | ||||||||||
| assert re.search(RLS_ALICE_REGEX, sql) | ||||||||||
|
Comment on lines
+887
to
+888
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggestion: The assertion only checks that the guest RLS clause exists somewhere in the SQL, which still passes if the clause is applied both to inner tables and the outer query. Assert that the clause appears exactly once to verify the fix truly prevents duplicate inner-table application. [logic error] Severity Level: Major
|
||||||||||
| # Guest RLS should appear in the outer WHERE clause | |
| assert re.search(RLS_ALICE_REGEX, sql) | |
| # Guest RLS should appear exactly once (outer query only) | |
| assert len(re.findall(RLS_ALICE_REGEX, sql)) == 1 |
Steps of Reproduction ✅
1. In `tests/integration_tests/security/row_level_security_tests.py:887-888`, the check
uses `re.search`, which validates only "at least one match".
2. Query generation path for this test is `virtual_dataset.get_query_str()`
(`row_level_security_tests.py:885`) → `superset/models/helpers.py:get_sqla_query` where
outer RLS is appended at line 3215.
3. Guest RLS clauses are produced in `superset/connectors/sqla/models.py:770-772`; if
guest RLS is mistakenly also included for inner virtual-dataset tables, SQL can contain
the same clause multiple times (inner + outer).
4. With duplicated predicates, `re.search(RLS_ALICE_REGEX, sql)` still passes;
`len(re.findall(...)) == 1` is needed to assert the fix intent ("outer query only") and
catch duplicate injection regressions.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/integration_tests/security/row_level_security_tests.py
**Line:** 887:888
**Comment:**
*Logic Error: The assertion only checks that the guest RLS clause exists somewhere in the SQL, which still passes if the clause is applied both to inner tables and the outer query. Assert that the clause appears exactly once to verify the fix truly prevents duplicate inner-table application.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: The new regression test does not actually create a virtual dataset with multiple inner tables, so it cannot catch the original bug scenario where a global guest filter is incorrectly pushed down to joined inner tables that lack the filtered column. Build the virtual dataset with a JOIN so the test exercises the intended failure mode. [logic error]
Severity Level: Major⚠️
Steps of Reproduction ✅
Prompt for AI Agent 🤖