fix(embedded): prevent double RLS application in virtual datasets#37395
Conversation
Fixes apache#37359: Guest users in embedded dashboards experienced double RLS application when using virtual datasets, causing SQL errors. Problem: - get_sqla_row_level_filters() includes guest RLS for ALL calls - For virtual datasets, it was called twice: 1. For underlying tables via get_predicates_for_table() 2. For the virtual dataset itself via get_sqla_query() - Global guest RLS rules were applied to BOTH, causing double filtering Solution: - Refactored get_sqla_row_level_filters() using Separate Method pattern - Created _get_sqla_row_level_filters_internal() with include_guest_rls param - Public API unchanged (backwards compatible) - In get_predicates_for_table(), call internal method with include_guest_rls=False - Guest RLS now applied only to outer query, not underlying tables Security: - Regular (non-guest) RLS still applied to underlying tables - Guest RLS still applied to outer query - No RLS bypass possible Testing: - Added 6 unit tests covering all scenarios - All tests pass
Code Review Agent Run #a80900Actionable Suggestions - 0Additional Suggestions - 1
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
✅ Deploy Preview for superset-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Renamed tests to follow Superset's test naming convention: - test_public_api_includes_guest_rls → test_rls_filters_include_guest_when_enabled - test_internal_api_excludes_guest_rls_when_requested → test_rls_filters_exclude_guest_when_requested - test_internal_api_includes_guest_rls_by_default → test_rls_filters_include_guest_by_default Superset pattern: test_<functionality>_<scenario> No "public_api" or "internal_api" in test names. Ref: bito-code-review suggestion on PR apache#37395
Response to bito-code-review suggestionAnalysis of the suggestionThe bot noted that Investigation of Superset test naming patterns: # Examples from codebase:
test_csv_reader_cast_column_types_function # tests _cast_column_types
test_get_sqla_engine_user_impersonation # tests internal behavior
test_query_context_modified_tampered # describes scenarioSuperset pattern: Changes madeRenamed tests to follow Superset naming convention:
Method naming verificationThe internal method name
All 6 tests pass after renaming. |
Code Review Agent Run #cc567fActionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #37395 +/- ##
==========================================
- Coverage 65.02% 64.42% -0.61%
==========================================
Files 1817 2529 +712
Lines 72268 128842 +56574
Branches 23017 29694 +6677
==========================================
+ Hits 46990 83002 +36012
- Misses 25278 44395 +19117
- Partials 0 1445 +1445
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Update test_get_predicates_for_table to mock _get_sqla_row_level_filters_internal instead of get_sqla_row_level_filters, matching the implementation change in get_predicates_for_table that uses the internal method with include_guest_rls=False.
CI FixFixed the failing Root cause: The test was mocking Note on pre-commit failure: The |
Response to codeant-ai bot suggestions1. rls.py:114 - "Logic/security regression" The bot's concern is incorrect. The change is intentional and does NOT disable guest RLS globally. Architecture:
Flow:
This prevents double application of guest RLS (the bug described in #37359), while ensuring guest RLS is still applied exactly once at the correct level. 2-4. test_double_rls_virtual_dataset.py - "with (patch(...),) tuple syntax" The bot is incorrect. The syntax |
Code Review Agent Run #b25117Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
|
up, please! |
|
+1 |
…ual-datasets-37359
…rtual-datasets-37359
|
Rebased on latest upstream/master. Did an additional code review and found room for improvement:
Production code diff: +2 lines in models.py, +2 lines in rls.py, +1 line in helpers.py. |
Code Review Agent Run #377fa6Actionable Suggestions - 0Additional Suggestions - 1
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
Replace binary include_guest_rls flag with include_global_guest_rls parameter that skips only global (unscoped) guest rules while preserving dataset-scoped guest rules that would otherwise be silently lost. Add integration tests for guest RLS scoping through virtual datasets covering both scoped and global rules via get_predicates_for_table().
ef320ba to
af1136e
Compare
Sequence DiagramThis PR changes how row level security is applied for guest users querying virtual datasets, ensuring global guest RLS rules are applied only once on the outer virtual dataset query instead of being duplicated on both underlying tables and the outer query. sequenceDiagram
participant GuestUser
participant EmbeddedDashboard
participant Backend
participant RLSEngine
participant Database
GuestUser->>EmbeddedDashboard: Open embedded virtual dataset chart
EmbeddedDashboard->>Backend: Request virtual dataset query
Backend->>RLSEngine: Build predicates for physical tables (exclude global guest RLS)
RLSEngine-->>Backend: Predicates with regular and scoped guest RLS
Backend->>RLSEngine: Get RLS for virtual dataset (include global guest RLS)
RLSEngine-->>Backend: Combined regular and global guest RLS filters
Backend->>Database: Execute virtual dataset SQL with single guest RLS application
Database-->>Backend: Filtered rows
Backend-->>EmbeddedDashboard: Chart data for guest user
Generated by CodeAnt AI |
|
The sequence diagram is accurate. One nuance worth clarifying: in step 3, dataset-scoped guest RLS rules are included in the physical table predicates only when |
Code Review Agent Run #02c86bActionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
|
thanks for the contriubion @YuriyKrasilnikov ! |
User description
Summary
Fixes #37359: Guest users in embedded dashboards can now use virtual datasets without SQL errors.
Before: Guest RLS applied twice → SQL errors like
Unknown expression or function identifier 'my_table.tenant_id'After: Guest RLS applied once → Virtual datasets work correctly
Problem Analysis
Root Cause (Verified by Code Trace)
get_sqla_row_level_filters()includes guest RLS for every call, but for virtual datasets it's called twice:Key Code Locations
superset/models/helpers.pyapply_rls()call inget_from_clause()superset/utils/rls.pydataset.get_sqla_row_level_filters()callsuperset/connectors/sqla/models.pysuperset/models/helpers.pyWhy This Happens
Global guest RLS rules (without dataset ID) match both:
Both get guest RLS added → double application.
Solution: Separate Method Pattern
Architecture Decision
Evaluated 3 options:
Implementation
include_guest_rlsparameter:get_predicates_for_table():Why Separate Method Pattern?
Security Analysis
Mathematical model unchanged:
RLS(query) = RLS(underlying) AND RLS(outer)Related Issues/PRs
Testing
New Tests (6 tests, all pass)
test_public_api_includes_guest_rls- Backwards compatibilitytest_internal_api_excludes_guest_rls_when_requested- Core fixtest_internal_api_includes_guest_rls_by_default- Default behaviortest_regular_rls_always_included- Securitytest_guest_rls_skipped_when_feature_disabled- Feature flagtest_filter_grouping_preserved- Edge caseFiles Changed
superset/connectors/sqla/models.py- Refactor methodsuperset/utils/rls.py- Call internal methodtests/unit_tests/models/test_double_rls_virtual_dataset.py- New testsHow To Test Manually
SELECT * FROM physical_table{"clause": "tenant_id = 123"}Checklist
CodeAnt-AI Description
Prevent double application of guest RLS filters for virtual datasets
What Changed
Impact
✅ Fewer SQL errors for guest users in embedded dashboards✅ Fewer double-applied guest RLS filters in virtual datasets✅ More accurate filtering for virtual-dataset queries💡 Usage Guide
Checking Your Pull Request
Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.
Talking to CodeAnt AI
Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:
This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.
Example
Preserve Org Learnings with CodeAnt
You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:
This helps CodeAnt AI learn and adapt to your team's coding style and standards.
Example
Retrigger review
Ask CodeAnt AI to review the PR again, by typing:
Check Your Repository Health
To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.