perf: cache dashboard bootstrap data#11234
Conversation
Codecov Report
@@ Coverage Diff @@
## master #11234 +/- ##
==========================================
- Coverage 61.47% 59.05% -2.42%
==========================================
Files 832 797 -35
Lines 39445 38232 -1213
Branches 3598 3396 -202
==========================================
- Hits 24248 22579 -1669
- Misses 15015 15471 +456
Partials 182 182
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Tried to add this to SQLAlchemy events, just like what we did with slices, but it was somehow triggered multiple times (4 or 5), which could have performance risks when a datasource is used in multiple dashboards.
So I moved it to this BaseDatasource API instead. The drawback is it may not trigger if the datasource is updated from FAB CRUD view (an unlikely use case).
There was a problem hiding this comment.
Remove quotes for consistency.
There was a problem hiding this comment.
__repr__ is used in cache key.
There was a problem hiding this comment.
Can't use before_delete event for Slice because the association record in the relationship table datasource_slice is deleted before the Slice object is deleted.
There was a problem hiding this comment.
This will throw 500 errors when chart don't have uuid, which could happen when chart is deleted but the dashboard cache is not purged for some reason.
There was a problem hiding this comment.
So we can override the configs from SUPERSET_CONFIG_PATH (I've set it to ~/.superset/config.py).
There was a problem hiding this comment.
Does this do anything? FEATURE_FLAGS is empty by default in the config
There was a problem hiding this comment.
But SUPERSET_CONFIG_PATH (a config file) is loaded before SUPERSET_CONFIG (a Python module). This allows users to override the feature flags in SUPERSET_CONFIG_PATH (a file not tracked in Git) when starting superset like this:
SUPERSET_CONFIG=tests.superset_test_config superset run
ebe41c8 to
ee36591
Compare
There was a problem hiding this comment.
Cleaned up to break circular imports, because I want to import DruidMetric and DruidColumn from dashboard.py.
There was a problem hiding this comment.
I think this is a duplicate of https://github.com/apache/incubator-superset/blob/93fdf1d64465b3378de65fb1b058e1a04742a70f/superset/models/dashboard.py#L573
cc @suddjian
Removed so that it's possible to import models.core from dashboard.py. It makes more sense to let Dashboard depends on core (if needed) than vice versa.
There was a problem hiding this comment.
Did some refactor for consistency.
4ba6fc2 to
1183dde
Compare
There was a problem hiding this comment.
This fixes a 500 error when duplicating a dashboard with deleted slices.
96590c2 to
19c6395
Compare
etr2460
left a comment
There was a problem hiding this comment.
could we add unit tests for the new decorators?
d78ab7c to
325e5b8
Compare
325e5b8 to
7939b0a
Compare
Added some basic tests. |
ca38c89 to
985fdb0
Compare

SUMMARY
Large and complex dashboards can be very slow to load because they need to read all the related slices and datasources and they all have go through SQL queries with no caching enabled.
A previous attempt to cache the rendered dashboard page had to be reverted because it did not consider user-specific data on that page. Why don't we cache the non-user-specific bootstrap data instead?
This PR extracts the data building logics from the
/dashboard/:id_or_slugFlask view to theDashboardmodel and uses Flack caching decorator to manage the cache. Cache will be cleaned up when any one of the dashboard, the associated slices, datasources, or table columns/metrics is updated.When tested with a dashboard of 300+ slices in our staging environment, this change can save up to more than 3 seconds of page load time.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
N/A
TEST PLAN
Make sure the dashboard page still work as expected.
ADDITIONAL INFORMATION