perf: cache dashboard bootstrap data by ktmud · Pull Request #11234 · apache/superset

ktmud · 2020-10-12T07:34:58Z

SUMMARY

Large and complex dashboards can be very slow to load because they need to read all the related slices and datasources and they all have go through SQL queries with no caching enabled.

A previous attempt to cache the rendered dashboard page had to be reverted because it did not consider user-specific data on that page. Why don't we cache the non-user-specific bootstrap data instead?

This PR extracts the data building logics from the /dashboard/:id_or_slug Flask view to the Dashboard model and uses Flack caching decorator to manage the cache. Cache will be cleaned up when any one of the dashboard, the associated slices, datasources, or table columns/metrics is updated.

When tested with a dashboard of 300+ slices in our staging environment, this change can save up to more than 3 seconds of page load time.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A

TEST PLAN

Make sure the dashboard page still work as expected.

You second visit to a very large dashboard page should be much more faster
Update a slice or datasource, refresh the Dashboard page, you should see the updates
User access etc should still work

ADDITIONAL INFORMATION

Has associated issue: feat: enable ETag header for dashboard GET requests #10963 fix: enable consistent etag across workers and force no-cache for dashboards #11137 fix: revert eTag cache feature for dashboard #11203
Changes UI
Requires DB Migration.
Confirm DB Migration upgrade and downgrade tested.
Introduces new feature or API
Removes existing feature or API

codecov-io · 2020-10-12T08:08:29Z

Codecov Report

Merging #11234 into master will decrease coverage by 2.41%.
The diff coverage is 66.08%.

@@            Coverage Diff             @@
##           master   #11234      +/-   ##
==========================================
- Coverage   61.47%   59.05%   -2.42%     
==========================================
  Files         832      797      -35     
  Lines       39445    38232    -1213     
  Branches     3598     3396     -202     
==========================================
- Hits        24248    22579    -1669     
- Misses      15015    15471     +456     
  Partials      182      182

Flag	Coverage Δ
#cypress	`55.94% <ø> (?)`
#javascript	`?`
#python	`60.73% <66.08%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
superset/models/dashboard.py	`80.63% <49.20%> (-7.41%)`	⬇️
superset/views/core.py	`74.51% <61.11%> (-0.29%)`	⬇️
superset/charts/commands/delete.py	`91.42% <100.00%> (+0.51%)`	⬆️
superset/config.py	`90.11% <100.00%> (+0.03%)`	⬆️
superset/dashboards/dao.py	`94.38% <100.00%> (ø)`
superset/models/core.py	`89.10% <100.00%> (+0.70%)`	⬆️
superset/utils/core.py	`89.73% <100.00%> (+0.08%)`	⬆️
superset/utils/decorators.py	`65.82% <100.00%> (+10.82%)`	⬆️
superset/views/dashboard/views.py	`70.58% <100.00%> (ø)`
...uperset-frontend/src/dashboard/util/dnd-reorder.js	`0.00% <0.00%> (-100.00%)`	⬇️
... and 343 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 16f7b2b...985fdb0. Read the comment docs.

ktmud · 2020-10-12T07:40:21Z

Tried to add this to SQLAlchemy events, just like what we did with slices, but it was somehow triggered multiple times (4 or 5), which could have performance risks when a datasource is used in multiple dashboards.

So I moved it to this BaseDatasource API instead. The drawback is it may not trigger if the datasource is updated from FAB CRUD view (an unlikely use case).

ktmud · 2020-10-12T07:41:07Z

Remove quotes for consistency.

ktmud · 2020-10-12T07:42:04Z

__repr__ is used in cache key.

After this change, when adding a dashboard schedule, dashboard names are seen as "Dashboard <1>" etc... with no name at all and hard to select a dashboard for a schedule from its ID.... Is not there a better option other than this?

ktmud · 2020-10-12T08:18:32Z

Can't use before_delete event for Slice because the association record in the relationship table datasource_slice is deleted before the Slice object is deleted.

ktmud · 2020-10-12T08:21:06Z

This will throw 500 errors when chart don't have uuid, which could happen when chart is deleted but the dashboard cache is not purged for some reason.

ktmud · 2020-10-12T08:23:11Z

So we can override the configs from SUPERSET_CONFIG_PATH (I've set it to ~/.superset/config.py).

Does this do anything? FEATURE_FLAGS is empty by default in the config

But SUPERSET_CONFIG_PATH (a config file) is loaded before SUPERSET_CONFIG (a Python module). This allows users to override the feature flags in SUPERSET_CONFIG_PATH (a file not tracked in Git) when starting superset like this:

SUPERSET_CONFIG=tests.superset_test_config superset run

ktmud · 2020-10-12T19:13:07Z

Cleaned up to break circular imports, because I want to import DruidMetric and DruidColumn from dashboard.py.

ktmud · 2020-10-12T19:14:58Z

I think this is a duplicate of https://github.com/apache/incubator-superset/blob/93fdf1d64465b3378de65fb1b058e1a04742a70f/superset/models/dashboard.py#L573

cc @suddjian

Removed so that it's possible to import models.core from dashboard.py. It makes more sense to let Dashboard depends on core (if needed) than vice versa.

ktmud · 2020-10-12T19:18:56Z

Did some refactor for consistency.

ktmud · 2020-10-12T21:17:10Z

This fixes a 500 error when duplicating a dashboard with deleted slices.

etr2460

could we add unit tests for the new decorators?

graceguo-supercat

LGTM

ktmud · 2020-10-13T23:26:18Z

could we add unit tests for the new decorators?

7939b0a

Added some basic tests.

pull-request-size Bot added the size/L label Oct 12, 2020

ktmud commented Oct 12, 2020

View reviewed changes

ktmud force-pushed the dashboard-cache branch 6 times, most recently from ebe41c8 to ee36591 Compare October 12, 2020 20:25

ktmud commented Oct 12, 2020

View reviewed changes

ktmud force-pushed the dashboard-cache branch from 4ba6fc2 to 1183dde Compare October 12, 2020 21:07

ktmud commented Oct 12, 2020

View reviewed changes

ktmud mentioned this pull request Oct 13, 2020

chore: deprecate REDUCE_DASHBOARD_BOOTSTRAP_PAYLOAD #11244

Merged

6 tasks

ktmud force-pushed the dashboard-cache branch 3 times, most recently from 96590c2 to 19c6395 Compare October 13, 2020 18:07

etr2460 reviewed Oct 13, 2020

View reviewed changes

Comment thread superset/models/dashboard.py Outdated

graceguo-supercat approved these changes Oct 13, 2020

View reviewed changes

ktmud added 2 commits October 13, 2020 16:02

perf: cache dashboard bootstrap data

93b270c

Make it clear returned values are metadata

7a1ac18

ktmud force-pushed the dashboard-cache branch from d78ab7c to 325e5b8 Compare October 13, 2020 23:23

Add basic test case for debounce

7939b0a

ktmud force-pushed the dashboard-cache branch from 325e5b8 to 7939b0a Compare October 13, 2020 23:24

fix linting

985fdb0

ktmud force-pushed the dashboard-cache branch from ca38c89 to 985fdb0 Compare October 13, 2020 23:56

ktmud merged commit 2c649ac into apache:master Oct 14, 2020

ktmud added the hacktoberfest-accepted label Oct 14, 2020

This was referenced Oct 14, 2020

fix: delete the correct dashboard cache key #11273

Merged

fix: use dashboard id for stable cache key #11293

Merged

ktmud mentioned this pull request Oct 21, 2020

fix: dashboard cache invalid join query #11369

Merged

6 tasks

auxten pushed a commit to auxten/incubator-superset that referenced this pull request Nov 20, 2020

perf: cache dashboard bootstrap data (apache#11234)

ff63083

mfyuce mentioned this pull request May 10, 2021

After this change, when adding a dashboard schedule, dashboard names are seen as "Dashboard <1>" etc... with no name at all and hard to select a dashboard for a schedule from its ID.... Is not there a better option other than this? #14559

Closed

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.0.0 First shipped in 1.0.0 labels Mar 12, 2024

qfcwell pushed a commit to qfcwell/superset that referenced this pull request May 12, 2026

perf: cache dashboard bootstrap data (apache#11234)

3ec9784

Conversation

ktmud commented Oct 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TEST PLAN

ADDITIONAL INFORMATION

Uh oh!

codecov-io commented Oct 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ktmud Oct 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ktmud Oct 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

etr2460 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

graceguo-supercat left a comment

Choose a reason for hiding this comment

Uh oh!

ktmud commented Oct 13, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ktmud commented Oct 12, 2020 •

edited

Loading

codecov-io commented Oct 12, 2020 •

edited

Loading

ktmud Oct 12, 2020 •

edited

Loading

ktmud Oct 13, 2020 •

edited

Loading