perf: memoize db_engine_spec in database by villebro · Pull Request #14638 · apache/superset

villebro · 2021-05-14T07:37:32Z

SUMMARY

A recent PR #14547 introduced a performance regression causing dataset metadata fetching to become very slow for datasets with large numbers of columns. I originally thought the type regexes were the problem, but when researching the problem more closely it turns out that just referencing self.table.database.db_engine_spec in a TableColumn instance cost ~6ms on my local machine. Multiply that by 1000 columns ~= 6000 ms. To get around this I added memoization to the semi-expensive regex, but also added memoizing for Database.db_engine_spec. This should also speed up query rendering a bit, as there was similar logic there.

BEFORE #14547 (pre-regression)

For the World Bank dataset (328 cols), fetching the data took slightly less than 180ms before on my local machine (including the unnecessary 20 ms redirect):

CURRENT (master)

For the same dataset, retrieval of data now takes ~10s!

AFTER

Retrieval is now slightly quicker than originally (including no redirect):

TEST PLAN

ADDITIONAL INFORMATION

Has associated issue:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

codecov · 2021-05-14T07:58:12Z

Codecov Report

Merging #14638 (06f90f6) into master (e4d2424) will decrease coverage by 0.09%.
The diff coverage is 93.10%.

@@            Coverage Diff             @@
##           master   #14638      +/-   ##
==========================================
- Coverage   77.47%   77.38%   -0.10%     
==========================================
  Files         958      958              
  Lines       48486    48480       -6     
  Branches     5679     5683       +4     
==========================================
- Hits        37565    37514      -51     
- Misses      10721    10766      +45     
  Partials      200      200

Flag	Coverage Δ
hive	`80.94% <92.85%> (-0.03%)`	⬇️
javascript	`72.52% <100.00%> (+<0.01%)`	⬆️
mysql	`81.21% <92.85%> (-0.03%)`	⬇️
postgres	`81.23% <92.85%> (-0.03%)`	⬇️
presto	`?`
python	`81.60% <92.85%> (-0.19%)`	⬇️
sqlite	`80.85% <92.85%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
superset/connectors/sqla/models.py	`88.64% <92.00%> (-1.62%)`	⬇️
...-frontend/src/datasource/ChangeDatasourceModal.tsx	`90.90% <100.00%> (+3.89%)`	⬆️
superset/db_engine_specs/base.py	`88.45% <100.00%> (+0.02%)`	⬆️
superset/models/core.py	`89.40% <100.00%> (+0.27%)`	⬆️
superset/db_engine_specs/presto.py	`84.42% <0.00%> (-5.90%)`	⬇️
superset/connectors/base/models.py	`88.03% <0.00%> (-2.66%)`	⬇️
superset-frontend/src/components/Tabs/Tabs.tsx	`96.55% <0.00%> (-0.33%)`	⬇️
...components/DashboardBuilder/DashboardContainer.tsx	`100.00% <0.00%> (ø)`
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e4d2424...06f90f6. Read the comment docs.

kgabryje · 2021-05-14T08:19:44Z

Can we remove the cypress timeout overrides introduced in the other PR?

villebro · 2021-05-14T08:28:44Z

Can we remove the cypress timeout overrides introduced in the other PR?

Good idea; I'll do that. I've also been looking into introducing perf tests on the backend to identify these easier (will be a follow-up PR).

dpgaspar · 2021-05-14T09:15:04Z

+        return self.table.db_engine_spec
+
+    @property
+    def type_generic(self) -> Optional[utils.GenericDataType]:


consider renaming to get_generic_type gives an head warning that it will do some computation around it

This is added to the column payload in the dataset request to complement the existing type field, so I think we need to keep it as a property.

* perf: memoize db_engine_spec in sqla table classes * remove extended cypress timeouts

superset-github-bot Bot added the preset-io label May 14, 2021

pull-request-size Bot added the size/M label May 14, 2021

villebro force-pushed the villebro/memoize-db-engine-spec branch from 083d3b5 to 1ff8ab7 Compare May 14, 2021 07:46

villebro force-pushed the villebro/memoize-db-engine-spec branch from 1ff8ab7 to 5e28bb6 Compare May 14, 2021 08:08

villebro requested a review from dpgaspar May 14, 2021 08:14

perf: memoize db_engine_spec in sqla table classes

2f38ef5

villebro force-pushed the villebro/memoize-db-engine-spec branch from 5e28bb6 to 2f38ef5 Compare May 14, 2021 08:19

villebro changed the title ~~perf: memoize db_engine_spec in sqla table classes~~ perf: memoize db_engine_spec in database May 14, 2021

remove extended cypress timeouts

06f90f6

dpgaspar approved these changes May 14, 2021

View reviewed changes

villebro merged commit 97c9e37 into apache:master May 14, 2021

villebro deleted the villebro/memoize-db-engine-spec branch May 14, 2021 09:49

cccs-RyanS pushed a commit to CybercentreCanada/superset that referenced this pull request Dec 17, 2021

perf: memoize db_engine_spec in database (apache#14638)

cdd98f5

* perf: memoize db_engine_spec in sqla table classes * remove extended cypress timeouts

QAlexBall pushed a commit to QAlexBall/superset that referenced this pull request Dec 29, 2021

perf: memoize db_engine_spec in database (apache#14638)

8079bf3

* perf: memoize db_engine_spec in sqla table classes * remove extended cypress timeouts

cccs-rc pushed a commit to CybercentreCanada/superset that referenced this pull request Mar 6, 2024

perf: memoize db_engine_spec in database (apache#14638)

d2d9e9e

* perf: memoize db_engine_spec in sqla table classes * remove extended cypress timeouts

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.3.0 First shipped in 1.3.0 labels Mar 12, 2024

qfcwell pushed a commit to qfcwell/superset that referenced this pull request May 12, 2026

perf: memoize db_engine_spec in database (apache#14638)

5a0d45b

* perf: memoize db_engine_spec in sqla table classes * remove extended cypress timeouts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: memoize db_engine_spec in database#14638

perf: memoize db_engine_spec in database#14638
villebro merged 2 commits into
apache:masterfrom
preset-io:villebro/memoize-db-engine-spec

villebro commented May 14, 2021 •

edited

Loading

Uh oh!

codecov Bot commented May 14, 2021 •

edited

Loading

Uh oh!

kgabryje commented May 14, 2021

Uh oh!

villebro commented May 14, 2021

Uh oh!

dpgaspar May 14, 2021

Uh oh!

villebro May 14, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

villebro commented May 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

SUMMARY

BEFORE #14547 (pre-regression)

CURRENT (master)

AFTER

TEST PLAN

ADDITIONAL INFORMATION

Uh oh!

codecov Bot commented May 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kgabryje commented May 14, 2021

Uh oh!

villebro commented May 14, 2021

Uh oh!

dpgaspar May 14, 2021

Choose a reason for hiding this comment

Uh oh!

villebro May 14, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

villebro commented May 14, 2021 •

edited

Loading

codecov Bot commented May 14, 2021 •

edited

Loading