Skip to content

perf: speed up uuid column generation#11209

Merged
ktmud merged 5 commits into
apache:masterfrom
ktmud:faster-uuid-migration
Oct 13, 2020
Merged

perf: speed up uuid column generation#11209
ktmud merged 5 commits into
apache:masterfrom
ktmud:faster-uuid-migration

Conversation

@ktmud
Copy link
Copy Markdown
Member

@ktmud ktmud commented Oct 9, 2020

SUMMARY

We have tens of thousands of dashboards, more than 200k slices, and 1.3 million table columns in our Superset deployment. It takes forever to run the db migration for #11098 . This PR tries to speed up the migration process by

  1. adding pagination to db queries, and
  2. using database built-in SQL functions to generate the UUIDs when possible.

I also tried to utilize ThreadPoolExecutor to parallelize the db operations and Python uuid generation, but it doesn't seem to help much.

Also fixes certain API errors while downgrading in MySQL and added the option to adjust batch size from command line.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Snip20201009_4

TEST PLAN

Make a copy of your very large database, then try:

superset db downgrade
superset db upgrade

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

@ktmud ktmud requested a review from betodealmeida October 9, 2020 05:51
@ktmud ktmud force-pushed the faster-uuid-migration branch from 5c1155b to dcd4c79 Compare October 9, 2020 05:58
@codecov-io
Copy link
Copy Markdown

codecov-io commented Oct 9, 2020

Codecov Report

Merging #11209 into master will decrease coverage by 4.17%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #11209      +/-   ##
==========================================
- Coverage   65.65%   61.47%   -4.18%     
==========================================
  Files         829      829              
  Lines       39213    39244      +31     
  Branches     3593     3593              
==========================================
- Hits        25744    24126    -1618     
- Misses      13357    14937    +1580     
- Partials      112      181      +69     
Flag Coverage Δ
#cypress ?
#javascript 62.38% <ø> (ø)
#python 60.93% <0.00%> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ns/b56500de1855_add_uuid_column_to_import_mixin.py 0.00% <0.00%> (ø)
superset-frontend/src/SqlLab/App.jsx 0.00% <0.00%> (-100.00%) ⬇️
superset-frontend/src/explore/App.jsx 0.00% <0.00%> (-100.00%) ⬇️
superset-frontend/src/dashboard/App.jsx 0.00% <0.00%> (-100.00%) ⬇️
superset-frontend/src/explore/index.jsx 0.00% <0.00%> (-100.00%) ⬇️
superset-frontend/src/dashboard/index.jsx 0.00% <0.00%> (-100.00%) ⬇️
superset-frontend/src/setup/setupColors.js 0.00% <0.00%> (-100.00%) ⬇️
superset-frontend/src/chart/ChartContainer.jsx 0.00% <0.00%> (-100.00%) ⬇️
superset-frontend/src/setup/setupFormatters.js 0.00% <0.00%> (-100.00%) ⬇️
superset-frontend/src/explore/reducers/index.js 0.00% <0.00%> (-100.00%) ⬇️
... and 166 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a6fc3d2...a2a86e2. Read the comment docs.

@ktmud ktmud force-pushed the faster-uuid-migration branch from 50428bc to d76af43 Compare October 9, 2020 06:13
Copy link
Copy Markdown
Member Author

@ktmud ktmud Oct 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the bigger the better. I tested many different numbers (100, 200, 500, 1000, 2000) and 200 seems to be working the best (but obviously this will depend on the machine so providing a way to override it via env variables).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MySQL will throw an error if type_ is not specified.

@ktmud ktmud force-pushed the faster-uuid-migration branch 4 times, most recently from c63c1af to 96e3637 Compare October 9, 2020 20:40
@pull-request-size pull-request-size Bot added size/L and removed size/M labels Oct 9, 2020
@ktmud ktmud changed the title perf: add db pagination to speed up uuid column migration perf: speed up uuid column generation Oct 9, 2020
@ktmud ktmud force-pushed the faster-uuid-migration branch 6 times, most recently from 5090b41 to 9162f28 Compare October 9, 2020 23:55
@ktmud ktmud mentioned this pull request Oct 10, 2020
6 tasks
@ktmud ktmud force-pushed the faster-uuid-migration branch from 9162f28 to a2a86e2 Compare October 10, 2020 00:21
Copy link
Copy Markdown
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, Jessie! Thanks for improving the perf and making the migration more resilient!

@ktmud ktmud merged commit d7eb1d4 into apache:master Oct 13, 2020
@bkyryliuk
Copy link
Copy Markdown
Member

got a bit too late to this pr, thanks @ktmud for the fix!

@ktmud ktmud deleted the faster-uuid-migration branch October 13, 2020 20:31
auxten pushed a commit to auxten/incubator-superset that referenced this pull request Nov 20, 2020
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.0.0 First shipped in 1.0.0 labels Mar 12, 2024
qfcwell pushed a commit to qfcwell/superset that referenced this pull request May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 1.0.0 First shipped in 1.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants