perf: memoize db_engine_spec in database#14638
Conversation
083d3b5 to
1ff8ab7
Compare
Codecov Report
@@ Coverage Diff @@
## master #14638 +/- ##
==========================================
- Coverage 77.47% 77.38% -0.10%
==========================================
Files 958 958
Lines 48486 48480 -6
Branches 5679 5683 +4
==========================================
- Hits 37565 37514 -51
- Misses 10721 10766 +45
Partials 200 200
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
1ff8ab7 to
5e28bb6
Compare
5e28bb6 to
2f38ef5
Compare
|
Can we remove the cypress timeout overrides introduced in the other PR? |
Good idea; I'll do that. I've also been looking into introducing perf tests on the backend to identify these easier (will be a follow-up PR). |
| return self.table.db_engine_spec | ||
|
|
||
| @property | ||
| def type_generic(self) -> Optional[utils.GenericDataType]: |
There was a problem hiding this comment.
consider renaming to get_generic_type gives an head warning that it will do some computation around it
There was a problem hiding this comment.
This is added to the column payload in the dataset request to complement the existing type field, so I think we need to keep it as a property.
* perf: memoize db_engine_spec in sqla table classes * remove extended cypress timeouts
* perf: memoize db_engine_spec in sqla table classes * remove extended cypress timeouts
* perf: memoize db_engine_spec in sqla table classes * remove extended cypress timeouts
* perf: memoize db_engine_spec in sqla table classes * remove extended cypress timeouts
SUMMARY
A recent PR #14547 introduced a performance regression causing dataset metadata fetching to become very slow for datasets with large numbers of columns. I originally thought the type regexes were the problem, but when researching the problem more closely it turns out that just referencing
self.table.database.db_engine_specin aTableColumninstance cost ~6ms on my local machine. Multiply that by 1000 columns ~= 6000 ms. To get around this I added memoization to the semi-expensive regex, but also added memoizing forDatabase.db_engine_spec. This should also speed up query rendering a bit, as there was similar logic there.BEFORE #14547 (pre-regression)
For the World Bank dataset (328 cols), fetching the data took slightly less than 180ms before on my local machine (including the unnecessary 20 ms redirect):

CURRENT (master)
For the same dataset, retrieval of data now takes ~10s!

AFTER
Retrieval is now slightly quicker than originally (including no redirect):

TEST PLAN
ADDITIONAL INFORMATION