chore(native-filters): Fetch only the required dataset fields#23303
Conversation
Codecov Report
@@ Coverage Diff @@
## master #23303 +/- ##
==========================================
+ Coverage 56.27% 65.93% +9.66%
==========================================
Files 1907 1907
Lines 73495 73590 +95
Branches 7977 7982 +5
==========================================
+ Hits 41356 48524 +7168
+ Misses 30091 23018 -7073
Partials 2048 2048
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 341 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
31a91aa to
1b53e5d
Compare
|
Closing (for now) in favor of #23314. |
419f484 to
cce1165
Compare
b602228 to
01ffa3b
Compare
01ffa3b to
8a64476
Compare
michael-s-molina
left a comment
There was a problem hiding this comment.
Thank you for the PR @john-bodley. I left some comments.
| if (datasetId != null) { | ||
| cachedSupersetGet({ | ||
| endpoint: `/api/v1/dataset/${datasetId}`, | ||
| endpoint: `/api/v1/dataset/${datasetId}?q=${rison.encode({ |
There was a problem hiding this comment.
The columns state is later used to get currentColumn which is then passed to the filterValues function. If you check filterValues references, you will see that is_dttm and type_generic attributes are also used.
| cachedSupersetGet({ | ||
| endpoint: `/api/v1/dataset/${datasetId}`, | ||
| endpoint: `/api/v1/dataset/${datasetId}?q=${rison.encode({ | ||
| columns: [ |
There was a problem hiding this comment.
I found other attributes that need to be queried when analyzing the following files:
database.id
datasource_name
schema
is_sqllab_view
columns.type
columns.is_dttm
columns.name
main_dttm_col
results
sql
There was a problem hiding this comment.
@michael-s-molina it seem like results isn't an attribute returned per here.
There was a problem hiding this comment.
@john-bodley You're right. Given that results is not listed in show_select_columns, I'm assuming that it's populated elsewhere and you projections won't affect this behavior. Tagging @zhaoyongjie and @villebro in case they know something different.
There was a problem hiding this comment.
Agreed, I think this is needed in the Explore control panel only, hence shouldn't affect dashboards/native filters.
8a64476 to
61dd061
Compare
|
@villebro @zhaoyongjie Can you double check if there are no missing attributes or invalid projection names? |
villebro
left a comment
There was a problem hiding this comment.
Tested to work as expected and LGTM after @michael-s-molina 's database_name comment has been addressed.
| 'columns.is_dttm', | ||
| 'columns.type_generic', |
There was a problem hiding this comment.
I'm not 100 % sure on this (more like 90 %), but I feel like is_dttm is no longer necessary, and type_generic === GenericDataType.TEMPORAL should always be true for any col that has is_dttm === true, and is the preferred way of checking for temporal columns in the frontend. So if we want to reduce payload size, this is kind of a low hanging fruit.
There was a problem hiding this comment.
@villebro thanks for the review. If your hypothesis is correct then it seems like a follow up PR would be to deprecate the is_dttm column everywhere.
| cachedSupersetGet({ | ||
| endpoint: `/api/v1/dataset/${datasetId}`, | ||
| endpoint: `/api/v1/dataset/${datasetId}?q=${rison.encode({ | ||
| columns: [ |
There was a problem hiding this comment.
Agreed, I think this is needed in the Explore control panel only, hence shouldn't affect dashboards/native filters.
…rsConfigModal/FiltersConfigForm/FiltersConfigForm.tsx Co-authored-by: Michael S. Molina <70410625+michael-s-molina@users.noreply.github.com>
| 'columns.verbose_name', | ||
| 'database.id', | ||
| 'database_name', | ||
| 'database.database_name', |
There was a problem hiding this comment.
It's somewhat weird how FAB doesn't barf when you give it invalid columns.
…#23303) Co-authored-by: Michael S. Molina <70410625+michael-s-molina@users.noreply.github.com>
SUMMARY
This PR partially addresses recommendation #1 in #14383 by requesting—via the FAB API—only the required fields (columns) from the
/api/v1/dataset/{pk}RESTful GET endpoint.Fetching the entirety of dataset (especially a very large datasets) can be rather expensive which. #22413 made significant performance improvements (reducing the response time from > 300 seconds to ~ 5 seconds) however later #23113 was introduced to address some performance tradeoffs (in other contexts) by disabling eager loading (increasing the response time from ~ 5 seconds to ~ 30 seconds). The bulk of this increased cost was from the repeated lazy loading of the back referenced table i.e.,
self.table, for every dataset column (even though this is always the same table; possibly a SQLAlchemy quirk) for a subset of theTableColumnfields including the advanced data type.This PR updates—in the context of the native filters modal—the
/api/v1/dataset/{pk}requests to fetch only the subset of columns required in the response. This helps to ensure the we're:For the dataset in question said change reduced the response time from ~ 30 seconds to ~ 3 seconds which significantly improves the UX when interfacing with the modal.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
CI.
ADDITIONAL INFORMATION