Fetch charts with GET to benefit from browser cache and conditional requests#7032
Conversation
* Exclude venv for python linter to ignore * Fix NaN error
This PR sets the background-color css property on `.ace_scroller` instead of `.ace_content` to prevent the white background shown during resizing of the SQL editor before drag ends.
|
you sure GET request can handle why not use Redis cache query results, in aribnb the round trip to fetch data from Redis is only 600ms~800ms. i think we should use etag for dashboard metadata (like dashboard layout, huge json blob...) |
Thanks, I'm aware of that problem. Here the
The decorator is using Redis for the server-side caching, but it's caching the HTTP response instead of the dataframe (saving time that's spent in serialization). But I'm also using the native browser cache (through the
Everywhere! :) |
|
|
Ah, you're right. The filter box works when changed, since it does a
Yes, but it has only the |
|
@graceguo-supercat I tested the interaction with filter boxes and it's not working. I'll work on fixing it. |
|
@betodealmeida this seems pretty complicated on top of data requests that are already complicated My first question to gauge whether it's worthwhile is: what are the speedup times you are seeing for existing approach vs your new approach? (ideally for multiple dashboards of varying size) |
|
@williaster the speedup will greatly depend on how often the data changes, how big the payload is (bignum vs deck.gl varies significantly), the duration of the cache, and how slow the network is. I can (and have) run tests against the example dashboards, but I don't think they would be significative, since they don't cover all the real life use cases. I think the question we should ask here is: "given that this is clearly an improvement, how can we make it bug free?" |
|
@betodealmeida just to clarify step 4 (and per your diagram) you have:
yet in the code it still seems like we're caching the result set from the database and thus I wonder if the diagram and thus After phase should mention that there's an additional entry in the server cache, i.e., after step (4) the server-side cache would contain:
Note I'm not saying this is wrong as I strongly believe that the database response should be cache given that represents the bulk of the compute, I just wanted to get clarity on the logic. |
|
@betodealmeida this is a big change that has the potential to impact many many users. If you're unable to provide numbers that indicate that this is strictly an improvement (or at a minimum no regressions), I'm a bit reluctant to introduce this additional complexity. I think it's part of the expected work of a feature like this to demonstrate the effects with real-life examples. It seems like you should be able to use real Lyft dashboards, dashboards from the "example datasets", etc., and you can throttle network speed using dev tools. Another concern I have is the impact on the # of requests. We've needed to introduce domain sharding because of the large number of simultaneous requests made by larger dashboards, and this potential DOUBLES that number, so I would want to see perf numbers that demonstrate no regressions for that case as well. |
|
Not directly related, but on the topic of optimization around caching, it would be an extra win to make the caching call |
|
@mistercrunch |
|
@williaster I'm thinking about something else: on the server side, in python, making the |
Sure, I will give the numbers of the example dashboards and some of the Lyft dashboards. My point was that, considering that this is a strict improvement, defining a threshold to accept the changes seems arbitrary to me. Of course I agree that if this causes a regression or no significant improvement we should not do it. Maybe it's not clear, but with this PR the client will always issue a smaller number of requests, and a percentage of those requests will receive body-less responses. Combined with the fact that we move the server cache closer to the user, there shouldn't be any regressions with this PR, unless I'm doing something stupid (which I've done in the past). Also, keep in mind that this PR is against the
Sorry, why do you think this would double the number of requests? The number of requests should be strictly equal or smaller: resources within the lifetime of the cache are no longer requested because the browser reads directly from its cache, and conditional requests are still a single request. The server will either return a normal response ( |
|
@john-bodley not sure if I understand your question. Currently we cache the dataframe in
Eventually we can remove the dataframe caching, but I'd rather do that in a separate PR. |
|
@graceguo-supercat I changed the code so that in the
This way the |
* added more functionalities for query context and object. * fixed cache logic * added default value for groupby * updated comments and removed print (cherry picked from commit d5b9795)
(cherry picked from commit 6a4d507)
…esh (apache#7027) (cherry picked from commit cc58f0e)
|
@betodealmeida my point was that I felt that the picture and description doesn't accurately reflect what your code is actually doing, i.e., that there's actually two server-side cache keys associated with each chart. They're both using the same underlying cache but the diagram and steps don't accurately reflect this. I agree with your logic but thought maybe the steps should be more explicit, e.g., After
|
|
@john-bodley you're right. I ended up describing in "after" the workflow once we remove the dataframe cache. |
|
thanks for the benchmarks! will let @john-bodley sign off since he had the last requested change. |
|
I was working on this tricky bug and ended up here as the cause for it. The issue is around the fact that the The bug or symptom is the "Genders by State" example started showing an extra (3rd) metric ( There's a lot going on related to this, but a key point here is that we have control-related logic like
Now another thing that compounds here is that
Now it's pretty clear that addressing any of these 3 things would fix my symptom, but point 2 on its own is worrisome. It can lead to intricate issues over time. Say I save a chart, and after that I add a new control to that vizType, in the context of the Good news is I'm working on a refactor #7350 to help and clean up all of the control / formData processing logic. It grew out of control over time. Now the assumption is that all request would make it through this logic, and I'm realizing that's not the case, at least since this PR. Ideas? |
| @wraps(f) | ||
| def wrapper(*args, **kwargs): | ||
| # check if the user can access the resource | ||
| check_perms(*args, **kwargs) |
There was a problem hiding this comment.
I was digging around to try and figure out where the datasource access permission is done nowadays, and found it here in the etag_cache decorator. I feel like it's not the right place for it.
I understand that this needs to happen prior to reading from cache, but maybe it should be done as a prior decorator, or maybe both of these routines should be done inside a method instead of decorators, to avoid calling get_viz twice.
| form_data, slc = get_form_data(slice_id, use_slice_data=True) | ||
| datasource_type = slc.datasource.type | ||
| datasource_id = slc.datasource.id | ||
| viz_obj = get_viz( |
There was a problem hiding this comment.
I think get_viz gets called at least two times now (here and in the view itself)
|
Found some other issues here that I wanted to raise ^^^ Also I noticed that the big "merge" on In the future, |
|
@mistercrunch agreed, and after that |
…equests (apache#7032) * Sparkline dates aren't formatting in Time Series Table (apache#6976) * Exclude venv for python linter to ignore * Fix NaN error * Fix the white background shown in SQL editor on drag (apache#7021) This PR sets the background-color css property on `.ace_scroller` instead of `.ace_content` to prevent the white background shown during resizing of the SQL editor before drag ends. * Show tooltip with time frame (apache#6979) * Fix time filter control (apache#6978) * Enhancement of query context and object. (apache#6962) * added more functionalities for query context and object. * fixed cache logic * added default value for groupby * updated comments and removed print (cherry picked from commit 326708e) * [fix] /superset/slice/id url is too long (apache#6989) (cherry picked from commit e0fea40) * [WIP] fix user specified JSON metadata not updating dashboard on refresh (apache#7027) (cherry picked from commit 77d114a) * feat: add ability to change font size in big number (apache#7003) * Add ability to change font sizes in Big Number * rename big number to header * Add comment to clarify font size values * Allow LIMIT to be specified in parameters (apache#7052) * [fix] Cursor jumping when editing chart and dashboard titles (apache#7038) (cherry picked from commit c697955) * Changing time table viz to pass formatTime a date (apache#7020) (cherry picked from commit ada7626) * [db-engine-spec] Aligning Hive/Presto partition logic (apache#7007) (cherry picked from commit fafff47) * [fix] explore chart from dashboard missed slice title (apache#7046) (cherry picked from commit 908c608) * fix inaccurate data calculation with adata rolling and contribution (apache#7035) (cherry picked from commit 7a30ad4) * Adding warning message for sqllab save query (apache#7028) (cherry picked from commit ea2ea16) * [datasource] Ensuring consistent behavior of datasource editing/saving. (apache#7037) * Update datasource.py * Update datasource.py (cherry picked from commit 65a6e40) * [csv-upload] Fixing message encoding (apache#6971) (cherry picked from commit 574e213) * [sql-parse] Fixing LIMIT exceptions (apache#6963) (cherry picked from commit bf90829) * Adding custom control overrides (apache#6956) * Adding extraOverrides to line chart * Updating extraOverrides to fit with more cases * Moving extraOverrides to index.js * Removing webpack-merge in package.json * Fixing metrics control clearing metric (cherry picked from commit eb603c7) * [sqlparse] Fixing table name extraction for ill-defined query (apache#7029) (cherry picked from commit d0d9cba) * [missing values] Removing replacing missing values (apache#4905) (cherry picked from commit ed93b4c) * [SQL Lab] Improved query and results tabs rendering reliability (apache#7082) closes apache#7080 (cherry picked from commit 7ee4b18) * Fix filter_box migration PR apache#6523 (apache#7066) * Fix filter_box migration PR apache#6523 * Fix druid-related bug (cherry picked from commit 7063e6c) * SQL editor layout makeover (apache#7102) This PR includes the following layout and css tweaks: - Using flex to layout the north and south sub panes of query pane so resizing works properly in both Chrome and Firefox - Removal of necessary wrapper divs and tweaking of css in sql lab so we can scroll to the bottom of both the table list and the results pane - Make sql lab's content not overflow vertically and layout the query result area to eliminate double scroll bars - css tweaks on the basic.html page so the loading animation appears in the center of the page across the board (cherry picked from commit 62c1a8d) * [forms] Fix handling of NULLs (cherry picked from commit e83a07d) * handle null column_name in sqla and druid models (cherry picked from commit 2ff721a) * Use metric name instead of metric in filter box (apache#7106) (cherry picked from commit 542125b) * Bump python lib croniter to an existing version (apache#7132) Package maintainers should really never delete packages, but it appears this happened with croniter and resulted in breaking our builds. This PR bumps to a more recent existing version of the library (cherry picked from commit d7b90c4) * Revert PR apache#6933 (apache#7162) * Add decorator for etag cache * Fetch charts with GET * Small fixes * Fix typo * Compute correct cache key; fix logging * Check perms on cached response * Revert change * If perms fail, return naked response * Fix lint * Compute cache key from all form data * Pass extra_filters in GET request * Fix pylint * Fix flake8 * Use ETags even if no cache is set * Handle adhoc filters * Raise in debug mode * Rename actions * Fix integration tests * Do POST request on new charts * Set extra/adhoc filters only in GET requests * Raise if check_perms fails * Refactor auth * Fix flake8 * Fix js unit tests * Fix js unit tests that fail in lyftga * Fix js * Sparkline dates aren't formatting in Time Series Table (apache#6976) * Exclude venv for python linter to ignore * Fix NaN error * Changing time table viz to pass formatTime a date (apache#7020) (cherry picked from commit ada7626) * SQL editor layout makeover (apache#7102) This PR includes the following layout and css tweaks: - Using flex to layout the north and south sub panes of query pane so resizing works properly in both Chrome and Firefox - Removal of necessary wrapper divs and tweaking of css in sql lab so we can scroll to the bottom of both the table list and the results pane - Make sql lab's content not overflow vertically and layout the query result area to eliminate double scroll bars - css tweaks on the basic.html page so the loading animation appears in the center of the page across the board (cherry picked from commit 62c1a8d) * Add decorator for etag cache * Fetch charts with GET * Small fixes * Fix typo * Compute correct cache key; fix logging * Check perms on cached response * Revert change * If perms fail, return naked response * Fix lint * Compute cache key from all form data * Pass extra_filters in GET request * Fix pylint * Fix flake8 * Use ETags even if no cache is set * Handle adhoc filters * Raise in debug mode * Rename actions * Fix integration tests * Do POST request on new charts * Set extra/adhoc filters only in GET requests * Raise if check_perms fails * Refactor auth * Fix flake8 * Fix js unit tests * Fix js unit tests that fail in lyftga * Fix js * Fix bad merge * Use far future when max_age=0
This is a small PR that does a lot. It changes the initial request for charts (in explore or dashboards) to be done through a
GETrequest, greatly improving the loading speed of dashboards. It also moves the caching to the HTTP layer, allowing us to benefit fromExpiresandETagheaders for conditional requests.The problem
This diagram compares the current flow ("before") with the one implemented by this PR ("after"):
Before
Let's assume Superset is configured with a 1 hour cache, and also that the data changes on a longer period (daily, eg):
POSTrequest with the payload.There are a few inefficiencies here:
POSTrequests.After
GETrequest with the chart id.Expiresheader of 1 hour, and anETagheader which is a hash of the payload.SupersetClientcaches it also in the Cache interface.Expiresheader and the use ofGETthe data is read directly from the native browser cache.Expiresis now in the past.SupersetClientlooks for a cached response in the Cache interface, and if one is found, extracts itsETag.If-None-Matchheader, containing the hash of the cached response (itsETag).ETagmatches theIf-None-Matchheader, returning a304 Not Modifiedresponse.Notes
The
GETrequest is done only the first time the chart is mounted. Forcing refresh on dashboards and clicking "Run Query" in the Explore views performPOSTrequests, which bypass the cache, and cache the new response. I tested the Explore view and dashboards with filters, and all further interactions are done withPOSTs.Since we're caching the HTTP response, we need to verify that the user has permission to read the cached response. This is done by passing a
check_permsfunction to the decorator that caches the responses.The fetch API has no support for conditional responses with ETags. We need to add explicit support in
SupersetClient. I have a separate PR for that (see feat: add support for conditional requests apache-superset/superset-ui#119).There is one small downside to this approach. During the time while
Expiresis still valid, the browser will not perform any requests for cached charts unless the user explicitly refreshes a dashboard or click "Run Query" in the Explore view. If the data is bad, they will see bad data until it expires or they purposefully refresh the chart. In the current workflow, in theory we can purge the cache in this case, since it lives only on the server-side. This is a hypothetical scenario, and we could workaround it by sending a notification to dashboards that one or more charts have bad data and should be refreshed.