-
Notifications
You must be signed in to change notification settings - Fork 4.2k
feat: [FC-0092] Optimize Course Info Blocks API #37122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: [FC-0092] Optimize Course Info Blocks API #37122
Conversation
|
Thanks for the pull request, @Serj-N! This repository is currently maintained by Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review. 🔘 Get product approvalIf you haven't already, check this list to see if your contribution needs to go through the product review process.
🔘 Provide contextTo help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:
🔘 Get a green buildIf one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green. DetailsWhere can I find more information?If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources: When can I expect my changes to be merged?Our goal is to get community contributions seen and reviewed as efficiently as possible. However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:
💡 As a result it may take up to several weeks or months to complete a review and merge your PR. |
|
|
||
| # store transformed blocks in the current request to be reused where possible for optimization | ||
| if current_request := get_current_request(): | ||
| setattr(current_request, "reusable_transformed_blocks", blocks) # pylint: disable=literal-used-as-attribute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use a RequestCache here, instead of setting this on the request object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ormsbee reminded me of a presentation he had given some time ago that's still relevant :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome. @Serj-N I notice there are some failing pylint checks, can you have a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I fixed the pylint checks. But now I see some failing tests, I'll take a look at those, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ormsbee Looks like a must-watch! Many thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to ask if you have checked that the performance numbers look the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I have: because of some trade-offs we had to make, the net gain in performance is less than before (compared to the original solution). I added a few details on that in the PR description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@e0d Hi! I finally got around to revising the current solution on this one to get a more tangible improvement. I updated the code as well as the description. Please take a look when you have a chance. Thanks!
6e889a8 to
ebb32f3
Compare
ebb32f3 to
9e0d736
Compare
8619d8a to
3ec750c
Compare
| block_types_filter=None, | ||
| hide_access_denials=False, | ||
| allow_start_dates_in_future=False, | ||
| for_blocks_view=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of this argument doesn't really express what it does -- to me at least. I also question the API design:
- Why wouldn't caching blocks be the default? Wouldn't all callers benefit?
- Could the caching code be abstracted from the body of the function so it's not inline with the business logic?
- I realize you are working around old code, but the future start date stuff feels like it could be the callers responsibility and that complexity could be removed from the getter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the naming, I guess we could come up something better: cache_with_future_dates seems to reflect the intent more clearly.
As for the other questions:
- Making this caching the default behavior would not benefit the other callers, because they either:
-
only call
get_blocks()once and never need to reuse the collected course structure; or -
do call
get_blocks()more than once, but they either pass the same arguments (benefiting from@request_cached) or they pass a differentuser(the cached structure has to be recollected from scratch).
-
Yes, this abstraction seems reasonable, and I think it can be nicely paired with the other refactoring you suggest - creating variables for cache key names.
-
The logic here was indeed influenced by the existing codebase in a lot of ways, that's for sure. And while some of the previous design choices might seem questionable, this particular decision seems to make sense: we want to get_blocks and we specify a filtering criteria - whether to include future start dates or not. Filtering after the fact in each caller would basically mean trying to reproduce what the existing transformers are already designed to do. This doesn't seem like a very clean approach, and the only reason we resort to it here is because of the constraints (in terms of scope and performance) of this particular api view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of this argument doesn't really express what it does -- to me at least.
Not only for you - I've also had this question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Renamed the flag to
cache_with_future_dates - Moved the logic for getting blocks from cache to
utils.py
lms/djangoapps/grades/course_data.py
Outdated
| # unfiltered, course blocks from RequestCache and thus reducing the number of times that | ||
| # the get_course_blocks function is called. | ||
|
|
||
| request_cache = RequestCache("unfiltered_course_structure") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the names are used multiple time, let's create a variable at the appropriate scope for these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, constants added to utils.py
86317fd to
25970a0
Compare
25970a0 to
f7bc539
Compare
e0d
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good with the last round of changes.
|
@ormsbee This has been revised since your original input. The current version looks good to me. There are some parts of the pre-existing API that I'm not crazy about, but I don't think that now is the time to fix them. |
ormsbee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor requests, but mostly questions to check my understanding of edge case behavior.
| # Store a copy of the transformed, but still unfiltered, course blocks in RequestCache to be reused | ||
| # wherever possible for optimization. Copying is required to make sure the cached structure is not mutated | ||
| # by the filtering below. | ||
| request_cache = RequestCache(UNFILTERED_STRUCTURE_CACHE_KEY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: The naming of this is confusing because UNFILTERED_STRUCTURE_CACHE_KEY implies that this is a cache key as part of a key-value pairing, but UNFILTERED_STRUCTURE_CACHE_KEY is really a namespace for this particular RequestCache as a whole. The namespaces just ensure that there's no chance of collision with other apps that need RequestCache functionality, so it would be more common to make the namespace be the module or app name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, changed it to COURSE_API_REQUEST_CACHE_NAMESPACE = "course_api"
| - also removes references to them from parents' 'children' lists | ||
| - removes 'start' key from all blocks if it wasn't requested | ||
| """ | ||
| from datetime import datetime, timezone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need for this to be a function-local import, is there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't indeed, moved the import to module level
| for block in course_blocks.values(): | ||
| children = block.get("children") | ||
| if children: | ||
| block["children"] = [cid for cid in children if cid not in to_remove] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to check my understanding: Is it the case that it's okay to do this simple child removal (and not go down into further descendants) because all the inheritance has already been pre-computed, and the start attribute has been denormalized and put on all the nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is correct: by this point, StartDateTransformer has traversed the structure and resolved the start dates, and BlockSerializer has put the start key on all the returned blocks.
| if start and start > now: | ||
| to_remove.add(block_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this function is ignoring days_early_for_beta for content beta testers. Is that accounted for somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, we are dealing with the start dates that have been computed by StartDateTransformer - which is where the logic concerning days_early_for_beta lives.
| # wherever possible for optimization. Copying is required to make sure the cached structure is not mutated | ||
| # by the filtering below. | ||
| request_cache = RequestCache(UNFILTERED_STRUCTURE_CACHE_KEY) | ||
| request_cache.set(REUSABLE_BLOCKS_CACHE_KEY, blocks.copy()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're doing a copy on the way in here, but we also have to worry about people mutating what they get back from get_cached_transformed_blocks(), right? Will that be a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a reasonable concern, but once again, we need to take into account whether this affects performance. And unfortunately it does: BlockStructureBlockData.copy() is a heavy operation, and calling it on cache retrieval will add several seconds to response time, negating the optimization we're looking for.
Luckily, in our current flow the retrieved structure is only used for reading, so there is no actual need to make a copy on the way out of the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's going to severely impact performance, then it's okay to have those kinds of side-effects, though we should put comments in the docstring of get_cached_transformed_blocks() that makes it clear that people should not mutate what they get back. This whole optimization is a trade-off between performance and long-term maintainability, and we will need to fix this at a more fundamental level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I added a word of caution to the function's docstring.
| """ | ||
| request_cache = RequestCache(COURSE_API_REQUEST_CACHE_NAMESPACE) | ||
| cached_response = request_cache.get_cached_response(REUSABLE_BLOCKS_CACHE_KEY) | ||
| reusable_transformed_blocks = cached_response.value.copy() if cached_response.is_found else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you added the copy() here because of my comment, it's okay to just return what you had before. Please just make sure to note it in the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, .copy() reverted back, docstring updated.
ormsbee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor request to add something to the docstring. You don't have to do the extra-defensive copy if it's going to have a negative impact on performance. I can merge this in the morning once those changes are done (or @e0d can if he gets to it before me). Thank you.
|
@ormsbee I implemented the requested changes (reverted |
The Course Info Blocks API endpoint has been known to be rather slow to return the response. Previous investigation showed that the major time sink was the get_course_blocks function, which is called three times in a single request. This commit aims to improve the response times by reducing the number of times that this function is called. Solution Summary The first time the function get_course_blocks is called, the result (transformed course blocks) is stored in the current WSGI request object. Later in the same request, before the second get_course_blocks call is triggered, the already transformed course blocks are taken from the request object, and if they are available, get_course_blocks is not called (if not, it is called as a fallback). Later in the request, the function is called again as before (see Optimization Strategy and Difficulties). Optimization Strategy and Difficulties The original idea was to fetch and transform the course blocks once and reuse them in all three cases, which would reduce get_course_blocks call count to 1. However, this did not turn out to be a viable solution because of the arguments passed to get_course_blocks. Notably, the allow_start_dates_in_future boolean flag affects the behavior of StartDateTransformer, which is a filtering transformer modifying the block structure returned. The first two times allow_start_dates_in_future is False, the third time it is True. Setting it to True in all three cases would mean that some blocks would be incorrectly included in the response. This left us with one option - optimize the first two calls. The difference between the first two calls is the non-filtering transformers, however the second call applies a subset of transformers from the first call, so it was safe to apply the superset of transformers in both cases. This allowed to reduce the number of function calls to 2. However, the cached structure may be further mutated by filters downstream, which means we need to cache a copy of the course structure (not the structure itself). The copy method itself is quite heavy (it calls deepcopy three times), making the benefits of this solution much less tangible. In fact, another potential optimization that was considered was to reuse the collected block structure (pre-transformation), but since calling copy on a collected structure proved to be more time-consuming than calling get_collected, this change was discarded, considering that the goal is to improve performance. Revised Solution To achieve a more tangible performance improvement, it was decided to modify the previous strategy as follows: * Pass a for_blocks_view parameter to the get_blocks function to make sure the new caching logic only affects the blocks view. * Collect and cache course blocks with future dates included. * Include start key in requested fields. * Reuse the cached blocks in the third call, which is in get_course_assignments * Before returning the response, filter out any blocks with a future start date, and also remove the start key if it was not in requested fields
The Course Info Blocks API endpoint has been known to be rather slow to return the response. Previous investigation showed that the major time sink was the get_course_blocks function, which is called three times in a single request. This commit aims to improve the response times by reducing the number of times that this function is called. Solution Summary The first time the function get_course_blocks is called, the result (transformed course blocks) is stored in the current WSGI request object. Later in the same request, before the second get_course_blocks call is triggered, the already transformed course blocks are taken from the request object, and if they are available, get_course_blocks is not called (if not, it is called as a fallback). Later in the request, the function is called again as before (see Optimization Strategy and Difficulties). Optimization Strategy and Difficulties The original idea was to fetch and transform the course blocks once and reuse them in all three cases, which would reduce get_course_blocks call count to 1. However, this did not turn out to be a viable solution because of the arguments passed to get_course_blocks. Notably, the allow_start_dates_in_future boolean flag affects the behavior of StartDateTransformer, which is a filtering transformer modifying the block structure returned. The first two times allow_start_dates_in_future is False, the third time it is True. Setting it to True in all three cases would mean that some blocks would be incorrectly included in the response. This left us with one option - optimize the first two calls. The difference between the first two calls is the non-filtering transformers, however the second call applies a subset of transformers from the first call, so it was safe to apply the superset of transformers in both cases. This allowed to reduce the number of function calls to 2. However, the cached structure may be further mutated by filters downstream, which means we need to cache a copy of the course structure (not the structure itself). The copy method itself is quite heavy (it calls deepcopy three times), making the benefits of this solution much less tangible. In fact, another potential optimization that was considered was to reuse the collected block structure (pre-transformation), but since calling copy on a collected structure proved to be more time-consuming than calling get_collected, this change was discarded, considering that the goal is to improve performance. Revised Solution To achieve a more tangible performance improvement, it was decided to modify the previous strategy as follows: * Pass a for_blocks_view parameter to the get_blocks function to make sure the new caching logic only affects the blocks view. * Collect and cache course blocks with future dates included. * Include start key in requested fields. * Reuse the cached blocks in the third call, which is in get_course_assignments * Before returning the response, filter out any blocks with a future start date, and also remove the start key if it was not in requested fields
The Course Info Blocks API endpoint has been known to be rather slow to return the response. Previous investigation showed that the major time sink was the get_course_blocks function, which is called three times in a single request. This commit aims to improve the response times by reducing the number of times that this function is called. Solution Summary The first time the function get_course_blocks is called, the result (transformed course blocks) is stored in the current WSGI request object. Later in the same request, before the second get_course_blocks call is triggered, the already transformed course blocks are taken from the request object, and if they are available, get_course_blocks is not called (if not, it is called as a fallback). Later in the request, the function is called again as before (see Optimization Strategy and Difficulties). Optimization Strategy and Difficulties The original idea was to fetch and transform the course blocks once and reuse them in all three cases, which would reduce get_course_blocks call count to 1. However, this did not turn out to be a viable solution because of the arguments passed to get_course_blocks. Notably, the allow_start_dates_in_future boolean flag affects the behavior of StartDateTransformer, which is a filtering transformer modifying the block structure returned. The first two times allow_start_dates_in_future is False, the third time it is True. Setting it to True in all three cases would mean that some blocks would be incorrectly included in the response. This left us with one option - optimize the first two calls. The difference between the first two calls is the non-filtering transformers, however the second call applies a subset of transformers from the first call, so it was safe to apply the superset of transformers in both cases. This allowed to reduce the number of function calls to 2. However, the cached structure may be further mutated by filters downstream, which means we need to cache a copy of the course structure (not the structure itself). The copy method itself is quite heavy (it calls deepcopy three times), making the benefits of this solution much less tangible. In fact, another potential optimization that was considered was to reuse the collected block structure (pre-transformation), but since calling copy on a collected structure proved to be more time-consuming than calling get_collected, this change was discarded, considering that the goal is to improve performance. Revised Solution To achieve a more tangible performance improvement, it was decided to modify the previous strategy as follows: * Pass a for_blocks_view parameter to the get_blocks function to make sure the new caching logic only affects the blocks view. * Collect and cache course blocks with future dates included. * Include start key in requested fields. * Reuse the cached blocks in the third call, which is in get_course_assignments * Before returning the response, filter out any blocks with a future start date, and also remove the start key if it was not in requested fields
This reverts commit 7cd4170.
|
FYI, we're encountering issues with persistent grades and we've opened #37661 to revert this to buy some time to figure out what's going on. |
… (openedx#37661) This reverts commit 7cd4170.
… (openedx#37661) This reverts commit 7cd4170.
PR Rationale
The Course Info Blocks API endpoint has been known to be rather slow to return the response. Previous investigation showed that the major time sink was the
get_course_blocksfunction, which is called three times in a single request. This PR aims to improve the response times by reducing the number of times that this function is called.Solution Summary
The first time the function
get_course_blocksis called, the result (transformed course blocks) is stored in the current WSGI request object. Later in the same request, before the secondget_course_blockscall is triggered, the already transformed course blocks are taken from the request object, and if they are available,get_course_blocksis not called (if not, it is called as a fallback). Later in the request, the function is called again as before (see Optimization Strategy and Difficulties).Optimization Strategy and Difficulties
The original idea was to fetch and transform the course blocks once and reuse them in all three cases, which would reduce
get_course_blockscall count to 1. However, this did not turn out to be a viable solution because of the arguments passed toget_course_blocks. Notably, theallow_start_dates_in_futureboolean flag affects the behavior ofStartDateTransformer, which is a filtering transformer modifying the block structure returned.The first two times
allow_start_dates_in_futureisFalse, the third time it isTrue. Setting it toTruein all three cases would mean that some blocks would be incorrectly included in the response.This left us with one option - optimize the first two calls. The difference between the first two calls is the non-filtering transformers, however the second call applies a subset of transformers from the first call, so it was safe to apply the superset of transformers in both cases. This allowed to reduce the number of function calls to 2. However, the cached structure may be further mutated by filters downstream, which means we need to cache a copy of the course structure (not the structure itself). The
copymethod itself is quite heavy (it callsdeepcopythree times), making the benefits of this solution much less tangible. In fact, another potential optimization that was considered was to reuse the collected block structure (pre-transformation), but since callingcopyon a collected structure proved to be more time-consuming than callingget_collected, this change was discarded, considering that the goal is to improve performance.UPD: Revised Solution
To achieve a more tangible performance improvement, it was decided to modify the previous strategy as follows:
get_blocksfunction to make sure the new caching logic only affects the blocks viewstartkey in requested fieldsget_course_assignmentsstartkey if it was not in requested fieldsResponse Times Measured
Here is a sample of the response times (in seconds).
The measurements were taken on a course with ~3000 blocks.
With the revised solution
Current time: ~12 seconds
New time: ~ 8 seconds
Improvement: 4 seconds (~30%)