feat: [FC-0092] Optimize Course Info Blocks API #37122

Serj-N · 2025-08-05T07:39:28Z

PR Rationale

The Course Info Blocks API endpoint has been known to be rather slow to return the response. Previous investigation showed that the major time sink was the get_course_blocks function, which is called three times in a single request. This PR aims to improve the response times by reducing the number of times that this function is called.

Solution Summary

The first time the function get_course_blocks is called, the result (transformed course blocks) is stored in the current WSGI request object. Later in the same request, before the second get_course_blocks call is triggered, the already transformed course blocks are taken from the request object, and if they are available, get_course_blocks is not called (if not, it is called as a fallback). Later in the request, the function is called again as before (see Optimization Strategy and Difficulties).

Optimization Strategy and Difficulties

The original idea was to fetch and transform the course blocks once and reuse them in all three cases, which would reduce get_course_blocks call count to 1. However, this did not turn out to be a viable solution because of the arguments passed to get_course_blocks. Notably, the allow_start_dates_in_future boolean flag affects the behavior of StartDateTransformer, which is a filtering transformer modifying the block structure returned.
The first two times allow_start_dates_in_future is False, the third time it is True. Setting it to True in all three cases would mean that some blocks would be incorrectly included in the response.

This left us with one option - optimize the first two calls. The difference between the first two calls is the non-filtering transformers, however the second call applies a subset of transformers from the first call, so it was safe to apply the superset of transformers in both cases. This allowed to reduce the number of function calls to 2. However, the cached structure may be further mutated by filters downstream, which means we need to cache a copy of the course structure (not the structure itself). The copy method itself is quite heavy (it calls deepcopy three times), making the benefits of this solution much less tangible. In fact, another potential optimization that was considered was to reuse the collected block structure (pre-transformation), but since calling copy on a collected structure proved to be more time-consuming than calling get_collected, this change was discarded, considering that the goal is to improve performance.

UPD: Revised Solution

To achieve a more tangible performance improvement, it was decided to modify the previous strategy as follows:

Pass a for_blocks_view parameter to the get_blocks function to make sure the new caching logic only affects the blocks view
Collect and cache course blocks with future dates included
Include start key in requested fields
Reuse the cached blocks in the third call, which is in get_course_assignments
Before returning the response, filter out any blocks with a future start date, and also remove the start key if it was not in requested fields

Response Times Measured

Here is a sample of the response times (in seconds).
The measurements were taken on a course with ~3000 blocks.

With the revised solution
Current time: ~12 seconds
New time: ~ 8 seconds
Improvement: 4 seconds (~30%)

get_course_blocks_highlighted_big_course

openedx-webhooks · 2025-08-05T07:39:32Z

Thanks for the pull request, @Serj-N!

This repository is currently maintained by @openedx/wg-maintenance-edx-platform.

Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review.

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
- This process (including the steps you'll need to take) is documented here.
If it doesn't, simply proceed with the next step.

🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

Dependencies

This PR must be merged before / after / at the same time as ...
Blockers

This PR is waiting for OEP-1234 to be accepted.
Timeline information

This PR must be merged by XX date because ...
Partner information

This is for a course on edx.org.
Supporting documentation
Relevant Open edX discussion forum threads

🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.

Details

Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

The size and impact of the changes that it introduces
The need for product review
Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

ormsbee · 2025-08-11T18:36:42Z

lms/djangoapps/course_api/blocks/api.py


+    # store transformed blocks in the current request to be reused where possible for optimization
+    if current_request := get_current_request():
+        setattr(current_request, "reusable_transformed_blocks", blocks)  # pylint: disable=literal-used-as-attribute


Please use a RequestCache here, instead of setting this on the request object.

@ormsbee reminded me of a presentation he had given some time ago that's still relevant :)

@ormsbee @e0d Thanks for the tip and the chart, very useful!
Change implemented.

Awesome. @Serj-N I notice there are some failing pylint checks, can you have a look?

Sure, I fixed the pylint checks. But now I see some failing tests, I'll take a look at those, too.

@ormsbee Looks like a must-watch! Many thanks!

@e0d @ormsbee I revised the solution slightly, the tests pass now. A few changes were made in the PR description, as well.

I meant to ask if you have checked that the performance numbers look the same.

Indeed, I have: because of some trade-offs we had to make, the net gain in performance is less than before (compared to the original solution). I added a few details on that in the PR description.

@e0d Hi! I finally got around to revising the current solution on this one to get a more tangible improvement. I updated the code as well as the description. Please take a look when you have a chance. Thanks!

…gnments

e0d · 2025-10-14T16:30:55Z

lms/djangoapps/course_api/blocks/api.py

        block_types_filter=None,
        hide_access_denials=False,
        allow_start_dates_in_future=False,
+        for_blocks_view=False,


The name of this argument doesn't really express what it does -- to me at least. I also question the API design:

Why wouldn't caching blocks be the default? Wouldn't all callers benefit?

Could the caching code be abstracted from the body of the function so it's not inline with the business logic?

I realize you are working around old code, but the future start date stuff feels like it could be the callers responsibility and that complexity could be removed from the getter.

As for the naming, I guess we could come up something better: cache_with_future_dates seems to reflect the intent more clearly.

As for the other questions:

Making this caching the default behavior would not benefit the other callers, because they either:

only call get_blocks() once and never need to reuse the collected course structure; or

do call get_blocks() more than once, but they either pass the same arguments (benefiting from @request_cached) or they pass a different user (the cached structure has to be recollected from scratch).

Yes, this abstraction seems reasonable, and I think it can be nicely paired with the other refactoring you suggest - creating variables for cache key names.

The logic here was indeed influenced by the existing codebase in a lot of ways, that's for sure. And while some of the previous design choices might seem questionable, this particular decision seems to make sense: we want to get_blocks and we specify a filtering criteria - whether to include future start dates or not. Filtering after the fact in each caller would basically mean trying to reproduce what the existing transformers are already designed to do. This doesn't seem like a very clean approach, and the only reason we resort to it here is because of the constraints (in terms of scope and performance) of this particular api view.

The name of this argument doesn't really express what it does -- to me at least.

Not only for you - I've also had this question.

Renamed the flag to cache_with_future_dates

Moved the logic for getting blocks from cache to utils.py

e0d · 2025-10-14T16:46:08Z

lms/djangoapps/grades/course_data.py

+            # unfiltered, course blocks from RequestCache and thus reducing the number of times that
+            # the get_course_blocks function is called.
+
+            request_cache = RequestCache("unfiltered_course_structure")


As the names are used multiple time, let's create a variable at the appropriate scope for these.

Done, constants added to utils.py

e0d

Looks good with the last round of changes.

e0d · 2025-10-21T01:58:38Z

@ormsbee This has been revised since your original input. The current version looks good to me. There are some parts of the pre-existing API that I'm not crazy about, but I don't think that now is the time to fix them.

ormsbee

Some minor requests, but mostly questions to check my understanding of edge case behavior.

ormsbee · 2025-10-26T16:00:56Z

lms/djangoapps/course_api/blocks/api.py

+        # Store a copy of the transformed, but still unfiltered, course blocks in RequestCache to be reused
+        # wherever possible for optimization. Copying is required to make sure the cached structure is not mutated
+        # by the filtering below.
+        request_cache = RequestCache(UNFILTERED_STRUCTURE_CACHE_KEY)


Nit: The naming of this is confusing because UNFILTERED_STRUCTURE_CACHE_KEY implies that this is a cache key as part of a key-value pairing, but UNFILTERED_STRUCTURE_CACHE_KEY is really a namespace for this particular RequestCache as a whole. The namespaces just ensure that there's no chance of collision with other apps that need RequestCache functionality, so it would be more common to make the namespace be the module or app name.

Good point, changed it to COURSE_API_REQUEST_CACHE_NAMESPACE = "course_api"

ormsbee · 2025-10-26T16:10:54Z

lms/djangoapps/course_api/blocks/views.py

+        - also removes references to them from parents' 'children' lists
+        - removes 'start' key from all blocks if it wasn't requested
+        """
+        from datetime import datetime, timezone


There's no need for this to be a function-local import, is there?

There isn't indeed, moved the import to module level

ormsbee · 2025-10-26T16:16:17Z

lms/djangoapps/course_api/blocks/views.py

+        for block in course_blocks.values():
+            children = block.get("children")
+            if children:
+                block["children"] = [cid for cid in children if cid not in to_remove]


Just to check my understanding: Is it the case that it's okay to do this simple child removal (and not go down into further descendants) because all the inheritance has already been pre-computed, and the start attribute has been denormalized and put on all the nodes?

Yes, this is correct: by this point, StartDateTransformer has traversed the structure and resolved the start dates, and BlockSerializer has put the start key on all the returned blocks.

ormsbee · 2025-10-26T16:16:53Z

lms/djangoapps/course_api/blocks/views.py

+            if start and start > now:
+                to_remove.add(block_id)


It seems like this function is ignoring days_early_for_beta for content beta testers. Is that accounted for somewhere else?

At this point, we are dealing with the start dates that have been computed by StartDateTransformer - which is where the logic concerning days_early_for_beta lives.

ormsbee · 2025-10-26T16:18:33Z

lms/djangoapps/course_api/blocks/api.py

+        # wherever possible for optimization. Copying is required to make sure the cached structure is not mutated
+        # by the filtering below.
+        request_cache = RequestCache(UNFILTERED_STRUCTURE_CACHE_KEY)
+        request_cache.set(REUSABLE_BLOCKS_CACHE_KEY, blocks.copy())


We're doing a copy on the way in here, but we also have to worry about people mutating what they get back from get_cached_transformed_blocks(), right? Will that be a problem?

That is a reasonable concern, but once again, we need to take into account whether this affects performance. And unfortunately it does: BlockStructureBlockData.copy() is a heavy operation, and calling it on cache retrieval will add several seconds to response time, negating the optimization we're looking for.
Luckily, in our current flow the retrieved structure is only used for reading, so there is no actual need to make a copy on the way out of the cache.

If it's going to severely impact performance, then it's okay to have those kinds of side-effects, though we should put comments in the docstring of get_cached_transformed_blocks() that makes it clear that people should not mutate what they get back. This whole optimization is a trade-off between performance and long-term maintainability, and we will need to fix this at a more fundamental level.

Yes, I added a word of caution to the function's docstring.

ormsbee · 2025-10-29T20:12:19Z

lms/djangoapps/course_api/blocks/utils.py

+    """
+    request_cache = RequestCache(COURSE_API_REQUEST_CACHE_NAMESPACE)
+    cached_response = request_cache.get_cached_response(REUSABLE_BLOCKS_CACHE_KEY)
+    reusable_transformed_blocks = cached_response.value.copy() if cached_response.is_found else None


If you added the copy() here because of my comment, it's okay to just return what you had before. Please just make sure to note it in the docstring.

Done, .copy() reverted back, docstring updated.

ormsbee

Minor request to add something to the docstring. You don't have to do the extra-defensive copy if it's going to have a negative impact on performance. I can merge this in the morning once those changes are done (or @e0d can if he gets to it before me). Thank you.

Serj-N · 2025-10-30T07:29:36Z

@ormsbee I implemented the requested changes (reverted .copy() and updated docstring). Thank you for the review!

The Course Info Blocks API endpoint has been known to be rather slow to return the response. Previous investigation showed that the major time sink was the get_course_blocks function, which is called three times in a single request. This commit aims to improve the response times by reducing the number of times that this function is called. Solution Summary The first time the function get_course_blocks is called, the result (transformed course blocks) is stored in the current WSGI request object. Later in the same request, before the second get_course_blocks call is triggered, the already transformed course blocks are taken from the request object, and if they are available, get_course_blocks is not called (if not, it is called as a fallback). Later in the request, the function is called again as before (see Optimization Strategy and Difficulties). Optimization Strategy and Difficulties The original idea was to fetch and transform the course blocks once and reuse them in all three cases, which would reduce get_course_blocks call count to 1. However, this did not turn out to be a viable solution because of the arguments passed to get_course_blocks. Notably, the allow_start_dates_in_future boolean flag affects the behavior of StartDateTransformer, which is a filtering transformer modifying the block structure returned. The first two times allow_start_dates_in_future is False, the third time it is True. Setting it to True in all three cases would mean that some blocks would be incorrectly included in the response. This left us with one option - optimize the first two calls. The difference between the first two calls is the non-filtering transformers, however the second call applies a subset of transformers from the first call, so it was safe to apply the superset of transformers in both cases. This allowed to reduce the number of function calls to 2. However, the cached structure may be further mutated by filters downstream, which means we need to cache a copy of the course structure (not the structure itself). The copy method itself is quite heavy (it calls deepcopy three times), making the benefits of this solution much less tangible. In fact, another potential optimization that was considered was to reuse the collected block structure (pre-transformation), but since calling copy on a collected structure proved to be more time-consuming than calling get_collected, this change was discarded, considering that the goal is to improve performance. Revised Solution To achieve a more tangible performance improvement, it was decided to modify the previous strategy as follows: * Pass a for_blocks_view parameter to the get_blocks function to make sure the new caching logic only affects the blocks view. * Collect and cache course blocks with future dates included. * Include start key in requested fields. * Reuse the cached blocks in the third call, which is in get_course_assignments * Before returning the response, filter out any blocks with a future start date, and also remove the start key if it was not in requested fields

This reverts commit 7cd4170.

pdpinch · 2025-11-20T13:01:33Z

FYI, we're encountering issues with persistent grades and we've opened #37661 to revert this to buy some time to figure out what's going on.

This reverts commit 7cd4170. (cherry picked from commit d65e274)

) This reverts commit 7cd4170.

… (openedx#37661) This reverts commit 7cd4170.

) This reverts commit 7cd4170.

… (openedx#37661) This reverts commit 7cd4170.

openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Aug 5, 2025

openedx-webhooks added this to Contributions Aug 5, 2025

github-project-automation bot moved this to Needs Triage in Contributions Aug 5, 2025

mphilbrick211 moved this from Needs Triage to Waiting on Author in Contributions Aug 5, 2025

mphilbrick211 added the FC Relates to an Axim Funded Contribution project label Aug 5, 2025

ormsbee reviewed Aug 11, 2025

View reviewed changes

Serj-N force-pushed the nanai/axm-2610/optimize-api branch from 6e889a8 to ebb32f3 Compare August 18, 2025 14:19

Serj-N added 12 commits October 8, 2025 09:43

feat: store and reuse transformed course blocks

3a0ab4b

test: add test_response_keys to TestBlocksInfoInCourseView

24e7bfe

fix: make reusable_transformed_blocks public

92d50b9

fix: check request is not None before setting attribute

86c42f3

fix: disable pylint warning

1845775

fix: use RequestCache to store course blocks

2fc81d4

docs: use comment instead of docstring

5bc490e

fix: copy blocks before caching, rename cache key

7ef8fd1

test: add test for depth in request params

b40d755

docs: expand comment on copying before caching

76b9cb3

feat: add remove_future_blocks function, reuse blocks in getting assi…

a25a123

…gnments

docs: add comments

9e0d736

Serj-N force-pushed the nanai/axm-2610/optimize-api branch from ebb32f3 to 9e0d736 Compare October 9, 2025 08:02

Serj-N added 3 commits October 9, 2025 11:10

fix: fix lint issues

e2ad6ec

fix: wrap requested_fields in set()

e6d8fc1

fix: handle start key in the main loop

3ec750c

Serj-N force-pushed the nanai/axm-2610/optimize-api branch from 8619d8a to 3ec750c Compare October 9, 2025 09:08

fix: pass for_blocks_view param to get_blocks

2a0b398

Serj-N marked this pull request as ready for review October 14, 2025 14:40

e0d reviewed Oct 14, 2025

View reviewed changes

Serj-N force-pushed the nanai/axm-2610/optimize-api branch 2 times, most recently from 86317fd to 25970a0 Compare October 17, 2025 15:58

refactor: abstract caching logic, move to utils, add key constants

f7bc539

Serj-N force-pushed the nanai/axm-2610/optimize-api branch from 25970a0 to f7bc539 Compare October 17, 2025 16:20

e0d approved these changes Oct 21, 2025

View reviewed changes

ormsbee requested changes Oct 26, 2025

View reviewed changes

Serj-N added 2 commits October 29, 2025 11:51

refactor: move datetime import to module-level

74bb27f

refactor: change request cache namespace variable and value

37e7a9e

ormsbee reviewed Oct 29, 2025

View reviewed changes

ormsbee requested changes Oct 29, 2025

View reviewed changes

docs: caution against mutating cached blocks, revert .copy()

9565fc8

ormsbee approved these changes Oct 30, 2025

View reviewed changes

ormsbee merged commit 7cd4170 into openedx:master Oct 30, 2025
49 checks passed

github-project-automation bot moved this from Waiting on Author to Done in Contributions Oct 30, 2025

ormsbee mentioned this pull request Oct 30, 2025

Optimize Course Info Blocks API #37583

Merged

ormsbee mentioned this pull request Nov 7, 2025

fix: bump learning-core to 0.30.0 #37614

Merged

asadali145 added a commit to mitodl/edx-platform that referenced this pull request Nov 20, 2025

Revert "feat: [FC-0092] Optimize Course Info Blocks API (openedx#37122)"

d65e274

This reverts commit 7cd4170.

asadali145 mentioned this pull request Nov 20, 2025

Revert "feat: [FC-0092] Optimize Course Info Blocks API" #37661

Merged

blarghmatey pushed a commit to mitodl/edx-platform that referenced this pull request Nov 20, 2025

Revert "feat: [FC-0092] Optimize Course Info Blocks API (openedx#37122)"

c3202a0

This reverts commit 7cd4170. (cherry picked from commit d65e274)

ormsbee pushed a commit that referenced this pull request Nov 20, 2025

revert: feat: [FC-0092] Optimize Course Info Blocks API (#37122) (#37661

ab6cf6e

) This reverts commit 7cd4170.

ormsbee pushed a commit to ormsbee/edx-platform that referenced this pull request Dec 10, 2025

revert: feat: [FC-0092] Optimize Course Info Blocks API (openedx#37122)…

77112c9

… (openedx#37661) This reverts commit 7cd4170.

ormsbee pushed a commit that referenced this pull request Dec 10, 2025

revert: feat: [FC-0092] Optimize Course Info Blocks API (#37122) (#37661

f3b9719

) This reverts commit 7cd4170.

blarghmatey pushed a commit to mitodl/edx-platform that referenced this pull request Dec 18, 2025

revert: feat: [FC-0092] Optimize Course Info Blocks API (openedx#37122)…

3521907

… (openedx#37661) This reverts commit 7cd4170.

feanil mentioned this pull request Dec 19, 2025

feanil/common constraint backport #37798

Closed

ormsbee mentioned this pull request Jan 11, 2026

refactor: convert global modulestore to request-cached #37859

Closed

feat: [FC-0092] Optimize Course Info Blocks API #37122

feat: [FC-0092] Optimize Course Info Blocks API #37122

Uh oh!

Conversation

Serj-N commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Rationale

Solution Summary

Optimization Strategy and Difficulties

UPD: Revised Solution

Response Times Measured

Uh oh!

openedx-webhooks commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cmltaWt0 Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

e0d left a comment

Choose a reason for hiding this comment

Uh oh!

e0d commented Oct 21, 2025

Uh oh!

ormsbee left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Serj-N commented Aug 5, 2025 •

edited

Loading

openedx-webhooks commented Aug 5, 2025 •

edited

Loading

cmltaWt0 Oct 16, 2025 •

edited

Loading

Serj-N Oct 30, 2025 •

edited

Loading