Skip to content

Conversation

@ormsbee
Copy link
Contributor

@ormsbee ormsbee commented Apr 18, 2025

Create the PublishSideEffect model as the publishing analog for
the DraftSideEffect model, and create these side effects for
parent containers when one of the children is published. This
allows us to efficiently query when any subtree of content was
affected by a publish.

@bradenmacdonald
Copy link
Contributor

bradenmacdonald commented Apr 18, 2025

What is the use case for this?

We did have one related bug:

  • Edit a Component in a Unit:

    • Component + Unit show "unpublished changes" ✔️
  • Publish the Component only.

    • Component no longer shows "unpublished changes" ✔️
    • Unit no longer shows "unpublished changes" ❌ (in the search index)

Which was fixed with:

https://github.com/openedx/edx-platform/blob/3a9b4367e60ebbefdc2813b5587947abd02f01c0/openedx/core/djangoapps/content_libraries/api/blocks.py#L292-L301

So I guess having access to the list of all parent containers affected when publishing a component would let us rewrite this, and better handle multiple layers of containers, but it's still all just a wrapper around the same get_containers_with_entity code. Though with the changes in this PR, is it "more correct" because it's searching for published containers affected by the child's publish?

Let me see:

  1. You have a draft container and you add component A to it. The container has never been published.
  2. You publish component A. This should not include the container in the PublishLog, because the published version of the container doesn't even exist and definitely isn't affected. On the platform side, we don't need to update the search index entry for the container - it still has unpublished changes (to itself) and one child.

yeah OK - that makes sense to me.

@bradenmacdonald
Copy link
Contributor

I'm wondering if we could we wire up our platform events like LIBRARY_CONTAINER_UPDATED and LIBRARY_BLOCK_UPDATED directly to become side effects of the draft/publish actions, so we can guarantee those are always called, no matter at which level the APIs are used? Though we try to be smart about when those events are synchronous vs. asynchronous (e.g. publishing a unit will synchronously update its children because they're usually all visible in the UI at the time and we need to refresh the UI data as soon as the publish is completed, but when publishing a component we asynchronously update all the parents because you're usually not viewing them at the same time and a bit of a delay in updating their publish status is fine).

@ormsbee
Copy link
Contributor Author

ormsbee commented Apr 19, 2025

I suspect that we'll want to emit signals at the Learning Core layer that more or less correspond to the entire DraftPublishLog and PublishLog creation, along with side effects, containers, etc.--so one synchronous signal with everything changed or published, and let the listeners extract what they find interesting.

@ormsbee ormsbee marked this pull request as ready for review April 29, 2025 15:49
@ormsbee ormsbee force-pushed the publish-side-effects-2 branch 2 times, most recently from 63970c2 to a5076e5 Compare April 30, 2025 17:33
@ormsbee
Copy link
Contributor Author

ormsbee commented Apr 30, 2025

Note to self: Need to rebase this also to take into account the existing 0008 migration.

@ormsbee ormsbee force-pushed the publish-side-effects-2 branch 3 times, most recently from 8cfe370 to a493077 Compare May 12, 2025 16:13
Create the PublishSideEffect model as the publishing analog for
the DraftSideEffect model, and create these side effects for
parent containers when one of the children is published. This
allows us to efficiently query when any subtree of content was
affected by a publish.
@ormsbee ormsbee force-pushed the publish-side-effects-2 branch from a493077 to f504a6a Compare May 12, 2025 16:19
@ormsbee
Copy link
Contributor Author

ormsbee commented May 12, 2025

Ready for review.

@kdmccormick kdmccormick self-requested a review May 13, 2025 18:35
publishable_entity__draft__version__containerversion__entity_list__entitylistrow__entity_id=publishable_entity_pk, # pylint: disable=line-too-long # noqa: E501
publishable_entity__draft__version__containerversion__entity_list__entitylistrow__entity_version_id=None, # pylint: disable=line-too-long # noqa: E501
)
# TODO: Do we need to run distinct() on this? Will fix in https://github.com/openedx/openedx-learning/issues/303
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are some rebase issues here - the comment above was removed in https://github.com/openedx/openedx-learning/pull/304/files (and I do see .distinct() below), and same with the large comment section "Could alternately do this query...." which I also removed.

Comment on lines +624 to +631
A container is considered to have been "published" if any of its descendents
were published. So publishing a Component implicitly publishes any Units it
was in, which publishes the Subsections those Units were in, etc. If the
Container is not already in the PublishLog (i.e. its own metadata did not
change), then we create a PublishLogRecord where the old_version is equal
to the new_version. We also create a PublishSideEffect for every child
change that affects a container, regardless of whether that container also
changed.
Copy link
Contributor

@bradenmacdonald bradenmacdonald May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment needs to be clarified, because I am confused by it.

A container is considered to have been "published" if any of its descendents were published.

Publishing a component does not automatically nor necessarily "publish" any containers that use it, and we even have test cases that verify that - which still seem to be passing after this change. So I think either this is wrong, or I misunderstood something here.

publishing a Component implicitly publishes any Units itwas in, which publishes the Subsections those Units were in

Again, I don't think that's the case? But I would say "When a component is published, we may need to perform post-publish actions on containers where that component is a descendant. For example, publishing the last unpublished component in a Unit may make both the unit and its parent section no longer "contain unpublished changes", and that may require an update to the metadata about those containers in the search index used by Studio."

If the Container is not already in the PublishLog (i.e. its own metadata did not change), then we create a PublishLogRecord where the old_version is equal to the new_version. We also create a PublishSideEffect for every child change that affects a container, regardless of whether that container also changed.

Hmm, I would expect the opposite: when a container is not in the PublishLog (neither it nor any ancestor was directly published) but it has a descendant that was published, it will be added as a PublishSideEffect, because the publish could affect it but it is not itself changed, but it will not appear in the PublishLog. If the container or an ancestor was changed as part of this publish, it will be included directly in the PublishLog. I would never expect any container to appear in both the direct PublishLog and the PublishSideEffect list. (misunderstanding - see below)


I'm also wondering if you could open a related PR to show how https://github.com/openedx/edx-platform/blob/ba47f1fae5094d9608d7d2299f778afe023533e1/openedx/core/djangoapps/content_libraries/tasks.py#L99-L104 can be updated once this PR is merged?

Copy link
Member

@kdmccormick kdmccormick May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I don't think that's the case? But I would say "When a component is published, we may need to perform post-publish actions on containers where that component is a descendant. For example, publishing the last unpublished component in a Unit may make both the unit and its parent section no longer "contain unpublished changes", and that may require an update to the metadata about those containers in the search index used by Studio."

+1, I have the same question as @bradenmacdonald .

Hmm, I would expect the opposite: when a container is not in the PublishLog (neither it nor any ancestor was directly published) but it has a descendant that was published, it will be added as a PublishSideEffect, because the publish could affect it but it is not itself changed, but it will not appear in the PublishLog. ... I would never expect any container to appear in both the direct PublishLog and the PublishSideEffect list.

This confuses me @bradenmacdonald , because PublishSideEffect joins together two PublishLogRecord rows (cause and effect). So, wouldn't the container necessarily have to have a PublishLogRecord so it could appear as the effect of the PublishSideEffect?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(@ormsbee For the never-published container K and its child C, to capture the PublishSideEffect of C -> K, would we need effect to be a new PublishLogRecord with entity=K, old_version==null and new_version==null?)

Copy link
Contributor

@bradenmacdonald bradenmacdonald May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK, I definitely misunderstood that part. Please ignore that part of my comments. I was mostly reading the description and got confused by assuming that PublishSideEffect just listed any container implicitly affected by a publish, in a way completely analogous to PublishLogEntry.

Now that I (hopefully) understand it, I'm wondering if we [will] actually have a use case for tracking which children caused each particular parent to be included in the PublishLog that otherwise wouldn't be? i.e. when would we read PublishSideEffect? But either way, the comment about how it works at least makes sense to me.

Copy link
Member

@kdmccormick kdmccormick May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A container is considered to have been "published" if any of its descendents were published. So publishing a Component implicitly publishes any Units it was in, which publishes the Subsections those Units were in, etc.

I'm still not sure about this. I feel like we're overloading the term "published" to mean two different things:

  1. I am at draft version N. My published version is N-M (i.e., M edits have occurred since last publish). Either me or my ancestor is published by the user, so I get a PublishLogRecord for (N-M)->N, and my published version becomes N.
  2. I am at draft version N. My published version is N-M. My descendent is published, as a side effect of that, I get a PublishLogRecord for (N-M)->(N-M). My published version remains at N-M.

IMO, scenario (1) makes sense to call "publishing". Scenario (2) does not. I don't know what to call it exactly, but does that distinction make sense @bradenmacdonald @ormsbee ?

Copy link
Contributor Author

@ormsbee ormsbee May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to this, per this Slack thread, @jmakowski1123 confirms that a container is considered to have been "last published" at whatever time that container or any of its children/descendants are published. Copying the exact text of what was agreed to for context in case anyone is looking at this a few years down the line:

Product clarification request...

Setup: I have a Unit U1 with Components C1 and C2 in it. All changes were published at 9:00. Then I make a change to C2. That marks both C2 and U1 as having unpublished changes. Then I go and publish C2 at 10:00. Now neither U1 nor C2 show the "unpublished changes" status.

Question: Is the "last published" time for U1 9:00 or 10:00?

My current understanding is that it should be 10:00. Just like when we say, "the course was most recently published at X time", we mean that, "the published state of the course changed at X time because some part of it was published", not "everything in the course was published at X time". Similarly, my understanding is that a Unit would be considered "last modified" whenever one of its child Components was modified.

I'll try to rewrite the docstring to be clearer on how this works and why.

Copy link
Contributor

@pomegranited pomegranited Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ormsbee Ok, I think I understand the justification for why publishing a child propagates up to publishing the parent container.

But where does this up-propagation end? E.g

  1. If I have an unpublished Library that contains a Section -> Subsection -> Unit -> Component, and I publish Component, then from what's said above, the Unit also gets published. But does that mean the Subsection, Section and Library parents also need to be published?
  2. If I have Library -> Section -> Subsection -> Unit -> Component A + Component B, but only Component A gets published? Do we still propagate the publish all the way back up?
  3. If Component A is in multiple Units, do all its Units <- Subsections <- Sections get published too?

Plus, we still need to address this TODO to publish down through all the children of a container.

Would also help to see test cases for these.. Please let me know how I can help out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I have an unpublished Library that contains a Section -> Subsection -> Unit -> Component, and I publish Component, then from what's said above, the Unit also gets published. But does that mean the Subsection, Section and Library parents also need to be published?

A couple of things:

  1. Libraries aren't explicitly versioned entities, so it's not quite the same case. But if someone were to look at the most recent publish time for a library, I'd say it would show the last time anything was published.
  2. Yes to all the containers being marked as "most recently published at xxx" with the new time. But it would not force them to publish their other children.

If I have Library -> Section -> Subsection -> Unit -> Component A + Component B, but only Component A gets published? Do we still propagate the publish all the way back up?

The Section's "last published" date gets updated, but Component B is not published. This is reading "last published" as "the last time my published state has changed in some way" and not "the last time I published all my children via a publish action at the top level".

If Component A is in multiple Units, do all its Units <- Subsections <- Sections get published too?

Yes.

Plus, we still need to address this TODO to publish down through all the children of a container.

Would also help to see test cases for these.. Please let me know how I can help out?

I paused work on this PR to work on a version that modeled publishing dependencies more explicitly, using the ideas outlined in #317. I'm unlikely to pick that up again until after the conference, so if you want to build on top of that, it would be amazing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pomegranited: FWIW, the bulk of the new logic would be in set_draft_dependencies and _calculate_state_hash in the api module.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ormsbee

This is reading "last published" as "the last time my published state has changed in some way" and not "the last time I published all my children via a publish action at the top level".

Oohhhh.. ok, that makes much more sense, thank you!

I paused work on this PR to work on a version that #328, using the ideas outlined in #317. I'm unlikely to pick that up again until after the conference, so if you want to build on top of that, it would be amazing.

Will see how far I get on openedx/frontend-app-authoring#1977 before then, but I'm hoping to help out there.

@bradenmacdonald
Copy link
Contributor

BTW, I had previously speculated that this information could be used to optimize get_components_in_published_unit_as_of:

TODO: optimize, perhaps by having the publishlog store a record of all
ancestors of every modified PublishableEntity in the publish.

It's definitely not super important (TBD if we'll even use that function), but I would be interested to know if you see a way to do that.

@ormsbee
Copy link
Contributor Author

ormsbee commented May 14, 2025

BTW, I had previously speculated that this information could be used to optimize get_components_in_published_unit_as_of:

TODO: optimize, perhaps by having the publishlog store a record of all
ancestors of every modified PublishableEntity in the publish.

It's definitely not super important (TBD if we'll even use that function), but I would be interested to know if you see a way to do that.

I can't think of a straightforward way to apply this work to make that function better. I'll chew on it for a bit.

@ormsbee
Copy link
Contributor Author

ormsbee commented May 15, 2025

BTW, one of the things I wish we had in the PublishLogRecord and DraftLogRecord is a cheap way to summarize the full state of a PublishableEntity, inclusive of everything that could cause a side-effect, so that we can have that as a hash that we can easily compare for questions like "does this have any unpublished changes in it". Ideally something that can be composed entirely as a product of the version_nums/uuids + hashes of the things making the side-effects.

I can think of ways to do it, but I don't know if I like it... For instance, because we only care about the state changes within a version, we don't have to worry about ordering of children as part of the hash, since any changes to that will be taken care of by the version of the container itself changing. So I think the inputs would just have to be a list of (UUID + optional hash) for all things that affect the container, in this case the children, sorted by UUID. We'd hash that whole thing together.

So when computing the state for the Unit, we'd only have to look at the UUIDs of all the child Components. When computing the state for the containing Subsection, we only look at the UUID+state for all the Units and hash that.

@ormsbee
Copy link
Contributor Author

ormsbee commented Jun 18, 2025

Closing in favor of the (still draft) #328

@ormsbee ormsbee closed this Jun 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants