-
Notifications
You must be signed in to change notification settings - Fork 18
Add Publish Side Effects #307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
What is the use case for this? We did have one related bug:
Which was fixed with: So I guess having access to the list of all parent containers affected when publishing a component would let us rewrite this, and better handle multiple layers of containers, but it's still all just a wrapper around the same Let me see:
yeah OK - that makes sense to me. |
|
I'm wondering if we could we wire up our platform events like |
|
I suspect that we'll want to emit signals at the Learning Core layer that more or less correspond to the entire DraftPublishLog and PublishLog creation, along with side effects, containers, etc.--so one synchronous signal with everything changed or published, and let the listeners extract what they find interesting. |
63970c2 to
a5076e5
Compare
|
Note to self: Need to rebase this also to take into account the existing 0008 migration. |
8cfe370 to
a493077
Compare
Create the PublishSideEffect model as the publishing analog for the DraftSideEffect model, and create these side effects for parent containers when one of the children is published. This allows us to efficiently query when any subtree of content was affected by a publish.
a493077 to
f504a6a
Compare
|
Ready for review. |
| publishable_entity__draft__version__containerversion__entity_list__entitylistrow__entity_id=publishable_entity_pk, # pylint: disable=line-too-long # noqa: E501 | ||
| publishable_entity__draft__version__containerversion__entity_list__entitylistrow__entity_version_id=None, # pylint: disable=line-too-long # noqa: E501 | ||
| ) | ||
| # TODO: Do we need to run distinct() on this? Will fix in https://github.com/openedx/openedx-learning/issues/303 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are some rebase issues here - the comment above was removed in https://github.com/openedx/openedx-learning/pull/304/files (and I do see .distinct() below), and same with the large comment section "Could alternately do this query...." which I also removed.
| A container is considered to have been "published" if any of its descendents | ||
| were published. So publishing a Component implicitly publishes any Units it | ||
| was in, which publishes the Subsections those Units were in, etc. If the | ||
| Container is not already in the PublishLog (i.e. its own metadata did not | ||
| change), then we create a PublishLogRecord where the old_version is equal | ||
| to the new_version. We also create a PublishSideEffect for every child | ||
| change that affects a container, regardless of whether that container also | ||
| changed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment needs to be clarified, because I am confused by it.
A container is considered to have been "published" if any of its descendents were published.
Publishing a component does not automatically nor necessarily "publish" any containers that use it, and we even have test cases that verify that - which still seem to be passing after this change. So I think either this is wrong, or I misunderstood something here.
publishing a Component implicitly publishes any Units itwas in, which publishes the Subsections those Units were in
Again, I don't think that's the case? But I would say "When a component is published, we may need to perform post-publish actions on containers where that component is a descendant. For example, publishing the last unpublished component in a Unit may make both the unit and its parent section no longer "contain unpublished changes", and that may require an update to the metadata about those containers in the search index used by Studio."
If the Container is not already in the PublishLog (i.e. its own metadata did not change), then we create a PublishLogRecord where the old_version is equal to the new_version. We also create a PublishSideEffect for every child change that affects a container, regardless of whether that container also changed.
Hmm, I would expect the opposite: when a container is not in the (misunderstanding - see below)PublishLog (neither it nor any ancestor was directly published) but it has a descendant that was published, it will be added as a PublishSideEffect, because the publish could affect it but it is not itself changed, but it will not appear in the PublishLog. If the container or an ancestor was changed as part of this publish, it will be included directly in the PublishLog. I would never expect any container to appear in both the direct PublishLog and the PublishSideEffect list.
I'm also wondering if you could open a related PR to show how https://github.com/openedx/edx-platform/blob/ba47f1fae5094d9608d7d2299f778afe023533e1/openedx/core/djangoapps/content_libraries/tasks.py#L99-L104 can be updated once this PR is merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I don't think that's the case? But I would say "When a component is published, we may need to perform post-publish actions on containers where that component is a descendant. For example, publishing the last unpublished component in a Unit may make both the unit and its parent section no longer "contain unpublished changes", and that may require an update to the metadata about those containers in the search index used by Studio."
+1, I have the same question as @bradenmacdonald .
Hmm, I would expect the opposite: when a container is not in the PublishLog (neither it nor any ancestor was directly published) but it has a descendant that was published, it will be added as a PublishSideEffect, because the publish could affect it but it is not itself changed, but it will not appear in the PublishLog. ... I would never expect any container to appear in both the direct PublishLog and the PublishSideEffect list.
This confuses me @bradenmacdonald , because PublishSideEffect joins together two PublishLogRecord rows (cause and effect). So, wouldn't the container necessarily have to have a PublishLogRecord so it could appear as the effect of the PublishSideEffect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(@ormsbee For the never-published container K and its child C, to capture the PublishSideEffect of C -> K, would we need effect to be a new PublishLogRecord with entity=K, old_version==null and new_version==null?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK, I definitely misunderstood that part. Please ignore that part of my comments. I was mostly reading the description and got confused by assuming that PublishSideEffect just listed any container implicitly affected by a publish, in a way completely analogous to PublishLogEntry.
Now that I (hopefully) understand it, I'm wondering if we [will] actually have a use case for tracking which children caused each particular parent to be included in the PublishLog that otherwise wouldn't be? i.e. when would we read PublishSideEffect? But either way, the comment about how it works at least makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A container is considered to have been "published" if any of its descendents were published. So publishing a Component implicitly publishes any Units it was in, which publishes the Subsections those Units were in, etc.
I'm still not sure about this. I feel like we're overloading the term "published" to mean two different things:
- I am at draft version N. My published version is N-M (i.e., M edits have occurred since last publish). Either me or my ancestor is published by the user, so I get a PublishLogRecord for (N-M)->N, and my published version becomes N.
- I am at draft version N. My published version is N-M. My descendent is published, as a side effect of that, I get a PublishLogRecord for (N-M)->(N-M). My published version remains at N-M.
IMO, scenario (1) makes sense to call "publishing". Scenario (2) does not. I don't know what to call it exactly, but does that distinction make sense @bradenmacdonald @ormsbee ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update to this, per this Slack thread, @jmakowski1123 confirms that a container is considered to have been "last published" at whatever time that container or any of its children/descendants are published. Copying the exact text of what was agreed to for context in case anyone is looking at this a few years down the line:
Product clarification request...
Setup: I have a Unit U1 with Components C1 and C2 in it. All changes were published at 9:00. Then I make a change to C2. That marks both C2 and U1 as having unpublished changes. Then I go and publish C2 at 10:00. Now neither U1 nor C2 show the "unpublished changes" status.
Question: Is the "last published" time for U1 9:00 or 10:00?
My current understanding is that it should be 10:00. Just like when we say, "the course was most recently published at X time", we mean that, "the published state of the course changed at X time because some part of it was published", not "everything in the course was published at X time". Similarly, my understanding is that a Unit would be considered "last modified" whenever one of its child Components was modified.
I'll try to rewrite the docstring to be clearer on how this works and why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ormsbee Ok, I think I understand the justification for why publishing a child propagates up to publishing the parent container.
But where does this up-propagation end? E.g
- If I have an unpublished Library that contains a Section -> Subsection -> Unit -> Component, and I publish Component, then from what's said above, the Unit also gets published. But does that mean the Subsection, Section and Library parents also need to be published?
- If I have Library -> Section -> Subsection -> Unit -> Component A + Component B, but only Component A gets published? Do we still propagate the publish all the way back up?
- If Component A is in multiple Units, do all its Units <- Subsections <- Sections get published too?
Plus, we still need to address this TODO to publish down through all the children of a container.
Would also help to see test cases for these.. Please let me know how I can help out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I have an unpublished Library that contains a Section -> Subsection -> Unit -> Component, and I publish Component, then from what's said above, the Unit also gets published. But does that mean the Subsection, Section and Library parents also need to be published?
A couple of things:
- Libraries aren't explicitly versioned entities, so it's not quite the same case. But if someone were to look at the most recent publish time for a library, I'd say it would show the last time anything was published.
- Yes to all the containers being marked as "most recently published at xxx" with the new time. But it would not force them to publish their other children.
If I have Library -> Section -> Subsection -> Unit -> Component A + Component B, but only Component A gets published? Do we still propagate the publish all the way back up?
The Section's "last published" date gets updated, but Component B is not published. This is reading "last published" as "the last time my published state has changed in some way" and not "the last time I published all my children via a publish action at the top level".
If Component A is in multiple Units, do all its Units <- Subsections <- Sections get published too?
Yes.
Plus, we still need to address this TODO to publish down through all the children of a container.
Would also help to see test cases for these.. Please let me know how I can help out?
I paused work on this PR to work on a version that modeled publishing dependencies more explicitly, using the ideas outlined in #317. I'm unlikely to pick that up again until after the conference, so if you want to build on top of that, it would be amazing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pomegranited: FWIW, the bulk of the new logic would be in set_draft_dependencies and _calculate_state_hash in the api module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is reading "last published" as "the last time my published state has changed in some way" and not "the last time I published all my children via a publish action at the top level".
Oohhhh.. ok, that makes much more sense, thank you!
I paused work on this PR to work on a version that #328, using the ideas outlined in #317. I'm unlikely to pick that up again until after the conference, so if you want to build on top of that, it would be amazing.
Will see how far I get on openedx/frontend-app-authoring#1977 before then, but I'm hoping to help out there.
|
BTW, I had previously speculated that this information could be used to optimize openedx-learning/openedx_learning/apps/authoring/units/api.py Lines 284 to 285 in 655c0e3
It's definitely not super important (TBD if we'll even use that function), but I would be interested to know if you see a way to do that. |
I can't think of a straightforward way to apply this work to make that function better. I'll chew on it for a bit. |
|
BTW, one of the things I wish we had in the PublishLogRecord and DraftLogRecord is a cheap way to summarize the full state of a PublishableEntity, inclusive of everything that could cause a side-effect, so that we can have that as a hash that we can easily compare for questions like "does this have any unpublished changes in it". Ideally something that can be composed entirely as a product of the version_nums/uuids + hashes of the things making the side-effects. I can think of ways to do it, but I don't know if I like it... For instance, because we only care about the state changes within a version, we don't have to worry about ordering of children as part of the hash, since any changes to that will be taken care of by the version of the container itself changing. So I think the inputs would just have to be a list of (UUID + optional hash) for all things that affect the container, in this case the children, sorted by UUID. We'd hash that whole thing together. So when computing the state for the Unit, we'd only have to look at the UUIDs of all the child Components. When computing the state for the containing Subsection, we only look at the UUID+state for all the Units and hash that. |
|
Closing in favor of the (still draft) #328 |
Uh oh!
There was an error while loading. Please reload this page.