fix: remove cloud-final.service regression#6286
Conversation
After=multi-user.target can cause issues on non-debian systems, so ensure it is templated Fixes canonicalGH-6245
|
Thanks for proposing this @TheRealFalcon. I don't think that per-distro multi-user.target ordering is the best path forward. Cloud-init should generally behave the same across distros, and a special case for a foundational unit like multi-user.target indicates significant diverging behavior. A common ordering relationship with core systemd services would be more ideal. For example: how should people reason about cloud-init if a While this example isn't particularly interesting, simple checks like this are pretty common. If pressed I'm sure we could come up with significant examples of cloud-init behaving differently as a result of different Also, I couldn't reproduce the issue from the report. I would like to better understand the issue before making this change. |
holmanb
left a comment
There was a problem hiding this comment.
Thanks again @TheRealFalcon. I was able to reproduce the reported behavior.
The regressed behavior is expected in my opinion - this behavior has existed in other distros for years. A simple solution exists for affected users: add After=multi-user.target to the service which is ordered After=cloud-final.service.
I'm -1 to upstream trying to support multiple ordering relationships against multi-user.target because wherever possible cloud-init behavior should match across distros.
|
From #5830 ,
How/Why is this happening? Is this documented in systemd somewhere?
What do you mean by this?
despite any regressions? I'm not sure I signed onto that design goal when I approved #5830 |
I haven't seen it documented, but this is easily observable. I've bumped into this myself in the past when trying to make a service be ordered after cloud-init.
Try to create a service ordered after cloud-init services on Ubuntu. You'll find that an
I hesitate to call this a cloud-init regression. Any observable change in behavior could technically regress somebody's use case, but that doesn't mean that we shouldn't change anything. Stable downstreams might consider this change a regression, but from an upstream perspective which cloud-init feature is regressed? My understanding of the reported issue is that the user wanted their service to run after cloud-init. #5830 changed what they need to do to accomplish that, but it didn't making ordering after cloud-init impossible. A change was observable, but a workaround exists. Unless #5830 broke some cloud-init-supported feature or some crucial distro-packaged external service, I'd suggest that we leave it (and even if it does, I would want to consider the significance of the feature/external service). Besides matching behavior across distros, a unified boot order across distros gives cloud-init more flexibility to deliver cloud-init features which benefit all distros. Perhaps we could document #5830 here instead. Either way, we should document the canonical way to order a service after cloud-init for everyone. If we re-template multi-user.target, this documentation cannot be distro-agnostic - which will render it less useful than a document which diverges based on which distro the user is using. If you agree, I can propose a change to document the breaking change as well as a how-to documenting how to order after cloud-init on systemd. Thoughts? |
Booting your distro without having to do weird workarounds.
At the expense of downstreams patching this themselves such that upstream changes/documentation don't benefit all distros either. I worry we're just pushing any inherent complexity and diverging behavior downstream, which makes it even harder to understand and reason about. Do we know why the
Does cloud-init need that flexibility currently? If not, I don't think the possibility of future needs is a reason for changed behavior now.
Agreed! But I also think that means we shouldn't change functionality unless there's a really good reason to. Bugs, (reasonable) requested functionality, resolving conflicting requirements, better UX. These might all be good reasons. I just don't see a good reason here other than it looks nicer and there may be some point in the future we'll want to make a change or documentation there that impacts all distros.
I'm open to this if you're still -1 |
I haven't seen any reports of an inability to boot any distro without having to do workarounds. Have you? The issue reported, as I understand it, was that a user's custom service was removed from the boot transaction because it introduced a cycle. Cloud-init still behaved, they just needed to update their service ordering accordingly to get their service working. And calling it "weird" is just subjective based on prior experience.
Yeah, there are tradeoffs with both approaches. I take issue with upstreaming divergences when actual requirements are left undocumented along the way. Left in the hands of the original maintainers they would have had bug reports associated with patches which justified their divergences. In some ways, cloud-init feels like an upstream that tries to also be every possible downstream which is hardly sustainable.
If I remember correctly, when I dug into the git history previously, all I found was a maintainer importing changes as a template from CentOS 7 with justification being "that's the way they had it". Cloud-init upstream had only gained the Repercussions aren't the only thing that concern me. I'm also concerned about a design that exists purely due to prior art. I think "because what is" is a weaker argument than "because this is best". A few users that need to update custom services is a small price to pay in exchange for an ecosystem of users that share the same fundamental boot order and can therefore use the same logic to debug, define, and understand cloud-init's behavior. It's much easier to follow documentation without statements like "To accomplish X on distros A, B, C, D, ... N, do ____ but on Ubuntu and Debian do _____." on every page. Reducing variances in boot order will help reduce that in the long run.
Gentle reminder that we are talking about changed behavior 9 months ago, not changed behavior now. This PR would be a change in behavior now, which would force a reversal at some point in the future for those that have already adjusted to the effects of #5830.
Looks nicer isn't one of my reasons for wanting this. Besides the points above and in #5818, for me one of the biggest advantages of eliminating unnecessary divergences in boot ordering is that it reduces the problem space in all ways - when trying to debug, understand, or define cloud-init behavior. Supporting every possible ordering in upstream means that upstream takes responsibility for every possible ordering when a bug is reported or future design is considered. Simplicity is a beautiful thing, and it lends itself to maintainability.
I still have some doubts in the position that I'm arguing. My biggest concern is that some significant system service will be affected by #5830 in an unforeseen way on some slow-moving distro. That said I'm not too concerned about this possibility, because the most likely scenario is that Ubuntu would also have the same service packaged, which would mean that a patch would already have to exist in either Ubuntu or the other downstream which breaks or introduces the cycle, respectively. In either case a path forward to break the cycle would already exist: their downstream patch could be dropped to match Ubuntu or Ubuntu's downstream patch could be used as an example demonstrating a fix. (not sure if I articulated that well) I'd like to more forward with my proposal above, if you're still open to this. We can always bring back this PR if a stronger justification presents itself. |
|
Thanks for the conversation. You raise good points. I'll close this now. |
Proposed Commit Message
Additional Context
#6245 (comment)
Test Steps
Merge type