Skip to content

fix: remove cloud-final.service regression#6286

Closed
TheRealFalcon wants to merge 1 commit into
canonical:mainfrom
TheRealFalcon:final-service
Closed

fix: remove cloud-final.service regression#6286
TheRealFalcon wants to merge 1 commit into
canonical:mainfrom
TheRealFalcon:final-service

Conversation

@TheRealFalcon
Copy link
Copy Markdown
Contributor

@TheRealFalcon TheRealFalcon commented Jul 3, 2025

Proposed Commit Message

fix: remove cloud-final.service regression

After=multi-user.target can cause issues on non-debian systems,
so ensure it is templated

Fixes GH-6245

Additional Context

#6245 (comment)

Test Steps

Merge type

  • Squash merge using "Proposed Commit Message"
  • Rebase and merge unique commits. Requires commit messages per-commit each referencing the pull request number (#<PR_NUM>)

After=multi-user.target can cause issues on non-debian systems,
so ensure it is templated

Fixes canonicalGH-6245
@TheRealFalcon TheRealFalcon requested a review from holmanb July 3, 2025 15:40
@holmanb
Copy link
Copy Markdown
Member

holmanb commented Jul 3, 2025

Thanks for proposing this @TheRealFalcon.

I don't think that per-distro multi-user.target ordering is the best path forward. Cloud-init should generally behave the same across distros, and a special case for a foundational unit like multi-user.target indicates significant diverging behavior. A common ordering relationship with core systemd services would be more ideal.

For example: how should people reason about cloud-init if a runcmd like this behaves different on different distros:

systemctl is-active multi-user.target && echo "do something"

While this example isn't particularly interesting, simple checks like this are pretty common. If pressed I'm sure we could come up with significant examples of cloud-init behaving differently as a result of different multi-user.target ordering.

Also, I couldn't reproduce the issue from the report. I would like to better understand the issue before making this change.

@holmanb holmanb self-assigned this Jul 3, 2025
Copy link
Copy Markdown
Member

@holmanb holmanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment

Copy link
Copy Markdown
Member

@holmanb holmanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @TheRealFalcon. I was able to reproduce the reported behavior.

The regressed behavior is expected in my opinion - this behavior has existed in other distros for years. A simple solution exists for affected users: add After=multi-user.target to the service which is ordered After=cloud-final.service.

I'm -1 to upstream trying to support multiple ordering relationships against multi-user.target because wherever possible cloud-init behavior should match across distros.

@TheRealFalcon
Copy link
Copy Markdown
Contributor Author

From #5830 ,

The issue is that systemd automatically adds a Before=multi-user.target to your service.

How/Why is this happening? Is this documented in systemd somewhere?

this behavior has existed in other distros for years

What do you mean by this?

wherever possible cloud-init behavior should match across distros.

despite any regressions? I'm not sure I signed onto that design goal when I approved #5830

@holmanb
Copy link
Copy Markdown
Member

holmanb commented Jul 8, 2025

From #5830 ,

The issue is that systemd automatically adds a Before=multi-user.target to your service.

How/Why is this happening? Is this documented in systemd somewhere?i

I haven't seen it documented, but this is easily observable. I've bumped into this myself in the past when trying to make a service be ordered after cloud-init.

this behavior has existed in other distros for years

What do you mean by this?

Try to create a service ordered after cloud-init services on Ubuntu. You'll find that an After=multi-user.target is required to avoid the cycle, regardless of which series of Ubuntu. #5830 made all systemd distros behave the same, so now you can create a service which is ordered after cloud-init in the same way whether you run on Ubuntu or any other distro.

wherever possible cloud-init behavior should match across distros.

despite any regressions? I'm not sure I signed onto that design goal when I approved #5830

I hesitate to call this a cloud-init regression. Any observable change in behavior could technically regress somebody's use case, but that doesn't mean that we shouldn't change anything. Stable downstreams might consider this change a regression, but from an upstream perspective which cloud-init feature is regressed? My understanding of the reported issue is that the user wanted their service to run after cloud-init. #5830 changed what they need to do to accomplish that, but it didn't making ordering after cloud-init impossible. A change was observable, but a workaround exists. Unless #5830 broke some cloud-init-supported feature or some crucial distro-packaged external service, I'd suggest that we leave it (and even if it does, I would want to consider the significance of the feature/external service). Besides matching behavior across distros, a unified boot order across distros gives cloud-init more flexibility to deliver cloud-init features which benefit all distros.

Perhaps we could document #5830 here instead. Either way, we should document the canonical way to order a service after cloud-init for everyone. If we re-template multi-user.target, this documentation cannot be distro-agnostic - which will render it less useful than a document which diverges based on which distro the user is using.

If you agree, I can propose a change to document the breaking change as well as a how-to documenting how to order after cloud-init on systemd. Thoughts?

@TheRealFalcon
Copy link
Copy Markdown
Contributor Author

but from an upstream perspective which cloud-init feature is regressed?

Booting your distro without having to do weird workarounds.

Besides matching behavior across distros, a unified boot order across distros gives cloud-init more flexibility to deliver cloud-init features which benefit all distros.

At the expense of downstreams patching this themselves such that upstream changes/documentation don't benefit all distros either. I worry we're just pushing any inherent complexity and diverging behavior downstream, which makes it even harder to understand and reason about. Do we know why the After=multi-user.target got templated in the first place? If not, it seems like hubris on our part to think we can just take it out with no repercussions. And to be clear, this is a change in my mentality on my part. I approved it previously.

flexibility

Does cloud-init need that flexibility currently? If not, I don't think the possibility of future needs is a reason for changed behavior now.

Any observable change in behavior could technically regress somebody's use case, but that doesn't mean that we shouldn't change anything

Agreed! But I also think that means we shouldn't change functionality unless there's a really good reason to. Bugs, (reasonable) requested functionality, resolving conflicting requirements, better UX. These might all be good reasons. I just don't see a good reason here other than it looks nicer and there may be some point in the future we'll want to make a change or documentation there that impacts all distros.

If you agree, I can propose a change to document the breaking change as well as a how-to documenting how to order after cloud-init on systemd. Thoughts?

I'm open to this if you're still -1

@holmanb
Copy link
Copy Markdown
Member

holmanb commented Jul 9, 2025

but from an upstream perspective which cloud-init feature is regressed?

Booting your distro without having to do weird workarounds.

I haven't seen any reports of an inability to boot any distro without having to do workarounds. Have you? The issue reported, as I understand it, was that a user's custom service was removed from the boot transaction because it introduced a cycle. Cloud-init still behaved, they just needed to update their service ordering accordingly to get their service working. And calling it "weird" is just subjective based on prior experience.

Besides matching behavior across distros, a unified boot order across distros gives cloud-init more flexibility to deliver cloud-init features which benefit all distros.

At the expense of downstreams patching this themselves such that upstream changes/documentation don't benefit all distros either. I worry we're just pushing any inherent complexity and diverging behavior downstream, which makes it even harder to understand and reason about.

Yeah, there are tradeoffs with both approaches. I take issue with upstreaming divergences when actual requirements are left undocumented along the way. Left in the hands of the original maintainers they would have had bug reports associated with patches which justified their divergences. In some ways, cloud-init feels like an upstream that tries to also be every possible downstream which is hardly sustainable.

Do we know why the After=multi-user.target got templated in the first place? If not, it seems like hubris on our part to think we can just take it out with no repercussions. And to be clear, this is a change in my mentality on my part. I approved it previously.

If I remember correctly, when I dug into the git history previously, all I found was a maintainer importing changes as a template from CentOS 7 with justification being "that's the way they had it". Cloud-init upstream had only gained the After=multi-user.target around a year prior to the template import, so my working theory at the time of investigation was that CentOS hadn't updated to the latest upstream ordering yet when the differences were upstreamed. "That's the way they had it" doesn't tell anybody why they had it that way and is hardly justification for such a significant change.

Repercussions aren't the only thing that concern me. I'm also concerned about a design that exists purely due to prior art. I think "because what is" is a weaker argument than "because this is best". A few users that need to update custom services is a small price to pay in exchange for an ecosystem of users that share the same fundamental boot order and can therefore use the same logic to debug, define, and understand cloud-init's behavior. It's much easier to follow documentation without statements like "To accomplish X on distros A, B, C, D, ... N, do ____ but on Ubuntu and Debian do _____." on every page. Reducing variances in boot order will help reduce that in the long run.

I don't think the possibility of future needs is a reason for changed behavior now.

Gentle reminder that we are talking about changed behavior 9 months ago, not changed behavior now. This PR would be a change in behavior now, which would force a reversal at some point in the future for those that have already adjusted to the effects of #5830.

Any observable change in behavior could technically regress somebody's use case, but that doesn't mean that we shouldn't change anything

Agreed! But I also think that means we shouldn't change functionality unless there's a really good reason to. Bugs, (reasonable) requested functionality, resolving conflicting requirements, better UX. These might all be good reasons. I just don't see a good reason here other than it looks nicer and there may be some point in the future we'll want to make a change or documentation there that impacts all distros.

Looks nicer isn't one of my reasons for wanting this. Besides the points above and in #5818, for me one of the biggest advantages of eliminating unnecessary divergences in boot ordering is that it reduces the problem space in all ways - when trying to debug, understand, or define cloud-init behavior. Supporting every possible ordering in upstream means that upstream takes responsibility for every possible ordering when a bug is reported or future design is considered. Simplicity is a beautiful thing, and it lends itself to maintainability.

If you agree, I can propose a change to document the breaking change as well as a how-to documenting how to order after cloud-init on systemd. Thoughts?

I'm open to this if you're still -1

I still have some doubts in the position that I'm arguing. My biggest concern is that some significant system service will be affected by #5830 in an unforeseen way on some slow-moving distro.

That said I'm not too concerned about this possibility, because the most likely scenario is that Ubuntu would also have the same service packaged, which would mean that a patch would already have to exist in either Ubuntu or the other downstream which breaks or introduces the cycle, respectively. In either case a path forward to break the cycle would already exist: their downstream patch could be dropped to match Ubuntu or Ubuntu's downstream patch could be used as an example demonstrating a fix. (not sure if I articulated that well)

I'd like to more forward with my proposal above, if you're still open to this. We can always bring back this PR if a stronger justification presents itself.

@TheRealFalcon
Copy link
Copy Markdown
Contributor Author

Thanks for the conversation. You raise good points. I'll close this now.

@holmanb holmanb mentioned this pull request Jul 9, 2025
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants