Add module 'write-deferred-files' executed in stage 'final'#916
Conversation
|
Hey @lucendio , unfortunately we cannot just swap these two modules for backwards compatibility reasons. One of the comments in that bug gives the reasoning and some possible ways to solve this:
|
|
Hey @TheRealFalcon , while, from a logical point of view, I understand the reasoning, although I honestly doubt that sb actually does this replacing I agree with your observation about A completely different approach could be to change how a user is being created from python - to eliminate the argument brought up. In some places on the internet I read sth about In any case, I'd like to help moving this dangling issue towards a solution :) |
This sounds reasonable to me. I'd be ok with it.
We would still have the issue of write_file need to run before creating the user, right? I think this would be a little more invasive that I'd prefer, especially since it would involve dragging a new dependency into cloud-init. If you want to go ahead with the |
|
Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close. If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging mitechie, and he will ensure that someone takes a look soon. (If the pull request is closed and you would like to continue working on it, please do tag mitechie to reopen it.) |
Yes. this approach would still involve placing
I would argue, that it is easier to teach users what So, to summarize, as it stands, we currently have two different approaches:
While the first one would extend the user API, the second seems to introduce another dependency and would change the default order. @TheRealFalcon I'd like to start working on this. Would you mind removing the |
|
hey @lucendio , thanks for driving this. Your plan sounds good to me. Let me know if you have any additional questions or need help. |
|
Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close. If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging mitechie, and he will ensure that someone takes a look soon. (If the pull request is closed and you would like to continue working on it, please do tag mitechie to reopen it.) |
|
Let's not let this one die on the vine. I have a need for this as well, by injecting files (openvpn, ~/.ssh/config and others) at I like the idea of having a specific construct of when to trigger the I'm inclined to support the idea of a global Thoughts? |
During a follow-up discussion it was desiced to extend the user interface instead of shuffling modules around. So, this commit reverts the initial approach of the PR canonical#916 before starting to implement the solution being discused. This reverts commit 1888f4f.
|
I put together a first proposal for the implementation. Most of the details can be found in the commit messages. The branch used for the proposal is the continuation of the one referred in this PR. I chose this specific location, because I can see cases, in which users would like to refer to users/groups created through package installation. Does this make sense? The current approach introduces sth like this: but additionally, I could also imagine the following: So, not only Please let me know what you think about the current implementation. I'd love to hear your feedback! PS: @TheRealFalcon would you mind re-opning this PR, so I can change the origin branch? Also, please let me know if this doesn't meet your level of code quality. And, I ended up making |
Quality overall looks good. There's no problem with reusing write_files in the way you did. I noticed a number of pep8-type things (mostly line lengths), but other than that the implementation mostly looks good. I haven't done a full review yet though. Backing up to your earlier comment though...
Now that you bring it up, I like this approach a lot more. When you had initially mentioned adding the files option to user-groups, I was thinking it would modifying the module directly. If we're already adding a "deferred" module that will run later in boot, I think this approach using I don't think we should have this and the What do you think? |
|
I have to admit, I almost couldn't resist the urge to implement Anyway, I must say, I still believe there is value in having both.
I guess, that could be solved by linking. I didn't render the docs yet, but I hope it works referring from one scheme description to another. And being able to define files that belong to a user right next to it and as simple as: It would even be possible to simplify
Oh, you mean, calling the So, I hope you reconsider. It's not a blocker, because it's more impotent to even get this functionality in. Maybe other folks have opinions too :) |
I don't think this will work. Root is still executing everything, so the user's HOME env var won't be set.
Having one write files definition that can write at 3 separate times seems excessive to me. We also need to consider cohesiveness and maintenance burden. In general, most modules have their yaml definition and operate only on that. There are obviously exceptions to this "rule", but I think it's important to keep those exceptions to a minimum. Having a write-files definition that is defined in the users yaml but not actually during user handling would lead to a lot of confusion. My preference would be that we choose one. Either keep the users files definition (handled in the users module) or add the defer key to write-files that get's executed at final time. Why do you think it's important to have both? |
blackboxsw
left a comment
There was a problem hiding this comment.
@lucendio thank you for picking up this work and running with it and for driving conversation here with James.
I know there has been a bit of back and forth here trying to define the best support for this deferred set of write_files operations and I appreciate the effort you've put into this so far.
I recognize you are coming from the perspective of solving the creation of files in a user-based directories with proper perms etc. It seems you've added some helper functionality to aid in creation specific to the users. While the extension to the "users" module directives are helpful, it feels a bit like we are adding an alias with sugar which can already be done by explicit directives in the existing 'write_files:' schema.
I feel like providing two ways to do things in this case might be more of a burden from doc/maintenance stand point than just doing one thing that serves a more broad set of needs.
I'd vote instead that this PR tries limiting scope to only adding the ability to perform write_files type operations during cloud-init final stage.
We can do that in one of two ways, both of which involve adding a cc_write_files_deferred.py module:
- Add a "defer: true" config directive to existing
write_files:schema, cc_write_files.py will ignore deferred items, cc_write_files_deferred.py will consume alldefer: trueitems underwrite_files:
-- OR -- - Don't add a defer:true option to
write_files:schema, just add awrite_files_deferred:top-level key which cc_write_files_deferred will consume
Are there any specific configuration directives missing from the existing write_files schema that don't ultimately meet the needs you tried addressing with your extract_deferred_files that maybe could be added to the base functionality of write_files ? It seems maybe you'd like the option to discover the user:group based on the working directory you are operating in?
I guess I'd probably lean toward option 2 for a couple reasons:
- If someone has a need to write deferred files or earlier files I'm just guessing they have a group of those operations that they would like to perform at the same stage in cloud-init. To more easily group those write_file operations with minimal #cloud-config under a single top-level key "write_files_deferred:" instead of appending defer:true on every item under the current
write_files:directive. - Most cloud-init config modules cc_*py are scoped to own unique top-level #cloud-config keys. So, it probably makes things a bit more intuitive for cc_write_files.py to honor
write_files:config key and cc_write_files_deferred: to honorwrite_files_deferred:key instead of two modules having to pop or filter unrelated items out of their config.
In either case here's a diff against your PR that might explain what I was thinking for option 2
https://paste.ubuntu.com/p/5MF77rqgMV/
But if we think option 1 is best: here's a diff to support a defer: true and avoid honoring a new top-level #cloud-config key.
https://paste.ubuntu.com/p/MbZ3y5fdrh/
If that would be the case, I would completely agree with you. I was aiming for two times, where times refers to the amount of modules writing files across all stages:
Originally, the idea was to not expose a dedicated top-level key for
One could say the oppositem too. Explicitly marking files whose creation is deferred, and finding it nicer to have all files managed by cloud-init in one place. No new top-level key one would need to be aware of, just a flag to slightly change the behaviour or preventing its creation from failing.
In my opinion, that is exactly what logic behind a user interface is supposed to do. Often, humans understand data very differently compared to how machines do. That being said, well-structured and maintainable code is not necessarily dependent and still very much possible.
Important is probably a bit strong. I'd say it feels intuitive to me: files, that I define in a user object would surely be created after the user has been created and would be owned by that user. And, a file marked as deferred is most likely be created at the very end of a cloud-init run, after anything that might interfere with the creation, did already happen. But I guess intuition is also somewhat subjective :) Above, I just wanted to make my case again to ensure that there is no misunderstanding about my train of thought and the implementation I propose. Just the perspective of a cloud-init user. Despite our different point of views, I think we all share the goal of reaching consensus. And since you are the maintainers, I would like you to have the final verdict, because I'd rather have a solution addressing the original issue. If you like, I can still carry out the implementation. One last technical thought though: From a backward compatibility perspective, does it make more or less sense to introduce a new top-level key or just extend existing schemes, when adding a new module to one of the stage lists in Side note: only way where we would not need to link to
@blackboxsw btw GH supports syntax highlighting for git diffs ;) although, a link to a diff served by GH is probably even more readable. diff --git a/cloudinit/config/cc_write_files.py b/cloudinit/config/cc_write_files.py
index 8601e707..b19e47f6 100644
--- a/cloudinit/config/cc_write_files.py
+++ b/cloudinit/config/cc_write_files.py
@@ -151,6 +151,15 @@ schema = {
``path`` exists. Default: **false**.
"""),
},
+ 'defer': {
+ 'type': 'boolean',
+ 'default': False,
+ 'description': dedent("""\
+ Whether to defer writing files until cloud-init's
+ final stage, after user creation and
+ package install.
+ """),
+ },
},
'required': ['path'],
'additionalProperties': False
@@ -163,7 +172,7 @@ __doc__ = get_schema_doc(schema) # Supplement python help()
def handle(name, cfg, _cloud, log, _args):
- files = cfg.get('write_files')
+ files = get_deferred_files(cfg)
if not files:
log.debug(("Skipping module named %s,"
" no/empty 'write_files' key in configuration"), name)
@@ -172,6 +181,14 @@ def handle(name, cfg, _cloud, log, _args):
write_files(name, files)
+def get_deferred_files(cfg, deferred=False):
+ """Filer files based on whether the items is deferred."""
+ filtered_files = []
+ for f in cfg.get('write_files', []):
+ filtered_files.append(f) if f.get("deferred", False) == deferred
+ return filtered_files
+
+
def canonicalize_extraction(encoding_type):
if not encoding_type:
encoding_type = '' |
I also agree with this point.
Thank you for your flexibility @lucendio , and also thank you for the explanation. It's helpful to understand everybody's perspectives. Both myself and @blackboxsw prefer to keep the write-files definition out of the users-groups module, so let's limit it to one of the two choices listed here. I know @blackboxsw said he prefer's option 2, but I actually prefer option 1. I think both of us are flexible on this point and we've all explained our reasonings, so feel free to be the tie-breaker. If you're still interested in carrying out the implementation, we still welcome it.
I think it makes sense to just extend existing schemas, and then the new module can refer people to the other schema in the docstring. |
|
Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close. If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging mitechie, and he will ensure that someone takes a look soon. (If the pull request is closed and you would like to continue working on it, please do tag mitechie to reopen it.) |
|
@lucendio , are you still working on this? |
Yes. Although, little progress due to some vacation days. I plan to pick of the pace over the upcoming weeks. I'm sorry for letting this go stale! |
During a follow-up discussion it was desiced to extend the user interface instead of shuffling modules around. So, this commit reverts the initial approach of the PR canonical#916 before starting to implement the solution being discused. This reverts commit 1888f4f.
THe main idea is to introduce a second module that takes care of writing files, but in the 'final' stage. While the introduction of a second module would allow for choosing the appropriate place withing the order of modules (and stages), there is no addition top-level directive being added to the cloud configuration schema. Instead, existing schemes are being extended, like the 'users' directive. The new module 'write-deferred-files' tries to reuse as much as possible of the 'write-files' functionality, including the scheme or the 'write_files' function. further details on how it would be used, are shown in the scheme definition of the module.
This is an initial small chuck of unittests, which verifie that
(a) a broken scheme ('path' is required) will log - similar
to the scheme testing in test_handler_write_files.py; it
even reuses some of the test data
(b) when gathering file configurations from the 'users'
directive, the defaults indicated by the docs are being
set
As this module is build upon the 'write-files' module, a larger
part is already covered by unit tests. Still, more tests -
especially integration tests - are going to be added before
getting this branch into mainline.
This change set implements the feedback from recent discussions [1] on how to best
support writing files at some later point during execution, so that, for example, a
user who is configured as file owner would exist when the file is being created.
THe general idea here is to keep the structure of the handlers of both modules
(write_files[_deferred]) as similar as possible, since they do pretty much the
same. The main difference is the lambda function used to filter deferred or not
deferred files.
For now, there is just some unit testing for the module handler. Integration tests
should be added after the implementation has been reviewed.
Specific details worth mentioning:
* 'validate_cloudconfig_schema' has been moved to the top of the handler functions,
because
a) even if the 'write_files' top-level directive is configured and not empty, it
might still be empty for one of the two write-file modules
b) the schema validation is a formal check and has no awareness of any logical
semantics, so it makes sense to first verify the former before moving forward
* the new module handling deferred files has been renamed
* write_files.handle has been slightly restructured to adhere to the changes in
write_files_deferred.handle
* revert all changes made to cc_users_groups.py in this branche while working on
the deferred files feature
[1] canonical#916
* mostly line length or too many spaces
This also fixes the last flake8 complaints.
If scheme validation fails, it will just warn but not stop the execution. As @TheRealFalcon put it: it's better to log a warning and do what the user intended rather than log a warning and do the wrong thing.
THis test is supposed to verify that files flagged as deferred are not being created by the 'write_files' module.
The test is supposed to verify that:
* deferred file writings are indeed happening at a later
stage (compared to the default behaviour of write_files)
* deferred files are created after users & groups have
been created
* thus deferred files can be owned by such users
Should cause a failure during integration testing
This reverts commit a401f72.
This reverts commit e718485.
The test is supposed to verify that:
* deferred file writings are indeed happening at a later
stage (compared to the default behaviour of write_files)
* deferred files are created after users & groups have
been created
* thus deferred files can be configured with and owned by
users or groups created during the same cloud-init run
Co-authored-by: James Falcon <james.falcon@canonical.com>
|
Thanks for all of your work on this @lucendio ! I'm pushing a rebase to fix the CI failure. If it passes I'll get this merged. |
1cb24aa to
28b4bb3
Compare
Got confirmation my review is sufficient
My pleasure. And, thank you for your collaboration. |
|
I would like to point out what I consider is a bug. In my case the default user is called ubuntu ;) After running everthing: I guess the correct way should be to inherit the ownership for the folder as well, unless pointed out in another way. What do you think? |
|
That sounds reasonable, @jlanza. Would you mind creating a bug report? |
|
I just ran into this issue where I'm trying to assign a user as owner of a file, but cloud init says the user couldn't be found:
This PR was meant to fix that, but I'm still getting it on both ubuntu v22 and v24. Is there a way to know starting from what release this would be fixed? |
|
@marcospgp , this functionality is already present on all supported Ubuntu releases including 20.04, 22.04, and 24.04. |
Proposed Commit Message
Additional Context
The reasoning in full length can be found here: https://bugs.launchpad.net/cloud-init/+bug/1486113
Test Steps
Please see this comment to find a reproducible example.
Checklist: