OKD: Add authorized keys if missing after ignition#2393
OKD: Add authorized keys if missing after ignition#2393fortinj66 wants to merge 1 commit intoopenshift:masterfrom fortinj66:add-authorized_keys_if_missing_after_ignition
Conversation
|
Hi @fortinj66. Thanks for your PR. I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
If this is accepted, #2388 will not be needed |
|
Example log from my test cluster (OKD vSphere IPI): |
|
/ok-to-test |
|
/test okd-e2e-vsphere |
|
sorry for the churn... git and I are not seeing eye to eye after I've made formatting changes with tabs/spaces... |
|
/retest |
|
/test okd-e2e-vsphere |
|
/retest |
3 similar comments
|
/retest |
|
/retest
|
|
/retest |
|
Looks good to me. Needs to be squashed into a single commit with a short description (Ignition may create dropins in authorized_keys.d instead of a single authorized_keys file, so in order to properly manage this file MCO should convert it into a single authorized_keys entry) |
kikisdeliveryservice
left a comment
There was a problem hiding this comment.
Some questions/comments
Once any additional changes needed are made and accepted I'll squash this down and update the commit comment... |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: fortinj66 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
so this covers OCP overall & OKD if there are no sshkeys?
There was a problem hiding this comment.
I believe so... I can't test OCP though...
ignition Ignition on OKD may create dropins in authorized_keys.d instead of a single authorized_keys file, so in order to properly manage this file MCO should convert it into a single authorized_keys entry
|
/retest |
|
/retest |
|
Q: I feel like we've had this discussion before and the take-away was that OKD was going to fix the true underlying issue elsewhere. Was that shelved? |
With this change I can log into the server machine-config-daemon-firstboot.service completes. Here is the example log from my cluster with this change: |
|
Part of the issues I see is that RHCOS != FCOS. FCOS behaves differently. Look at what we had to do to make DNS work properly because of systemd-resolved (#2377) I think this is one of those differences that has now reared its head. Note that OKD 4.6 has the same behavior, but does not have #2087 applied. I'm guessing that 4.5 has the same... |
|
I commented in the bug. However, the must-gather indicates that the cluster is very unhappy, from the start. Before we proceed any further, can we confirm the bug with a clean must-gather? |
I can rebuild my cluster. I'm guessing you want it from the latest 4.7 nightly? I'll run the "vanilla" install without my changes... |
#2087 breaks OKD ssh authentication because an authorized_keys file for the core user is never created and the authorized_keys.d/ignition file is now unavailable/disabled because of #2087 @vrutkovs has reported this also... #2283 I'll post the directory structure once the cluster is up and I can |
|
I'm also going to provision an OCP cluster... |
This was a known tradeoff agreed up on the initial implementation. This PR doesn't feel like the correct fix to mitigating that tradeoff and as we discussed previously, more work elsewhere is probably required. |
|
I'm fine with leaving that change if FCOS adopts setting We can't do that in OKD custom payload, as it would be applied only after -firstboot has passed, which means we can't debug a node before the pivot |
New OKD cluster. ssh with key does not work...OKD Directory Structure for core after cluster install OCP Directory Structure for core after cluster install If you look at the logs for ignition, you can see where the files are created... OKD OCP |
|
If the reason for this PR is a bootstrap, and OKD already uses a custom installer, it seems that the better place to handle the drift is in the custom installer, no? This allows the MCO and Ignition to maintain the aforementioned agreement while providing for the debuggability. https://github.com/darkmuggle/installer/commit/ca7adb5b5305209ab4f9d4a21fbdfa24070b0c1f (YMMV, POC commit) |
I don't really know anything about the installer internals but I'm OK with the concept... I tested the symlink and it works, and MCO correctly removes the symlink and creates a regular file when I update 99_master_ssh as expected @vrutkovs will need to chime in on this... |
IMO, a viable workaround that resolves the problem without changing the MCO and maintains the agreements between various stakeholders is the preferable path forward for the time being. Since the FCOS branch for the installer is for OKD use-cases it seems like an appropriate place to handle this specific bootstrap problem. Without a clear argument for why using the installer to handle this nuanced case will require further review and stakeholder input (and thus delay merging). |
Not for long, we're about to get merged in master soon. Also, this won't help with workers pivot debugging.
Why should we not change MCO? There are two locations where Ignition may write an ssh key - and MCO only manages one of them. It sounds like MCO is the component which needs to be updated to support both? And this PR does exactly that? |
|
Also, with the addition of a couple of lines of code, we can remove the authorized_keys.d/ignition file and eliminate the need for #2087 Remember, this only runs at first boot and is a noop for RHCOS/OCP until and if RHCOS implements authorized_keys.d |
FCOS is using [1] which is tried after The CoreOS and MCO teams made a decision earlier due a security concern and resolution before the RHCOS support for [1] |
@darkmuggle I don't think we are suggesting that the MCO support authorized_keys.d... If fact we are offering the opposite solution which is to create the supported method of using authorized_keys. The issue revolves around the fact that the authorized_keys file is never created anywhere; not in ignition and not during first boot. This forces the creation of authorized_keys from the ignition created authorized_keys.d/ignition file if it exists and it can be supported by the MCO as normal. and if we also delete authorized_keys.d/ignition afterwards, the security issue also goes away... |
Source of truth for MCO to apply any config is through MachineConfig which we can validate at any point of time by looking at applied MC and rendered config. For authorized ssh keys, MCO manages and keep track of only ~/.ssh/authorized_keys file. If MCO will start writing ~/.ssh/authorized_keys.d/ignition implicitly into ~/.ssh/authorized_keys , MCO will have no way to know from where these extra ssh keys are coming as they won't get reflected in applied MC or rendered MC. Also, since MCO doesn't manages them, it won't be able to delete these keys which user may want later on. |
|
I surrender... I'm not an owner or member so I don't have any real responsibility for this codebase. Although I think my solution is appropriate I will not continue to argue for its merits. The ssh issue still remains without a solution. I hope someone is able to come up with one before OKD 4.7 is released. Regards, |
|
Let me take a step back here. @fortinj66, the designed and implemented behavior is this:
#2393 (comment) implies that we need an installer change for SSH keys to work on the bootstrap node on FCOS, I assume because the MCO doesn't manage the bootstrap node. If step 2 in the above list is not working, that sounds like an MCO bug, and I assume we should fix it rather than changing the architecture. But I'm a CoreOS person, not an MCO person, and I can't speak for the MCO folks. |
I do not believe this is actually the case. There does not seem to be a "sync" any where for authorized_keys in OCP or OKD. If there is, please reference the code as I cannot find it and I've looked... You can update it after the cluster is up and that works fine, but it does not create a new authorized_keys file from 99_master_ssh or 99_worker_ssh. I can't speak to the installer. @vrutkovs has already made a comment above with his thoughts... |
And I agree, after the cluster is up... but there is not a sync during cluster creation, otherwise the authorized_keys file would get created and it doesn't. for RHCOS, it is created at ignition. for FCOS, it is created in authorized_keys.d/ignition. All this PR does is copy authorized_keys.d/ignition to authorized_keys so that ssh will work... and since this key matches what is in the bootstrap 99_master_ssh and 99_worker_ssh, MCO is also in sync. That is were the difference/disconnect is. This PR does not meet your architectural guidance, so I have withdrawn it |
|
I wonder if this can be fixed with a symlink (like @darkmuggle suggested above, see https://github.com/darkmuggle/installer/commit/ca7adb5b5305209ab4f9d4a21fbdfa24070b0c1f) - but in https://github.com/openshift/okd-machine-os instead of the installer - this would then account for both bootstrap and non-bootstrap nodes. |
The symlink would be applied only post-pivot, so the host can't be ssh'd to if it can't pivot |
|
Before pivoting the config to disable |
|
Experimenting with this in openshift/okd-machine-os#85 |
OKD Ignite does not create /home/core/.ssh/authorized_keys. It creates /home/core/.ssh/authorized_keys.d/ignition instead.
This patch creates /home/core/.ssh/authorized_keys from /home/core/.ssh/authorized_keys.d/ignition if missing during
machine-config-daemon-firstboot.service.
This puts the system in sync with rendered machine configs.
I do not believe this will effect OCP. My understanding is that ignite on OCP creates /home/core/.ssh/authorized_keys so this will never be applied (Modified so that it would handle no ssh keys being passed as cluster creation)
- What I did
Created writeMissingAuthorizedKeys() function which is called in updateFiles()
- How to verify it
/home/core/.ssh/authorized_keys file exists
- Description for the changelog
Add authorized keys if missing after ignition