Skip to content

OKD: Add authorized keys if missing after ignition#2393

Closed
fortinj66 wants to merge 1 commit intoopenshift:masterfrom
fortinj66:add-authorized_keys_if_missing_after_ignition
Closed

OKD: Add authorized keys if missing after ignition#2393
fortinj66 wants to merge 1 commit intoopenshift:masterfrom
fortinj66:add-authorized_keys_if_missing_after_ignition

Conversation

@fortinj66
Copy link
Copy Markdown
Contributor

@fortinj66 fortinj66 commented Feb 6, 2021

OKD Ignite does not create /home/core/.ssh/authorized_keys. It creates /home/core/.ssh/authorized_keys.d/ignition instead.
This patch creates /home/core/.ssh/authorized_keys from /home/core/.ssh/authorized_keys.d/ignition if missing during
machine-config-daemon-firstboot.service.

This puts the system in sync with rendered machine configs.

I do not believe this will effect OCP. My understanding is that ignite on OCP creates /home/core/.ssh/authorized_keys so this will never be applied (Modified so that it would handle no ssh keys being passed as cluster creation)

- What I did

Created writeMissingAuthorizedKeys() function which is called in updateFiles()

- How to verify it
/home/core/.ssh/authorized_keys file exists

- Description for the changelog
Add authorized keys if missing after ignition

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 6, 2021
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

Hi @fortinj66. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Feb 6, 2021
@fortinj66 fortinj66 changed the title WIP Add authorized keys if missing after ignition [WIP] Add authorized keys if missing after ignition Feb 6, 2021
@fortinj66
Copy link
Copy Markdown
Contributor Author

If this is accepted, #2388 will not be needed

@fortinj66
Copy link
Copy Markdown
Contributor Author

Example log from my test cluster (OKD vSphere IPI):

Feb 06 19:34:14 localhost systemd[1]: Starting Machine Config Daemon Firstboot...
Feb 06 19:34:14 localhost machine-config-daemon[2382]: I0206 19:34:14.103968    2382 rpm-ostree.go:261] Running captured: rpm-ostree status --json
Feb 06 19:34:15 localhost machine-config-daemon[2382]: I0206 19:34:15.760772    2382 daemon.go:224] Booted osImageURL:  (33.20210117.3.1)
Feb 06 19:34:15 localhost machine-config-daemon[2382]: I0206 19:34:15.862158    2382 daemon.go:231] Installed Ignition binary version: 2.9.0
Feb 06 19:34:15 localhost machine-config-daemon[2382]: I0206 19:34:15.865477    2382 update.go:598] Checking Reconcilable for config mco-empty-mc to rendered-master-9c89b3d73b2747f940dd156d73c30d3f
Feb 06 19:34:15 localhost machine-config-daemon[2382]: I0206 19:34:15.866931    2382 update.go:1931] Starting update from mco-empty-mc to rendered-master-9c89b3d73b2747f940dd156d73c30d3f: &{osUpdate:true kargs:false fips:false passwd
:false files:false units:false kernelType:false extensions:true}
Feb 06 19:34:15 localhost root[2869]: machine-config-daemon[2382]: Starting update from mco-empty-mc to rendered-master-9c89b3d73b2747f940dd156d73c30d3f: &{osUpdate:true kargs:false fips:false passwd:false files:false units:false ker
nelType:false extensions:true}
Feb 06 19:34:15 localhost machine-config-daemon[2382]: I0206 19:34:15.871445    2382 update.go:1216] Updating files
Feb 06 19:34:15 localhost machine-config-daemon[2382]: I0206 19:34:15.871925    2382 update.go:1510] Creating missing /home/core/.ssh/authorized_keys
Feb 06 19:34:15 localhost machine-config-daemon[2382]: I0206 19:34:15.872032    2382 update.go:1762] Writing SSHKeys at "/home/core/.ssh/authorized_keys"
Feb 06 19:34:15 localhost machine-config-daemon[2382]: I0206 19:34:15.873484    2382 update.go:1294] Deleting stale data

@vrutkovs
Copy link
Copy Markdown
Contributor

vrutkovs commented Feb 6, 2021

/ok-to-test
/test okd-e2e-vsphere
/test e2e-vsphere

@openshift-ci-robot openshift-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 6, 2021
@fortinj66
Copy link
Copy Markdown
Contributor Author

/test okd-e2e-vsphere
/test e2e-vsphere

@fortinj66
Copy link
Copy Markdown
Contributor Author

fortinj66 commented Feb 7, 2021

sorry for the churn... git and I are not seeing eye to eye after I've made formatting changes with tabs/spaces...

@fortinj66
Copy link
Copy Markdown
Contributor Author

/retest

@fortinj66
Copy link
Copy Markdown
Contributor Author

/test okd-e2e-vsphere
/test e2e-vsphere

@fortinj66
Copy link
Copy Markdown
Contributor Author

/retest

3 similar comments
@fortinj66
Copy link
Copy Markdown
Contributor Author

/retest

@fortinj66
Copy link
Copy Markdown
Contributor Author

fortinj66 commented Feb 7, 2021 via email

@fortinj66
Copy link
Copy Markdown
Contributor Author

/retest

@vrutkovs
Copy link
Copy Markdown
Contributor

vrutkovs commented Feb 8, 2021

Looks good to me. Needs to be squashed into a single commit with a short description (Ignition may create dropins in authorized_keys.d instead of a single authorized_keys file, so in order to properly manage this file MCO should convert it into a single authorized_keys entry)

Copy link
Copy Markdown
Contributor

@kikisdeliveryservice kikisdeliveryservice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions/comments

Comment thread pkg/daemon/update.go Outdated
Comment thread pkg/daemon/update.go Outdated
Comment thread pkg/daemon/update.go Outdated
Comment thread pkg/daemon/update.go Outdated
@fortinj66
Copy link
Copy Markdown
Contributor Author

Looks good to me. Needs to be squashed into a single commit with a short description (Ignition may create dropins in authorized_keys.d instead of a single authorized_keys file, so in order to properly manage this file MCO should convert it into a single authorized_keys entry)

Once any additional changes needed are made and accepted I'll squash this down and update the commit comment...

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: fortinj66
To complete the pull request process, please assign darkmuggle after the PR has been reviewed.
You can assign the PR to them by writing /assign @darkmuggle in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Comment thread pkg/daemon/update.go Outdated
Copy link
Copy Markdown
Contributor

@kikisdeliveryservice kikisdeliveryservice Feb 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this covers OCP overall & OKD if there are no sshkeys?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so... I can't test OCP though...

ignition

Ignition on OKD may create dropins in authorized_keys.d instead of a single authorized_keys file, so in order
to properly manage this file MCO should convert it into a single authorized_keys entry
@kikisdeliveryservice kikisdeliveryservice added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 9, 2021
@kikisdeliveryservice kikisdeliveryservice changed the title [WIP] Add authorized keys if missing after ignition [WIP] OKD: Add authorized keys if missing after ignition Feb 9, 2021
@fortinj66
Copy link
Copy Markdown
Contributor Author

/retest
/test okd-e2e-vsphere
/test e2e-vsphere

@fortinj66
Copy link
Copy Markdown
Contributor Author

/retest

@kikisdeliveryservice
Copy link
Copy Markdown
Contributor

kikisdeliveryservice commented Feb 9, 2021

Q: I feel like we've had this discussion before and the take-away was that OKD was going to fix the true underlying issue elsewhere. Was that shelved?

@fortinj66
Copy link
Copy Markdown
Contributor Author

That does mean that on FCOS, unlike on RHCOS, the machine cannot be SSHed to until the MCO is running. But your change still has that property, does it not?

With this change I can log into the server machine-config-daemon-firstboot.service completes.

Here is the example log from my cluster with this change:

Feb 08 22:17:23 localhost systemd[1]: Starting Machine Config Daemon Firstboot...
Feb 08 22:17:24 localhost machine-config-daemon[2384]: I0208 22:17:24.041365    2384 rpm-ostree.go:261] Running captured: rpm-ostree status --json
Feb 08 22:17:26 localhost machine-config-daemon[2384]: I0208 22:17:26.103372    2384 daemon.go:224] Booted osImageURL:  (33.20210117.3.1)
Feb 08 22:17:26 localhost machine-config-daemon[2384]: I0208 22:17:26.210231    2384 daemon.go:231] Installed Ignition binary version: 2.9.0
Feb 08 22:17:26 localhost machine-config-daemon[2384]: I0208 22:17:26.212791    2384 update.go:598] Checking Reconcilable for config mco-empty-mc to rendered-master-b62a234c88f45eccb72d33c17c34a532
Feb 08 22:17:26 localhost machine-config-daemon[2384]: I0208 22:17:26.214076    2384 update.go:1939] Starting update from mco-empty-mc to rendered-master-b62a234c88f45eccb72d33c17c34a532: &{osUpdate:true kargs:false fips:false passw>
Feb 08 22:17:26 localhost root[2871]: machine-config-daemon[2384]: Starting update from mco-empty-mc to rendered-master-b62a234c88f45eccb72d33c17c34a532: &{osUpdate:true kargs:false fips:false passwd:false files:false units:false ke>
Feb 08 22:17:26 localhost machine-config-daemon[2384]: I0208 22:17:26.221307    2384 update.go:1216] Updating files
Feb 08 22:17:26 localhost machine-config-daemon[2384]: I0208 22:17:26.222066    2384 update.go:1518] Creating missing /home/core/.ssh/authorized_keys
Feb 08 22:17:26 localhost machine-config-daemon[2384]: I0208 22:17:26.222145    2384 update.go:1770] Writing SSHKeys at "/home/core/.ssh/authorized_keys"
Feb 08 22:17:26 localhost machine-config-daemon[2384]: I0208 22:17:26.223390    2384 update.go:1294] Deleting stale data

@fortinj66
Copy link
Copy Markdown
Contributor Author

Part of the issues I see is that RHCOS != FCOS. FCOS behaves differently. Look at what we had to do to make DNS work properly because of systemd-resolved (#2377)

I think this is one of those differences that has now reared its head.

Note that OKD 4.6 has the same behavior, but does not have #2087 applied. I'm guessing that 4.5 has the same...

@darkmuggle
Copy link
Copy Markdown

I commented in the bug. However, the must-gather indicates that the cluster is very unhappy, from the start. Before we proceed any further, can we confirm the bug with a clean must-gather?

@fortinj66
Copy link
Copy Markdown
Contributor Author

I commented in the bug. However, the must-gather indicates that the cluster is very unhappy, from the start. Before we proceed any further, can we confirm the bug with a clean must-gather?

I can rebuild my cluster. I'm guessing you want it from the latest 4.7 nightly? I'll run the "vanilla" install without my changes...

@kikisdeliveryservice
Copy link
Copy Markdown
Contributor

kikisdeliveryservice commented Feb 9, 2021

I'd like to know what problem this is really solving bc I thought we already came to a final solution last year on handling authorized_keys in OKD via: #2087

It seems like the base of this is reverting the decision that was already made last year? #2388

@fortinj66
Copy link
Copy Markdown
Contributor Author

I'd like to know what problem this is really solving bc I thought we already came to a final solution last year on handling authorized_keys in OKD via: #2087

#2087 breaks OKD ssh authentication because an authorized_keys file for the core user is never created and the authorized_keys.d/ignition file is now unavailable/disabled because of #2087

@vrutkovs has reported this also... #2283

I'll post the directory structure once the cluster is up and I can oc debug node/xxx into it...

@fortinj66
Copy link
Copy Markdown
Contributor Author

I'm also going to provision an OCP cluster...

@kikisdeliveryservice
Copy link
Copy Markdown
Contributor

kikisdeliveryservice commented Feb 9, 2021

I'd like to know what problem this is really solving bc I thought we already came to a final solution last year on handling authorized_keys in OKD via: #2087

#2087 breaks OKD ssh authentication because an authorized_keys file for the core user is never created and the authorized_keys.d/ignition file is now unavailable/disabled because of #2087

@vrutkovs has reported this also... #2283

I'll post the directory structure once the cluster is up and I can oc debug node/xxx into it...

This was a known tradeoff agreed up on the initial implementation. This PR doesn't feel like the correct fix to mitigating that tradeoff and as we discussed previously, more work elsewhere is probably required.

@vrutkovs
Copy link
Copy Markdown
Contributor

vrutkovs commented Feb 9, 2021

I'm fine with leaving that change if FCOS adopts setting WRITE_AUTHORIZED_KEYS_FRAGMENT env var (just like RHCOS apparently does).

We can't do that in OKD custom payload, as it would be applied only after -firstboot has passed, which means we can't debug a node before the pivot

@fortinj66
Copy link
Copy Markdown
Contributor Author

New OKD cluster. ssh with key does not work...

Newest must-gather

OKD Directory Structure for core after cluster install

sh-5.0# pwd
/var/home/core
sh-5.0# find .
.
./.bash_logout
./.bash_profile
./.bashrc
./.ssh
./.ssh/authorized_keys.d
./.ssh/authorized_keys.d/ignition

OCP Directory Structure for core after cluster install

sh-4.4# pwd
/var/home/core
sh-4.4# find .
.
./.bash_logout
./.bash_profile
./.bashrc
./.ssh
./.ssh/authorized_keys
sh-4.4#

If you look at the logs for ignition, you can see where the files are created...

OKD

Feb 09 22:04:37 localhost ignition[676]: INFO     : Ignition 2.9.0
Feb 09 22:04:37 localhost ignition[676]: INFO     : Stage: files
Feb 09 22:04:37 localhost ignition[676]: INFO     : reading system config file "/usr/lib/ignition/base.d/00-core.ign"
Feb 09 22:04:37 localhost ignition[676]: DEBUG    : parsing config with SHA512: ff6a5153be363997e4d5d3ea8cc4048373a457c48c4a5b134a08a30aacd167c1e0f099f0bdf1e24c99ad180628cd02b767b863b5fe3a8fce3fe1886847eb8e2e
Feb 09 22:04:37 localhost ignition[676]: INFO     : reading system config file "/usr/lib/ignition/base.d/30-afterburn-sshkeys-core.ign"
Feb 09 22:04:37 localhost ignition[676]: DEBUG    : parsing config with SHA512: a30a1921169d5a3b58ef9b25de60783be1add6ea8d05fd44a0746cb60dd1b8a8b34ab51eec5eb14eecc2df2ab6ba1cd3fd7351eed65793d22316ab262a857d95
Feb 09 22:04:37 localhost ignition[676]: INFO     : no config dir at "/usr/lib/ignition/base.platform.d/vmware"
Feb 09 22:04:37 localhost ignition[676]: INFO     : Adding "root-ca" to list of CAs
Feb 09 22:04:37 localhost ignition[676]: INFO     : files: ensureUsers: op(1): [started]  creating or modifying user "core"
Feb 09 22:04:37 localhost ignition[676]: DEBUG    : files: ensureUsers: op(1): executing: "useradd" "--root" "/sysroot" "--create-home" "--password" "*" "--comment" "CoreOS Admin" "--groups" "adm,sudo,systemd-journal,wheel" "core"
Feb 09 22:04:37 localhost ignition[676]: INFO     : files: ensureUsers: op(1): [finished] creating or modifying user "core"
Feb 09 22:04:37 localhost ignition[676]: INFO     : files: ensureUsers: op(2): [started]  adding ssh keys to user "core"
Feb 09 22:04:37 localhost ignition[676]: wrote ssh authorized keys file for user: core

OCP

Feb 09 22:33:57 localhost ignition[985]: INFO     : Ignition 2.6.0
Feb 09 22:33:57 localhost ignition[985]: INFO     : Stage: files
Feb 09 22:33:57 localhost ignition[985]: INFO     : reading system config file "/usr/lib/ignition/base.ign"
Feb 09 22:33:57 localhost ignition[985]: DEBUG    : parsing config with SHA512: ff6a5153be363997e4d5d3ea8cc4048373a457c48c4a5b134a08a30aacd167c1e0f099f0bdf1e24c99ad180628cd02b767b863b5fe3a8fce3fe1886847eb8e2e
Feb 09 22:33:57 localhost ignition[985]: INFO     : Adding "root-ca" to list of CAs
Feb 09 22:33:57 localhost ignition[985]: INFO     : files: ensureUsers: op(1): [started]  creating or modifying user "core"
Feb 09 22:33:57 localhost ignition[985]: DEBUG    : files: ensureUsers: op(1): executing: "useradd" "--root" "/sysroot" "--create-home" "--password" "*" "--comment" "CoreOS Admin" "--groups" "adm,sudo,systemd-journal,wheel" "core"
Feb 09 22:33:57 localhost ignition[985]: INFO     : files: ensureUsers: op(1): [finished] creating or modifying user "core"
Feb 09 22:33:57 localhost ignition[985]: INFO     : files: ensureUsers: op(2): [started]  adding ssh keys to user "core"
Feb 09 22:33:57 localhost ignition[985]: INFO     : files: ensureUsers: op(2): [finished] adding ssh keys to user "core"
Feb 09 22:33:57 localhost ignition[985]: wrote ssh authorized keys file for user: core

@darkmuggle
Copy link
Copy Markdown

If the reason for this PR is a bootstrap, and OKD already uses a custom installer, it seems that the better place to handle the drift is in the custom installer, no? This allows the MCO and Ignition to maintain the aforementioned agreement while providing for the debuggability.

https://github.com/darkmuggle/installer/commit/ca7adb5b5305209ab4f9d4a21fbdfa24070b0c1f (YMMV, POC commit)

@fortinj66
Copy link
Copy Markdown
Contributor Author

fortinj66 commented Feb 10, 2021

If the reason for this PR is a bootstrap, and OKD already uses a custom installer, it seems that the better place to handle the drift is in the custom installer, no? This allows the MCO and Ignition to maintain the aforementioned agreement while providing for the debuggability.

darkmuggle/installer@ca7adb5 (YMMV, POC commit)

I don't really know anything about the installer internals but I'm OK with the concept... I tested the symlink and it works, and MCO correctly removes the symlink and creates a regular file when I update 99_master_ssh as expected

@vrutkovs will need to chime in on this...

@darkmuggle
Copy link
Copy Markdown

@vrutkovs will need to chime in on this...

IMO, a viable workaround that resolves the problem without changing the MCO and maintains the agreements between various stakeholders is the preferable path forward for the time being. Since the FCOS branch for the installer is for OKD use-cases it seems like an appropriate place to handle this specific bootstrap problem. Without a clear argument for why using the installer to handle this nuanced case will require further review and stakeholder input (and thus delay merging).

@vrutkovs
Copy link
Copy Markdown
Contributor

vrutkovs commented Feb 10, 2021

OKD already uses a custom installer

Not for long, we're about to get merged in master soon. Also, this won't help with workers pivot debugging.

a viable workaround that resolves the problem without changing the MCO

Why should we not change MCO? There are two locations where Ignition may write an ssh key - and MCO only manages one of them. It sounds like MCO is the component which needs to be updated to support both? And this PR does exactly that?

@fortinj66
Copy link
Copy Markdown
Contributor Author

Also, with the addition of a couple of lines of code, we can remove the authorized_keys.d/ignition file and eliminate the need for #2087

Remember, this only runs at first boot and is a noop for RHCOS/OCP until and if RHCOS implements authorized_keys.d

@darkmuggle darkmuggle removed the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Feb 10, 2021
@darkmuggle
Copy link
Copy Markdown

darkmuggle commented Feb 10, 2021

Why should we not change MCO? There are two locations where Ignition may write an ssh key - and MCO only manages one of them. It sounds like MCO is the component which needs to be updated to support both? And this PR does exactly that?

FCOS is using [1] which is tried after ~/.ssh/authorized_keys. The reason why the MCO should not manage both locations is that the FCOS implementation of .ssh/authorized_keys.d via [2] is unmanaged by the MCO -- the MCO doesn't create the files. Rather, Ignition wrote a file to a specific location. Assuming that the MCO has ownership over ~/.ssh/authorized_keys.d/ignition is a bit pretentious. When RHCOS supports ~/.ssh/authorized_keys.d the MCO will have to develop a more nuanced understanding of which configuration it owns instead of blindly assuming the ownership without a clear and agreed-upon design.

The CoreOS and MCO teams made a decision earlier due a security concern and resolution before the RHCOS support for ~/.ssh/authorized_keys.d is implemented to solve a debug pain point without full stakeholder discussion is premature.

[1]

AuthorizedKeysCommand
            Specifies a program to be used to look up the user's public keys.  The program must be owned by root, not writable by group or others and specified by an absolute path.  Arguments to AuthorizedKeysCommand accept the tokens described in the
            TOKENS section.  If no arguments are specified then the username of the target user is used.

            The program should produce on standard output zero or more lines of authorized_keys output (see AUTHORIZED_KEYS in sshd(8)).  AuthorizedKeysCommand is tried after the usual AuthorizedKeysFile files and will not be executed if a matching key
            is found there.  By default, no AuthorizedKeysCommand is run.

[2] https://github.com/coreos/ssh-key-dir

@fortinj66
Copy link
Copy Markdown
Contributor Author

Why should we not change MCO? There are two locations where Ignition may write an ssh key - and MCO only manages one of them. It sounds like MCO is the component which needs to be updated to support both? And this PR does exactly that?

FCOS is using [1] which is tried after ~/.ssh/authorized_keys. The reason why the MCO should not manage both locations is that the FCOS implementation of .ssh/authorized_keys.d via [2] is unmanaged by the MCO -- the MCO doesn't create the files. Rather, Ignition wrote a file to a specific location. Assuming that the MCO has ownership over ~/.ssh/authorized_keys.d/ignition is a bit pretentious. When RHCOS supports ~/.ssh/authorized_keys the MCO will have to develop a more nuanced understanding of which configuration it owns instead of blindly assuming the ownership without a clear and agreed upon design.

The CoreOS and MCO teams made a decision earlier due a security concern and resolution before the RHCOS support for ~/.ssh/authorized_keys.d is implemented to solve a debug pain point without full stakeholder discussion is premature.

[1]

AuthorizedKeysCommand
            Specifies a program to be used to look up the user's public keys.  The program must be owned by root, not writable by group or others and specified by an absolute path.  Arguments to AuthorizedKeysCommand accept the tokens described in the
            TOKENS section.  If no arguments are specified then the username of the target user is used.

            The program should produce on standard output zero or more lines of authorized_keys output (see AUTHORIZED_KEYS in sshd(8)).  AuthorizedKeysCommand is tried after the usual AuthorizedKeysFile files and will not be executed if a matching key
            is found there.  By default, no AuthorizedKeysCommand is run.

[2] https://github.com/coreos/ssh-key-dir

@darkmuggle I don't think we are suggesting that the MCO support authorized_keys.d... If fact we are offering the opposite solution which is to create the supported method of using authorized_keys. The issue revolves around the fact that the authorized_keys file is never created anywhere; not in ignition and not during first boot.

This forces the creation of authorized_keys from the ignition created authorized_keys.d/ignition file if it exists and it can be supported by the MCO as normal. and if we also delete authorized_keys.d/ignition afterwards, the security issue also goes away...

@sinnykumari
Copy link
Copy Markdown
Contributor

This forces the creation of authorized_keys from the ignition created authorized_keys.d/ignition file if it exists and it can be supported by the MCO as normal. and if we also delete authorized_keys.d/ignition afterwards, the security issue also goes away...

Source of truth for MCO to apply any config is through MachineConfig which we can validate at any point of time by looking at applied MC and rendered config. For authorized ssh keys, MCO manages and keep track of only ~/.ssh/authorized_keys file. If MCO will start writing ~/.ssh/authorized_keys.d/ignition implicitly into ~/.ssh/authorized_keys , MCO will have no way to know from where these extra ssh keys are coming as they won't get reflected in applied MC or rendered MC. Also, since MCO doesn't manages them, it won't be able to delete these keys which user may want later on.

@fortinj66
Copy link
Copy Markdown
Contributor Author

I surrender... I'm not an owner or member so I don't have any real responsibility for this codebase. Although I think my solution is appropriate I will not continue to argue for its merits.

The ssh issue still remains without a solution. I hope someone is able to come up with one before OKD 4.7 is released.

Regards,
--John

@fortinj66 fortinj66 closed this Feb 10, 2021
@fortinj66 fortinj66 deleted the add-authorized_keys_if_missing_after_ignition branch February 10, 2021 19:01
@bgilbert
Copy link
Copy Markdown
Contributor

Let me take a step back here. @fortinj66, the designed and implemented behavior is this:

  1. On FCOS, Ignition writes ~/.ssh/authorized_keys.d/ignition because we want to support allowing multiple tools to manage their own sets of SSH keys. On RHCOS, there is only one set of tools that manages SSH keys (the combination of Ignition on first boot + MCO afterward) so we just use ~/.ssh/authorized_keys, and there are no plans to switch to ~/.ssh/authorized_keys.d/ignition.
  2. Once the MCO starts managing the system, it syncs ~/.ssh/authorized_keys from the MachineConfig. At that point, SSH keys should work on both FCOS and RHCOS. The MCO does not touch ~/.ssh/authorized_keys.d/ignition.
  3. On FCOS, the MCO also needs to disable ~/.ssh/authorized_keys.d/ignition, or else old keys will continue to work after they've been removed from the MachineConfig.

#2393 (comment) implies that we need an installer change for SSH keys to work on the bootstrap node on FCOS, I assume because the MCO doesn't manage the bootstrap node. If step 2 in the above list is not working, that sounds like an MCO bug, and I assume we should fix it rather than changing the architecture. But I'm a CoreOS person, not an MCO person, and I can't speak for the MCO folks.

@fortinj66
Copy link
Copy Markdown
Contributor Author

2. Once the MCO starts managing the system, it syncs ~/.ssh/authorized_keys from the MachineConfig. At that point, SSH keys should work on both FCOS and RHCOS. The MCO does not touch ~/.ssh/authorized_keys.d/ignition.

I do not believe this is actually the case. There does not seem to be a "sync" any where for authorized_keys in OCP or OKD. If there is, please reference the code as I cannot find it and I've looked... You can update it after the cluster is up and that works fine, but it does not create a new authorized_keys file from 99_master_ssh or 99_worker_ssh.

I can't speak to the installer. @vrutkovs has already made a comment above with his thoughts...

@fortinj66
Copy link
Copy Markdown
Contributor Author

Source of truth for MCO to apply any config is through MachineConfig which we can validate at any point of time by looking at applied MC and rendered config. For authorized ssh keys, MCO manages and keep track of only ~/.ssh/authorized_keys file. If MCO will start writing ~/.ssh/authorized_keys.d/ignition implicitly into ~/.ssh/authorized_keys , MCO will have no way to know from where these extra ssh keys are coming as they won't get reflected in applied MC or rendered MC. Also, since MCO doesn't manages them, it won't be able to delete these keys which user may want later on.

And I agree, after the cluster is up... but there is not a sync during cluster creation, otherwise the authorized_keys file would get created and it doesn't. for RHCOS, it is created at ignition. for FCOS, it is created in authorized_keys.d/ignition. All this PR does is copy authorized_keys.d/ignition to authorized_keys so that ssh will work... and since this key matches what is in the bootstrap 99_master_ssh and 99_worker_ssh, MCO is also in sync.

That is were the difference/disconnect is. This PR does not meet your architectural guidance, so I have withdrawn it

@LorbusChris
Copy link
Copy Markdown
Contributor

I wonder if this can be fixed with a symlink (like @darkmuggle suggested above, see https://github.com/darkmuggle/installer/commit/ca7adb5b5305209ab4f9d4a21fbdfa24070b0c1f) - but in https://github.com/openshift/okd-machine-os instead of the installer - this would then account for both bootstrap and non-bootstrap nodes.

@vrutkovs
Copy link
Copy Markdown
Contributor

this would then account for both bootstrap and non-bootstrap nodes.

The symlink would be applied only post-pivot, so the host can't be ssh'd to if it can't pivot

@LorbusChris
Copy link
Copy Markdown
Contributor

Before pivoting the config to disable ~/.ssh/authorized_keys.d doesn't exist yet and everything should still work, since the MCO hasn't written the sshd config fragment at that point. Or am I missing something?

@vrutkovs
Copy link
Copy Markdown
Contributor

Experimenting with this in openshift/okd-machine-os#85

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants