User on Slack reported that after an upgrade of their Flux components, the image-automation-controller (which at the moment still depends on the Git libraries from this controller, and recently started using libgit2 only), stopped working with the following error:
{"level":"error","ts":"2021-07-01T17:52:47.736Z","logger":"controller-runtime.manager.controller.imageupdateautomation","msg":"Reconciler error","reconciler group":"image.toolkit.fluxcd.io","reconciler kind":"ImageUpdateAutomation","name":"flux-system","namespace":"flux-system","error":"unable to clone 'ssh://git@example.com/repo.git', error: Certificate"}
Isolating the issue, we discovered that while the known_hosts entry in their Secret did contain a ssh-rsa item that matched the host key of the server, it resulted in a false mismatch.
Once the user had updated the known_hosts entry in the Secret with the output of ssh-keyscan example.com 2>/dev/null | base64 (containing a ssh-rsa and ssh-ed25519 item), the image-automation-controller started working again.
My educated guess is that something is not working correctly at all times in the custom bit of code we have for validating host keys with libgit2: https://github.com/fluxcd/source-controller/blob/main/pkg/git/libgit2/transport.go#L147-L239, as the error as logged by the controller matches the git2go.ErrCertificate returned by the certCallback.
Slack thread reference: https://cloud-native.slack.com/archives/CLAJ40HV3/p1625162540293300
User on Slack reported that after an upgrade of their Flux components, the image-automation-controller (which at the moment still depends on the Git libraries from this controller, and recently started using
libgit2only), stopped working with the following error:Isolating the issue, we discovered that while the
known_hostsentry in theirSecretdid contain assh-rsaitem that matched the host key of the server, it resulted in a false mismatch.Once the user had updated the
known_hostsentry in theSecretwith the output ofssh-keyscan example.com 2>/dev/null | base64(containing assh-rsaandssh-ed25519item), the image-automation-controller started working again.My educated guess is that something is not working correctly at all times in the custom bit of code we have for validating host keys with
libgit2: https://github.com/fluxcd/source-controller/blob/main/pkg/git/libgit2/transport.go#L147-L239, as the error as logged by the controller matches thegit2go.ErrCertificatereturned by thecertCallback.Slack thread reference: https://cloud-native.slack.com/archives/CLAJ40HV3/p1625162540293300