Update HCP bootstrapping to support existing clusters#16916
Conversation
jjacobson93
left a comment
There was a problem hiding this comment.
Just one non-blocking comment/question. Looks good
| // IsCloudEnabled returns true if a cloud.resource_id is set and the server mode is enabled | ||
| func (c *RuntimeConfig) IsCloudEnabled() bool { | ||
| if c == nil { | ||
| return false |
There was a problem hiding this comment.
What's the case where RuntimeConfig is nil here?
There was a problem hiding this comment.
There are some tests in command/agent that hit that and panic'd: https://app.circleci.com/pipelines/github/hashicorp/consul/56048/workflows/a6e357e8-4c0f-49a7-a22b-6872e2f391f8/jobs/1349145/parallel-runs/3
a8e7574 to
7956dbc
Compare
jmurret
left a comment
There was a problem hiding this comment.
👏 nice work. just had two small things, but one could possibly cause a subtle bug, so asking for changes/discussion.
| } | ||
|
|
||
| func (b *builder) cloudConfigVal(v *CloudConfigRaw) (val hcpconfig.CloudConfig) { | ||
| val.ResourceID = os.Getenv("HCP_RESOURCE_ID") |
There was a problem hiding this comment.
I think you will need to use LookupEnv here similar to what the SDK uses or else this will set an empty string if it is null. Previous iterations of the SDK also checked for null rather than empty string and would fail. So, probably best to only set this when the env var is actually configured.
There was a problem hiding this comment.
The val variable here is already a zero-value of hcp.CloudConfig because of the named return, so if HCP_RESOURCE_ID is not set then it stays empty. The SDK is checking that it's set first because that function is mutating potentially non-empty config.
I updated this to make the val variable declaration clearer. This isn't really a case to use named returns: style guide.
hanshasselberg
left a comment
There was a problem hiding this comment.
Thanks, this looks great! I like the combined flow and the tests!
I left a couple nits and I have one question: When a cluster with consul 1.14 with CCM integration is upgraded to Consul 1.15, will it rebootstrap because of the missing sucess marker?
|
|
||
| func persistBootstrapConfig(dataDir, cfgJSON string) error { | ||
| func persistBootstrapConfig(dir, cfgJSON string) error { | ||
| if cfgJSON == "" { |
There was a problem hiding this comment.
nit: to further remove differences between new and existing cluster flow, an empty config {} could be stored here.
There was a problem hiding this comment.
What if we send that empty JSON string down from HCP? It would simplify some of the conditional logic in persistAndProcessConfig
There was a problem hiding this comment.
Take a look at the last commit, I updated that bit to expect {} from CCM
There was a problem hiding this comment.
I like the idea to always send valid json 👍
9b3b0ed to
97b080b
Compare
22c7a30 to
9a28f9e
Compare
We want to move away from injecting an initial management token into Consul clusters linked to HCP. The reasoning is that by using a separate class of token we can have more flexibility in terms of allowing HCP's token to co-exist with the user's management token. Down the line we can also more easily adjust the permissions attached to HCP's token to limit it's scope. With these changes, the cloud management token is like the initial management token in that iit has the same global management policy and if it is created it effectively bootstraps the ACL system.
The HCP management token will now be sent in a special field rather than as Consul's "initial management" token configuration. This commit also updates the mock HCP server to more accurately reflect the behavior of the CCM backend.
We want to allow users to link Consul clusters that already exist to HCP. Existing clusters need care when bootstrapped by HCP, since we do not want to do things like change ACL/TLS settings for a running cluster. Additional changes: * Deconstruct MaybeBootstrap so that it can be tested. The HCP Go SDK requires HTTPS to fetch a token from the Auth URL, even if the backend server is mocked. By pulling the hcp.Client creation out we can modify its TLS configuration in tests while keeping the secure behavior in production code. * Add light validation for data received/loaded. * Sanitize initial_management token from received config, since HCP will only ever use the CloudConfig.MangementToken.
9a28f9e to
f798be1
Compare
Replaces #16788
Description
We want to allow users to link Consul clusters that already exist to
HCP. Existing clusters need care when bootstrapped by HCP, since we do
not want to do things like change ACL/TLS settings for a running
cluster.
The plan is to only send down a management token to existing clusters
and nothing else (at least initially).
This PR adds a few commits to that end:
through Raft as we do for the initial management token.
clusters that are going to be supported.
Best reviewed by commit
Testing & Reproduction steps
PR Checklist
TODO