Test Shamir-to-Transit and Transit-to-Shamir Seal Migration for post-1.4 Vault.#9214
Conversation
This reverts commit 39593defd0d4c6fd79aedfd37df6298391abb9db.
This reverts commit a776452.
|
|
||
| // SetRaftAddressProviders sets a ServerAddressProvider for all the nodes in a | ||
| // cluster. | ||
| func SetRaftAddressProviders(t testing.T, cluster *vault.TestCluster, provider raftlib.ServerAddressProvider) { |
There was a problem hiding this comment.
How come this is no longer used/required?
There was a problem hiding this comment.
I'm doing it inside the ReusableRaftStorage now
vishalnayak
left a comment
There was a problem hiding this comment.
Outside of the one TODO that you have out there, it is looking good to me. Nice work!
Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com>
Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com>
Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com>
|
|
||
| isLeader, _, _, err := core.Leader() | ||
| if err != nil { | ||
| t.Fatal(err) |
There was a problem hiding this comment.
I would probably not error out here. Allow errors until the timeout elapses in case the cluster is still coming up and behaving strangely.
There was a problem hiding this comment.
(but still report the last error encountered upon timeout, don't just report a timeout)
| time.Sleep(time.Second) | ||
| } | ||
|
|
||
| return 0, fmt.Errorf("timeout waiting leader") |
There was a problem hiding this comment.
0 is a valid index, I would return -1 here.
| } else { | ||
| cluster.UnsealCore(t, core) | ||
| } | ||
| cluster.UnsealCore(t, core) |
There was a problem hiding this comment.
I don't think this is is equivalent logic to what was previously here. We don't want to use recovery keys to unseal we want to wait for the stored key to do the job for us.
There was a problem hiding this comment.
It is equivalent logic to what was originally here though -- we currently don't have any tests that call this function with stored keys.
| } | ||
|
|
||
| // AwaitLeader waits for one of the cluster's nodes to become leader. | ||
| func AwaitLeader(t testing.T, cluster *vault.TestCluster) (int, error) { |
There was a problem hiding this comment.
We already have a wait for leader function, do we need both?
There was a problem hiding this comment.
I needed a function that returned the index of the leader core
| basePort_ShamirToTransit_Pre14 = 20000 | ||
| basePort_TransitToShamir_Pre14 = 21000 | ||
| basePort_ShamirToTransit_Post14 = 22000 | ||
| basePort_TransitToShamir_Post14 = 23000 |
There was a problem hiding this comment.
A little while back we did a lot of work to remove static ports from our tests which removed a lot of flakiness due to port collisions. Instead of re-introducing static ports can we use the in-memory cluster listener and a random API port? Here's an example:
Lines 182 to 193 in 6e2b91f
| var conf = vault.CoreConfig{ | ||
| Logger: logger.Named("migrateFromShamirToTransit"), | ||
| Logger: logger.Named("migrateFromShamirToTransit"), | ||
| DisablePerformanceStandby: true, |
There was a problem hiding this comment.
How come we need to disable performance standbys here?
| var conf = vault.CoreConfig{ | ||
| Logger: logger.Named("initializeShamir"), | ||
| Logger: logger.Named("initializeShamir"), | ||
| DisablePerformanceStandby: true, |
There was a problem hiding this comment.
Same here, do we need this disabled?
| // joinRaftFollowers unseals the leader, and then joins-and-unseals the | ||
| // followers one at a time. We assume that the ServerAddressProvider has | ||
| // already been installed on all the nodes. | ||
| func joinRaftFollowers(t *testing.T, cluster *vault.TestCluster, useStoredKeys bool) { |
There was a problem hiding this comment.
Why not use the test helpers version of this?
* master: (31 commits) changelog++ changelog++ Ui/replication status discoverability (#8705) Update CHANGELOG.md Counter that increments on every secret engine lease creation. (#9244) Add password_policy field to Azure docs (#9249) Replaced ClusterMetricSink's cluster name with an atomic.Value. (#9252) Fix database creds rotation panic for nil resp (#9258) changelog++ changelog++ Move sdk/helper/random -> helper/random (#9226) UI: Disallow kv2 with too large 'max versions' value (#9242) Allow mTLS for mysql secrets engine (#9181) docs: add sample revocation for mongodb (#9245) Add new Telemetry config options (#9238) Add a simple sealed gauge, updated when seal status changes (#9177) Test Shamir-to-Transit and Transit-to-Shamir Seal Migration for post-1.4 Vault. (#9214) Configure metrics wrapper with the "global" object, not just the fanout. (#9099) changelog++ Add backend type to audit logs (#9167) ...
…1.4 Vault. (#9214) * move adjustForSealMigration to vault package * fix adjustForSealMigration * begin working on new seal migration test * create shamir seal migration test * refactor testhelpers * add VerifyRaftConfiguration to testhelpers * stub out TestTransit * Revert "refactor testhelpers" This reverts commit 39593defd0d4c6fd79aedfd37df6298391abb9db. * get shamir test working again * stub out transit join * work on transit join * remove debug code * initTransit now works with raft join * runTransit works with inmem * work on runTransit with raft * runTransit works with raft * cleanup tests * TestSealMigration_TransitToShamir_Pre14 * TestSealMigration_ShamirToTransit_Pre14 * split for pre-1.4 testing * add simple tests for transit and shamir * fix typo in test suite * debug wrapper type * test debug * test-debug * refactor core migration * Revert "refactor core migration" This reverts commit a776452. * begin refactor of adjustForSealMigration * fix bug in adjustForSealMigration * clean up tests * clean up core refactoring * fix bug in shamir->transit migration * stub out test that brings individual nodes up and down * refactor NewTestCluster * pass listeners into newCore() * simplify cluster address setup * simplify extra test core setup * refactor TestCluster for readability * refactor TestCluster for readability * refactor TestCluster for readability * add shutdown func to TestCore * add cleanup func to TestCore * create RestartCore * stub out TestSealMigration_ShamirToTransit_Post14 * refactor address handling in NewTestCluster * fix listener setup in newCore() * remove unnecessary lock from setSealsForMigration() * rename sealmigration test package * use ephemeral ports below 30000 * work on post-1.4 migration testing * clean up pre-1.4 test * TestSealMigration_ShamirToTransit_Post14 works for non-raft * work on raft TestSealMigration_ShamirToTransit_Post14 * clean up test code * refactor TestClusterCore * clean up TestClusterCore * stub out some temporary tests * use HardcodedServerAddressProvider in seal migration tests * work on raft for TestSealMigration_ShamirToTransit_Post14 * always use hardcoded raft address provider in seal migration tests * debug TestSealMigration_ShamirToTransit_Post14 * fix bug in RestartCore * remove debug code * TestSealMigration_ShamirToTransit_Post14 works now * clean up debug code * clean up tests * cleanup tests * refactor test code * stub out TestSealMigration_TransitToShamir_Post14 * set seals properly for transit->shamir migration * migrateFromTransitToShamir_Post14 works for inmem * migrateFromTransitToShamir_Post14 works for raft * use base ports per-test * fix seal verification test code * simplify seal migration test suite * simplify test suite * cleanup test suite * use explicit ports below 30000 * simplify use of numTestCores * Update vault/external_tests/sealmigration/seal_migration_test.go Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> * Update vault/external_tests/sealmigration/seal_migration_test.go Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> * clean up imports * rename to StartCore() * Update vault/testing.go Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> * simplify test suite * clean up tests Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com>
This PR introduces unit tests that test Shamir-to-Transit and Transit-to-Shamir Seal Migration for post-1.4 Vault.
The following changes have been made to the test suite to support these changes:
vault.Coresin avault.TestCluster, via the new methodsStopCore()andRestartCore(). This allows tests to simulate bringing an individual node in a cluster up and down.raft.ServerAddressProvidercan optionally be provided at the time theTestClusteris created, if the cluster is using integrated raft storage. In addition, there is an option calledJoinRaftFollowersthat specifies that the raft follower nodes should be joined to the raft leader at the time the cluster is started.teststorage.ReusableRaftStorageare now created with araft.ServerAddressProvider.