Add upgrade integration test#693
Conversation
Add an integration test that roughly mimics many of the manual cloud SRU tests. Also refactored some of the image setup code to make it easier to use in non-fixture code.
d788750 to
397718f
Compare
| print('executing: {}'.format(command)) | ||
| print(instance.execute(command)) |
There was a problem hiding this comment.
Printing these isn't ideal; one has to pass -s to pytest to get the output, and even then it's interleaved with log output:
executing: cloud-id
INFO pycloudlib.instance:instance.py:166 executing: sh -c cloud-id
lxd
Perhaps write the full output to a before and after file (and log the names used)?
| 'systemd-analyze blame', | ||
| 'cloud-init analyze show', | ||
| 'cloud-init analyze blame', | ||
| 'cat $NETCFG_FILE', |
There was a problem hiding this comment.
$NETCFG_FILE isn't set here, we should substitute the appropriate value in.
| 'hostname', | ||
| 'dpkg-query --show cloud-init', | ||
| 'cat /run/cloud-init/result.json', | ||
| '! grep Trace /var/log/cloud-init.log', |
There was a problem hiding this comment.
As we aren't checking the exit codes here, we can drop the ! prefix.
| 'cloud-id' | ||
| ] | ||
| for command in commands: | ||
| print('executing: {}'.format(command)) |
There was a problem hiding this comment.
"==== {} ====".format(command) or similar would make these easier to pick out.
|
I pushed an update to the refactor, but haven't updated the actual test yet. I realize I hurriedly put together that test and pushed it to have something out before I left for the holiday, but I didn't really consider the test itself review-ready yet. Sorry about that! |
OddBloke
left a comment
There was a problem hiding this comment.
The rerefactor looks good to me now, just some nits, thanks!
|
I think this is ready for a full review. We still just log the things we were logging manually before. Eventually it'd be nice to make this test a little "smarter". |
| f.write(instance.execute(command) + '\n') | ||
|
|
||
|
|
||
| @pytest.mark.sru_2020_11 |
There was a problem hiding this comment.
Is this mark appropriate here? Not sure if we're marking all tests we're including in an SRU, or only marking tests that are only relevant to that SRU.
There was a problem hiding this comment.
I think this is good. We may have to at some point add -m "not sru_2020_11" for our typical ci runs when we want to exclude them. Our final SRU pass run can just run -m sru_2020_11 to assert all tests have be validated. We could kick this off as a separate jenkins job at some point to get a passing jenkins log as reference for success. I think applying that mark to all sru manual tests is appropriate.
| def test_upgrade(session_cloud: IntegrationCloud): | ||
| source = get_validated_source() | ||
| if not source.installs_new_version(): | ||
| pytest.skip("Current install method not supported for this test") |
There was a problem hiding this comment.
Let's parameterize the {source.value} in this message to tell us specifically what didn't match
There was a problem hiding this comment.
Skip text isn't actually printed anywhere. There's probably a way to make it show somewhere, but we haven't done it 😂
There was a problem hiding this comment.
The message printed for me as I run pytest -rxXs.
blackboxsw
left a comment
There was a problem hiding this comment.
Looks really good @TheRealFalcon a couple of requests and I think we are good. I'd like to actually see some asserts on hostname using sample cloud-config somehow to make sure plumbing is working across reboot and also check cloud-init init for Tracebacks prior to a reboot as this is something that cached datasources would break across upgrade in a few previous SRUs
| f.write(instance.execute(command) + '\n') | ||
|
|
||
|
|
||
| @pytest.mark.sru_2020_11 |
There was a problem hiding this comment.
I think this is good. We may have to at some point add -m "not sru_2020_11" for our typical ci runs when we want to exclude them. Our final SRU pass run can just run -m sru_2020_11 to assert all tests have be validated. We could kick this off as a separate jenkins job at some point to get a passing jenkins log as reference for success. I think applying that mark to all sru manual tests is appropriate.
| def test_upgrade(session_cloud: IntegrationCloud): | ||
| source = get_validated_source() | ||
| if not source.installs_new_version(): | ||
| pytest.skip("Current install method not supported for this test") |
There was a problem hiding this comment.
The message printed for me as I run pytest -rxXs.
| with session_cloud.launch(launch_kwargs=launch_kwargs) as instance: | ||
| _output_to_compare(instance, before_path, netcfg_path) | ||
| instance.install_new_cloud_init(source, take_snapshot=False) | ||
| instance.instance.restart() |
There was a problem hiding this comment.
I think we need to fail on seeing tracebacks after reset. Also I think we need to check cloud-init init and no Tracebacks as we do in ec2 manual sru test
I also think we should try to inject some userdata that exercises parts of cloud-init on the upgrade test to ensure that various pieces are working
We could use and validate sample jinja template as we did in manual test runs to assert that hostname returned actually contains the right cloud-id from instance data
So if we can use
| instance.instance.restart() | |
| instance.instance.execute('cloud-init init') | |
| tracebacks = instance.instance.execute('grep Traceback /var/log/cloud-init.log') | |
| assert tracebacks == '' | |
| instance.restart(raise_on_cloudinit_failure=True) # Make sure we fail this test if cloud-init fails on next clean boot |
There was a problem hiding this comment.
I think we need to fail on seeing tracebacks after reset. Also I think we need to check cloud-init init and no Tracebacks as we do in ec2 manual sru test
Would instance.restart(raise_on_cloudinit_failure=True) accomplish this? Why would cloud-init init need to run again if it has already run on boot?
There was a problem hiding this comment.
@TheRealFalcon, some individuals who use cloud-init on a running system and don't want a reboot attempt to upgrade cloud-init and re start cloud-init services via systemctl this particular use-case, while not recommended, has raised a number of upgrade issues in the past with cloud-init and pickled DS cache handling across that upgrade path. So it's helpful for us to spot check this as it can be valuable in ascertaining if some of our pickling upgrade paths have broken across re-constitution of that cached datasource. If we do this only after reboot, some platforms invalidate their datasource cache forcing a clean run instead of a dirty run of cloud-init. A fully clean run won't raise such pickle deserializing errors.
blackboxsw
left a comment
There was a problem hiding this comment.
+1 thanks a lot for this @TheRealFalcon
Proposed Commit Message
Add upgrade integration test
Add an integration test that roughly mimics many of the manual cloud
SRU tests. Also refactored some of the image setup code to make it
easier to use in non-fixture code.
Additional Context
Test Steps
pytest tests/integration_tests/test_example.py
Checklist: