ci: require secrets.PYCLOUDLIB_TOML, add lxd_vm and ec2 scheduled jobs#6715
ci: require secrets.PYCLOUDLIB_TOML, add lxd_vm and ec2 scheduled jobs#6715blackboxsw wants to merge 7 commits into
Conversation
4f73b0f to
ed8690d
Compare
| awk '/cloud-init version: /{printf DEB_VERSION=$NF; exit}' pytest-${{ inputs.platform }}-${{ inputs.release }}-${{ inputs.image_type }}.log | ||
| awk '/image-serial: /{printf IMAGE_SERIAL=$NF; exit}' pytest-${{ inputs.platform }}-${{ inputs.release }}-${{ inputs.image_type }}.log | ||
| shell: bash |
There was a problem hiding this comment.
What is are the awk commands doing here?
And why bash?
There was a problem hiding this comment.
Ahh I was going to use these to help create a top-level report output that would announce what version of cloud-init was installed during the test. I'll drop this from this PR until I have a working approach. I didn't like how opaque our GH workflow runs are when compared to jenkins jobs which announce the version of cloud-init being tested.
There was a problem hiding this comment.
I was originally stuffing env vars them into GITHUB_ENV per these docs and the ctrf.io step can extract and use the environment variables for report headers or summary.
b16406d to
be806da
Compare
| uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 | ||
| - name: Setup LXD | ||
| if: ${{ env.REQUIRED_SECRET != '' and contains(fromJSON('["lxd_vm", "lxd_container"]'), env.CLOUD_INIT_PLATFORM ) }} |
There was a problem hiding this comment.
Avoid the setup-lxd step if we are not on CLOUD_INIT_OS_PLATFORM lxd_container of lxd_vm.
| test '${{ secrets.PYCLOUDLIB_TOML }}' != '' || echo "ERROR: Missing required repo secrets.PYCLOUDLIB_TOML non-empty value." | ||
| test '${{ secrets.PYCLOUDLIB_TOML }}' == '' && exit 1 |
There was a problem hiding this comment.
This checks for the same thing twice, I think an if / else would be cleaner
There was a problem hiding this comment.
Done. We only really care to exit 1 in the face of no secrets.PYCLOUDLIB_TOML or empty string. So, I just put the operation in a single if clause.
| - name: Checkout | ||
| if: ${{ env.REQUIRED_SECRET != '' }} |
There was a problem hiding this comment.
Since env.REQUIRED_SECRET is assigned from secrets.PYCLOUDLIB_TOML, and the first step has an exit 1 when secrets.PYCLOUDLIB_TOML is empty, is this not redundant?
Same comment elsewhere.
There was a problem hiding this comment.
Dropped all unneessary conditionals checks. An exit 1 above will prevent running these steps anyway.
0c75a04 to
3937aae
Compare
|
@blackboxsw let me know when this is ready for re-review |
| @@ -0,0 +1,13 @@ | |||
| name: "Daily: Integration Resolute lxd_vm" | |||
There was a problem hiding this comment.
The naming scheme isn't going to work with future expansion of new series, additional platforms, etc.
I would prefer that we do it in a way that isn't going to force us to do the same thing again the next time we add more platforms - something a little bit more sustainable would be better.
Also - numbers simultaneously increasing and decreasing is chaotic.
There was a problem hiding this comment.
I have dropped all unrelated renames and moved all integration platform runs to the end of the ordered list. Each platform is now grouped and reserves 10 workflow test slots for each platform.
Platform group prefix will be the following for each platform:
- 100: lxd_container
- 110: lxd_vm
- 120: ec2
The oldest tested series (22.04) will be the first index at prefix 100-. Each additional supported series will increment from that initial platform index:
- XX0: 22.04
- XX1: 24.04
- XX2: 25.10
- XX3: 26.04
|
@holmanb I have updated this branch and linked a successful runs on Ec2. |
Rename the reusable dispatch workflow to 100-dispatch-common.yml to establish a convention where 1xx = reusable/callable workflows. Workflow indexes 11x/12x/13x are caller workflows grouped by platform: lxd_container, lxd_vm and ec2. Move scheduled daily-integration-*-lxd-container.yml files to 11x-daily-integration-*-lxd_container.yml equivalent. Each platform group of scheduled workflows will be ordered by increasing Ubuntu series: 22.04, 24.04, 25.10, 26.04.
…er temp - Declare PYCLOUDLIB_TOML, SSH_PRIVATE_KEY, and SSH_PUBLIC_KEY as required secrets on the workflow_call trigger. - Add an early assertion step that exits immediately if PYCLOUDLIB_TOML is empty. - Replace the ssh-keygen throwaway key with a new 'Setup SSH' step that writes the injected public/private key pair to ~/.ssh, for use in authentication to instances under test - base64-decode PYCLOUDLIB_TOML secret and write it to $RUNNER_TEMP/pycloudlib.toml to keep secrets off the persistent filesystem. Add cleanup to remove the file unconditionally on exit.
Add workflow files 120-123 for lxd_vm, daily, on all supported releases. Add workflow files 130-133 for EC2, twice weekly, all supported releases All new caller workflows expose an install_source workflow_dispatch input, defaulting to ppa:cloud-init-dev/daily, and forward the three required secrets to the common dispatch workflow.
Avoid dependency on ctrf.io integrations and shift to pytest-json-ctrf for report formatting needs. - Add pytest-json-ctrf to integration-requirements.txt; this plugin writes a native CTRF JSON report file and provide --ctrf to pytest. - Set a descriptive title that includes platform, release, image_type, and install_source. Job and environment cleanup: - Move CLOUD_INIT_OS_IMAGE, CLOUD_INIT_OS_IMAGE_TYPE, CLOUD_INIT_CLOUD_INIT_SOURCE, and CLOUD_INIT_LOCAL_LOG_PATH from the job-level env block into the 'Run integration Tests' step env, keeping the job env focused on credentials and config that steps other than the test runner need. - Add a default of 'ppa:cloud-init-dev/daily' to the workflow_call install_source input so callers that omit it still get a sensible value. - Add install_source workflow_dispatch input with the same default to the lxd_container caller workflows (110-113) and pass it through.
|
@holmanb I believe I have addressed all questions or suggestions. I have refactored the commits into logically separate commits for ease of review. Final successful run https://github.com/blackboxsw/cloud-init/actions/runs/24911475438/job/72953869594. |
| uses: ./.github/workflows/100-dispatch-common.yml | ||
| with: | ||
| release: resolute | ||
| platform: ec2 |
There was a problem hiding this comment.
I see now that an issue in one of my latest comments was introduced by following one of the recommendations suggested here.
Please self-review for consistency with the existing code when applying suggested changes.
There was a problem hiding this comment.
Given that we now decided below to shift this to a required parameter, all workflows will provide image_type for both cheduled and manual workflow dispatch.
| run: | | ||
| sh -c 'echo "${{ secrets.PYCLOUDLIB_TOML}}" | base64 -d > "$PYCLOUDLIB_CONFIG"' |
There was a problem hiding this comment.
We'll prefer to validate that the content is base64-encoded. Will perform this validation by checking for a non-empty decoded PYCLOUDLIB_CONFIG after processing.
There was a problem hiding this comment.
After review, we'll allow the base64 -d failure and non-zero exit code leave us with an adequate breadcrumb to resolve this.
| options: | ||
| - generic | ||
| - minimal | ||
| required: false |
There was a problem hiding this comment.
I would have suggested making this required instead
There was a problem hiding this comment.
Shifted to required: true. All scheduled workflows provide image_type: generic currently.
| uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 | ||
| - name: Setup LXD | ||
| if: ${{ contains(fromJSON('["lxd_vm", "lxd_container"]'), env.CLOUD_INIT_PLATFORM ) }} | ||
| uses: canonical/setup-lxd@8c6a87bfb56aa48f3fb9b830baa18562d8bfd4ee # v0.1.2 |
There was a problem hiding this comment.
I'm uncertain what point you think has been missed here. I have updated the commit hash and comment to align with the upstream repo tag # v1. This is already resolved. in the second commit from the top. 9cd0ff6.
| - name: Checkout | ||
| uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 | ||
| - name: Setup LXD | ||
| if: ${{ contains(fromJSON('["lxd_vm", "lxd_container"]'), env.CLOUD_INIT_PLATFORM ) }} |
There was a problem hiding this comment.
This workflow is shared and generic. Why is an lxd specialization used here?
There was a problem hiding this comment.
@holmanb the lxd specialization is here because we only want to install the lxd snaps if we are launching LXD backends for integration testing, Azure,GCE and Ec2 don't need this step. I've added a comment.
There was a problem hiding this comment.
I understand why it is needed. But I still don't understand why it is needed here.
There was a problem hiding this comment.
It is in 100-dispatch-common because we allow manual "Run workflow" dispatch directly from this workflow providing a dropdown choice to select the platform which can be lxd_vm or lxd_container. As a result, this conditional logic needs to be in here to allow that flexibility for a manual run which can include lxd_*
| # Dump secrets using a subprocess to avoid accidental leaks while debugging. | ||
| sh -c 'printf "%s\n" "$SSH_PUBLIC_KEY" > ~/.ssh/cloudinit_id_rsa.pub' | ||
| sh -c 'printf "%s\n" "$SSH_PRIVATE_KEY" > ~/.ssh/cloudinit_id_rsa' | ||
| chmod 600 ~/.ssh/cloudinit_id_rsa |
There was a problem hiding this comment.
nit: where code fails on err, this is probably fine
but as a pattern it is undesirable to chmod after writing your secret, because there is a window of time between the two commands where the secret can be accessed by code with lower permissions
There was a problem hiding this comment.
Good thought. Fixed with an early touch ~/.ssh/cloudinit_id_rsa && chmod 600 ~/.ssh/cloudinit_id_rsa
|
|
@holmanb all your comments and copilot's are addressed. I've commented iwth success runs for lxd_vm and lxc_container. |
|
#6864 just landed so Alternate distros CI would pass if we rebased. |
Address review comments from copilot and Brett: - update setup-lxd pinning - validate presence of non-empty required SSH_*_KEY secrets - simplifiy -z secret test conditionals to avoid multi-line issues - use install -m 600 before creating SSH_PRIVATE_KEY file - test and exit 1 on non-encoded PYCLOUDLIB_TOML repo secret - set image_type: generic as defaults in 100-dispatch-common and ec2 job.
|
Renamed PYCLOUDLIB_TOML -> PYCLOUDLIB_TOML_B64 to indicate intent for value. |
| # Dump secrets using a subprocess to avoid accidental leaks while debugging. | ||
| sh -c 'printf "%s\n" "$SSH_PUBLIC_KEY" > ~/.ssh/cloudinit_id_rsa.pub' | ||
| # Create empty cloudinit_id_rsa with file mode 600. | ||
| touch ~/.ssh/cloudinit_id_rsa && chmod 600 ~/.ssh/cloudinit_id_rsa |
There was a problem hiding this comment.
I think this would be cleaner: install -m 600 <(echo $var) file
There was a problem hiding this comment.
I had used that before, but install command isn't available in the workflow and resulted in an error.
There was a problem hiding this comment.
Well crud, it was a typo 'intall command not found`. Shifting it back.
| PYCLOUDLIB_CONFIG: ${{ runner.temp }}/pycloudlib.toml | ||
| CLOUD_INIT_LOCAL_LOG_PATH: ${{ github.workspace }}/cloud_init_test_logs | ||
| run: | | ||
| tox -e integration-tests -- --ctrf=${{ github.workspace }}/report.json --color=yes ${{ inputs.filter_tests || 'tests/integration_tests' }} |
There was a problem hiding this comment.
Why is a default value conditionally set in test logic?
There was a problem hiding this comment.
You are right, it is unnecessary, when absent we won't provide any test paths and cloud-init integration tests will default to tests/integration_tests
There was a problem hiding this comment.
Actually, in testing, the absence of that default/fallback results in dependency errors from unittests this happens because we are already providing posargs which causes tox to drop the default of tests/integration_tests from tox.ini and try to run all tests in all directories. So, unittest dependencies end up 'missing' from integration-test target in this case.
So, we still need a default of fiilter_tests is unset or empty. Given that our workflow dispatch doesn't have a default, the value of filter_tests is an empty string if unspecified. We need the fallback conditional on an empty value to ensure we pass a reasonable posarg to pyttest for the integration-test target. So this logic needs to remain.
| run: | | ||
| sh -c 'echo "${{ secrets.PYCLOUDLIB_TOML_B64}}" | base64 -d > "$PYCLOUDLIB_CONFIG"' | true | ||
| if [ ! -s $PYCLOUDLIB_CONFIG ]; then | ||
| echo "PYCLOUDLIB_TOML_B64 repo secret is not a base64-encoded string" |
There was a problem hiding this comment.
I would just let it exit on error rather than suppressing the error and trying to add a message.
This message assumes a particular failure mode, when in fact multiple failure modes are possible.
There was a problem hiding this comment.
Sure, dropped, it will fail in either case. without a specific message you will still see a `base64 error invalid input"
| - name: Assert required repo secrets are set | ||
| run: | | ||
| if [ -z "$REQUIRED_SECRET" ]; then | ||
| echo "ERROR: Missing required repo secrets.PYCLOUDLIB_TOML_B64 non-empty value." |
There was a problem hiding this comment.
The error message has some issues.
The non-empty value part is confusing. Also, putting a variable name in an error message means this could unknowingly become stale.
Rather than trying to diagnose the root cause of an issue in an error message, why not just state the problem in a clear way?
There was a problem hiding this comment.
I have adapted the message to:
ERROR: Missing required repo secret. Please provide the necessary repo secret at ${{ github.repository }}/settings/secrets/action.
| type: string | ||
| default: 'ppa:cloud-init-dev/daily' | ||
| schedule: | ||
| # Run Mon & Thurs for high-cost test runs |
There was a problem hiding this comment.
higher than lxd* on the local worker. We can measure this once we turn on the new workflows.
| with: | ||
| release: jammy | ||
| platform: ec2 | ||
| install_source: ${{ inputs.install_source || 'ppa:cloud-init-dev/daily' }} |
There was a problem hiding this comment.
Isn't this value conditionally set elsewhere?
There was a problem hiding this comment.
No, this is only conditionally set if we manually use workflow-dispatch via Run workflow button on the specific job. It is not set automatically for the scheduled workflow runner so we need a default in that case.
| pycloudlib>=1!10.0.2,<1!11 | ||
|
|
||
| pytest-timeout | ||
| pytest-json-ctrf # Used for GH ctrf.io test report dashboard |
There was a problem hiding this comment.
It looks like this project is no longer supported?
There was a problem hiding this comment.
Meh, this README update happened after opening this original PR, having seen recent releases this year prior to that event. Strange, pypi also has an updated release one month ago, despite this project mentioning unsupported nature now. I guess we can go back to the supported junit-to-ctrf integrations-config that is currently in tip of main.
- junit-to-ctrf is part of the ctrf projecthttps://github.com/ctrf-io/junit-to-ctrf. And it's under active maintenance with active security advisory handling
| - lxd_container | ||
| - lxd_vm | ||
| - ec2 | ||
| image_type: |
There was a problem hiding this comment.
yes each parameter is needed to manually dispatch the workflow as seen when clicking the run workflow button on the workflow dispatch page. Without these values being selectable, we can't manually dispatch the workflow as those required inputs would be missing.
There was a problem hiding this comment.
Yes, I see that you made it required. What is it needed for?
|
Also, I just unresolved some copilot reviews with comments. |
ef54ca4 to
fbbc886
Compare
blackboxsw
left a comment
There was a problem hiding this comment.
Thank you again @holmanb for finding the gaps or inconsistencies. I responded to everything I've seen open. I 'm not sure what else we need to do with setup-lxd though.
| - lxd_container | ||
| - lxd_vm | ||
| - ec2 | ||
| image_type: |
There was a problem hiding this comment.
yes each parameter is needed to manually dispatch the workflow as seen when clicking the run workflow button on the workflow dispatch page. Without these values being selectable, we can't manually dispatch the workflow as those required inputs would be missing.
| options: | ||
| - generic | ||
| - minimal | ||
| required: false |
There was a problem hiding this comment.
Shifted to required: true. All scheduled workflows provide image_type: generic currently.
| - name: Assert required repo secrets are set | ||
| run: | | ||
| if [ -z "$REQUIRED_SECRET" ]; then | ||
| echo "ERROR: Missing required repo secrets.PYCLOUDLIB_TOML_B64 non-empty value." |
There was a problem hiding this comment.
I have adapted the message to:
ERROR: Missing required repo secret. Please provide the necessary repo secret at ${{ github.repository }}/settings/secrets/action.
| run: | | ||
| sh -c 'echo "${{ secrets.PYCLOUDLIB_TOML}}" | base64 -d > "$PYCLOUDLIB_CONFIG"' |
There was a problem hiding this comment.
After review, we'll allow the base64 -d failure and non-zero exit code leave us with an adequate breadcrumb to resolve this.
| uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 | ||
| - name: Setup LXD | ||
| if: ${{ contains(fromJSON('["lxd_vm", "lxd_container"]'), env.CLOUD_INIT_PLATFORM ) }} | ||
| uses: canonical/setup-lxd@8c6a87bfb56aa48f3fb9b830baa18562d8bfd4ee # v0.1.2 |
There was a problem hiding this comment.
I'm uncertain what point you think has been missed here. I have updated the commit hash and comment to align with the upstream repo tag # v1. This is already resolved. in the second commit from the top. 9cd0ff6.
| with: | ||
| release: jammy | ||
| platform: ec2 | ||
| install_source: ${{ inputs.install_source || 'ppa:cloud-init-dev/daily' }} |
There was a problem hiding this comment.
No, this is only conditionally set if we manually use workflow-dispatch via Run workflow button on the specific job. It is not set automatically for the scheduled workflow runner so we need a default in that case.
| uses: ./.github/workflows/100-dispatch-common.yml | ||
| with: | ||
| release: resolute | ||
| platform: ec2 |
There was a problem hiding this comment.
Given that we now decided below to shift this to a required parameter, all workflows will provide image_type for both cheduled and manual workflow dispatch.
| - name: Checkout | ||
| uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 | ||
| - name: Setup LXD | ||
| if: ${{ contains(fromJSON('["lxd_vm", "lxd_container"]'), env.CLOUD_INIT_PLATFORM ) }} |
There was a problem hiding this comment.
It is in 100-dispatch-common because we allow manual "Run workflow" dispatch directly from this workflow providing a dropdown choice to select the platform which can be lxd_vm or lxd_container. As a result, this conditional logic needs to be in here to allow that flexibility for a manual run which can include lxd_*
| # Dump secrets using a subprocess to avoid accidental leaks while debugging. | ||
| sh -c 'printf "%s\n" "$SSH_PUBLIC_KEY" > ~/.ssh/cloudinit_id_rsa.pub' | ||
| # Create empty cloudinit_id_rsa with file mode 600. | ||
| touch ~/.ssh/cloudinit_id_rsa && chmod 600 ~/.ssh/cloudinit_id_rsa |
There was a problem hiding this comment.
Well crud, it was a typo 'intall command not found`. Shifting it back.
Migrate away from deprecated pytest-json-ctrf reverting to use ctrf intrgration junit-to-ctrf.
Improve our public scheduled integration test coverage to cover lxd_vm and ec2 platforms.
Running integration tests will require 3 repository secrets
PYCLOUDLIB_TOML_B64,SSH_PUBLIC_KEYandSSH_PRIVATE_KEYin order to configure pycloudlib to run integration tests.This PR renames and orders Github workflow files to aid visibility when manually reviewing and running actions.
Add assertion on a non-empty Github repo-level secret named
PYCLOUDLIB_TOML_B64which willbe base64 decoded and written to a temporary file which is set in the environment variable
PYCLOUDLIB_CONFIG. This config is used sa runtime integration test configuration for pycloudlib.Add an assertion step in 20-dispatch-common.yml to error when
secrets.PYCLOUDLIB_TOML_B64 is absent and conditional logic which skips
each integration test step to avoid wasting runner cycles on integration tests.
Proposed Commit Message
See individual commits
Additional Context
Test Steps
Failed run example empty or absent secret PYCLOUDLIB_TOML_B64 https://github.com/blackboxsw/cloud-init/actions/runs/21699149948/job/62575866822
Success RUN with PYCLOUDLIB_TOML_B64 secret set to "[ec2]"
https://github.com/blackboxsw/cloud-init/actions/runs/24084983529/job/70255277685
Merge type