Skip to content

[skip-ci] TMT: run system tests on Fedora#24369

Merged
openshift-merge-bot[bot] merged 1 commit into
containers:mainfrom
lsm5:tmt-fedora-centos
May 21, 2025
Merged

[skip-ci] TMT: run system tests on Fedora#24369
openshift-merge-bot[bot] merged 1 commit into
containers:mainfrom
lsm5:tmt-fedora-centos

Conversation

@lsm5
Copy link
Copy Markdown
Member

@lsm5 lsm5 commented Oct 25, 2024

This commit introduces TMT test jobs triggered via packit to run system tests on testing-farm infrastructure. Tests are run for Fedora 41, 42 and rawhide on x86_64. The same test plan will be reused by Fedora for bodhi, zuul and fedora-ci gating tests. Packit will handle syncing of test plan and sources from upstream to downstream.

TODO:
1. Enable jobs for CentOS Stream and aarch64 envs.
2. Enable separate set of jobs for release branches as they need to be
tested with official distro packages, not with bleeding-edge
packages.

Does this PR introduce a user-facing change?

None

@openshift-ci openshift-ci Bot added release-note-none do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Oct 25, 2024
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Oct 25, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lsm5

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 25, 2024
@lsm5 lsm5 force-pushed the tmt-fedora-centos branch from af89e2d to fbd02d2 Compare October 25, 2024 15:14
@github-actions
Copy link
Copy Markdown

A friendly reminder that this PR had no activity for 30 days.

@lsm5 lsm5 force-pushed the tmt-fedora-centos branch from fbd02d2 to ab31b4b Compare November 25, 2024 13:45
@github-actions github-actions Bot removed the stale-pr label Nov 26, 2024
@lsm5 lsm5 force-pushed the tmt-fedora-centos branch 12 times, most recently from 1276d46 to 002ca1c Compare November 27, 2024 13:16
@packit-as-a-service
Copy link
Copy Markdown

Ephemeral COPR build failed. @containers/packit-build please check.

@lsm5 lsm5 force-pushed the tmt-fedora-centos branch 4 times, most recently from 5df7e93 to a2a9383 Compare November 28, 2024 11:03
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 28, 2024
@lsm5 lsm5 force-pushed the tmt-fedora-centos branch from a2a9383 to ff774a1 Compare November 28, 2024 12:29
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 28, 2024
@lsm5 lsm5 force-pushed the tmt-fedora-centos branch from ff774a1 to c92c69f Compare November 28, 2024 13:35
Copy link
Copy Markdown
Collaborator

@inknos inknos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just nitpicking here and there but overall, I think it looks good

Comment thread .packit.yaml Outdated
Comment thread rpm/podman.spec Outdated
Comment thread .packit.yaml
Comment on lines +99 to +112
targets:
- fedora-rawhide
- fedora-42
- fedora-41
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a reason why not to follow packit aliases to follow fedora releases? these will need to be bumped

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this because f40 and aarch64 tests were failing. aarch64 had some criu issues and F40 I don't quite remember what.

This is only a short term thing, and I would like to follow up with enablements for centos stream and all active fedora as well as aarch64

@inknos
Copy link
Copy Markdown
Collaborator

inknos commented Feb 27, 2025

LGTM

@lsm5
Copy link
Copy Markdown
Member Author

lsm5 commented Mar 3, 2025

@containers/podman-maintainers Need reviews. PTAL.

@lsm5
Copy link
Copy Markdown
Member Author

lsm5 commented Mar 3, 2025

@containers/podman-maintainers PTAL. Reviews appreciated.

Copy link
Copy Markdown
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are some major questions here that I see no good answer to.

What is the overall goal of this? Why should this run on every PR? We already run the set set in cirrus so why duplicate runs, that just leads to even more flakes.

Second, even worse we are testing packages that are updated on the fly so any random PR might start failing, we need a proper image build process like we have in cirrus where packages are only updates via a PR.
(I do think testing with up to data packages makes sense on our smaller repos where there is little risk of breakages, however on podman we do know that is will break quite often) Then packit jobs do not integrate with our merge bot protection anyway which just means we confuse all if they need to look at these tests or not.

Another issue I see with these that the task on github is a single name for all tests, I then have to click at least three times until I actually see what failed, with our cirrus setup I see the log with the failing test right away on the github page with a meaningful task name. The logs also do not seem to be using our custom logformatter. This is pretty bad as it makes them so much harder to parse. I now must scroll through the entire log just to see that there was a flake...

Then there is no function of filtering tests on sources like we do in cirrus which means all tests run all the time and the packit tasks seem slower then our cirrus setup. That alone is a major downgrade/regression.

@lsm5
Copy link
Copy Markdown
Member Author

lsm5 commented Mar 3, 2025

Thanks @Luap99 . Overall long term goal is to get rid of Cirrus. I understand we're a long way away from that. Valid concerns given the traffic on podman and Packit UX not fitting into our workflow, and I've mentioned some of the issues to the packit and testing-farm teams already. I think some of those like stable images might be addressable already, but I have yet to verify that.

For now, I would like rpms to be tested at the very least before we cut upstream releases.

So how about this: I could make these tests run only on release branch PRs with triggers based on PR labels. On main branch, the jobs will be seen in the list, but won't run.

If release branch PRs are still a no-go, we could even limit them to only the release PR, I guess in this case someone would need to add the label manually and run a slash-packit test in github comment, unless the label addition can be automated for release PRs.

@Luap99
Copy link
Copy Markdown
Member

Luap99 commented Mar 3, 2025

Yeah to be clear I am not against this per se.

I just want to make clear that this workflow may be a significant downstep from cirrus. We did a of work to get testing close to 30mins so I am not willing to go back to something slower with less features.

Obviously there are upsides with tmt that cirrus will never be able to do, upstream/downstream testing via the same sources is clearly useful and makes your life better as we catch the gating issues in upstream (because well they should be using the same env now). The /packit commands are also nicer to retrigger all failed tests, compared of having to click on all test one by one in cirrus.
So it is not like I don't see the good things about this.

To me the issue here is that we have cirrus and tmt running at the same time, this is extremely confusing for any contributor and maintainer. The current flakes are already hard to explain to contributors, i.e. when they can and cannot ignore failures. It is hard to tell who is going to fix what

And unless we find a way to have a aggregate status task, like "Total Success", I see no way to have tmt in our CI env ever enforcing tests via merge protection. (I believe that is already somewhere tracked as we mentioned it before to them)
And without actual merge protection there is no way we can consider fully switching to it.

I think some of those like stable images might be addressable already, but I have yet to verify that.

I believe so, at least that is what I was told in the past, it should be possible to use own own custom images which I think is quite important for CI stability.

@lsm5
Copy link
Copy Markdown
Member Author

lsm5 commented Mar 4, 2025

@Luap99 ack .

So, how would you like to proceed with this? Would you be ok with enabling this only on release branches / release PRs?

I would like to have these jobs be run somewhere at least once before we cut an upstream release. I'd be totally happy for now if we run them only on release PRs. I'll need to make some changes to the existing draft for that.

Btw, if these Packit jobs are added but kept disabled on the main branch, they'll currently look like below, which can be confusing and I've asked Packit team to be able to hide such jobs or allow customization of status packit/packit-service#2678 (comment)

image

Also RE: time limits

  1. if you see Fedora rawhide log , Fedora 42 log and Fedora 41 log all 4 sets of tests run in parallel on separate environments, so they aren't blocking each other. The local tests finish in the 20-30 min range while remote tests finish within 15 mins.

  2. While these test jobs currently depend on the packit copr rpm build jobs which could increase the total time, we could also have these test jobs run independently of copr job, but of course we lose the benefits of actual rpm testing in that case.

@Luap99
Copy link
Copy Markdown
Member

Luap99 commented Mar 4, 2025

So, how would you like to proceed with this? Would you be ok with enabling this only on release branches / release PRs?

The branch itself doesn't matter to me, I am fine with main as well. But what we need is clear documentation on how these are supposed to be treated, i.e. all contributors and maintainers are free to ignore all packit tasks as they are not enforced as part of https://github.com/containers/podman/blob/main/CONTRIBUTING.md#continuous-integration
And it is your responsibility to react to any failures, if there are new bugs report them as issues, etc...

But that is just my opinion, properly needs the opinion from all maintainers working on podman here.

While these test jobs currently depend on the packit copr rpm build jobs which could increase the total time, we could also have these test jobs run independently of copr job, but of course we lose the benefits of actual rpm testing in that case.

I think that is fine rpm job seems to be around, 6 mins which seems reasonable. 20-30 mins seems a bit much for the sys tests, I would expected them to run much faster than that but I guess this is actually about the podman test but rather the tmt setup creating the test machines,etc.. which is included as part of this and cirrus doesn't show it by default. That said the cirrus schedule time is very fast for most things with less than 1 minute.

The github tasks do not report the time of the job like they do with cirrus so I cannot know what the real time here really is. 30-40mins is acceptable to me but that means the end to end time from push until all tasks are green.


I should likely start a doc somewhere with all my requirements for packit/tmt jobs (but as long as packit is optional and cirrus is the source of truth they are not blockers for this)

@packit-as-a-service
Copy link
Copy Markdown

Ephemeral COPR build failed. @containers/packit-build please check.

@lsm5
Copy link
Copy Markdown
Member Author

lsm5 commented Mar 5, 2025

copr builds are failing currently: https://status.packit.dev/issues/2025-03-05-copr-issue/ . Anyway..

The branch itself doesn't matter to me, I am fine with main as well. But what we need is clear documentation on how these are supposed to be treated, i.e. all contributors and maintainers are free to ignore all packit tasks as they are not enforced as part of https://github.com/containers/podman/blob/main/CONTRIBUTING.md#continuous-integration And it is your responsibility to react to any failures, if there are new bugs report them as issues, etc...

Included another commit updating CONTRIBUTING.md . PTAL. I'm cool with owning Packit and TMT issues and everyone else ignoring them until we reach the point of Packit being our primary CI.

But I hope if any PR causes genuine packaging issues, then change suggestions to rpm/podman.spec will not be ignored. If that's a no-go, I'm fine with handling the rpm changes too.

But that is just my opinion, properly needs the opinion from all maintainers working on podman here.

Ack. I'll ping the others too.

@packit-as-a-service
Copy link
Copy Markdown

Packit jobs failed. @containers/packit-build please check. Everyone else, feel free to ignore.

@packit-as-a-service
Copy link
Copy Markdown

[NON-BLOCKING] Packit jobs failed. @containers/packit-build please check. Everyone else, feel free to ignore.

@packit-as-a-service
Copy link
Copy Markdown

Cockpit tests failed for commit 1a96d55. @martinpitt, @jelly, @mvollmer please check.

@packit-as-a-service
Copy link
Copy Markdown

[NON-BLOCKING] Packit jobs failed. @containers/packit-build please check. Everyone else, feel free to ignore.

@packit-as-a-service
Copy link
Copy Markdown

Cockpit tests failed for commit 1a96d55. @martinpitt, @jelly, @mvollmer please check.

@packit-as-a-service
Copy link
Copy Markdown

[NON-BLOCKING] Packit jobs failed. @containers/packit-build please check. Everyone else, feel free to ignore.

@Luap99
Copy link
Copy Markdown
Member

Luap99 commented Apr 7, 2025

@lsm5 about

# [11:30:49.788033359] # /usr/bin/podman run --network testnet-t557-y9n5yryz --rm quay.io/libpod/testimage:20241011 cat /etc/resolv.conf
# [11:30:50.181337368] nameserver 172.12.7.1
# #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# #|     FAIL: correct search domain
# #| expected: =~ search dns.podman.\*
# #|   actual:    nameserver 172.12.7.1
# #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I assume you test podman-next netavark rpms?
I created #25819 for it which should fix the the test again since we no longer want that behavior.

@lsm5
Copy link
Copy Markdown
Member Author

lsm5 commented Apr 7, 2025

@lsm5 about

# [11:30:49.788033359] # /usr/bin/podman run --network testnet-t557-y9n5yryz --rm quay.io/libpod/testimage:20241011 cat /etc/resolv.conf
# [11:30:50.181337368] nameserver 172.12.7.1
# #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# #|     FAIL: correct search domain
# #| expected: =~ search dns.podman.\*
# #|   actual:    nameserver 172.12.7.1
# #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I assume you test podman-next netavark rpms? I created #25819 for it which should fix the the test again since we no longer want that behavior.

Yes, thanks a lot!

@lsm5
Copy link
Copy Markdown
Member Author

lsm5 commented Apr 11, 2025

/packit retest-failed

@lsm5
Copy link
Copy Markdown
Member Author

lsm5 commented Apr 22, 2025

@containers/podman-maintainers PTAL.

This commit introduces TMT test jobs triggered via packit to run system
tests on testing-farm infrastructure. Tests are run for
Fedora 41, 42 and rawhide on x86_64. The same
test plan will be reused by Fedora for bodhi, zuul and fedora-ci gating
tests. Packit will handle syncing of test plan and sources from upstream
to downstream.

Packit failure notification has also been updated to be less noisy and
let people know they are free to ignore any failures.

TODO:
1. Enable jobs for CentOS Stream and aarch64 envs.
2. Enable separate set of jobs for release branches as they need to be
   tested with official distro packages, not with bleeding-edge
   packages.

Signed-off-by: Lokesh Mandvekar <lsm5@fedoraproject.org>
@baude
Copy link
Copy Markdown
Member

baude commented May 6, 2025

LGTM

@lsm5
Copy link
Copy Markdown
Member Author

lsm5 commented May 20, 2025

@containers/podman-maintainers mergeme

@l0rd
Copy link
Copy Markdown
Member

l0rd commented May 21, 2025

/lgtm

@lsm5
Copy link
Copy Markdown
Member Author

lsm5 commented May 21, 2025

/cherrypick v5.5

@openshift-cherrypick-robot
Copy link
Copy Markdown
Collaborator

@lsm5: new pull request created: #26173

Details

In response to this:

/cherrypick v5.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note-none

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants