-
Notifications
You must be signed in to change notification settings - Fork 6
[CCXDEV-16280] feat: add konflux-dep-bumps skill #104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
lenasolarova
wants to merge
25
commits into
RedHatInsights:master
Choose a base branch
from
lenasolarova:feat/add-konflux-dep-bumps-skill
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
6f63553
feat: add konflux-dep-bumps skill
lenasolarova b19f7c2
fix: update rebase instructions — checkbox not /rebase comment
lenasolarova 6f7680a
docs: add CI/workflow gotchas from triage experience
lenasolarova a69384a
docs: document bonfire-tekton failure handling in konflux-dep-bumps s…
lenasolarova 356ff30
fix: add Go bin to PATH before running make test
lenasolarova a33372d
Revert "fix: add Go bin to PATH before running make test"
lenasolarova f487982
chore: document linter patterns and fix workflow learnings in skill
lenasolarova e25287e
chore: trigger rebase
lenasolarova 94150c0
chore: trigger rebase
lenasolarova f3eb0d6
chore: trigger rebase
lenasolarova c44af9c
chore: document linter version mismatch pitfall in skill
lenasolarova 79b0c37
chore: clarify linter version check must happen before each commit
lenasolarova 10c724b
chore: document go mod tidy as fix for broken go.sum on bot branches
lenasolarova 114c27d
chore: suggest closing empty bot PRs after go mod tidy produces no diff
lenasolarova 938078e
chore: clarify merge vs close for empty bot PRs after go mod tidy
lenasolarova 2ef8ebd
chore: link Konflux navigation and debugging skills for pipeline log …
lenasolarova fbeb798
chore: try rebase before go mod tidy for broken go.sum
lenasolarova e89d912
chore: clarify oc logs only works while pipeline is running
lenasolarova dbd1376
chore: correct log availability - Konflux retains pods after completion
lenasolarova 7fdc4ed
chore: rewrite step 4 to reflect actual practice
lenasolarova f2b240e
chore: add linter to verify step
lenasolarova c8ba326
chore: fix step 7 - bot PRs don't auto-rebase, must be triggered manu…
lenasolarova 714e293
chore: broaden cross-repo pattern description to include same-issue-p…
lenasolarova e9a618a
chore: remove common failure patterns table and duplicate go.mod fix …
lenasolarova b0d0fd1
chore: fix step 6 fork wording, CI gotcha, and maintenance status tri…
lenasolarova File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,234 @@ | ||
| --- | ||
| name: konflux-dep-bumps | ||
| description: Triage and fix failing Konflux/MintMaker dependency bump PRs (bot-authored, auto-merge enabled). Use when a Renovate/MintMaker PR is stuck due to CI failures. Covers Go and Python repos in the RedHatInsights org. | ||
| --- | ||
|
|
||
| # Konflux Dependency Bump Triage Skill | ||
|
|
||
| MintMaker (Renovate via Konflux) opens dependency bump PRs with auto-merge enabled. When CI fails the PR stalls. This skill covers triage, investigation, and resolution. | ||
|
|
||
| --- | ||
|
|
||
| ## Step 0 — Get the current open Konflux bot PRs | ||
|
|
||
| **Source of truth:** https://github.com/RedHatInsights/processing-tools/tree/master/open_mr_pr/github | ||
|
|
||
| Check the `open-prs-konflux.md` file there. The date at the top of that file tells you when it was last generated. **If the date does not match today's date, the file is stale — run the fetcher locally and inform the user before proceeding:** | ||
|
|
||
| ```bash | ||
| cd /Users/lsolarov/Documents/processing-tools-gh/open_mr_pr/github | ||
| python3 list_repos_prs.py | ||
| cat open-prs-konflux.md | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Step 1 — Check what is failing | ||
|
|
||
| For each stuck PR: | ||
|
|
||
| ```bash | ||
| gh pr checks <PR_NUMBER> --repo RedHatInsights/<REPO> | ||
| ``` | ||
|
|
||
| Note every failing check. The most common are: Go tests, lint, BDD tests, Konflux pipeline, enterprise contract, artifact update (`renovate/artifacts`). Multiple failures often share a single root cause — fix the root and the rest clear. | ||
|
|
||
| --- | ||
|
|
||
| ## Step 2 — Try the cheap fixes first | ||
|
|
||
| Before investigating root cause, try these in order. They resolve a large portion of stuck PRs with no code change. | ||
|
|
||
| **Rebase** — covers go.mod drift or Renovate artifact failures where Renovate hasn't yet attempted to fix it: | ||
|
|
||
| Push an empty commit to the bot PR branch — this re-triggers CI and prompts Renovate to rebase with fresh artifacts: | ||
|
|
||
| ```bash | ||
| BRANCH=$(gh pr view <PR_NUMBER> --repo RedHatInsights/<REPO> --json headRefName --jq '.headRefName') | ||
| git fetch origin $BRANCH | ||
| git checkout $BRANCH | ||
| git commit --allow-empty -m "chore: trigger Renovate rebase" | ||
| git push origin $BRANCH | ||
| ``` | ||
|
|
||
| Note: `/rebase` as a comment does **not** work in these repos. | ||
|
|
||
| **`go mod tidy` directly** — use this when Renovate already ran artifact updates but produced a broken go.sum (e.g. missing checksum entries for a major version bump). **Try the empty commit rebase first** — it may be enough if Renovate's artifact update just didn't run properly. Only move to `go mod tidy` if the rebase attempt doesn't fix it. Clone the bot branch, run `go mod tidy`, verify `go build ./...` passes, then push go.mod and go.sum directly to the bot branch: | ||
|
|
||
| ```bash | ||
| BRANCH=$(gh pr view <PR_NUMBER> --repo RedHatInsights/<REPO> --json headRefName --jq '.headRefName') | ||
| git clone git@github.com:RedHatInsights/<REPO>.git /tmp/<REPO>-fix | ||
| cd /tmp/<REPO>-fix | ||
| git fetch origin $BRANCH && git checkout FETCH_HEAD -b fix-go-sum | ||
| go mod tidy | ||
| go build ./... | ||
| git add go.mod go.sum | ||
| git commit -m "chore: run go mod tidy to fix missing go.sum entries" | ||
| git push origin fix-go-sum:$BRANCH | ||
| ``` | ||
|
|
||
| If `go mod tidy` reverts all of Renovate's changes (go.mod ends up the same as master, or the diff undoes what the bot introduced), the bot PR has no real effect. Let the user know the PR is empty and present the options — do not act yourself: | ||
| - **Let pipelines run and merge** — preferred, because merging signals to Renovate that the bump has been handled and prevents it from recreating the PR. | ||
| - **Close** — may cause Renovate to recreate the same PR, so only do this if you're sure the bump is invalid and you want to block it. | ||
|
|
||
| **Retest** — covers flaky or environment-dependent failures (infrastructure errors clearly unrelated to the dependency change, e.g. Kafka unreachable, OCM API client errors, DB dial failures). Check whether the Konflux pipeline itself passed even if GitHub Actions failed — that is a strong signal the failure is environmental: | ||
|
|
||
| ```bash | ||
| gh pr comment <PR_NUMBER> --repo RedHatInsights/<REPO> --body "/retest" | ||
| ``` | ||
|
|
||
| Note: use `/retest`, not `/ok-to-test` — the latter may not be wired up in all repos. | ||
|
|
||
| If either of these resolves it, move on. If not, proceed to root cause investigation. | ||
|
|
||
| **Exception — bonfire-tekton `deploy-application` failures:** these cannot be triaged with cheap fixes and cannot be triaged without the logs. Use the Konflux navigation and debugging skills (linked in Step 3) to extract the PipelineRun URL and read the logs directly before asking the user for anything. The causes are varied — a missing image tag can result from a failed Konflux build pipeline, a component pointing at the wrong or outdated Quay repo, or changes in multiple repos landing simultaneously. | ||
|
|
||
| --- | ||
|
|
||
| ## Step 3 — Read the logs and identify the root cause | ||
|
|
||
| **GitHub Actions failures** — use `gh run view`: | ||
|
|
||
| ```bash | ||
| gh run view <RUN_ID> --repo RedHatInsights/<REPO> --log-failed 2>&1 | grep -E "undefined|cannot use|no field|incompatible|conflict|Error|FAILED" | head -40 | ||
| ``` | ||
|
|
||
| **Konflux pipeline failures (bonfire-tekton, on-pull-request, or any check containing "konflux")** — for any Konflux failure you must always attempt to read the actual logs before asking the user. Use the two external skills linked below in sequence: | ||
|
|
||
| 1. **[navigating-github-to-konflux-pipelines](https://github.com/konflux-ci/skills/blob/main/skills/navigating-github-to-konflux-pipelines/SKILL.md)** — extracts the PipelineRun URL from the GitHub check run (via `gh api` check-runs, filtering for "konflux" in the check name), then parses the URL to get cluster, namespace, and pipelinerun name. | ||
| 2. **[debugging-pipeline-failures](https://github.com/konflux-ci/skills/blob/main/skills/debugging-pipeline-failures/SKILL.md)** — uses `kubectl`/`oc` with the extracted cluster/namespace/pipelinerun to read logs, inspect TaskRun status, and identify the root cause. | ||
|
|
||
| Always attempt this yourself first. In Konflux, pipeline run pods are retained after completion so logs remain accessible via `kubectl`/`oc` even on finished runs. Only fall back to asking the user to paste a log snippet from the Konflux UI if the cluster token is expired or the pods have been pruned. | ||
|
|
||
| **Understand what broke before deciding how to fix it.** The PR description lists every bumped package with links to release notes — read them for the relevant version. Look specifically for breaking changes: removed fields, renamed types, changed function signatures, altered dependency requirements. | ||
|
|
||
| Key questions to answer: | ||
| - Which bumped package introduced the breakage? | ||
| - Is it the bumped package itself that broke, or something that depends on it? | ||
| - Is the broken package archived / unmaintained? | ||
| - Does a newer compatible version of the affected package exist already? | ||
| - Is the breakage in this repo's own code, or in a shared library that this repo depends on? | ||
|
|
||
| **The last question matters most for scoping the fix.** If the breakage originates in a shared library, fixing it there unblocks all downstream repos at once. Always check whether a direct dependency of the repo is pulling in the broken package transitively (`go mod graph`, `go.mod` indirect entries) before deciding where to fix. | ||
|
|
||
| --- | ||
|
|
||
| ## Step 4 — Choose the right fix | ||
|
|
||
| The most common fixes are code changes in the repo itself (adding constants, fixing API usage, updating imports) or `go mod tidy` to fix artifacts. Beyond that: | ||
|
|
||
| **If the breaking package is archived or unmaintained** — a replacement may be needed, but do not do this autonomously. Explain the situation to the user: what the package is, why it's unmaintained, and what the replacement candidate would be. This is a team decision. Wait for sign-off before touching anything. | ||
|
|
||
| **If the PR is simply incompatible with no clear fix path** — do not pin the dependency back. Just leave the PR unmerged and let the user know. Pinning introduces technical debt and Renovate will keep reopening the PR anyway. | ||
|
|
||
| **Do not** apply fixes directly to repos that get the broken package transitively. Fix it at the source (the shared library), verify the downstream effect with a local `replace` directive, then open one PR instead of many. | ||
|
|
||
| --- | ||
|
|
||
| ## Step 5 — Verify the fix before opening a PR | ||
|
|
||
| **Always verify locally before pushing — no exceptions.** | ||
|
|
||
| For a fix in a shared library, verify the downstream effect by pointing a dependent repo at the local fix using a `replace` directive: | ||
|
|
||
| ```bash | ||
| # In the downstream repo's go.mod, temporarily add: | ||
| replace github.com/RedHatInsights/<SHARED-LIB> => /path/to/local/fix | ||
|
|
||
| go mod tidy | ||
| go build ./... | ||
| go test ./... | ||
| ``` | ||
|
|
||
| If the broken package disappears from `go.mod` and all tests pass, the fix is correct. | ||
|
|
||
| For any fix repo, always run all available tests and the linter locally: | ||
|
|
||
| ```bash | ||
| # Go | ||
| go build ./... && go test ./... | ||
| # Also check Makefile for additional targets (BDD, integration, e2e) | ||
| grep -E "^test|^bdd|^e2e|^integration" Makefile | ||
|
|
||
| # Python | ||
| pip install -r requirements.txt && python -m pytest | ||
|
|
||
| # Linter — use the bumped version from the bot PR's .pre-commit-config.yaml, not the current one on master | ||
| golangci-lint run ./... | ||
| ``` | ||
|
|
||
| **Do not push without local tests and linter passing.** | ||
|
|
||
| --- | ||
|
|
||
| ## Step 6 — Open the fix PR | ||
|
|
||
| Use your existing fork, create a branch from the upstream default branch, apply the fix, run tests, then open a PR. | ||
|
|
||
| The PR description must include: | ||
| - What broke and why (cite the specific breaking change from release notes with a link) | ||
| - Why this fix is the right approach (not just what changed) | ||
| - A link to the Konflux bot PR it unblocks | ||
| - If fixing a shared library: note which downstream repos are affected | ||
|
|
||
| After opening, comment on the stuck Konflux bot PR linking to the fix and noting it can be retested once the fix merges. | ||
|
|
||
| --- | ||
|
|
||
| ## Step 7 — After the fix merges | ||
|
|
||
| Bot PRs do not auto-rebase when a fix lands. Trigger it manually using one of: | ||
|
|
||
| ```bash | ||
| # Preferred — updates the branch via GitHub API (merges master into the bot branch) | ||
| gh pr update-branch <PR_NUMBER> --repo RedHatInsights/<REPO> | ||
|
|
||
| # Alternative — push an empty commit to the bot branch to retrigger CI | ||
| BRANCH=$(gh pr view <PR_NUMBER> --repo RedHatInsights/<REPO> --json headRefName --jq '.headRefName') | ||
| git fetch origin $BRANCH && git checkout $BRANCH | ||
| git commit --allow-empty -m "chore: trigger rebase" | ||
| git push origin $BRANCH | ||
| ``` | ||
|
|
||
| For fixes in shared libraries, downstream bot PRs will not pick up the fix until Renovate opens a new bump including the updated shared library version. | ||
|
|
||
| --- | ||
|
|
||
| ## CI / workflow gotchas | ||
|
|
||
| - **`gh run rerun` does not re-fetch reusable workflows.** When a workflow uses `uses: some-repo/.github/workflows/foo.yaml@master`, the `@master` SHA is resolved once when the run is first created and baked in. `gh run rerun` replays with that same SHA — even if master has since changed. To pick up a new workflow version, push a new commit to trigger a completely fresh run. | ||
|
|
||
| - **Multiple PRs with the same root cause — fix them all at once.** When a wave of bot PRs hits with identical failures (e.g. missing go.sum entry across 5 repos), clone each branch, run `go mod tidy`, and push in one session rather than one by one. | ||
|
|
||
| - **After your fix PR merges, update the bot PR branch** using `gh pr update-branch` or an empty commit (see Step 7). If the conflict is in go.mod/go.sum, run `go mod tidy` directly on the bot branch after updating it. | ||
|
|
||
| - **Coverage drop from removing covered code is not a regression to fix with arbitrary tests.** If you delete duplicated or misplaced code that happened to be well-covered, overall coverage may dip. The right response is to explain why to the team — not to add pointless tests just to hit a number. If coverage is enforced as a CI gate and blocking merges, consider making it non-blocking (`continue-on-error: true`) rather than gaming the percentage. | ||
|
|
||
| ## Triage standards | ||
|
|
||
| - **One root cause can affect many repos.** Always check the full list of open Konflux PRs before starting — a pattern across repos may point to a shared dependency or shared library as the source, or the same issue appearing independently in each repo due to the bumped package (e.g. a linter version bump introducing the same violation type across all repos). | ||
| - **Check the package's maintenance status** (archived? last commit date? open issues?) before deciding on a fix strategy. An archived package needs replacement — always raise this with the user before acting. | ||
| - **Check release dates.** If a breaking change was released very recently, downstream packages may not have had time to react yet. Document this and park the PR rather than applying a workaround. | ||
| - **The renovate.json in these repos is centrally managed** (synced from `processing-tools`) — do not edit it in downstream repos to work around dependency conflicts. Fix the conflict properly. | ||
| - **If a failure involves a processing-tools version bump** (pre-commit hooks, shared workflows, shared scripts), always ask the user before fixing it in the downstream repo. The right fix may be upstream in `processing-tools` itself — which is our repo and where the change should live. Fixing it downstream is a workaround; fixing it upstream unblocks all repos at once. | ||
| - **Never add files to repos unnecessarily.** The fix should be the minimum change that resolves the incompatibility — a go.mod update, an import swap, a version bump. Not a new source file unless genuinely required. | ||
|
|
||
| --- | ||
|
|
||
| ## Similar failures across a similar update | ||
|
|
||
| When multiple repos fail on the same check after a similar bot PR (e.g. all failing after a shared tooling version bump), treat it as one investigation, not many. Pull the logs for a couple of repos and compare — if the root cause is the same, one fix approach applies to all. | ||
|
|
||
| **Ask the user how to fix before acting.** A shared tooling bump is a processing-tools concern — confirm the approach with the user first. The correct fix is to address the underlying issue rather than suppress it in config. | ||
|
|
||
| **When making the fix across repos:** | ||
| - Clone fresh to `/tmp` rather than fighting a local clone that may have a stale lock or uncommitted changes. | ||
| - Branch from upstream master/main, not from the bot PR branch. | ||
| - **Run the linter with the bumped version, not the current one on master — before every commit.** Your fix PR runs lint against master's tooling version, so passing there means nothing for the bot PR. Check the bot PR's `.pre-commit-config.yaml` for the bumped version, install it, and run it against the full repo after each change. Only commit when that version reports no new violations. This avoids the back-and-forth where each fix exposes the next wave of truncated linter output. | ||
| - Run `go build ./...` and `go test ./...` locally before pushing — not just the build. | ||
| - When using `replace_all` to swap a string literal for a constant, the replacement will also hit the constant's own definition. Always verify the const declaration still has the string literal, not a self-reference. | ||
| - After code changes, run the linter locally before pushing — formatters can reformat code after a substitution and cause the pre-commit hook to report "files were modified". | ||
|
|
||
| **Opening fix PRs:** one per repo, linked to the bot PR it unblocks, with a description explaining what changed and why the fix is correct. | ||
|
|
||
| **Pre-existing CI failures.** Verify any failing check was already failing on the bot PR before assuming your change broke it. If it was — note it in your PR description and move on. | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove hardcoded user-specific absolute path.
The path
/Users/lsolarov/Documents/processing-tools-gh/open_mr_pr/githubis specific to the author's machine and will fail for all other users attempting to follow this runbook.📝 Proposed fix
Additionally, consider adding a note above the command block clarifying that users should first navigate to their local clone of the
processing-toolsrepository.📝 Committable suggestion
🤖 Prompt for AI Agents