WIP: OCPBUGS-81712: fixes race condition when mirroring operator catalogs#1390
WIP: OCPBUGS-81712: fixes race condition when mirroring operator catalogs#1390aguidirh wants to merge 6 commits intoopenshift:mainfrom
Conversation
Root Cause: During mirrorToDisk operations, catalog images specified by tag are resolved at two different times: 1. Collection phase (CollectAll) - creates working-dir structure 2. Mirroring phase (Batch.Worker) - mirrors the catalog image If the catalog tag is updated between these phases (pointing to a different digest), the working-dir structure uses one digest but the tarball contains a different digest. This causes diskToMirror to fail in air-gapped environments when it cannot find the catalog in the working-dir and attempts to fetch from the source registry. Fix: Pin catalog images to their resolved digest during collection for mirrorToDisk and mirrorToMirror operations. This ensures the same immutable digest reference is used throughout the entire workflow. Changes: - filtered_collector.go:293-302: Pin catalog to digest in m2d/m2m modes - filtered_collector.go:88-93: Update CatalogToFBCMap key to use digest - filtered_collector_test.go: Update test expectations for digest-based source references while preserving tag-based destinations Impact: - Prevents race conditions when Red Hat updates catalogs during mirroring - Ensures diskToMirror works reliably in air-gapped environments - Preserves original catalog tags in destination registry Signed-off-by: Alex Guidi <aguidi@redhat.com>
Root Cause: With the catalog pinning fix in filtered_collector.go, the CatalogToFBCMap keys now use digest format in M2D/M2M modes: docker://registry/path@sha256:abc123... However, pinSingleCatalogDigest() performs lookups using the original tag reference from the ImageSetConfiguration: docker://registry/path:v4.7 This causes the lookup to fail, preventing generation of pinned ISC/DISC files. Fix: Add a fallback lookup mechanism that searches CatalogToFBCMap by matching the OperatorFilter.Catalog field when direct key lookup fails. This allows pinned ISC generation to work with both tag-based keys (D2M mode) and digest-based keys (M2D/M2M modes). Changes: - pin_catalogs.go:86-101: Add findCatalogByOriginalRef() helper function that searches map by OperatorFilter.Catalog field - pin_catalogs.go:116-126: Update pinSingleCatalogDigest() to try direct lookup first, then fall back to findCatalogByOriginalRef() - pin_catalogs_test.go: Add TestPinSingleCatalog_M2DMapKeyFormat and TestPinCatalogDigests_M2DMapKeyFormat to verify digest-based lookups Impact: - Pinned ImageSetConfiguration generation now works in M2D/M2M modes - Maintains backward compatibility with D2M tag-based lookups - Users can generate pinned ISC/DISC for repeatable mirroring Signed-off-by: Alex Guidi <aguidi@redhat.com>
|
@aguidirh: This pull request references Jira Issue OCPBUGS-81712, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughCatalog handling now uses resolved catalog digests for non‑OCI catalogs: collector writes digest‑keyed entries and rewrites catalog refs to digest‑pinned form; pinning falls back to a linear search by original tag when direct digest-key lookup fails. Tests updated; one file contains commented test pauses. Changes
Sequence Diagram(s)sequenceDiagram
participant Collector as Collector\n(internal/pkg/operator)
participant Pinning as PinningLogic\n(internal/pkg/config)
participant CatalogMap as CatalogMap\n(in-memory map)
participant Mirror as Mirroring\n(disk/mirror)
Collector->>Pinning: request catalog resolution (imgSpec)
alt direct digest-key exists
Pinning->>CatalogMap: lookup by ReferenceWithTransport (digest key)
CatalogMap-->>Pinning: CatalogFilterResult (OriginDigest)
else fallback to original tag
Pinning->>CatalogMap: linear search by OperatorFilter.Catalog (original tag)
CatalogMap-->>Pinning: CatalogFilterResult (OriginDigest) or not found
end
Pinning-->>Collector: resolved digest-pinned catalog ref or not found
Collector->>Mirror: mirror using digest-pinned catalogImage (set targetTag from original tag if unset)
Mirror-->>Collector: mirror result / error
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (10 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aguidirh The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
This PR is a working in progress. There are still a lot of tests to be performed. Reminder for myself: Test OCI scenario, |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
internal/pkg/cli/executor.go (1)
879-881: Remove the test-only pause block before merge.Even commented out, this is dead debug code in the main M2D flow and is easy to accidentally re-enable in a follow-up/cherry-pick.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/pkg/cli/executor.go` around lines 879 - 881, Remove the dead test-only pause block that starts with the TODO comment and the commented println/Scanln lines (the lines containing "PAUSED - Update catalog, press Enter..." and the preceding "//TODO REMOVE ME, THIS IS ONLY FOR TEST PURPOSES."). Delete those three commented lines from internal/pkg/cli/executor.go so no debug pause code remains in the main M2D flow.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/pkg/operator/filtered_collector.go`:
- Around line 89-94: The map key is being set to consts.DockerProtocol +
image.WithDigest(..., result.Digest) which uses the rebuilt catalog digest and
causes lookup misses; change the assignment of mapKey in filtered_collector.go
to use the source/catalog digest (the same value used as Origin in
collectOperator and ExecutorSchema.RebuildCatalogs) instead of result.Digest,
preserving the transport prefix (e.g., consts.DockerProtocol +
image.WithDigest(catalogDigest, ...)) so CatalogToFBCMap is indexed by the
original catalog digest that filterOperator/collectOperator expect.
---
Nitpick comments:
In `@internal/pkg/cli/executor.go`:
- Around line 879-881: Remove the dead test-only pause block that starts with
the TODO comment and the commented println/Scanln lines (the lines containing
"PAUSED - Update catalog, press Enter..." and the preceding "//TODO REMOVE ME,
THIS IS ONLY FOR TEST PURPOSES."). Delete those three commented lines from
internal/pkg/cli/executor.go so no debug pause code remains in the main M2D
flow.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: de6c182e-9972-4c7e-bb09-f7c24a3b6d45
📒 Files selected for processing (5)
internal/pkg/cli/executor.gointernal/pkg/config/pin_catalogs.gointernal/pkg/config/pin_catalogs_test.gointernal/pkg/operator/filtered_collector.gointernal/pkg/operator/filtered_collector_test.go
Fix lint error: add space after // in TODO comment The golangci-lint check failed due to gocritic commentFormatting rule: commentFormatting: put a space between // and comment text Changed: //TODO REMOVE ME To: // TODO REMOVE ME Signed-off-by: Alex Guidi <aguidi@redhat.com>
Fix CatalogToFBCMap key to use source catalog digest The mapKey construction was using result.Digest (from CatalogFilterResult) but should use the source catalog digest to match what collectOperator sets as the catalogImage. This ensures the map key matches the Origin reference used in CopyImageSchema, enabling successful lookups during catalog rebuild. Changed: mapKey = consts.DockerProtocol + image.WithDigest(imgSpec.Name, result.Digest) To: catalogDigest, err := o.getCatalogDigest(ctx, op) mapKey = consts.DockerProtocol + image.WithDigest(imgSpec.Name, catalogDigest) Signed-off-by: Alex Guidi <aguidi@redhat.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
internal/pkg/operator/filtered_collector.go (1)
89-100: Consider returning the source catalog digest fromcollectOperatorto avoid duplicate registry lookups.For M2D/M2M non-OCI cases,
getCatalogDigestis called twice: once here at line 92, and again withincollectOperatorat line 252. Both calls make the same registry request. The duplicate call is necessary here becauseresult.DigestfromcollectOperatormay contain the rebuilt/filtered catalog digest rather than the source digest.This is functionally correct but inefficient. Consider extending
CatalogFilterResultto include aSourceDigestfield, populated bycollectOperator, so it can be reused here without a second registry call.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/pkg/operator/filtered_collector.go` around lines 89 - 100, collectOperator currently performs the registry lookup but only returns result.Digest (which may be the rebuilt digest), causing filtered_collector.go to call getCatalogDigest again; extend CatalogFilterResult with a new SourceDigest field, populate it inside collectOperator with the original/source catalog digest when you perform the initial registry lookup, and then in filtered_collector.go (where mapKey is built using getCatalogDigest) use result.SourceDigest instead of calling getCatalogDigest; keep a safe fallback to call getCatalogDigest only if result.SourceDigest is empty to avoid breaking behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@internal/pkg/operator/filtered_collector.go`:
- Around line 89-100: collectOperator currently performs the registry lookup but
only returns result.Digest (which may be the rebuilt digest), causing
filtered_collector.go to call getCatalogDigest again; extend CatalogFilterResult
with a new SourceDigest field, populate it inside collectOperator with the
original/source catalog digest when you perform the initial registry lookup, and
then in filtered_collector.go (where mapKey is built using getCatalogDigest) use
result.SourceDigest instead of calling getCatalogDigest; keep a safe fallback to
call getCatalogDigest only if result.SourceDigest is empty to avoid breaking
behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 4afdd5a5-2d8d-45e1-92ac-a630a499c8f4
📒 Files selected for processing (1)
internal/pkg/operator/filtered_collector.go
Fix CatalogFilterResult to always use source catalog digest - Rename Digest field to OriginDigest for clarity - Fix filterOperator to use catalogDigest instead of filteredImageDigest when catalog is already filtered, ensuring OriginDigest always contains the source catalog digest - This fixes: 1. Correct map key lookup in executor.go during catalog rebuild 2. Valid pinned ISC/DISC references (filtered digest doesn't exist in source registry) - Update all references to use the OriginDigest field name Signed-off-by: Alex Guidi <aguidi@redhat.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
internal/pkg/operator/filtered_collector.go (1)
89-100: DuplicategetCatalogDigestcall creates minor performance overhead.
getCatalogDigestis called here (line 92) and again insidecollectOperator(line 252). For M2D/M2M with non-OCI catalogs, this results in two registry calls per catalog when resolving tag-based references.Consider refactoring
collectOperatorto return the resolvedcatalogDigestalongsideresult, allowing the map key construction to reuse it:// Example: modify collectOperator signature result, catalogDigest, err := o.collectOperator(ctx, op, relatedImages, copyImageSchemaMap) // Then use catalogDigest for mapKey constructionThis would eliminate the duplicate network call and any theoretical TOCTOU window.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/pkg/operator/filtered_collector.go` around lines 89 - 100, The code currently calls getCatalogDigest twice for non-OCI catalogs (once here and again inside collectOperator), causing duplicate registry calls; refactor collectOperator to return the resolved catalogDigest along with its existing result so the caller can reuse it when building mapKey instead of invoking getCatalogDigest again — update collectOperator signature to return (result, catalogDigest, error) and change the call site to use the returned catalogDigest when constructing mapKey (consts.DockerProtocol + image.WithDigest(imgSpec.Name, catalogDigest)), removing the extra getCatalogDigest invocation and keeping existing behavior for other code paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@internal/pkg/operator/filtered_collector.go`:
- Around line 89-100: The code currently calls getCatalogDigest twice for
non-OCI catalogs (once here and again inside collectOperator), causing duplicate
registry calls; refactor collectOperator to return the resolved catalogDigest
along with its existing result so the caller can reuse it when building mapKey
instead of invoking getCatalogDigest again — update collectOperator signature to
return (result, catalogDigest, error) and change the call site to use the
returned catalogDigest when constructing mapKey (consts.DockerProtocol +
image.WithDigest(imgSpec.Name, catalogDigest)), removing the extra
getCatalogDigest invocation and keeping existing behavior for other code paths.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: f532bb76-49a7-48c0-9452-fa753b51c1f1
📒 Files selected for processing (4)
internal/pkg/api/v2alpha1/type_internal.gointernal/pkg/config/pin_catalogs.gointernal/pkg/config/pin_catalogs_test.gointernal/pkg/operator/filtered_collector.go
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/pkg/config/pin_catalogs_test.go
Enables using result.OriginDigest for map key construction, avoiding an extra getCatalogDigest call (will be done in follow-up commit)
|
@aguidirh: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/jira refresh |
|
@aguidirh: This pull request references Jira Issue OCPBUGS-81712, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Description
This PR fixes a race condition in oc-mirror v2's operator catalog mirroring that causes diskToMirror operations to fail in air-gapped environments.
Root Cause: During mirrorToDisk, catalog images are resolved by tag at two different times:
CollectAll()) - creates working-dir structure based on resolved digestBatch.Worker()) - mirrors the catalog imageIf Red Hat updates a catalog between these phases (the tag now points to a different digest), the working-dir structure is created with one digest but the tarball contains a different digest. During diskToMirror, oc-mirror cannot find the catalog in the working-dir and attempts to fetch it from the source registry (registry.redhat.io), which fails in air-gapped environments.
Solution: Pin catalog images to their resolved digest during collection for mirrorToDisk and mirrorToMirror operations. This ensures the same digest is used throughout the entire workflow, preventing the race condition. Additionally, fix the pinned ImageSetConfiguration generator to handle digest-based CatalogToFBCMap keys when generating pinned ISC/DISC files.
Github / Jira issue: OCPBUGS-81712
Type of change
Code Changes
Primary Fix: Catalog Digest Pinning
File: internal/pkg/operator/filtered_collector.go
Lines 89-100: Updated
CatalogToFBCMapkey generationLines 301-310: Pin catalog to digest during collection
File: internal/pkg/operator/filtered_collector_test.go
Secondary Fix: Pinned ISC/DISC Generation
File: internal/pkg/config/pin_catalogs.go
Lines 86-101: Added
findCatalogByOriginalRef()helper functionOperatorFilter.CatalogLines 116-126: Updated
pinSingleCatalogDigest()lookup logicfindCatalogByOriginalRef()for M2D/M2M digest-based keysFile: internal/pkg/config/pin_catalogs_test.go
TestPinSingleCatalog_M2DMapKeyFormatTestPinCatalogDigests_M2DMapKeyFormatHow Has This Been Tested?
Unit Tests
All existing unit tests pass
Manual Reproduction Test
A complete reproduction kit is provided below to verify the bug and validate the fix.
Prerequisites
podman,skopeo, andjqinstalledSetup (One-Time)
export QUAY_USERNAME=your-quay-usernamemkdir -p /tmp/fake-catalog/configs cd /tmp/fake-catalogCreate the files below (Dockerfile, configs, scripts)
Run the reproduction script:
chmod +x /tmp/fake-catalog/reproduce-bug.sh cd /tmp/fake-catalog ./reproduce-bug.shThis will:
This simulates Red Hat pushing a new catalog version (moving the tag to a different digest).
Verification
Without the fix:
With the fix:
Cleanup for Re-testing
This removes:
/tmp/bug-reproduction/directory~/.oc-mirrorcacheReproduction Kit Files
1. Dockerfile
2. configs/catalog.json (Initial catalog - v1.0)
{ "schema": "olm.package", "name": "my-test-operator", "defaultChannel": "stable" } { "schema": "olm.channel", "name": "stable", "package": "my-test-operator", "entries": [ { "name": "my-test-operator.v1.0.0" } ] } { "schema": "olm.bundle", "name": "my-test-operator.v1.0.0", "package": "my-test-operator", "image": "quay.io/operatorhubio/prometheus:v0.47.0", "properties": [ { "type": "olm.package", "value": { "packageName": "my-test-operator", "version": "1.0.0" } } ], "relatedImages": [ { "name": "prometheus", "image": "quay.io/prometheus/prometheus:v2.22.1" } ] }3. configs/catalog-v2.json (Updated catalog - v2.0)
{ "schema": "olm.package", "name": "my-test-operator", "defaultChannel": "stable" } { "schema": "olm.channel", "name": "stable", "package": "my-test-operator", "entries": [ { "name": "my-test-operator.v1.0.0" }, { "name": "my-test-operator.v1.1.0", "replaces": "my-test-operator.v1.0.0" } ] } { "schema": "olm.bundle", "name": "my-test-operator.v1.0.0", "package": "my-test-operator", "image": "quay.io/operatorhubio/prometheus:v0.47.0", "properties": [ { "type": "olm.package", "value": { "packageName": "my-test-operator", "version": "1.0.0" } } ], "relatedImages": [ { "name": "prometheus", "image": "quay.io/prometheus/prometheus:v2.22.1" } ] } { "schema": "olm.bundle", "name": "my-test-operator.v1.1.0", "package": "my-test-operator", "image": "quay.io/operatorhubio/prometheus:v0.50.0", "properties": [ { "type": "olm.package", "value": { "packageName": "my-test-operator", "version": "1.1.0" } } ], "relatedImages": [ { "name": "prometheus", "image": "quay.io/prometheus/prometheus:v2.30.0" }, { "name": "alertmanager", "image": "quay.io/prometheus/alertmanager:v0.23.0" } ] }4. reproduce-bug.sh
5. cleanup.sh
6. README.md