fix(gitlab): auto-discover registry hostname for self-hosted GitLab instances#28
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughAdded runtime auto-discovery and caching of GitLab container registry hostnames by querying the GitLab API and extracting registry information from repository location data. Falls back to the original registry URL if discovery fails. Changes
Sequence DiagramsequenceDiagram
participant Client
participant GitLabRegistryClient
participant Cache as discoveredRegistryHost
participant GitLabAPI as GitLab API
Client->>GitLabRegistryClient: GetImagesToScan(ctx)
GitLabRegistryClient->>Cache: Check cached hostname
alt hostname cached
Cache-->>GitLabRegistryClient: return cached value
else cache miss or empty
GitLabRegistryClient->>GitLabAPI: discoverRegistryHost(ctx)
GitLabAPI->>GitLabAPI: GET /api/v4/projects
GitLabAPI->>GitLabAPI: GET /api/v4/projects/{id}/registry/repositories
GitLabAPI-->>GitLabRegistryClient: repository Location data
GitLabRegistryClient->>GitLabRegistryClient: extractHostFromLocation()
alt extraction succeeds
GitLabRegistryClient->>Cache: store discovered hostname
else extraction fails
GitLabRegistryClient->>Cache: store empty string (fallback)
end
end
GitLabRegistryClient->>GitLabRegistryClient: build registry using hostname
GitLabRegistryClient-->>Client: return images with registry keys
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
registryclients/gitlab.go (3)
223-251: Duplicated URL normalization logic withgetGitLabAPIBaseURL.Lines 233-250 largely duplicate the scheme normalization and fallback logic from
getGitLabAPIBaseURL(lines 69-86). Consider extracting the shared normalization into a helper to reduce code duplication.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@registryclients/gitlab.go` around lines 223 - 251, The URL scheme normalization and fallback logic duplicated between getDiscoveryBaseURL and getGitLabAPIBaseURL should be extracted into a single helper (e.g., normalizeGitLabURL or buildGitLabAPIHost) that accepts the raw Registry.RegistryURL, ensures https:// if missing, parses the URL and returns the scheme+host (or host fallback) so both getDiscoveryBaseURL and getGitLabAPIBaseURL call this helper and format their final "/api/v4" or other path-specific returns; update getDiscoveryBaseURL to call the new helper instead of reimplementing the trimming, parsing, and fallback loop and remove the duplicated code in getGitLabAPIBaseURL.
273-281: Consider thread safety ifGetImagesToScanmay be called concurrently.The struct fields
discoveredRegistryHost(and potentiallydiscoveryAttemptedif added) are read and written without synchronization. If multiple goroutines invokeGetImagesToScanon the same client instance, this could cause a data race.If concurrent use is expected, consider using
sync.Oncefor the discovery call.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@registryclients/gitlab.go` around lines 273 - 281, The code reads/writes the field discoveredRegistryHost inside GetImagesToScan without synchronization; make the discovery thread-safe by using sync.Once (or a mutex) to perform discoverRegistryHost exactly once and store the result in discoveredRegistryHost (and remove any unsynchronized discoveryAttempted flag); update GetImagesToScan to call a new initDiscovery method that uses a sync.Once (or locks) to call discoverRegistryHost and set discoveredRegistryHost, and then use discoveredRegistryHost for registryHost selection to avoid data races.
266-281: Caching only works for successful discovery; fallback scenarios will re-attempt on every call.When
discoverRegistryHostreturns("", nil)(e.g., no matching repos, empty location field), nothing is cached. Subsequent calls toGetImagesToScanwill retry discovery, incurring repeated API round-trips.Consider using a sentinel value or a separate
discoveryAttempted boolfield to distinguish "never tried" from "tried but fell back."♻️ Suggested approach
type GitLabRegistryClient struct { Registry *armotypes.GitlabImageRegistry Options *common.RegistryOptions - discoveredRegistryHost string // cached result from discoverRegistryHost; empty means not yet discovered + discoveredRegistryHost string // cached result from discoverRegistryHost + discoveryAttempted bool // true once discoverRegistryHost has been called }Then in
GetImagesToScan:registryHost := g.Registry.RegistryURL if g.discoveredRegistryHost != "" { registryHost = g.discoveredRegistryHost - } else if discoveredHost, err := g.discoverRegistryHost(ctx); err != nil { + } else if g.discoveryAttempted { + // Already attempted discovery; use fallback + } else if discoveredHost, err := g.discoverRegistryHost(ctx); err != nil { log.Printf("gitlab registry: failed to discover registry host, falling back to RegistryURL %q: %v", g.Registry.RegistryURL, err) + g.discoveryAttempted = true } else if discoveredHost != "" { g.discoveredRegistryHost = discoveredHost + g.discoveryAttempted = true registryHost = discoveredHost + } else { + g.discoveryAttempted = true }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@registryclients/gitlab.go` around lines 266 - 281, GetImagesToScan repeatedly re-calls discoverRegistryHost when discoverRegistryHost returns ("", nil) because only non-empty discoveredRegistryHost is cached; add caching of the "attempted" state so negative results are remembered. Modify the GitLabRegistryClient struct to include a boolean (e.g., discoveredRegistryHostAttempted) and in GetImagesToScan call discoverRegistryHost only when that flag is false; after calling discoverRegistryHost set discoveredRegistryHostAttempted = true and set discoveredRegistryHost = discoveredHost (possibly empty) so future calls will not re-attempt discovery; keep existing fallback to Registry.RegistryURL when discoveredRegistryHost is empty.registryclients/gitlab_test.go (1)
397-404: Remove unnecessary loop variable capture for Go 1.24.1.The
repos := reposcapture was needed for Go versions prior to 1.22, but since the project targets Go 1.24.1, loop variables are captured per-iteration by default. This line can be removed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@registryclients/gitlab_test.go` around lines 397 - 404, Remove the unnecessary per-iteration capture `repos := repos` inside the loop that registers handlers for mux.HandleFunc; since the project targets Go 1.24.1 loop variables are correctly captured per-iteration, simply delete the `repos := repos` line and leave the loop using `projectID` and `repos` directly (the fmt.Sprintf("/api/v4/projects/%d/registry/repositories", projectID) and mux.HandleFunc callback remain unchanged).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@registryclients/gitlab_test.go`:
- Around line 419-430: The fallback branch of the test for discoverRegistryHost
currently only asserts that got == "" when tt.wantFallbackToURL is true; update
that branch to also assert no unexpected error by checking err == nil and
calling t.Fatalf (or similar) if err != nil, so the test fails on unexpected
errors; locate the conditional using tt.wantFallbackToURL and the variables got
and err in gitlab_test.go and add the error nil-check before or alongside the
existing got == "" assertion.
---
Nitpick comments:
In `@registryclients/gitlab_test.go`:
- Around line 397-404: Remove the unnecessary per-iteration capture `repos :=
repos` inside the loop that registers handlers for mux.HandleFunc; since the
project targets Go 1.24.1 loop variables are correctly captured per-iteration,
simply delete the `repos := repos` line and leave the loop using `projectID` and
`repos` directly (the fmt.Sprintf("/api/v4/projects/%d/registry/repositories",
projectID) and mux.HandleFunc callback remain unchanged).
In `@registryclients/gitlab.go`:
- Around line 223-251: The URL scheme normalization and fallback logic
duplicated between getDiscoveryBaseURL and getGitLabAPIBaseURL should be
extracted into a single helper (e.g., normalizeGitLabURL or buildGitLabAPIHost)
that accepts the raw Registry.RegistryURL, ensures https:// if missing, parses
the URL and returns the scheme+host (or host fallback) so both
getDiscoveryBaseURL and getGitLabAPIBaseURL call this helper and format their
final "/api/v4" or other path-specific returns; update getDiscoveryBaseURL to
call the new helper instead of reimplementing the trimming, parsing, and
fallback loop and remove the duplicated code in getGitLabAPIBaseURL.
- Around line 273-281: The code reads/writes the field discoveredRegistryHost
inside GetImagesToScan without synchronization; make the discovery thread-safe
by using sync.Once (or a mutex) to perform discoverRegistryHost exactly once and
store the result in discoveredRegistryHost (and remove any unsynchronized
discoveryAttempted flag); update GetImagesToScan to call a new initDiscovery
method that uses a sync.Once (or locks) to call discoverRegistryHost and set
discoveredRegistryHost, and then use discoveredRegistryHost for registryHost
selection to avoid data races.
- Around line 266-281: GetImagesToScan repeatedly re-calls discoverRegistryHost
when discoverRegistryHost returns ("", nil) because only non-empty
discoveredRegistryHost is cached; add caching of the "attempted" state so
negative results are remembered. Modify the GitLabRegistryClient struct to
include a boolean (e.g., discoveredRegistryHostAttempted) and in GetImagesToScan
call discoverRegistryHost only when that flag is false; after calling
discoverRegistryHost set discoveredRegistryHostAttempted = true and set
discoveredRegistryHost = discoveredHost (possibly empty) so future calls will
not re-attempt discovery; keep existing fallback to Registry.RegistryURL when
discoveredRegistryHost is empty.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8a9bfc0f-a8d0-4d8e-aa65-fa53a3cf7a71
📒 Files selected for processing (2)
registryclients/gitlab.goregistryclients/gitlab_test.go
There was a problem hiding this comment.
Pull request overview
Adds GitLab registry-host auto-discovery so self-hosted GitLab instances with a separate Container Registry hostname can be scanned successfully (avoiding /v2/ auth challenges that resolve to dependency_proxy on the web host).
Changes:
- Extend GitLab registry repository model to include the API
locationfield and use it to discover the actual registry hostname. - Update
GetImagesToScanto prefer the discovered registry host (with caching) and fall back to the configuredRegistryURLif discovery fails. - Add unit + httptest “integration-style” coverage for the discovery logic.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
registryclients/gitlab.go |
Implements registry-host discovery + caching and switches image references to use the discovered host. |
registryclients/gitlab_test.go |
Adds tests validating discovery behavior (including an httptest server scenario). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
65d7373 to
97a78f9
Compare
Self-hosted GitLab instances can expose the container registry on a separate hostname (e.g. gitlab-reg.example.com vs gitlab.example.com). Authenticating against the web URL returns service=dependency_proxy in the /v2/ challenge, causing 403 errors during image scanning. GetImagesToScan now calls discoverRegistryHost before registry ops. It fetches each project's registry repositories from the GitLab API, reads the location field, and extracts the actual registry hostname. On any error it falls back to RegistryURL so existing setups are unaffected. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add sync.Once-based cache in resolveRegistryHost so concurrent callers
share a single API round-trip and there is no data race on the cached field
- Extract resolveRegistryHost helper (discovery → scheme-stripped fallback)
and use it in GetImagesToScan, making the call path testable
- Remove log.Printf (library code; silent fallback as stated in PR description)
- Normalize RegistryURL with extractHostFromLocation before name.NewRegistry
so scheme/path in RegistryURL no longer causes a parse error
- Optimize discoverRegistryHost inner loop with a map[string]struct{} set
to reduce complexity from O(n³) to O(n²)
- Fix inline comment to accurately describe scheme-preservation behaviour
- Test: add err==nil assertion in fallback branches of discoverRegistryHost tests
- Test: add TestGitLabRegistryClient_resolveRegistryHost with 4 subtests
covering discovery preference, fallback, scheme stripping, and caching
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s and tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
532c1f2 to
23d9e81
Compare
- Add comment explaining why discoverRegistryHost builds its own base URL instead of reusing getGitLabAPIBaseURL (different heuristics needed) - Add TODO noting getUserProjects pagination could be optimized with filtered project search for large GitLab instances - Remove redundant discoverRegistryHost_integration test (covered by table-driven test) - Add GetImagesToScan end-to-end test verifying image map keys reference the discovered registry host, not the original RegistryURL Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove sync.Once caching, resolveRegistryHost wrapper, extractHostFromLocation helper, and duplicate URL-building logic. Reuse getGitLabAPIBaseURL() instead of building a separate base URL. Inline host extraction in discoverRegistryHost. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Locationfield togitLabRepositorystruct (populated from GitLab API response)discoverRegistryHost()method that queries the GitLab API to find the actual container registry hostname from thelocationfield of configured repositoriesGetImagesToScanto use the discovered hostname instead ofRegistryURLdirectlyRegistryURLif discovery fails (backward compatible)Problem
Self-hosted GitLab instances sometimes have a separate registry hostname (e.g.
gitlab-reg.example.com) that differs from the GitLab web URL (gitlab.example.com). When a user enters the GitLab web URL in the ARMO UI,GetImagesToScanwas using it directly for container registry operations. GitLab's/v2/challenge on the web URL returnsservice=dependency_proxyinstead ofservice=container_registry, causing a 403 Forbidden error.The GitLab API always returns the correct
locationfield (containing the actual registry hostname) for each registry repository. We now use this to auto-discover the right hostname.Test plan
go test ./registryclients/ -vTestGitLabRegistryClient_discoverRegistryHost(5 subtests) verifies discovery and fallback behaviorTestGitLabRegistryClient_resolveRegistryHost(4 subtests) verifies caching, fallback, and scheme strippingTestGitLabRegistryClient_GetImagesToScan_usesDiscoveredHostverifies image map keys use the discovered registry host🤖 Generated with Claude Code