Skip to content

fix(gitlab): auto-discover registry hostname for self-hosted GitLab instances#28

Merged
jnathangreeg merged 5 commits into
mainfrom
fix/gitlab-registry-host-discovery
Apr 14, 2026
Merged

fix(gitlab): auto-discover registry hostname for self-hosted GitLab instances#28
jnathangreeg merged 5 commits into
mainfrom
fix/gitlab-registry-host-discovery

Conversation

@rotemamsa
Copy link
Copy Markdown

@rotemamsa rotemamsa commented Apr 13, 2026

Summary

  • Adds Location field to gitLabRepository struct (populated from GitLab API response)
  • Adds discoverRegistryHost() method that queries the GitLab API to find the actual container registry hostname from the location field of configured repositories
  • Updates GetImagesToScan to use the discovered hostname instead of RegistryURL directly
  • Falls back silently to RegistryURL if discovery fails (backward compatible)
  • Caches the discovered hostname to avoid redundant API calls on subsequent invocations

Problem

Self-hosted GitLab instances sometimes have a separate registry hostname (e.g. gitlab-reg.example.com) that differs from the GitLab web URL (gitlab.example.com). When a user enters the GitLab web URL in the ARMO UI, GetImagesToScan was using it directly for container registry operations. GitLab's /v2/ challenge on the web URL returns service=dependency_proxy instead of service=container_registry, causing a 403 Forbidden error.

The GitLab API always returns the correct location field (containing the actual registry hostname) for each registry repository. We now use this to auto-discover the right hostname.

Test plan

  • All existing tests pass: go test ./registryclients/ -v
  • New test TestGitLabRegistryClient_discoverRegistryHost (5 subtests) verifies discovery and fallback behavior
  • New test TestGitLabRegistryClient_resolveRegistryHost (4 subtests) verifies caching, fallback, and scheme stripping
  • New test TestGitLabRegistryClient_GetImagesToScan_usesDiscoveredHost verifies image map keys use the discovered registry host
  • Verify with customer's on-prem environment: web URL → discovers actual registry URL → successful scan

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings April 13, 2026 08:36
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 13, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 707c311e-624a-4158-82c1-3012681d946d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Added runtime auto-discovery and caching of GitLab container registry hostnames by querying the GitLab API and extracting registry information from repository location data. Falls back to the original registry URL if discovery fails.

Changes

Cohort / File(s) Summary
GitLab Registry Discovery
registryclients/gitlab.go
Introduced discoveredRegistryHost field and helper methods (discoverRegistryHost, getDiscoveryBaseURL, extractHostFromLocation) to query GitLab's API and cache effective registry hostname. Extended gitLabRepository with location JSON field for extraction. Modified GetImagesToScan to accept ctx and use discovered hostname (with fallback) instead of static RegistryURL; added logging for discovery failures.
Registry Discovery Tests
registryclients/gitlab_test.go
Added integration test for discovery flow and comprehensive table-driven test covering hostname extraction from repository locations, fallback scenarios, and empty-result edge cases. Introduced HTTP test server mocking and JSON unmarshaling for API responses.

Sequence Diagram

sequenceDiagram
    participant Client
    participant GitLabRegistryClient
    participant Cache as discoveredRegistryHost
    participant GitLabAPI as GitLab API
    
    Client->>GitLabRegistryClient: GetImagesToScan(ctx)
    GitLabRegistryClient->>Cache: Check cached hostname
    alt hostname cached
        Cache-->>GitLabRegistryClient: return cached value
    else cache miss or empty
        GitLabRegistryClient->>GitLabAPI: discoverRegistryHost(ctx)
        GitLabAPI->>GitLabAPI: GET /api/v4/projects
        GitLabAPI->>GitLabAPI: GET /api/v4/projects/{id}/registry/repositories
        GitLabAPI-->>GitLabRegistryClient: repository Location data
        GitLabRegistryClient->>GitLabRegistryClient: extractHostFromLocation()
        alt extraction succeeds
            GitLabRegistryClient->>Cache: store discovered hostname
        else extraction fails
            GitLabRegistryClient->>Cache: store empty string (fallback)
        end
    end
    GitLabRegistryClient->>GitLabRegistryClient: build registry using hostname
    GitLabRegistryClient-->>Client: return images with registry keys
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • matthyx

Poem

🐰 A registry hostname quest so fine,
Now caching discoveries in time,
When API calls reveal the way,
We'll use that host throughout the day,
With fallbacks ready, just in case—
Our GitLab search finds its true place! 🎀

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: implementing auto-discovery of registry hostname for self-hosted GitLab instances, which is the core objective addressing the problem of separate registry hostnames.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/gitlab-registry-host-discovery

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
registryclients/gitlab.go (3)

223-251: Duplicated URL normalization logic with getGitLabAPIBaseURL.

Lines 233-250 largely duplicate the scheme normalization and fallback logic from getGitLabAPIBaseURL (lines 69-86). Consider extracting the shared normalization into a helper to reduce code duplication.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@registryclients/gitlab.go` around lines 223 - 251, The URL scheme
normalization and fallback logic duplicated between getDiscoveryBaseURL and
getGitLabAPIBaseURL should be extracted into a single helper (e.g.,
normalizeGitLabURL or buildGitLabAPIHost) that accepts the raw
Registry.RegistryURL, ensures https:// if missing, parses the URL and returns
the scheme+host (or host fallback) so both getDiscoveryBaseURL and
getGitLabAPIBaseURL call this helper and format their final "/api/v4" or other
path-specific returns; update getDiscoveryBaseURL to call the new helper instead
of reimplementing the trimming, parsing, and fallback loop and remove the
duplicated code in getGitLabAPIBaseURL.

273-281: Consider thread safety if GetImagesToScan may be called concurrently.

The struct fields discoveredRegistryHost (and potentially discoveryAttempted if added) are read and written without synchronization. If multiple goroutines invoke GetImagesToScan on the same client instance, this could cause a data race.

If concurrent use is expected, consider using sync.Once for the discovery call.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@registryclients/gitlab.go` around lines 273 - 281, The code reads/writes the
field discoveredRegistryHost inside GetImagesToScan without synchronization;
make the discovery thread-safe by using sync.Once (or a mutex) to perform
discoverRegistryHost exactly once and store the result in discoveredRegistryHost
(and remove any unsynchronized discoveryAttempted flag); update GetImagesToScan
to call a new initDiscovery method that uses a sync.Once (or locks) to call
discoverRegistryHost and set discoveredRegistryHost, and then use
discoveredRegistryHost for registryHost selection to avoid data races.

266-281: Caching only works for successful discovery; fallback scenarios will re-attempt on every call.

When discoverRegistryHost returns ("", nil) (e.g., no matching repos, empty location field), nothing is cached. Subsequent calls to GetImagesToScan will retry discovery, incurring repeated API round-trips.

Consider using a sentinel value or a separate discoveryAttempted bool field to distinguish "never tried" from "tried but fell back."

♻️ Suggested approach
 type GitLabRegistryClient struct {
 	Registry               *armotypes.GitlabImageRegistry
 	Options                *common.RegistryOptions
-	discoveredRegistryHost string // cached result from discoverRegistryHost; empty means not yet discovered
+	discoveredRegistryHost string // cached result from discoverRegistryHost
+	discoveryAttempted     bool   // true once discoverRegistryHost has been called
 }

Then in GetImagesToScan:

 	registryHost := g.Registry.RegistryURL
 	if g.discoveredRegistryHost != "" {
 		registryHost = g.discoveredRegistryHost
-	} else if discoveredHost, err := g.discoverRegistryHost(ctx); err != nil {
+	} else if g.discoveryAttempted {
+		// Already attempted discovery; use fallback
+	} else if discoveredHost, err := g.discoverRegistryHost(ctx); err != nil {
 		log.Printf("gitlab registry: failed to discover registry host, falling back to RegistryURL %q: %v", g.Registry.RegistryURL, err)
+		g.discoveryAttempted = true
 	} else if discoveredHost != "" {
 		g.discoveredRegistryHost = discoveredHost
+		g.discoveryAttempted = true
 		registryHost = discoveredHost
+	} else {
+		g.discoveryAttempted = true
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@registryclients/gitlab.go` around lines 266 - 281, GetImagesToScan repeatedly
re-calls discoverRegistryHost when discoverRegistryHost returns ("", nil)
because only non-empty discoveredRegistryHost is cached; add caching of the
"attempted" state so negative results are remembered. Modify the
GitLabRegistryClient struct to include a boolean (e.g.,
discoveredRegistryHostAttempted) and in GetImagesToScan call
discoverRegistryHost only when that flag is false; after calling
discoverRegistryHost set discoveredRegistryHostAttempted = true and set
discoveredRegistryHost = discoveredHost (possibly empty) so future calls will
not re-attempt discovery; keep existing fallback to Registry.RegistryURL when
discoveredRegistryHost is empty.
registryclients/gitlab_test.go (1)

397-404: Remove unnecessary loop variable capture for Go 1.24.1.

The repos := repos capture was needed for Go versions prior to 1.22, but since the project targets Go 1.24.1, loop variables are captured per-iteration by default. This line can be removed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@registryclients/gitlab_test.go` around lines 397 - 404, Remove the
unnecessary per-iteration capture `repos := repos` inside the loop that
registers handlers for mux.HandleFunc; since the project targets Go 1.24.1 loop
variables are correctly captured per-iteration, simply delete the `repos :=
repos` line and leave the loop using `projectID` and `repos` directly (the
fmt.Sprintf("/api/v4/projects/%d/registry/repositories", projectID) and
mux.HandleFunc callback remain unchanged).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@registryclients/gitlab_test.go`:
- Around line 419-430: The fallback branch of the test for discoverRegistryHost
currently only asserts that got == "" when tt.wantFallbackToURL is true; update
that branch to also assert no unexpected error by checking err == nil and
calling t.Fatalf (or similar) if err != nil, so the test fails on unexpected
errors; locate the conditional using tt.wantFallbackToURL and the variables got
and err in gitlab_test.go and add the error nil-check before or alongside the
existing got == "" assertion.

---

Nitpick comments:
In `@registryclients/gitlab_test.go`:
- Around line 397-404: Remove the unnecessary per-iteration capture `repos :=
repos` inside the loop that registers handlers for mux.HandleFunc; since the
project targets Go 1.24.1 loop variables are correctly captured per-iteration,
simply delete the `repos := repos` line and leave the loop using `projectID` and
`repos` directly (the fmt.Sprintf("/api/v4/projects/%d/registry/repositories",
projectID) and mux.HandleFunc callback remain unchanged).

In `@registryclients/gitlab.go`:
- Around line 223-251: The URL scheme normalization and fallback logic
duplicated between getDiscoveryBaseURL and getGitLabAPIBaseURL should be
extracted into a single helper (e.g., normalizeGitLabURL or buildGitLabAPIHost)
that accepts the raw Registry.RegistryURL, ensures https:// if missing, parses
the URL and returns the scheme+host (or host fallback) so both
getDiscoveryBaseURL and getGitLabAPIBaseURL call this helper and format their
final "/api/v4" or other path-specific returns; update getDiscoveryBaseURL to
call the new helper instead of reimplementing the trimming, parsing, and
fallback loop and remove the duplicated code in getGitLabAPIBaseURL.
- Around line 273-281: The code reads/writes the field discoveredRegistryHost
inside GetImagesToScan without synchronization; make the discovery thread-safe
by using sync.Once (or a mutex) to perform discoverRegistryHost exactly once and
store the result in discoveredRegistryHost (and remove any unsynchronized
discoveryAttempted flag); update GetImagesToScan to call a new initDiscovery
method that uses a sync.Once (or locks) to call discoverRegistryHost and set
discoveredRegistryHost, and then use discoveredRegistryHost for registryHost
selection to avoid data races.
- Around line 266-281: GetImagesToScan repeatedly re-calls discoverRegistryHost
when discoverRegistryHost returns ("", nil) because only non-empty
discoveredRegistryHost is cached; add caching of the "attempted" state so
negative results are remembered. Modify the GitLabRegistryClient struct to
include a boolean (e.g., discoveredRegistryHostAttempted) and in GetImagesToScan
call discoverRegistryHost only when that flag is false; after calling
discoverRegistryHost set discoveredRegistryHostAttempted = true and set
discoveredRegistryHost = discoveredHost (possibly empty) so future calls will
not re-attempt discovery; keep existing fallback to Registry.RegistryURL when
discoveredRegistryHost is empty.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8a9bfc0f-a8d0-4d8e-aa65-fa53a3cf7a71

📥 Commits

Reviewing files that changed from the base of the PR and between 8c09b0f and 65d7373.

📒 Files selected for processing (2)
  • registryclients/gitlab.go
  • registryclients/gitlab_test.go

Comment thread registryclients/gitlab_test.go Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds GitLab registry-host auto-discovery so self-hosted GitLab instances with a separate Container Registry hostname can be scanned successfully (avoiding /v2/ auth challenges that resolve to dependency_proxy on the web host).

Changes:

  • Extend GitLab registry repository model to include the API location field and use it to discover the actual registry hostname.
  • Update GetImagesToScan to prefer the discovered registry host (with caching) and fall back to the configured RegistryURL if discovery fails.
  • Add unit + httptest “integration-style” coverage for the discovery logic.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
registryclients/gitlab.go Implements registry-host discovery + caching and switches image references to use the discovered host.
registryclients/gitlab_test.go Adds tests validating discovery behavior (including an httptest server scenario).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread registryclients/gitlab.go Outdated
Comment thread registryclients/gitlab.go
Comment thread registryclients/gitlab.go Outdated
Comment thread registryclients/gitlab.go Outdated
Comment thread registryclients/gitlab.go
Comment thread registryclients/gitlab.go
@rotemamsa rotemamsa force-pushed the fix/gitlab-registry-host-discovery branch from 65d7373 to 97a78f9 Compare April 13, 2026 09:21
Comment thread registryclients/gitlab.go Outdated
rotemamsa and others added 3 commits April 13, 2026 16:39
Self-hosted GitLab instances can expose the container registry on a
separate hostname (e.g. gitlab-reg.example.com vs gitlab.example.com).
Authenticating against the web URL returns service=dependency_proxy in
the /v2/ challenge, causing 403 errors during image scanning.

GetImagesToScan now calls discoverRegistryHost before registry ops.
It fetches each project's registry repositories from the GitLab API,
reads the location field, and extracts the actual registry hostname.
On any error it falls back to RegistryURL so existing setups are unaffected.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add sync.Once-based cache in resolveRegistryHost so concurrent callers
  share a single API round-trip and there is no data race on the cached field
- Extract resolveRegistryHost helper (discovery → scheme-stripped fallback)
  and use it in GetImagesToScan, making the call path testable
- Remove log.Printf (library code; silent fallback as stated in PR description)
- Normalize RegistryURL with extractHostFromLocation before name.NewRegistry
  so scheme/path in RegistryURL no longer causes a parse error
- Optimize discoverRegistryHost inner loop with a map[string]struct{} set
  to reduce complexity from O(n³) to O(n²)
- Fix inline comment to accurately describe scheme-preservation behaviour
- Test: add err==nil assertion in fallback branches of discoverRegistryHost tests
- Test: add TestGitLabRegistryClient_resolveRegistryHost with 4 subtests
  covering discovery preference, fallback, scheme stripping, and caching

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s and tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@rotemamsa rotemamsa force-pushed the fix/gitlab-registry-host-discovery branch from 532c1f2 to 23d9e81 Compare April 13, 2026 13:41
- Add comment explaining why discoverRegistryHost builds its own base URL
  instead of reusing getGitLabAPIBaseURL (different heuristics needed)
- Add TODO noting getUserProjects pagination could be optimized with
  filtered project search for large GitLab instances
- Remove redundant discoverRegistryHost_integration test (covered by
  table-driven test)
- Add GetImagesToScan end-to-end test verifying image map keys reference
  the discovered registry host, not the original RegistryURL

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@rotemamsa rotemamsa requested a review from jnathangreeg April 14, 2026 07:44
Remove sync.Once caching, resolveRegistryHost wrapper, extractHostFromLocation
helper, and duplicate URL-building logic. Reuse getGitLabAPIBaseURL() instead
of building a separate base URL. Inline host extraction in discoverRegistryHost.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jnathangreeg jnathangreeg merged commit f87b89f into main Apr 14, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants