Skip to content

fix: marketplace build respects GITHUB_HOST for GHE repos#1009

Merged
danielmeppiel merged 5 commits intomainfrom
fix/1008-marketplace-build-ghe
Apr 28, 2026
Merged

fix: marketplace build respects GITHUB_HOST for GHE repos#1009
danielmeppiel merged 5 commits intomainfrom
fix/1008-marketplace-build-ghe

Conversation

@sergio-sisternes-epam
Copy link
Copy Markdown
Collaborator

@sergio-sisternes-epam sergio-sisternes-epam commented Apr 27, 2026

Description

apm marketplace build hardcoded github.com in four places, so GITHUB_HOST had no effect on ref resolution, token lookup, or metadata fetch. This PR threads the existing default_host() / build_https_clone_url() / AuthResolver pattern (already used by apm install) through the marketplace build pipeline, and decouples auth from marketplace generation by reusing existing resolution infrastructure.

Fixes #1008
Related: #1010 (ADO marketplace support -- not covered here; URL parsing accepts ADO forms but downstream resolution still uses GITHUB_HOST)

Changes

Phase A -- Bug fix (commit 11a9d27)

ref_resolver.py -- RefResolver accepts an optional host parameter (defaults to GITHUB_HOST or github.com). Both list_remote_refs() and resolve_ref_sha() use build_https_clone_url() instead of hardcoded github.com.

builder.py -- MarketplaceBuilder stores a normalised host and HostInfo:

  • _resolve_github_token() resolves against the configured host, not "github.com"
  • _fetch_remote_metadata() uses the GitHub REST API for GHES/GHE Cloud (since raw.githubusercontent.com is github.com-only), skips metadata for non-GitHub hosts, and short-circuits tokenless GHE Cloud requests
  • AuthResolver import moved to top of try block to fix a scoping issue when auth_resolver is pre-injected

Phase B -- Resolution decoupling (commit 239064d)

ref_resolver.py -- RefResolver accepts an optional token parameter. When set, git ls-remote uses authenticated URLs (x-access-token), so private GHES repos work without separate git credential setup.

builder.py -- Extracted lazy _ensure_auth() method with _auth_resolved sentinel for true idempotency (including offline mode). Called from _get_resolver() so both resolve() and build() benefit from authenticated git ls-remote. Resolver is eagerly initialised before the thread pool to prevent a race condition. Fixed _host_info type annotation (Optional["HostInfo"] with TYPE_CHECKING guard).

resolver.py -- _resolve_url_source() now delegates to DependencyReference.parse() instead of hardcoding github.com prefix matching. This reuses the existing resolution infrastructure (as suggested by @danielmeppiel) and gives marketplace type: url sources broader URL form acceptance. Note: the URL's host is not preserved -- downstream resolution uses the configured GITHUB_HOST. True cross-host resolution is tracked in #1010.

Review feedback (commits 51a1760, 94fe220)

Addressed findings from both the APM Review Panel and Copilot code review:

  • Added _auth_resolved sentinel to _ensure_auth() for true idempotency (panel + Copilot)
  • Fixed offline branch to also set sentinel (Copilot)
  • Clarified _resolve_url_source() docstring: host is not preserved (panel)
  • Added test documenting host-is-ignored behaviour for non-GitHub URLs (panel)
  • Split CHANGELOG entry into two bullets; reworded to clarify GITHUB_HOST drives resolution (panel + Copilot)
  • Updated marketplace-authoring docs to warn against cross-host URL reliance (Copilot)

Tests and docs

  • 22 new tests covering GHE host resolution, metadata fetch paths, token injection, lazy auth, cross-source URL resolution, and host-is-ignored behaviour
  • Updated CHANGELOG, marketplace-authoring guide, and apm-usage skill resource

Type of change

  • Bug fix
  • New feature (cross-source URL parsing in marketplace)
  • Documentation
  • Maintenance / refactor (auth decoupling)

Testing

  • Tested locally
  • All existing tests pass (6,649 passed)
  • Added tests for new functionality (22 new tests)

Thread the existing default_host() / build_https_clone_url() / AuthResolver
pattern (used by apm install) through the marketplace build pipeline.

Changes:
- RefResolver: accept optional host parameter, use build_https_clone_url()
  instead of hardcoded github.com for git ls-remote URLs
- MarketplaceBuilder: resolve tokens against configured host, use REST API
  for metadata fetch on GHES/GHE Cloud (raw.githubusercontent.com is
  github.com-only), skip metadata for non-GitHub hosts
- Fix AuthResolver import scoping so classify_host() works when
  auth_resolver is pre-injected
- Add GHE Cloud early-exit when no token (avoids pointless 401)

Tests:
- Update URL assertions to use urlparse (test convention)
- Add 4 RefResolver GHE host tests
- Add 3 metadata fetch path tests (GHES REST API, non-GitHub skip,
  GHE Cloud no-token skip)
- Add builder host env test

Docs:
- CHANGELOG: Fixed entry under [Unreleased]
- marketplace-authoring guide: GHES section
- apm-usage authentication skill: marketplace build example

Closes #1008

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@sergio-sisternes-epam sergio-sisternes-epam added the panel-review Trigger the apm-review-panel gh-aw workflow label Apr 27, 2026
@github-actions
Copy link
Copy Markdown

APM Review Panel Verdict

Disposition: APPROVE (two minor pre-merge suggestions; neither is a blocker)


Per-persona findings

Python Architect:

This is a routine bug-fix PR: two existing classes (MarketplaceBuilder, RefResolver) receive a host parameter; no new abstractions, no hierarchy changes. One class diagram + one flow diagram applies.

OO / class diagram

classDiagram
    direction LR
    class MarketplaceBuilder {
        <<Builder>>
        +_host: str
        +_host_info: Optional[object]
        +_github_token: Optional[str]
        +_get_resolver() RefResolver
        +_resolve_github_token() Optional[str]
        +_fetch_remote_metadata(pkg) Optional[Dict]
        +build() BuildResult
    }
    class RefResolver {
        <<Service>>
        +_host: str
        +list_remote_refs(owner_repo) List[RemoteRef]
        +resolve_ref_sha(owner_repo, ref) str
    }
    class AuthResolver {
        <<Strategy>>
        +classify_host(host) HostInfo
        +resolve(host) AuthContext
    }
    class HostInfo {
        <<ValueObject>>
        +kind: str
        +api_base: str
    }
    class AuthContext {
        <<ValueObject>>
        +token: str
        +source: str
    }
    class default_host {
        <<Pure>>
        +default_host() str
        +build_https_clone_url(host, repo) str
    }
    class ResolvedPackage {
        <<ValueObject>>
        +source_repo: str
        +sha: str
        +subdir: Optional[str]
    }
    MarketplaceBuilder *-- RefResolver : creates lazily
    MarketplaceBuilder ..> AuthResolver : classify_host and resolve
    MarketplaceBuilder ..> HostInfo : stores as _host_info
    MarketplaceBuilder ..> ResolvedPackage : reads in _fetch_remote_metadata
    MarketplaceBuilder ..> default_host : reads host at init
    RefResolver ..> default_host : reads host at init
    AuthResolver ..> HostInfo : returns
    AuthResolver ..> AuthContext : returns
    class MarketplaceBuilder:::touched
    class RefResolver:::touched
    classDef touched fill:#fff3b0,stroke:#d47600
Loading

Execution flow diagram

flowchart TD
    A["apm marketplace build\ncli.py"] --> B["MarketplaceBuilder.__init__()\nbuilder.py\n_host = default_host() or 'github.com'"]
    B --> C["_prefetch_metadata(resolved)\nbuilder.py:589"]
    C --> D["_resolve_github_token()\nbuilder.py:547\nsets _host_info AND _github_token"]
    D --> E["[NET] AuthResolver.classify_host(self._host)\nsrc/apm_cli/core/auth.py:134\nreturns HostInfo(kind, api_base)"]
    D --> F["[NET] resolver.resolve(self._host)\nreturns AuthContext.token"]
    F --> G["pool.submit(_fetch_remote_metadata, pkg)\nbuilder.py:553\nfor each resolved package"]
    G --> H{"host_kind?"}
    H -->|"not github/ghe_cloud/ghes"| I["logger.debug skip\nreturn None"]
    H -->|"ghe_cloud and no token"| J["logger.debug skip\nreturn None"]
    H -->|"self._host == 'github.com'"| K["[NET] raw.githubusercontent.com\n/{source_repo}/{sha}/{path}/apm.yml\nurllib.request.urlopen"]
    H -->|"ghes or ghe_cloud+token"| L["[NET] {api_base}/repos/{source_repo}\n/contents/{file}?ref={sha}\nAccept: application/vnd.github.raw\nurllib.request.urlopen"]
    K --> M["yaml.safe_load(raw) -> dict"]
    L --> M
    M --> N["return metadata dict"]
Loading

Design patterns

  • Used in this PR: Lazy initialization -- _host_info is populated as a side effect of _resolve_github_token() (called once before the thread pool in _prefetch_metadata()). _get_resolver() already used this pattern; _host_info extends it consistently.
  • Pragmatic suggestion: Move AuthResolver.classify_host(self._host) into __init__() (directly after self._host is set) rather than as a side effect of _resolve_github_token(). classify_host is a pure, cheap operation and its placement inside a method named "resolve token" is surprising. No new abstraction needed -- one line moved up. This eliminates the Optional[object] None-guard in _fetch_remote_metadata and makes the class invariant explicit: _host_info is always populated after __init__.

CLI Logging Expert: All new log calls use logger.debug() at library layer -- correct. No _rich_* or CommandLogger calls introduced in builder or resolver. The two new debug messages ("Skipping metadata fetch for %s (non-GitHub host: %s)" and "Skipping metadata fetch for %s (GHE Cloud requires auth)") follow the "named thing, reason" style and pass the verbose-mode "So What?" test. No user-visible output changes. No issues.


DevX UX Expert: This is a silent behavior fix -- no new flags, no command surface changes. The key UX property preserved: GITHUB_HOST set once, all apm commands obey it. Previously apm install respected it but apm marketplace build did not; now the mental model is consistent. The marketplace-authoring.md addition is clean: 2-line runnable example, cross-link to the authentication docs. The apm-usage/authentication.md skill update is correctly scoped (one line showing apm marketplace build uses the same env convention). No new flags to document in cli-commands.md (no surface change). No blocking issues.


Supply Chain Security Expert: Reviewed against the threat model:

  • Identity: Metadata URLs use pkg.source_repo and pkg.sha from the already-resolved ResolvedPackage (sourced from the lockfile). The SHA-pinning integrity model is untouched.
  • Integrity: apm.yml metadata is informational enrichment only; it does not affect install integrity decisions.
  • Token scope: resolver.resolve(self._host) routes through AuthResolver for the configured host -- correct. Token appears only in the Authorization header, not in any URL or log line.
  • Fail closed: Non-GitHub hosts return None (skip); GHE Cloud without a token returns None (skip). Both fail closed without error -- appropriate since metadata is optional enrichment.
  • api_base fallback construction: f"https://{self._host}/api/v3" when api_base is not set. self._host comes from os.environ.get("GITHUB_HOST", "github.com"). This is not a path traversal vector (network, not filesystem). No path security guard is needed here.
  • One observation: the _host_info is None fallback in _fetch_remote_metadata defaults host_kind to "github", meaning a non-GitHub host could reach the URL-construction branch if called out of sequence. In the production path this cannot happen (the call order is guaranteed via _prefetch_metadata). Moving classify_host to __init__ (per the Python Architect suggestion) would make this invariant structural rather than relying on call order.

No new supply-chain surface opened.


Auth Expert: Activated -- the PR changes resolver.resolve("github.com") to resolver.resolve(self._host) and introduces AuthResolver.classify_host(self._host).

  • Token resolution: Correct fix. Previously token resolution always happened against github.com even with GITHUB_HOST=corp.ghe.com. Now it routes through the full precedence chain (GITHUB_APM_PAT_{ORG} -> GITHUB_APM_PAT -> GITHUB_TOKEN -> GH_TOKEN -> git credential fill) for the configured host, consistent with resolve_for_dep().
  • AuthResolver import scoping: The lazy from ..core.auth import AuthResolver was moved to the top of the try block. This is the correct fix for the scoping issue when auth_resolver is pre-injected -- classify_host() can now be called regardless of whether a custom resolver was provided.
  • Thread safety: _host_info is set with if self._host_info is None: guard. Since _resolve_github_token() is called from the main thread before executor.submit(), there is no TOCTOU risk in the current implementation.
  • One concern (minor): self._host_info: Optional[object] should be Optional["HostInfo"] (with a TYPE_CHECKING import). The weak object type means type checkers cannot catch incorrect attribute access on this field. Suggested fix: add from typing import TYPE_CHECKING block with if TYPE_CHECKING: from ..core.auth import HostInfo and annotate _host_info: Optional["HostInfo"].
  • AuthResolver precedence invariant: Unchanged. The precedence diagram in docs/getting-started/authentication.md does not need updating -- the PR adds a call site, not a new strategy.

OSS Growth Hacker: This fix completes APM's GHES story: apm install already respected GITHUB_HOST; now apm marketplace build does too. For enterprise teams that use APM on GHES, this removes a silent failure mode that was invisible until build time. The CHANGELOG entry is clean and story-shaped. The doc addition gives enterprise users a self-contained 2-line recipe. Side-channel to CEO: once merged, this is worth a release-note beat that frames APM's complete GHE support posture -- "APM now fully supports GitHub Enterprise across install and marketplace build workflows" -- with a concrete GITHUB_HOST example. This is directly on the enterprise conversion surface and should be included in the next v0.10.x or v0.11.0 release announcement.


CEO arbitration

Specialists agree: this is a correct, well-tested, well-documented bug fix from an external contributor. The two minor suggestions (move classify_host to __init__, fix Optional[object] type annotation) are non-blocking quality improvements. Neither changes behavior in the production path -- _prefetch_metadata() guarantees _resolve_github_token() runs before _fetch_remote_metadata() -- but both make the code easier to reason about and type-safe. The PR is in draft; the author should address these before marking ready. No specialist disagreements to arbitrate. The Growth Hacker's release-beat note is filed for the maintainer's release planning (not a merge gate). Disposition: APPROVE pending the two suggestions below.


Required actions before merge

  1. Move classify_host to __init__ (src/apm_cli/marketplace/builder.py): After self._host: str = default_host() or "github.com" (line ~154), add self._host_info = AuthResolver.classify_host(self._host) (requires the lazy import to become an eager import, or use TYPE_CHECKING for the type hint and keep the lazy import). Remove the if self._host_info is None: guard in _resolve_github_token() and the if self._host_info else "github" fallback in _fetch_remote_metadata(). This makes the class invariant explicit and eliminates silent fallback behavior.

  2. Fix _host_info type annotation (src/apm_cli/marketplace/builder.py, line ~155): Change self._host_info: Optional[object] = None to self._host_info: Optional["HostInfo"] = None with a TYPE_CHECKING guard: if TYPE_CHECKING: from ..core.auth import HostInfo. This is a one-line change and ensures type checkers can validate _host_info.kind and _host_info.api_base access.


Optional follow-ups

  • Once merged, include a release-note beat for the next release that tells the complete GHE support story across apm install and apm marketplace build -- the Growth Hacker flags this as a concrete enterprise conversion surface.
  • Future: if a third host-routing branch is added to _fetch_remote_metadata (e.g., Bitbucket Server), consider a small HostMetadataStrategy abstraction; at 3 branches inline is still appropriate.
  • The _resolve_github_token() method now does two things (token resolution + host classification). If it grows further, consider splitting into _init_host_info() and _resolve_github_token(). Not needed now.

Generated by PR Review Panel for issue #1009 · ● 641.3K ·

…yReference for URL sources

Phase B of #1008 -- decouples authentication from marketplace generation
and reuses existing resolution infrastructure for cross-source compatibility.

Changes:
- RefResolver: accept optional token for authenticated git ls-remote
- Builder: extract lazy _ensure_auth() called from _get_resolver() so
  both resolve() and build() benefit from authenticated ls-remote
- Builder: eagerly init resolver before thread pool (race prevention)
- Builder: fix _host_info type annotation (Optional["HostInfo"] with
  TYPE_CHECKING guard)
- resolver.py: _resolve_url_source() now delegates to
  DependencyReference.parse() -- accepts any valid Git URL (GitHub,
  GHES, GitLab, Bitbucket, ADO, SSH) instead of github.com only
- 13 new tests covering token injection, lazy auth, and cross-source
  URL resolution

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@sergio-sisternes-epam sergio-sisternes-epam added panel-review Trigger the apm-review-panel gh-aw workflow and removed panel-review Trigger the apm-review-panel gh-aw workflow labels Apr 28, 2026
@github-actions
Copy link
Copy Markdown

APM Review Panel Verdict

Disposition: APPROVE (with one recommended docstring fix and one follow-up on URL-source semantics)


Per-persona findings

Python Architect:

This is a focused, well-scoped fix. Two commits: one that threads GITHUB_HOST through the marketplace build pipeline, and one that extracts lazy auth and delegates URL-source parsing to DependencyReference.parse(). Neither constitutes a major architectural shift (no new base classes, protocols, or registries), so two diagrams apply.

1. OO / class diagram

classDiagram
    direction LR

    class MarketplaceBuilder {
        <<Builder>>
        -_host str
        -_host_info HostInfo
        -_github_token Optional[str]
        -_resolver Optional[RefResolver]
        -_auth_resolver Optional[AuthResolver]
        +_ensure_auth() None
        +_get_resolver() RefResolver
        +_resolve_github_token() Optional[str]
        +_fetch_remote_metadata(pkg) Optional[dict]
        +resolve() ResolveResult
        +build() BuildReport
    }

    class RefResolver {
        <<IOBoundary>>
        -_host str
        -_token Optional[str]
        -_cache RefCache
        -_lock threading.Lock
        +list_remote_refs(owner_repo) List[RemoteRef]
        +resolve_ref_sha(owner_repo, ref) str
    }

    class AuthResolver {
        <<Strategy>>
        +resolve(host) AuthContext
        +classify_host(host) HostInfo
    }

    class HostInfo {
        <<ValueObject>>
        +kind str
        +api_base Optional[str]
    }

    class AuthContext {
        <<ValueObject>>
        +token Optional[str]
        +source str
    }

    class DependencyReference {
        <<ValueObject>>
        +repo_url str
        +reference Optional[str]
        +is_local bool
        +parse(url) DependencyReference
    }

    class _resolve_url_source {
        <<Pure>>
        accepts any Git URL via DependencyReference
    }

    MarketplaceBuilder *-- RefResolver : lazily creates
    MarketplaceBuilder ..> AuthResolver : calls resolve(host)
    MarketplaceBuilder ..> HostInfo : reads kind, api_base
    AuthResolver ..> HostInfo : returns from classify_host()
    AuthResolver ..> AuthContext : returns from resolve()
    _resolve_url_source ..> DependencyReference : delegates parse()
    RefResolver ..> AuthContext : uses token field

    class MarketplaceBuilder:::touched
    class RefResolver:::touched
    class _resolve_url_source:::touched
    classDef touched fill:#fff3b0,stroke:#d47600
Loading

2. Execution flow diagram

flowchart TD
    A["apm marketplace build\n(cli.py)"] --> B["MarketplaceBuilder.build()"]
    B --> C["MarketplaceBuilder.resolve()"]
    C --> D["_get_resolver()\n[TOUCHED]"]
    D --> E["_ensure_auth()\n[TOUCHED]"]
    E --> F{"_github_token\nalready set?"}
    F -- yes --> G["return early"]
    F -- no --> H["_resolve_github_token()"]
    H --> I["[NET] AuthResolver.classify_host(host)\nset _host_info"]
    I --> J["[NET] AuthResolver.resolve(host)\ntoken precedence chain"]
    J --> K["self._github_token = token or None"]
    K --> L["RefResolver(host=, token=)\n[TOUCHED]"]
    L --> M["ThreadPoolExecutor spawned\nfor each package entry"]
    M --> N["RefResolver.list_remote_refs(owner_repo)"]
    N --> O["build_https_clone_url(host, owner_repo, token=)\nproduces x-access-token URL or plain URL"]
    O --> P["[EXEC] subprocess: git ls-remote\nGIT_TERMINAL_PROMPT=0, GIT_ASKPASS=echo"]
    P --> Q["ref + sha resolved"]
    B --> R["_prefetch_metadata(resolved)"]
    R --> S["_ensure_auth() again\n(idempotent if token set;\nre-calls _resolve_github_token if None)"]
    S --> T{"host_kind?"}
    T -- github.com --> U["[NET] urllib: raw.githubusercontent.com"]
    T -- ghes / ghe_cloud --> V["[NET] urllib: {api_base}/repos/.../contents/...\nAccept: application/vnd.github.raw"]
    T -- generic --> W["skip metadata enrichment"]
    T -- ghe_cloud + no token --> X["skip metadata enrichment"]
Loading

Design patterns

  • Used in this PR: Lazy Initialization -- _ensure_auth() + _get_resolver() each guard with "already set?" short-circuits; race prevented by pre-calling _get_resolver() before ThreadPoolExecutor in resolve(). Adapter -- _resolve_url_source() delegates to DependencyReference.parse(), normalising any Git URL to owner/repo[#ref] (visible as <<Pure>> + <<ValueObject>> in diagram).
  • Pragmatic suggestion: none -- the current shape is the simplest correct design at this scope. A sentinel value (_UNRESOLVED = object()) could make _ensure_auth() fully idempotent in the no-token case, but it adds complexity that is not justified until there is evidence of repeated AuthResolver.resolve() calls in hot paths.

Minor architecture note: _ensure_auth() docstring says "Short-circuits when already resolved" but when no token is available, self._github_token remains None so the method re-invokes _resolve_github_token() on every subsequent call. In practice both call sites (inside _get_resolver() and _prefetch_metadata()) are pre-pool so the redundancy is harmless, but the docstring promises more than is delivered. Fixing the docstring (or the guard) is the one recommended action below.


CLI Logging Expert: No concerns. All new diagnostic output routes through logger.debug() -- never through _rich_* helpers directly. Token values are stripped from stderr via _redact_token() before appearing in GitLsRemoteError.hint. The GHE-skip debug messages follow the "name the thing" rule ("Skipping metadata fetch for {pkg.name} (non-GitHub host: {host})"). Nothing in this PR changes the user-visible output path; the happy-path surface is identical before and after.


DevX UX Expert: Transparent to the user. A marketplace author on a GHE instance sets GITHUB_HOST (the same env var they already set for apm install) and apm marketplace build now just works. No new flags, no new error messages, no command-surface change. The GHES section added to marketplace-authoring.md is appropriately concise. The type: url acceptance expansion quietly removes a friction point for authors using mixed-host environments. No UX regressions.


Supply Chain Security Expert: Three surfaces reviewed.

  1. Token in subprocess URL: build_https_clone_url(..., token=token) embeds the PAT as x-access-token:{token}@host`` in the URL passed to git ls-remote. The docstring on `build_https_clone_url` warns callers not to log the raw URL; `_redact_token()` in `_git_utils.py` redacts the `(token/redacted)@` pattern in all stderr before it reaches error messages. `GIT_TERMINAL_PROMPT=0` / `GIT_ASKPASS=echo` are set. No credential leakage path identified.

  2. DependencyReference.parse() scope expansion: _resolve_url_source() previously rejected non-GitHub URLs (raised ValueError). It now accepts any URL DependencyReference.parse() can handle (GitLab, Bitbucket, ADO, SSH). The resulting owner/repo is then resolved against the configured GitHub host (self._host), not the URL's own host. This means a GitLab URL silently resolves to owner/repo against github.com -- the expansion does not introduce a new network destination (the configured host is always used). The is_local guard prevents path traversal via ./ or ../ sources. No new attack surface identified; the semantic gap (accepting a GitLab URL but hitting GitHub) is a UX confusion issue, not a security one.

  3. Thread safety of auth state: _github_token is set before thread pool creation and read-only inside workers. _host_info is set inside _resolve_github_token() which is called before the pool. No lock needed and none is missing. Safe.


Auth Expert: Activated (fallback self-check: YES -- the PR changes how tokens are injected into git ls-remote URLs, how AuthResolver.classify_host() is used to determine host kind, and changes resolver.resolve("github.com") to resolver.resolve(self._host), which directly changes credential resolution semantics).

The key fix -- changing from the hardcoded resolver.resolve("github.com") to resolver.resolve(self._host) -- is correct and mirrors the pattern used throughout apm install. Token precedence chain (GITHUB_APM_PAT_{ORG} -> GITHUB_APM_PAT -> GITHUB_TOKEN -> GH_TOKEN -> git credential fill) is fully preserved because the fix routes through AuthResolver.resolve() rather than bypassing it.

The x-access-token format used by build_https_clone_url is compatible with GHES and GHE Cloud. The GHE Cloud + no-token -> skip metadata guard is correct: GHE Cloud has no public API surface and a 401 on every package would be wasteful and noisy.

One idempotency gap (noted by Python Architect): when _resolve_github_token() returns None (no credentials found), _ensure_auth() will re-invoke _resolve_github_token() on every subsequent call because self._github_token is not None never becomes true. This results in repeated AuthResolver.resolve() calls in the no-auth path. It is harmless (all calls are pre-pool, AuthResolver.resolve() is cheap for env-var resolution) but the docstring is misleading. Recommend either a sentinel guard or a docstring correction.

No auth precedence regression. No credential leakage path. No new os.getenv() bypasses of AuthResolver.


OSS Growth Hacker: This is an enterprise unlock. Marketplace authoring previously required a github.com-accessible network path; GHE/GHES users were silently blocked. The fix uses the same GITHUB_HOST env var that GHE users already know from apm install, meaning zero new concepts for the target audience.

Side-channel to CEO: The type: url acceptance expansion (GitLab/Bitbucket/ADO URLs now accepted) is a quiet capability increase that may deserve a sentence in the release note -- it reinforces APM's "works with any Git host" positioning. The CHANGELOG entry is dense; consider splitting into two bullets (GHE host fix / URL-source expansion) in the release narrative to maximize two distinct story beats.

The marketplace-authoring.md GHES section is appropriately minimal. No quickstart changes needed; the GITHUB_HOST pattern is already documented in the auth guide.


CEO arbitration

Specialists are in agreement: this is a correct, well-tested fix with no regressions. The core change -- routing AuthResolver.resolve(self._host) instead of the hardcoded resolve("github.com") -- mirrors the pattern used everywhere else in the codebase and closes a gap that blocked enterprise marketplace authoring. The _ensure_auth() idempotency gap is the only item worth resolving before merge; it is a docstring accuracy issue (the method does not short-circuit in the no-token case), not a correctness bug. The DependencyReference URL-source expansion is net-positive: it removed a github.com-only restriction without introducing a new attack surface. The semantic note (GitLab URLs are accepted but resolved against the configured GitHub host) belongs in a follow-up issue, not as a blocker. Ratified: APPROVE with one pre-merge fix.


Required actions before merge

  1. _ensure_auth() docstring accuracy (src/apm_cli/marketplace/builder.py, _ensure_auth method): The docstring says "Short-circuits when already resolved" but the guard if self._github_token is not None: return does not short-circuit when token resolution returned None (no credentials found). Either update the docstring to "Short-circuits when token is already set to a non-None value" or add a sentinel to make the method truly idempotent. The sentinel approach:
    # In __init__:
    self._auth_resolved: bool = False
    # In _ensure_auth:
    if self._auth_resolved:
        return
    ...
    self._auth_resolved = True
    Either fix is acceptable; the docstring-only fix is lower risk.

Optional follow-ups

  • URL-source host semantics (src/apm_cli/marketplace/resolver.py): _resolve_url_source() now accepts GitLab/Bitbucket/ADO URLs and normalises them to owner/repo, but the downstream RefResolver resolves that owner/repo against the configured GitHub host -- not the URL's own host. The docstring currently says "any valid Git URL (GitHub, GHES, GitLab, Bitbucket, ADO, SSH) is accepted" which implies cross-host resolution that is not implemented. A follow-up issue should clarify intended semantics and either document the limitation or implement true cross-host support.
  • test_non_github_url test intent shift: The old test asserted that GitLab URLs raised ValueError; the new test asserts they resolve to "owner/repo". The test comment ("DependencyReference.parse() handles any valid Git host URL") is accurate but worth pairing with a test that documents the host-is-ignored behavior explicitly, so future contributors don't accidentally implement cross-host resolution that breaks the owner/repo assumption.

Generated by PR Review Panel for issue #1009 · ● 1.2M ·

- Add _auth_resolved sentinel to _ensure_auth() for true idempotency
- Clarify _resolve_url_source() docstring: host is not preserved (#1010)
- Split CHANGELOG #1008 entry into GHE fix + URL-source expansion
- Add test documenting host-is-ignored behaviour for non-GitHub URLs

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@sergio-sisternes-epam
Copy link
Copy Markdown
Collaborator Author

Review Panel Findings -- Addressed

All findings from both panel reviews have been addressed in 51a17603:

Required fixes

Finding Status Details
_ensure_auth() idempotency gap Fixed Added _auth_resolved: bool sentinel -- method now short-circuits even when no token was found. Docstring updated to match.
_host_info: Optional[object] type Already done Fixed in Phase B commit (239064d1) with TYPE_CHECKING guard.

Optional fixes (also implemented)

Finding Status Details
_resolve_url_source() docstring overpromises Fixed Clarified that URL host is not preserved; downstream uses configured GITHUB_HOST. Cross-ref to #1010.
Test for host-is-ignored behaviour Added test_url_host_is_not_preserved_in_output -- 4 hosts (github.com, gitlab.com, bitbucket.org, corp.ghe.com) all resolve to same owner/repo.
CHANGELOG bullet split Done Split into two distinct story beats: GHE host fix + URL-source expansion.

Validation

  • 6,649 unit tests pass (16.3s)
  • ASCII compliance verified on all changed lines

@sergio-sisternes-epam sergio-sisternes-epam marked this pull request as ready for review April 28, 2026 10:12
Copilot AI review requested due to automatic review settings April 28, 2026 10:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes apm marketplace build to respect GITHUB_HOST (consistent with the install/auth infrastructure) so GHES/GHE Cloud repos can be resolved/authenticated correctly, and expands marketplace type: url parsing by delegating to DependencyReference.parse().

Changes:

  • Thread default_host() / build_https_clone_url() + host/token handling through marketplace ref resolution (git ls-remote) and metadata fetching.
  • Add lazy auth resolution (_ensure_auth) so both resolve() and build() use authenticated ref resolution when available.
  • Update docs/changelog and add unit tests for GHE host behavior and URL parsing.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/apm_cli/marketplace/ref_resolver.py Adds host + optional token support; builds git ls-remote URLs via shared host utilities.
src/apm_cli/marketplace/builder.py Uses default host, lazy auth resolution, and host-aware metadata fetch (raw CDN vs REST API).
src/apm_cli/marketplace/resolver.py Delegates type: url resolution to DependencyReference.parse() instead of github.com-only matching.
tests/unit/marketplace/test_ref_resolver.py Adds URL parsing assertions + GHE host/token-in-URL coverage.
tests/unit/marketplace/test_marketplace_resolver.py Adds tests for URL-source parsing behavior (including host stripping).
tests/unit/marketplace/test_builder.py Adds tests for host-kind branching in _fetch_remote_metadata() and _ensure_auth() behavior.
tests/unit/commands/test_marketplace_build.py Confirms GITHUB_HOST is respected by MarketplaceBuilder.
docs/src/content/docs/guides/marketplace-authoring.md Documents GHES usage for marketplace build.
packages/apm-guide/.apm/skills/apm-usage/authentication.md Updates auth guide to mention marketplace build respects GITHUB_HOST.
CHANGELOG.md Adds Unreleased Fixed entries for GHE host support and URL parsing behavior.

Comment thread CHANGELOG.md Outdated
Comment thread src/apm_cli/marketplace/resolver.py
Comment thread src/apm_cli/marketplace/builder.py Outdated
Comment thread docs/src/content/docs/guides/marketplace-authoring.md Outdated
- Fix _ensure_auth() offline branch to set _auth_resolved sentinel
- Clarify CHANGELOG and docs: URL host is not preserved, GITHUB_HOST required
- Update marketplace-authoring.md to warn against cross-host URL reliance

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel danielmeppiel added panel-review Trigger the apm-review-panel gh-aw workflow and removed panel-review Trigger the apm-review-panel gh-aw workflow labels Apr 28, 2026
@github-actions
Copy link
Copy Markdown

APM Review Panel Verdict

Disposition: APPROVE (with one optional follow-up noted below)


Per-persona findings

Python Architect:

The PR is a well-scoped bug fix that decouples host-awareness from the marketplace build pipeline. Three files are modified in the problem space: MarketplaceBuilder (builder.py), RefResolver (ref_resolver.py), and _resolve_url_source (resolver.py). The changes are internally consistent and proportional to the problem.

1. OO / Class Diagram

classDiagram
    direction LR
    class MarketplaceBuilder {
        <<Service>>
        -_host str
        -_host_info Optional[HostInfo]
        -_auth_resolved bool
        -_github_token Optional[str]
        -_resolver Optional[RefResolver]
        +build() MarketplaceOutput
        +resolve() list
        -_ensure_auth() None
        -_get_resolver() RefResolver
        -_resolve_github_token() Optional[str]
        -_fetch_remote_metadata(pkg) Optional[dict]
    }
    class RefResolver {
        <<Service>>
        -_host str
        -_token Optional[str]
        -_cache RefCache
        -_lock Lock
        +list_remote_refs(owner_repo) List[RemoteRef]
        +resolve_ref_sha(owner_repo, ref) str
    }
    class AuthResolver {
        <<Strategy>>
        +classify_host(host) HostInfo
        +resolve(host) AuthContext
    }
    class HostInfo {
        <<ValueObject>>
        +kind str
        +api_base Optional[str]
    }
    class AuthContext {
        <<ValueObject>>
        +token str
        +source str
    }
    class DependencyReference {
        <<ValueObject>>
        +repo_url str
        +reference Optional[str]
        +is_local bool
        +parse(url) DependencyReference
    }
    class github_host_utils {
        <<Module>>
        +default_host() Optional[str]
        +build_https_clone_url(host, repo, token) str
    }
    class resolver_module {
        <<Module>>
        +_resolve_url_source(source) str
    }
    class MarketplaceBuilder:::touched
    class RefResolver:::touched
    class resolver_module:::touched
    MarketplaceBuilder *-- RefResolver : creates and owns
    MarketplaceBuilder ..> AuthResolver : token and host classification
    MarketplaceBuilder ..> HostInfo : branches on kind (github/ghes/ghe_cloud/generic)
    MarketplaceBuilder ..> github_host_utils : default_host()
    RefResolver ..> github_host_utils : build_https_clone_url()
    AuthResolver ..> HostInfo : returns
    AuthResolver ..> AuthContext : returns
    resolver_module ..> DependencyReference : delegates URL parsing
    note for MarketplaceBuilder "Lazy init: _ensure_auth() is idempotent\n_auth_resolved flag prevents re-entry\n_host_info set as side-effect in _resolve_github_token()"
    classDef touched fill:#fff3b0,stroke:#d47600
Loading

2. Execution Flow Diagram

flowchart TD
    A["apm marketplace build (cli.py)"] --> B["MarketplaceBuilder.build()"]
    B --> C["_get_resolver() [eager, pre-thread-pool]"]
    C --> D["_ensure_auth()"]
    D --> E{_auth_resolved?}
    E -- yes --> F["return (idempotent)"]
    E -- no --> G{offline mode?}
    G -- yes --> H["_auth_resolved = True, token = None"]
    G -- no --> I["_resolve_github_token()"]
    I --> J["[NET] AuthResolver.classify_host(self._host) -> HostInfo"]
    J --> K["[NET] AuthResolver.resolve(self._host) -> AuthContext"]
    K --> L["self._github_token = ctx.token\nself._host_info = HostInfo\n_auth_resolved = True"]
    L --> M["RefResolver(host=self._host, token=self._token)"]
    M --> N["ThreadPoolExecutor: _resolve_references()"]
    N --> O["[EXEC] RefResolver.list_remote_refs(owner_repo)"]
    O --> P["build_https_clone_url(host, owner_repo, token)"]
    P --> Q["[EXEC] git ls-remote --tags --heads url.git\nGIT_TERMINAL_PROMPT=0, GIT_ASKPASS=echo"]
    B --> R["_prefetch_metadata(resolved)"]
    R --> S["_ensure_auth() (idempotent)"]
    R --> T["ThreadPoolExecutor: _fetch_remote_metadata(pkg)"]
    T --> U{host_info.kind?}
    U -- generic --> V["return None (skip, no HTTP)"]
    U -- ghe_cloud no token --> W["return None (skip, no HTTP)"]
    U -- github.com --> X["[NET] urllib GET raw.githubusercontent.com/repo/sha/apm.yml\nAuthorization: token ..."]
    U -- ghes/ghe_cloud with token --> Y["[NET] urllib GET api_base/repos/repo/contents/apm.yml?ref=sha\nAccept: application/vnd.github.raw\nAuthorization: token ..."]
    X --> Z["yaml.safe_load(raw) -> dict"]
    Y --> Z
Loading

3. Design patterns

Design patterns

  • Used in this PR: Lazy init with idempotency flag (_ensure_auth() + _auth_resolved) -- prevents re-entrant auth resolution across _get_resolver() and _prefetch_metadata() call sites; shown as note for MarketplaceBuilder in the class diagram above.
  • Used in this PR: Adapter -- _resolve_url_source() now delegates to DependencyReference.parse() rather than reimplementing URL parsing, eliminating the github.com-only hard-coding.
  • Pragmatic suggestion: The host-dispatch logic in _fetch_remote_metadata() (if/elif on host_kind) could eventually be a small Strategy object (MetadataFetcher per host kind), but only if a third host type with distinct behavior emerges. At current scope, the if/elif is the simplest correct design.

One code smell (not blocking): _host_info is set as a side effect inside _resolve_github_token(). If _resolve_github_token() raises and the exception is caught inside the try/except-all, _host_info may remain None; _fetch_remote_metadata() defensively falls back to host_kind = "github" in that case, so the failure mode is a graceful degradation to the github.com CDN path rather than a crash. The smell is the implicit coupling, not a correctness bug.


CLI Logging Expert: No output path changes. The PR adds only logger.debug() calls with %s-placeholder format strings, consistent with the established pattern. Debug messages correctly use pkg.name and self._host for concrete context ("Skipping metadata fetch for %s (non-GitHub host: %s)"). No _rich_* helpers or CommandLogger phases are touched. Clean.


DevX UX Expert: No CLI surface changes -- no new flags, no new commands, no help text changes. The fix is transparent for github.com users. For GHES users, the pattern is export GITHUB_HOST=corp.ghe.com && apm marketplace build, which is identical to the existing apm install pattern -- zero new mental model required.

The docs addition in marketplace-authoring.md is accurate, concise, and includes a working example. The caveat that type: url host is ignored (with #1010 forward reference) is the right level of honesty -- it prevents users from assuming cross-host resolution works when it does not. The authentication skill resource (packages/apm-guide/.apm/skills/apm-usage/authentication.md) is updated in the same PR per Rule 4.

One note: the cli-commands.md reference doc is not updated, but marketplace build already exists there with no GHE-specific flag -- so no update is required.


Supply Chain Security Expert: No new security surface introduced.

  1. Token in git ls-remote URL: build_https_clone_url() embeds x-access-token:{token}@ -- this is the pre-existing pattern used across APM for git operations. GIT_TERMINAL_PROMPT=0 + GIT_ASKPASS=echo prevent interactive prompt fallbacks. stderr is passed through _redact_token() before any error logging (ref_resolver.py lines 236, 252, 313, 329). The github_host.py docstring explicitly notes "callers must avoid logging raw token-bearing URLs" and this caller does not log the URL.

  2. GHES REST API path construction: {api_base}/repos/{pkg.source_repo}/contents/{file_path}?ref={sha}. source_repo is the owner/repo coordinate from the marketplace.yml, validated upstream. file_path is {subdir}/apm.yml where subdir comes from the resolved package -- not from user URL input. No path traversal surface.

  3. _resolve_url_source host stripping: Non-github.com URLs now parse to owner/repo and resolve against GITHUB_HOST. This is documented and tracked in feat: ADO marketplace support (marketplace.yml with Azure DevOps repos) #1010. Not a security regression: the owner/repo coordinate goes through the same auth and integrity pipeline as a github-source entry. The risk is user misconfiguration (wrong host), not adversarial injection.

  4. Fail-closed behavior: Non-GitHub host -> metadata skipped (not fetched with no auth). GHE Cloud without token -> metadata skipped. Auth exception -> token None, debug-logged, not raised. These are all correct graceful-degradation paths that fail to "no enrichment" rather than to a security bypass.


Auth Expert: Activated -- the PR changes token resolution from hardcoded github.com to self._host, uses AuthResolver.classify_host(), and injects the resolved token into RefResolver.

Token resolution chain: AuthResolver.resolve(self._host) follows the documented precedence (GITHUB_APM_PAT_{ORG} -> GITHUB_APM_PAT -> GITHUB_TOKEN -> GH_TOKEN -> git credential fill). Previously this chain was called against "github.com" regardless of GITHUB_HOST, causing GHES tokens to be missed. The fix is correct.

Thread safety: _ensure_auth() is called before ThreadPoolExecutor spawns workers (eagerly via _get_resolver() at line 396, confirmed in the diff). _auth_resolved, _github_token, and _host_info are all set before workers read them. AuthContext is frozen. Consistent with the Auth Expert guidance.

Offline mode: _ensure_auth() short-circuits with _auth_resolved = True and _github_token = None. Offline builds do not attempt network auth. Correct.

Side-effect coupling (minor): self._host_info = AuthResolver.classify_host(self._host) is set inside _resolve_github_token() rather than in _ensure_auth() directly. The defensive fallback in _fetch_remote_metadata() (host_kind = self._host_info.kind if self._host_info else "github") handles the None case, so the failure mode is graceful. This is a mild design smell -- not a correctness bug.

ADO / non-GitHub hosts: _resolve_url_source() now parses ADO-style URLs via DependencyReference.parse(), extracting owner/repo and resolving against GITHUB_HOST. The documented limitation (cross-host resolution deferred to #1010) is appropriate.

No regressions to AuthResolver precedence, host classification, or credential leakage surface.


OSS Growth Hacker: This fix closes a gap that blocked enterprise customers from using apm marketplace build on GHES, while apm install already worked. With this PR, APM has a consistent GHES story across its main commands.

Story angle for release notes: "APM marketplace now works with GitHub Enterprise Server -- export GITHUB_HOST=corp.ghe.com && apm marketplace build resolves, authenticates, and fetches metadata from your GHES instance using the same token you already configured." This reinforces the enterprise-readiness frame without requiring new auth setup.

The CHANGELOG entries are clean raw material. The type: url limitation note and #1010 forward reference show maturity -- shipping the immediate fix while being transparent about what's next builds community trust.

Side-channel to CEO: GHES parity across install + marketplace build is a concrete enterprise unlock. Worth a dedicated bullet in the next release post targeting enterprise DevEx leads.


CEO arbitration

The five specialists and the Auth Expert are in strong agreement: this is a correct, well-tested, well-documented bug fix. The lazy-init pattern with _ensure_auth() is clean; the _host_info side-effect in _resolve_github_token() is a minor cleanliness observation, not a correctness issue, and fixing it inline would increase the PR's diff scope without adding safety value -- track in a follow-up if desired. The security posture is neutral (no new surfaces, pre-existing token-in-URL pattern, explicit graceful-degradation paths). The Auth Expert confirms no regressions to AuthResolver precedence or host classification.

The Growth Hacker's framing is sound: GHES parity across install and marketplace build is a concrete enterprise unlock worth calling out in the release narrative. No strategic concerns.

Ratification: APPROVE. The change ships the right fix at the right scope.


Required actions before merge

  1. None. The disposition is a clean APPROVE. The _host_info side-effect coupling in _resolve_github_token() is noted but is not blocking -- it is a gracefully handled degradation path with defensive code in _fetch_remote_metadata().

Optional follow-ups

  • feat: ADO marketplace support (marketplace.yml with Azure DevOps repos) #1010 (cross-host resolution): The type: url sources strip the URL host and resolve against GITHUB_HOST. True cross-host resolution (routing to the URL's actual host) is the natural next step; the existing #1010 reference in the code and docs is the right place to track this.
  • _host_info side-effect refactor: Move the AuthResolver.classify_host(self._host) call from inside _resolve_github_token() into _ensure_auth() directly, making the data-flow explicit and removing the implicit side effect. Low urgency; purely a readability improvement.
  • Release narrative: The OSS Growth Hacker suggests a dedicated bullet in the next release post highlighting GHES parity across install and marketplace build for enterprise audiences.

Generated by PR Review Panel for issue #1009 · ● 1.2M ·

@danielmeppiel danielmeppiel enabled auto-merge April 28, 2026 16:50
@danielmeppiel danielmeppiel added this pull request to the merge queue Apr 28, 2026
Merged via the queue into main with commit 3fdaa94 Apr 28, 2026
39 checks passed
@danielmeppiel danielmeppiel deleted the fix/1008-marketplace-build-ghe branch April 28, 2026 16:58
@danielmeppiel danielmeppiel added this to the 0.11.0 milestone Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

panel-review Trigger the apm-review-panel gh-aw workflow

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] apm marketplace build fails for GitHub Enterprise Server repositories

3 participants