support virtual packages on generic git hosts (Gitea)#587
support virtual packages on generic git hosts (Gitea)#587ganesanviji wants to merge 26 commits intomicrosoft:mainfrom
Conversation
|
@microsoft-github-policy-service agree |
Review FeedbackThanks @ganesanviji for adding Gitea support! The raw URL download approach is a good idea. A few issues need addressing: 1. API version change breaks GitLab (critical)Changing Options:
2. Virtual package detection too broad
3. Bare
|
danielmeppiel
left a comment
There was a problem hiding this comment.
As per previous comment
|
Hi @danielmeppiel , Thanks for review and I have addressed all the reviewed suggestions, 1. API version change breaks GitLab (critical)Addressed with the preferred approach. For non-GitHub/GHE hosts we now attempt If that returns a non-200 we fall through to API version negotiation, trying 2. Virtual package detection too broadWe did not use
The distinction is driven by a new 3. Bare
|
|
@danielmeppiel - Could you please review the changes and update is there any changes or explanation needed on these changes AS SOON AS POSSIBLE. It would be very helpful to include the gitea support in APM in next release to use. |
There was a problem hiding this comment.
Pull request overview
Adds broader support for installing virtual packages from non-GitHub Git hosts (with a focus on Gitea), by updating dependency parsing heuristics and expanding the downloader’s raw/API fetching logic, plus new regression tests around hostname classification and generic-host URL handling.
Changes:
- Add
is_gitlab_hostname()and use it during virtual-package detection to treat GitLab nested-group paths as repo paths by default. - Extend generic-host downloads with a raw URL attempt and API version “negotiation”.
- Add unit tests covering GitLab hostname detection, Gitea/generic URL parsing expectations, and generic-host download behavior.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_github_host.py | Adds tests for GitLab hostname detection. |
| tests/unit/test_generic_git_urls.py | Adds Gitea/generic-host virtual package detection regression tests. |
| tests/test_github_downloader.py | Adds tests for generic-host raw download + API version fallback behavior. |
| src/apm_cli/utils/github_host.py | Introduces is_gitlab_hostname() helper. |
| src/apm_cli/models/dependency/reference.py | Adjusts virtual-package detection and standard URL parsing behavior for generic hosts/GitLab. |
| src/apm_cli/deps/github_downloader.py | Adds generic-host raw fetch and API version candidate list changes. |
Copilot's findings
- Files reviewed: 6/6 changed files
- Comments generated: 5
| class TestGitLabApiVersionNegotiation: | ||
| """API version negotiation: v1 -> v3 -> v4 for generic hosts.""" | ||
|
|
||
| def setup_method(self): | ||
| with patch.dict(os.environ, {}, clear=True), _CRED_FILL_PATCH: | ||
| self.downloader = GitHubPackageDownloader() | ||
|
|
||
| def test_gitlab_v4_reached_after_v1_and_v3_return_404(self): | ||
| """GitLab uses /api/v4/ -- negotiation must try v1, v3, then v4.""" | ||
| dep_ref = DependencyReference.parse("gitlab.myorg.com/owner/repo") | ||
| expected = b"gitlab file content" | ||
|
|
||
| side_effects = [ | ||
| _make_resp(404), # raw URL | ||
| _make_resp(404), # v1 | ||
| _make_resp(404), # v3 | ||
| _make_resp(200, expected), # v4 | ||
| ] | ||
| with patch.object(self.downloader, "_resilient_get", side_effect=side_effects) as mock_get: | ||
| result = self.downloader.download_raw_file(dep_ref, "skill.md", "main") | ||
|
|
||
| assert result == expected | ||
| urls = [c[0][0] for c in mock_get.call_args_list] | ||
| assert "/api/v1/" in urls[1] | ||
| assert "/api/v3/" in urls[2] | ||
| assert "/api/v4/" in urls[3] | ||
|
|
There was a problem hiding this comment.
These tests assume a GitLab fallback of '/api/v4/repos/{owner}/{repo}/contents/...', but that's not a valid GitLab API shape (GitLab uses /api/v4/projects/.../repository/files...). As written, this test suite will lock in behavior that won't work against real GitLab instances and may hide regressions. Either remove the GitLab framing (treat this as "try v1 then v3 for Gitea/Gogs") or update both implementation and tests to use GitLab's actual endpoints.
| # GitLab supports nested groups (group/subgroup/repo), so the full | ||
| # path is the repo -- no shorthand subdirectory splitting. | ||
| # Use https://gitlab.com/group/subgroup/repo.git for GitLab nested | ||
| # groups; shorthand subdirectory syntax is not supported for GitLab. | ||
| # All other generic hosts (Gitea, Bitbucket, self-hosted, etc.) use | ||
| # the owner/repo convention, so extra segments are a virtual subdir. |
There was a problem hiding this comment.
This PR changes how virtual packages are detected/handled for generic FQDN hosts (and introduces GitLab-specific nested-group behavior). The Starlight docs and the apm-guide usage doc currently document virtual package rules and the "dict form required when shorthand is ambiguous" note, but they don't describe the generic-host behavior being introduced here (e.g., whether subdirectory virtual packages are supported via shorthand on non-GitHub hosts, or require object form). Please update the relevant docs pages so users of Gitea/self-hosted hosts know which syntax is supported and when they must use the object form.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@copilot apply changes based on the comments in this thread |
sergio-sisternes-epam
left a comment
There was a problem hiding this comment.
Thanks for the continued work on this, @ganesanviji! The raw URL approach and API version negotiation are solid ideas. Two blockers need attention before this can merge:
-
Tests vs implementation mismatch (4 CI failures):
min_base_segments = 2for non-GitLab generic hosts means 3+ segment paths are treated as virtual, buttest_three_segment_gitea_path_is_not_virtualand siblings expect them NOT to be virtual. One side needs to change -- please clarify the intended behavior for nested-group repos on Gitea/generic hosts and align tests with implementation. -
GitLab
/api/v4/repos/...doesn't exist: GitLab's v4 API uses/api/v4/projects/{id}/repository/files/..., not the/repos/.../contents/format. Suggest removing v4 from the candidate list (stick to v1/v3 for Gitea/Gogs) or adding a separate GitLab-specific code path.
Happy to help iterate on the virtual package detection design if useful!
| elif is_gitlab_hostname(validated_host): | ||
| min_base_segments = len(path_segments) | ||
| else: | ||
| min_base_segments = 2 |
There was a problem hiding this comment.
Blocking: This sets min_base_segments = 2 for all non-GitLab generic hosts (Gitea, Bitbucket, self-hosted git). That means gitea.myorg.com/group/subgroup/repo parses as repo_url="group/subgroup" + virtual_path="repo" + is_virtual=True.
But your tests (test_three_segment_gitea_path_is_not_virtual, test_four_segment_generic_path_without_indicators_is_not_virtual) expect is_virtual=False with repo_url="group/subgroup/repo".
This is the root cause of the 4 CI failures. Either:
- Change this to
min_base_segments = len(path_segments)(all generic hosts treat full path as repo, require dict form for virtual), or - Update the tests to match the current 2-segment split behavior.
The Copilot bot suggested the first option -- I agree that's safer, since Gitea also supports nested orgs/groups.
There was a problem hiding this comment.
Yes, you are correct. but when I go with this logic and try to use the below structure of gitea repo to install, I am facing the issue and installation process is not completing. So, I have added this logic to work with the below structure of gitea repo to mention in apm.yml file.
apm install gitea.host.com/group/repo/skills/create-pull-request#Skill_Feature
|
@danielmeppiel / @sergio-sisternes-epam - I have addressed the review comments. Could you please check my comments and approve it? |
danielmeppiel
left a comment
There was a problem hiding this comment.
Thanks for pushing this forward — Gitea / generic-git-host support is genuinely wanted by the community, and the dict-form download path here is salvageable. Requesting changes because the current shape has a few hard blockers that should land before this can merge. None of them require throwing the PR out — most are scope reductions.
Blockers
1. Parsing heuristic for non-GitLab generic hosts is broken on nested groups.
min_base_segments=2 causes gitea.example.com/group/subgroup/repo to parse as repo_url=group/subgroup, virtual=repo — which loses the actual repo name. The PR's own tests in test_generic_git_urls.py assert the opposite behavior and currently fail in CI. This is a fundamental ambiguity: shorthand host/a/b/c cannot be disambiguated offline on platforms that support nested groups (Gitea 1.19+, Bitbucket Server, GitLab).
Recommendation: drop the shorthand-virtual heuristic for generic hosts entirely and require dict form (repo: + path:) for ambiguous cases — this matches existing APM convention and is unambiguous. If shorthand for generic hosts is wanted, an explicit // separator (Go-modules style) would be the clean approach, but that's a larger design discussion best left to a follow-up.
2. /api/v4/repos/{owner}/{repo}/contents/... is not a real GitLab endpoint.
GitLab v4 actually exposes /api/v4/projects/{owner}%2F{repo}/repository/files/{path}?ref={ref}. The current path will always 404 against real GitLab; tests pass only because the HTTP layer is mocked. Either swap in the correct endpoint or remove the v4 attempt altogether.
3. v4 only appears in the fallback ref loop, not the primary attempt.
GitLab servers that only speak v4 will never be reached on the initial ref=main request, so files appear missing even when present. If v4 stays, mirror it across primary + fallback.
4. ~156 lines of unrelated YAML-serialization regression tests deleted with no replacement.
Please restore those — they cover behavior orthogonal to this PR and shouldn't be collateral.
5. CI is red. The four failing tests in test_generic_git_urls.py need to be green before re-review.
Non-blocking — worth addressing in this PR if quick
- The fallback ref loop swallows non-404 errors (401/500 reported as "file not found"). Worth bubbling auth/server errors up with the host name so users can debug.
- No
verbose_callbackbreadcrumbs on the up-to-7 silent attempts per file — when this fails, users have no way to see which endpoint was tried. is_gitlab_hostname()usingstartswith("gitlab.")is fragile (org-managed hostnames often don't follow that convention). Convention-based detection is fine as a heuristic, but please document the limitation.
Tracked separately (NOT blockers for this PR)
I've filed #773 to track architectural debt this PR exposes — primarily the get_unique_key() lockfile identity collision (which becomes much easier to hit with multi-host support) plus a few related polish items in the download/auth path. Those are pre-existing issues that shouldn't gate your contribution.
Summary
Suggested split: keep the dict-form Gitea download path in this PR, drop the shorthand heuristic, fix the v4 endpoint (or remove it), restore the deleted tests, get CI green. That's a tight, reviewable, mergeable PR. Happy to re-review as soon as those land.
|
@danielmeppiel / @sergio-sisternes-epam - Could you please review my below comment on this change and suggest me on this? This is the one failure I am facing in CI. I think we can approve and change the logic in CI based on this change. What is your thought on this? |
|
Hi @danielmeppiel / @sergio-sisternes-epam Blockers ResolvedAll blockers from the review have been addressed in the latest commits:
|
…etection These two test classes were accidentally removed from the branch. Restoring them from upstream main (8665f4b) to ensure full coverage is preserved alongside the Gitea virtual package detection changes.
APM Review Panel VerdictDisposition: REQUEST_CHANGES (two minor pre-merge fixes required; core approach is sound) Per-persona findingsPython Architect: This is a routine-scope PR: extends one private method ( 1. OO / class diagram classDiagram
direction LR
class GitHubPackageDownloader {
<<IOBoundary>>
+download_raw_file(dep_ref, file_path, ref) bytes
-_download_github_file(dep_ref, file_path, ref, verbose_callback) bytes
-_try_raw_download(owner, repo, ref, file_path) bytes
-_resilient_get(url, headers, timeout) Response
}
class DependencyReference {
<<ValueObject>>
+host str
+repo_url str
+virtual_path str
+is_virtual bool
+_detect_virtual_package(dep_str) classmethod
+_parse_standard_url(dep_str, is_virtual, virtual_path, validated_host) classmethod
}
class GitHubHostUtils {
<<Pure>>
+is_github_hostname(h) bool
+is_gitlab_hostname(h) bool
+is_azure_devops_hostname(h) bool
+is_supported_git_host(h) bool
}
class AuthResolver {
<<Strategy>>
+resolve(host, org, port) AuthContext
}
class AuthContext {
<<ValueObject>>
+token str
}
GitHubPackageDownloader ..> DependencyReference : reads
GitHubPackageDownloader *-- AuthResolver : delegates
AuthResolver ..> AuthContext : returns
DependencyReference ..> GitHubHostUtils : uses
note for GitHubPackageDownloader "Chain of Responsibility: raw URL -> API v1 -> API v3 (new, generic hosts)"
note for GitHubHostUtils "is_gitlab_hostname: defined and tested but has zero production call sites in this PR"
class GitHubPackageDownloader:::touched
class DependencyReference:::touched
class GitHubHostUtils:::touched
classDef touched fill:#fff3b0,stroke:#d47600
2. Execution flow diagram flowchart TD
A["download_raw_file(dep_ref, file_path, ref)"]
B["_download_github_file()\nsrc/apm_cli/deps/github_downloader.py"]
C{"host == github.com\nand no token?"}
D["[NET] _try_raw_download()\nraw.githubusercontent.com CDN"]
E{"status 200?"}
F["return content (bytes)"]
G{"host not github.com\nand not .ghe.com?\nNEW BLOCK"}
H["[NET] raw_url = (host/redacted) token ... if token"]
I{"status 200?"}
J["Build api_url_candidates list\ngithub.com: api.github.com\nghe.com: api.HOST\ngeneric: /api/v1/ then /api/v3/"]
K["[NET] _resilient_get(api_url_candidates[0])"]
L{"raise_for_status\npasses?"}
M["[NET] try api_url_candidates[1:] in order"]
N{"any non-404?"}
O{"ref in main/master?"}
P["raise RuntimeError\n(specific ref -- no fallback)"]
Q["Build fallback_url_candidates\nsame host dispatch, fallback_ref"]
R["[NET] try fallback_url_candidates"]
S{"any 200?"}
T["raise RuntimeError\n(tried ref + fallback_ref)"]
U["raise HTTPError\n(401/403 auth/rate-limit)"]
A --> B
B --> C
C -->|yes| D
D --> E
E -->|yes| F
E -->|no| G
C -->|no| G
G -->|yes| H
H --> I
I -->|yes| F
I -->|no| J
G -->|no| J
J --> K
K --> L
L -->|yes| F
L -->|404| M
M --> N
N -->|yes| F
N -->|no -- all 404| O
O -->|specific ref| P
O -->|main or master| Q
Q --> R
R --> S
S -->|yes| F
S -->|no| T
L -->|401/403| U
Design patterns
Three findings:
CLI Logging Expert: No issues. The new generic host code paths reuse the existing DevX UX Expert: No CLI command surface changes -- no new flags, subcommands, or help text. Error messages follow the existing Supply Chain Security Expert: No blocking security issues.
Auth Expert: Activated --
OSS Growth Hacker: Strong growth signal from this external contributor PR.
CEO arbitrationThe panel is aligned. This is a genuine feature expansion from an external contributor with sound core logic, good test coverage, and full backward compatibility. Two items require resolution before merge: (1) the Required actions before merge
Optional follow-ups
|
…anesanviji/apm into feat/genric-host-gitea-private
APM Review Panel VerdictDisposition: REQUEST_CHANGES (two pre-merge blockers: CHANGELOG conflict and test URL assertion convention) Per-persona findingsPython Architect: This is a routine feature PR (no new abstract bases, no hierarchy restructure). Two mermaid blocks below. 1. OO / class diagram classDiagram
direction LR
class GitHubPackageDownloader {
<<IOBoundary>>
+download_raw_file(dep_ref, file_path, ref) bytes
+_download_github_file(dep_ref, file_path, ref) bytes
+_try_raw_download(owner, repo, ref, file_path) bytes
+_resilient_get(url, headers, timeout) Response
}
class DependencyReference {
<<ValueObject>>
+host str
+repo_url str
+is_virtual bool
+parse(url) DependencyReference
+_parse_standard_url(...) DependencyReference
+_detect_virtual_package(...) tuple
}
class AuthResolver {
<<Strategy>>
+resolve(host, org, port) AuthContext
}
class AuthContext {
<<ValueObject>>
+token str
+source str
}
GitHubPackageDownloader ..> DependencyReference : reads
GitHubPackageDownloader ..> AuthResolver : resolves token
AuthResolver ..> AuthContext : returns
GitHubPackageDownloader ..> AuthContext : uses for headers
class GitHubPackageDownloader:::touched
class DependencyReference:::touched
classDef touched fill:#fff3b0,stroke:#d47600
2. Execution flow diagram flowchart TD
A[download_raw_file dep_ref file_path ref] --> G{host == github.com\nand no token?}
G -->|yes| H["[NET] _try_raw_download raw.githubusercontent.com CDN"]
H --> I{200?}
I -->|yes| R1[return content]
I -->|no - try fallback ref| J["[NET] _try_raw_download fallback ref"]
J --> K{200?}
K -->|yes| R2[return content]
G -->|no| L
K -->|no| L
L{generic host?\nnot github.com\nnot .ghe.com} -->|yes| M["[NET] GET host/owner/repo/raw/ref/file\nwith Authorization header if token set"]
M --> N{200?}
N -->|yes| R3[return content]
N -->|no - pass| O
L -->|no| O
O[Build api_url_candidates list] --> P{host type}
P -->|github.com| Q["candidates = [api.github.com/repos/...]"]
P -->|.ghe.com| QQ["candidates = [api.host/repos/...]"]
P -->|generic| QQQ["candidates = [host/api/v1/..., host/api/v3/...]"]
Q --> T["[NET] GET api_url_candidates at index 0"]
QQ --> T
QQQ --> T
T --> U{200?}
U -->|yes| R4[return content]
U -->|404 + remaining candidates| V["[NET] try remaining candidates in order"]
V --> W{any 200?}
W -->|yes| R5[return content]
W -->|no| X{ref in main or master?}
X -->|non-default ref| Y[raise RuntimeError: not found at ref]
X -->|default ref| Z["Build fallback_url_candidates\n(opposite branch)"]
Z --> AA["[NET] try each fallback URL"]
AA --> AB{any 200?}
AB -->|yes| R6[return content]
AB -->|no| AC[raise RuntimeError: not found]
U -->|401 or 403| AD[raise RuntimeError: auth or rate-limit error]
Design patterns
Additional findings:
CLI Logging Expert: No issues. The two new DevX UX Expert: No CLI surface changes -- no new commands, flags, or help text to review. Supply Chain Security Expert: No new security vulnerabilities introduced.
Auth Expert: ACTIVATED (fast-path:
One gap: users of private Gitea repos who set OSS Growth Hacker: This is a strong enterprise-adoption signal. Gitea/Gogs is widely deployed in regulated industries, enterprises that cannot use GitHub.com due to data-residency requirements, and the Chinese developer ecosystem (Gitea is the leading self-hosted git platform in that market). "apm install gitea.myorg.com/owner/repo works" removes a hard adoption blocker for a meaningful segment. Side-channel to CEO: the CHANGELOG entry is story-shaped ("Virtual package support for self-hosted Git services (Gitea, Gogs)..."). Recommend featuring this in the release notes with a one-liner CEO arbitrationSpecialists are aligned. The core approach -- raw URL cascade with v1/v3 API negotiation for generic Git hosts -- is architecturally sound, correctly routes through AuthResolver, and fails closed. Two items are hard blockers. First, the CHANGELOG.md conflict: the PR branch diverged from main before the 0.9.3 cut, and the contributor appears to have merged or pulled main, causing already-released entries (#917, #884, #849, #887, #882, #915, #885) to re-appear in the diff as additions to Second, the URL substring assertions in the new tests (lines 1905, 1936, 1937, 1953, 1978 of The DRY violation (duplicated The Growth Hacker's documentation gap is a valid follow-up issue, not a gate. @ganesanviji -- great contribution. Two concrete changes needed (CHANGELOG cleanup + urlparse assertions), then this is ready. Required actions before merge
Optional follow-ups
|
|
Hi @ganesanviji -- friendly check-in: this PR has been in Concretely, the outstanding feedback is: Let us know either way -- a quick "still on it" or "going to close it" reply is enough. Thanks for the contribution! |
Description
Add support for installing virtual packages from self-hosted Git services like Gitea. Currently, APM only supports virtual packages (subdirectories) on GitHub. This change enables users with Gitea to install packages from subdirectories within repositories.
Changes:
DependencyReferenceto recognize subdirectory packages on generic Git hosts (any FQDN)/api/v3to/api/v1for better compatibility with Gitea and other Git servicesMore details about the changes:
✅ Change 1: Virtual Package Detection (reference.py)
Analysis: This only affects generic Git hosts, not GitHub. Allows subdirectory packages to be detected as virtual even without specific file extensions. Safe because:
GitHub uses separate logic path (is_generic_host = False)
Validation still requires package markers (apm.yml, SKILL.md, etc.) in the subdirectory
No impact on existing GitHub virtual file detection
✅ Change 2: Authenticated Raw Downloads (github_downloader.py)
Analysis: Improves private repo support. Safe because:
Only applies to generic hosts, not GitHub
Falls back to API if raw fails
Uses standard Authorization header format
✅ Change 3: API Endpoint Update
Analysis: Gitea uses /api/v1/, GitHub uses /api/v3/. Safe because:
GitHub still uses /api/v3/
Gitea API v1 is compatible for contents endpoint
Falls back gracefully if endpoint doesn't exist
Motivation:
Enterprise teams using self-hosted Git services (Gitea) cannot currently use APM to install packages from repository subdirectories. This is a significant limitation for organizations that don't use GitHub. These changes enable APM to work seamlessly across all Git hosting platforms.
Type of change
Testing
Tested locally
All existing tests pass
Added tests for new functionality (if applicable)