refactor: replace per-file GitHub API calls with ZIP archive downloads#2607
refactor: replace per-file GitHub API calls with ZIP archive downloads#2607
Conversation
📝 WalkthroughWalkthroughAdds a GitHubArchiveService to download and parse GitHub ZIP archives (yauzl), rewires TemplateFetcherService to use archive-based reads, adds TemplateRefreshService and providers, updates DI wiring and tests, and introduces TEMPLATE_REFRESH_INTERVAL_SECONDS / TEMPLATE_REFRESH_ENABLED env vars. Changes
Sequence DiagramsequenceDiagram
participant TFS as TemplateFetcherService
participant GAS as GitHubArchiveService
participant GitHub as GitHub (ZIP)
participant Parser as yauzl Parser
participant Cache as Archive Cache
TFS->>GAS: getArchive(owner, repo, ref)
GAS->>Cache: lookup cache key
alt cache hit
Cache-->>GAS: ArchiveReader
else cache miss
GAS->>GitHub: GET /{owner}/{repo}/archive/{ref}.zip
GitHub-->>GAS: ZIP bytes
GAS->>Parser: parse ZIP (yauzl)
Parser-->>GAS: entries + root prefix
GAS->>Cache: store ArchiveReader
end
GAS-->>TFS: ArchiveReader
TFS->>GAS: readFile(path) / listDirectory(path)
GAS-->>TFS: file contents / directory entries
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
Codecov Report❌ Patch coverage is ❌ Your patch status has failed because the patch coverage (68.84%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #2607 +/- ##
==========================================
+ Coverage 50.06% 50.41% +0.35%
==========================================
Files 1025 1029 +4
Lines 29025 29186 +161
Branches 6626 6631 +5
==========================================
+ Hits 14531 14715 +184
+ Misses 14213 14076 -137
- Partials 281 395 +114
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@apps/api/src/template/services/github-archive/github-archive.service.ts`:
- Around line 15-31: The `#cache` in getArchive grows unbounded by ref causing
potential memory bloat; replace the raw Map<string, Promise<ArchiveReader>> with
a bounded cache (e.g., an LRU or size-limited wrapper) so that when adding a new
entry from getArchive (the cacheKey created in getArchive and the promise from
`#downloadAndParse`) older entries are evicted once a max size or total memory
threshold is reached; ensure eviction also deletes from the underlying storage
so failed downloads still remove keys (preserve the existing try/catch that
deletes on error) and reference the existing symbols (`#cache`, getArchive,
`#downloadAndParse`, ArchiveReader) when implementing the bounded-cache
replacement.
- Around line 60-98: Normalize incoming paths in the `#createArchiveReader` so
leading "./" or "/" don't break lookups: inside readFile and listDirectory,
strip any leading "./" and leading "/" from the received path (and treat empty
or "." as ""), then build fullPath and dirPath by concatenating rootPrefix and
the normalized path with exactly one "/" separator (avoid double slashes), and
compute entryPath from the normalized path when pushing DirectoryEntry; update
both readFile and listDirectory logic to use this normalizedPath variable so all
archive lookups are consistent.
In `@apps/api/src/template/services/template-fetcher/template-fetcher.service.ts`:
- Around line 81-94: fetchFileContent currently throws for missing refs/files
and archive read failures which lets transient archive errors abort the whole
refresh; update TemplateFetcherService so fetchTemplatesFromReadme wraps calls
to fetchFileContent (and any calls to this.#archiveService.getArchive /
archive.readFile) in a try/catch, log the error, and return a safe fallback
(e.g., null or an empty category list) instead of letting the error propagate;
ensure fetchTemplatesFromReadme returns the safe fallback when fetchFileContent
fails so the overall template refresh remains resilient.
apps/api/src/template/services/github-archive/github-archive.service.ts
Outdated
Show resolved
Hide resolved
apps/api/src/template/services/github-archive/github-archive.service.ts
Outdated
Show resolved
Hide resolved
apps/api/src/template/services/template-fetcher/template-fetcher.service.ts
Show resolved
Hide resolved
apps/api/src/template/services/template-fetcher/template-fetcher.service.ts
Outdated
Show resolved
Hide resolved
apps/api/src/template/services/github-archive/github-archive.service.ts
Outdated
Show resolved
Hide resolved
…late refresh service
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@apps/api/src/template/config/env.config.ts`:
- Around line 5-8: The env schema for TEMPLATE_REFRESH_INTERVAL_SECONDS uses
z.number(), which rejects string env values; update the schema in
apps/api/src/template/config/env.config.ts to use z.coerce.number() instead of
z.number() so environment strings (e.g., "900") are coerced to numbers, keeping
.optional() and .default(15 * 60) as-is and leaving other fields unchanged.
In `@apps/api/src/template/services/github-archive/github-archive.service.ts`:
- Around line 95-108: The extraction silently skips entries on
zipfile.openReadStream or readStream errors; update the handler in
github-archive.service.ts (the zipfile.openReadStream callback and the
readStream.on("error") handler) to capture and surface errors instead of just
calling zipfile.readEntry(): log the error with context (include relativePath
and the original error) via the service logger or push the error into a
collection (e.g., an extractionErrors array) so callers can inspect/report
missing files, then continue processing by calling zipfile.readEntry().
- Around line 43-55: The fetch in async `#downloadAndParse`(owner, repo, ref) can
hang; wrap the request with an AbortController timeout: create an
AbortController, set a timer (e.g., 30s) that calls controller.abort(), pass
controller.signal to fetch(url, { signal }), and clear the timer after fetch
completes; ensure you catch the abort error and rethrow a descriptive Error
(including url and timeout) so the caller fails fast instead of hanging in
`#downloadAndParse` and downstream in `#extractArchive/`#createArchiveReader.
🧹 Nitpick comments (2)
apps/api/src/template/services/github-archive/github-archive.service.ts (1)
19-21: Consider injecting LoggerService for observability.This service performs network I/O and archive parsing but has no logging. Injecting
LoggerServicewould provide visibility into download times, parsing errors, and cache behavior, consistent with other services in the codebase.apps/api/src/template/services/template-gallery/template-gallery.service.ts (1)
36-42: Consider injectingGitHubArchiveServicefor improved testability.Creating
new GitHubArchiveService()directly couples the service instantiation. While functional, injecting it as a dependency would make unit testing easier by allowing mock injection without modifying the constructor call site.
apps/api/src/template/services/github-archive/github-archive.service.ts
Outdated
Show resolved
Hide resolved
| @@ -111,6 +111,7 @@ | |||
| "tsyringe": "^4.10.0", | |||
There was a problem hiding this comment.
🔄 Carefully review the package-lock.json diff
Resolve the comment if everything is ok
+ node_modules/@types/yauzl 2.10.3
+ node_modules/@types/yazl 3.3.0
+ node_modules/yazl/node_modules/buffer-crc32 1.0.0
+ node_modules/yazl 3.3.1
- node_modules/git-semver-tags/node_modules/conventional-commits-filter 5.0.0
- node_modules/git-semver-tags/node_modules/conventional-commits-parser 6.2.1 … template fetcher
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
apps/api/src/template/services/template-fetcher/template-fetcher.service.ts (1)
144-148: ReplacePromise<any>with a proper return type.The return type
Promise<any>violates the coding guideline to never use typeany. Based on the method logic, this should return the template type fromprocessTemplateornull.🔧 Suggested fix
private async processTemplateSource( templateSource: TemplateSource, directoryItems: GithubDirectoryItem[], options: { includeConfigJson?: boolean } - ): Promise<any> { + ): Promise<Template | null> {You may need to import the
Templatetype if not already imported:import type { Category, Template, TemplateSource } from "../../types/template.ts";As per coding guidelines:
**/*.{ts,tsx,js}: Never use typeanyor cast to typeany. Always define the proper TypeScript types.
🤖 Fix all issues with AI agents
In `@apps/api/src/template/config/env.config.ts`:
- Around line 4-9: The TEMPLATE_REFRESH_ENABLED schema currently uses
z.coerce.boolean() which treats any non-empty string (e.g. "false") as true;
replace that with z.preprocess to explicitly map common string representations
("true","1","yes" => true; "false","0","no" => false) before validating with
z.boolean().optional().default(true), using z.preprocess(...) and leaving
TEMPLATE_REFRESH_INTERVAL_SECONDS and GITHUB_PAT unchanged; update the
TEMPLATE_REFRESH_ENABLED declaration to use this explicit string-to-boolean
preprocessing so environment values like "false" correctly become false.
In `@apps/api/src/template/services/github-archive/github-archive.service.ts`:
- Around line 30-43: The cache key in getArchive (cacheKey) currently ignores
the fileFilter so cached promises from a previous call (via this.#cache) can be
reused for different filters; update the caching strategy in getArchive to
include a stable identifier for fileFilter (e.g., append fileFilter?.toString()
or a provided filterId) to the cacheKey or maintain a nested cache keyed by
filter identity, then use that augmented key when calling this.#cache.get,
this.#cache.set and this.#cache.delete so each distinct filter/ref/owner/repo
combination gets its own cached promise from this.#downloadAndParse.
🧹 Nitpick comments (4)
apps/api/src/template/services/template-gallery/template-gallery.service.spec.ts (1)
246-262: Alignsetuphelper signature with test guidelines.The helper should accept a single inline-typed parameter to support overrides and avoid shared state.
As per coding guidelines: Use `setup` function instead of `beforeEach` in test files. The `setup` function must be at the bottom of the root `describe` block, accept a single parameter with inline type definition, avoid shared state, and not have a specified return type.♻️ Suggested adjustment
- function setup() { + function setup({ githubPAT = "test-pat" }: { githubPAT?: string } = {}) { const logger = mock<LoggerService>(); const fsMock = mock<FileSystemApi>({ access: jest.fn(() => Promise.reject(new Error("File not found"))) }); const dataFolderPath = "/data"; const service = new TemplateGalleryService(logger, fsMock, { - githubPAT: "test-pat", + githubPAT, dataFolderPath });apps/api/src/template/services/github-archive/github-archive.service.spec.ts (1)
83-99: Alignsetuphelper signature with test guidelines.Adopt a single inline-typed parameter for the setup helper.
As per coding guidelines: Use `setup` function instead of `beforeEach` in test files. The `setup` function must be at the bottom of the root `describe` block, accept a single parameter with inline type definition, avoid shared state, and not have a specified return type.♻️ Suggested adjustment
- function setup() { - const logger = mock<LoggerService>(); + function setup({ logger = mock<LoggerService>() }: { logger?: LoggerService } = {}) { const service = new GitHubArchiveService(logger);apps/api/src/template/services/template-fetcher/template-fetcher.service.spec.ts (1)
472-483: Alignsetuphelper signature with test guidelines.The helper should accept a single inline-typed parameter to support overrides.
As per coding guidelines: Use `setup` function instead of `beforeEach` in test files. The `setup` function must be at the bottom of the root `describe` block, accept a single parameter with inline type definition, avoid shared state, and not have a specified return type.♻️ Suggested adjustment
- function setup() { + function setup({ githubPAT = "test-pat" }: { githubPAT?: string } = {}) { const templateProcessor = mock<TemplateProcessorService>(); const logger = mock<LoggerService>(); const octokit = mockDeep<Octokit>(); const archiveService = mock<GitHubArchiveService>(); const service = new TemplateFetcherService(templateProcessor, logger, () => octokit, archiveService, { - githubPAT: "test-pat" + githubPAT }); return { service, templateProcessor, logger, octokit, archiveService }; }apps/api/src/template/services/template-fetcher/template-fetcher.service.ts (1)
124-133: Consider extracting the hardcoded chain-registry reference.The
"master"branch is hardcoded for the cosmos/chain-registry repository. If the default branch changes (as many repos have moved to"main"), this would silently fail.Consider extracting this to a constant similar to
REPOSITORIESfor consistency and easier maintenance.♻️ Suggested refactor
export const REPOSITORIES = { "awesome-akash": { repoOwner: "akash-network", repoName: "awesome-akash", mainBranch: "master" }, "cosmos-omnibus": { repoOwner: "akash-network", repoName: "cosmos-omnibus", mainBranch: "master" + }, + "chain-registry": { + repoOwner: "cosmos", + repoName: "chain-registry", + mainBranch: "master" } };Then use it in
fetchChainRegistryData:private async fetchChainRegistryData(chainPath: string): Promise<GithubChainRegistryChainResponse> { - const archive = await this.#archiveService.getArchive("cosmos", "chain-registry", "master", isTemplateFile); + const { repoOwner, repoName, mainBranch } = REPOSITORIES["chain-registry"]; + const archive = await this.#archiveService.getArchive(repoOwner, repoName, mainBranch, isTemplateFile);
Summary
repos.getContentAPI calls per refresh cycle with 3 ZIP archive downloads from GitHub's CDN (not rate-limited)GitHubArchiveServicethat downloads and parses repo archives usingyauzl, providingreadFile()andlistDirectory()via anArchiveReaderinterfaceraw.githubusercontent.cominstead of the GitHub APITemplateRefreshService, so new templates appear without redeploying the APITemplateGalleryServiceinto a shared DI singleton so the controller and refresh service operate on the same in-memory cachesetTimeoutchain (notsetInterval) to guarantee no overlapping refreshes and clean disposal on shutdownGITHUB_PATis not configuredHow the refresh works
TEMPLATE_REFRESH_INTERVAL_SECONDS), the refresh service callsTemplateGalleryService.refreshCache()refreshCache()rebuilds the cache file (SHA-based — skips ZIP download if nothing changed), then resets in-memory caches (#galleriesCache,#parsedTemplates) and clears theGitHubArchiveServicearchive cache to free old entriesNew files
providers/template-gallery.provider.tsTemplateGalleryServicevia DIproviders/template-refresh.provider.tsTemplateRefreshServiceasAPP_INITIALIZERwithDisposableRegistryservices/template-refresh/template-refresh.service.tssetTimeout-chain periodic refresh with error logging and dispose cleanupTest plan
npx tsc --noEmit)GITHUB_PATset, verify "TEMPLATE_REFRESH_REGISTERED" log appears, temporarily setTEMPLATE_REFRESH_INTERVAL_SECONDS=30and confirm refresh logs appearSummary by CodeRabbit
New Features
Improvements
Tests
✏️ Tip: You can customize this high-level summary in your review settings.