perf(azure): batched SKU catalogue lookup eliminates N+1 in recommendation converters by cristim · Pull Request #81 · LeanerCloud/CUDly

cristim · 2026-04-25T13:27:21Z

Summary

Adds a lazy, single-shot SKU catalogue lookup to three of the four Azure recommendation converters (cache, cosmos, database) so the previously-deferred Details fields populate without N+1 SDK calls.

Cache: CacheDetails.Shards from armredis.Properties.ShardCount.
Cosmos: NoSQLDetails.APIType (sql / mongodb / cassandra / gremlin / table) from the dominant Cosmos account Kind + Capabilities in the subscription.
Database: DatabaseDetails.EngineVersion from armsql.CapabilitiesClient.ListByLocation traversal.

Each converter calls a per-service cachedSKULookup (or cachedAPIType on cosmos). The catalogue is fetched ONCE per client lifetime via sync.Once; many converter calls in the same GetRecommendations run hit the in-memory map. A failed catalogue fetch leaves the cache nil and converters fall back to the previous empty-Details behaviour with a one-time WARN log — the conversion itself never fails.

Compute scoped back: common.ComputeDetails (in pkg/common/types.go) currently exposes only InstanceType / Platform / Tenancy / Scope. The two enrichments compute would supply (vCPU, MemoryGB) have no struct field to write into. Adding fields to a shared pkg/common type would touch all 3 cloud providers, the frontend, and the matchers — out of scope for this perf change. Tracked as a narrower follow-up in known_issues/10_azure_provider.md.

Test plan

New per-service tests in the existing *_test.go:

_PopulatesShardsFromSKUCache / _PopulatesAPIType (5 sub-cases) / _PopulatesEngineVersion — cache hit populates the previously-deferred field.
_PagerErrorFallsBack / _CapabilitiesErrorFallsBack — fetch error leaves the field empty and conversion still succeeds (graceful-degradation contract).
_CachedSKULookup_FetchedOnce / _CachedAPIType_FetchedOnce — many converter calls share a single catalogue fetch (the N+1 invariant pinned in tests).
_AmbiguousAPIType — multi-API-type Cosmos subscription leaves APIType empty rather than guessing.
Pre-existing _PopulatesAllFields tests now inject empty mock pagers so they don't hit the real Azure API on first lookup.
cd providers/azure && go test ./... — all green.
go test ./... (root) — all green.
go vet ./... — clean.
Pre-commit hooks (gofmt, gocyclo, gosec, etc.) — all green.

Closes #49 for cache / cosmos / database. Compute follow-up filed as a narrower issue in the known-issues doc.

…ation converters The four Azure recommendation converters (compute, database, cache, cosmosdb) previously left SKU-derived `Details` fields empty because populating them inline would have triggered an N+1 SDK call per recommendation. This change wires three of them to a lazily-cached, single-shot SKU catalogue lookup, gated by `sync.Once` and held for the client's lifetime. Per-service: - **Cache** (`services/cache`): caches `Properties.ShardCount` per SKU from `armredis.Client.NewListBySubscriptionPager`. Converter populates `CacheDetails.Shards` for Premium-tier clustered caches. - **Cosmos** (`services/cosmosdb`): caches the dominant Cosmos APIType (mongodb / cassandra / gremlin / table / sql) across the subscription's accounts via `armcosmos.DatabaseAccountsClient.NewListPager`, mapping `account.Kind` + `Capabilities`. Converter populates `NoSQLDetails.APIType`. Multi-API-type subscriptions leave APIType empty rather than guessing. - **Database** (`services/database`): caches engine version per SKU from `armsql.CapabilitiesClient.ListByLocation`, traversing ServerVersion → Editions → ServiceLevelObjectives. Converter populates `DatabaseDetails.EngineVersion`. Each converter calls a per-service `cachedSKULookup` (or `cachedAPIType` on cosmos). The catalogue is fetched ONCE per client lifetime; many converter calls in the same `GetRecommendations` run hit the in-memory map. A failed catalogue fetch leaves the cache nil and converters fall back to the previous empty-Details behaviour with a one-time WARN log — the conversion itself never fails, preserving the graceful-degradation contract. **Compute scoped back**: `common.ComputeDetails` (in `pkg/common/types.go`) exposes only InstanceType/Platform/Tenancy/Scope. The two enrichments the SKU catalogue would supply for compute are vCPU and MemoryGB — neither has a struct field to write into. Adding fields to the shared type would touch all 3 cloud providers' converters, the frontend, and the matchers, well outside this perf change's scope. Documented as a narrower follow-up in `known_issues/10_azure_provider.md`. Tests per service in the existing `*_test.go`: - `_PopulatesShardsFromSKUCache` / `_PopulatesAPIType` / `_PopulatesEngineVersion` — cache hit populates the previously-deferred field. - `_PagerErrorFallsBack` / `_CapabilitiesErrorFallsBack` — fetch error leaves the field empty and conversion still succeeds. - `_FetchedOnce` — many converter calls share a single catalogue fetch (the N+1 invariant). - The pre-existing `_PopulatesAllFields` tests now inject empty mock pagers so they don't hit the real Azure API on first lookup. Refs #49

coderabbitai · 2026-04-25T13:27:26Z

Warning

Rate limit exceeded

@cristim has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 31 minutes and 52 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 31 minutes and 52 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6d7c96a5-6222-434c-83cc-0581b1bff0ec

📥 Commits

Reviewing files that changed from the base of the PR and between 96dccb8 and b3fe6e2.

📒 Files selected for processing (8)

known_issues/10_azure_provider.md
providers/azure/services/cache/client.go
providers/azure/services/cache/client_test.go
providers/azure/services/compute/client.go
providers/azure/services/cosmosdb/client.go
providers/azure/services/cosmosdb/client_test.go
providers/azure/services/database/client.go
providers/azure/services/database/client_test.go

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/azure-batched-sku-lookup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cristim · 2026-04-25T13:27:26Z

@coderabbitai review

coderabbitai · 2026-04-25T13:27:34Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

cristim · 2026-04-27T21:29:39Z

Two follow-up GitHub issues filed for the deferrals this PR explicitly scopes back:

perf(azure compute): add vCPU + MemoryGB to ComputeDetails and wire batched SKU catalogue #148 — perf(azure compute): add vCPU + MemoryGB to ComputeDetails and wire batched SKU catalogue — covers the cross-cutting pkg/common.ComputeDetails extension that compute enrichment requires. Cross-cuts AWS / Azure / GCP / frontend / matchers, which is why this PR rightly excluded it.
perf(azure database): batched per-server lookup to populate AZConfig + Deployment in DatabaseDetails #149 — perf(azure database): batched per-server lookup to populate AZConfig + Deployment in DatabaseDetails — covers the two DatabaseDetails fields this PR notes "remain deferred." Different SDK pattern (per-server armsql.ServersClient.NewListPager, not the per-region CapabilitiesClient already wired).

Both reference back here and propose mirroring the cachedAPIType / cachedSKULookup patterns this PR establishes.

Adds optional VCPU (int) and MemoryGB (float64) fields to common.ComputeDetails so per-provider catalogue lookups can enrich compute recommendations with sizing data. Both fields use json omitempty — converters that don't yet wire a catalogue leave them at the zero value and the API payload stays clean. GetDetailDescription appends " (N vCPU / M GB)" when both values are > 0, otherwise returns the existing "platform/tenancy" form. Format uses %g so 16 GB renders as "16 GB" (not "16.000000 GB") while 0.5 GB SKUs render as "0.5 GB". Scope decision — schema-only foundation: - Azure: PR #81 (perf/azure-batched-sku-lookup) introduces the cachedSKULookup helper for cache/cosmos/database. Compute was scoped back from #81 because these fields didn't exist. With the schema in place, a follow-up PR can wire compute on top of #81 once it merges (filed as a separate issue post-merge to avoid an unmerged-PR dependency in this changeset). - AWS: ce.EC2InstanceDetails (Cost-Explorer) has no vCPU/Memory fields; populating would require a new ec2:DescribeInstanceTypes caller + cache + mocks. Substantial net-new code, tracked as a separate follow-up. - GCP: no compute recommendation converter exists today. N/A. - Frontend: api.Recommendation has no `details` field and the detail drawer renders a hard-coded list. Surfacing the new fields needs separate API + UI work; tracked as follow-up once at least one provider populates the fields. Tests: extend pkg/common/types_test.go::TestComputeDetails_GetDetail Description with 4 new table rows — VCPU-only, MemoryGB-only, integer-GB, and fractional-GB (Azure 0.5 GB SKU shape). Refs #82

closes #148) (#229) PR #97 added VCPU + MemoryGB fields to common.ComputeDetails (with omitempty JSON tags). PR #81 introduced the lazy SKU-catalogue pattern for cache/cosmosdb/database but explicitly scoped back compute because the destination fields didn't exist. Both prerequisites are now in place; this wires Azure compute to the same pattern. Implementation mirrors cache/database verbatim for consistency: - New unexported vmSKUEntry{vCPUs, memoryGB}. - New skuCacheOnce sync.Once + skuCacheMap field on ComputeClient. - cachedSKULookup(ctx, skuName) lazily triggers the catalogue fetch on first call; subsequent calls are O(1) map lookups. ok=false on miss or fetch error. - fetchSKUCatalogue reuses the existing createResourceSKUsPager and isAvailableInRegion helpers (already used by GetValidResourceTypes), walks every page, and reduces virtualMachines SKUs into the map. Returns nil on error so the sync.Once-gated cache stays nil and converters fall back to the empty-fields path with a one-time WARN. - extractVMSKUCapabilities pulls vCPUs (Atoi) and MemoryGB (ParseFloat) from the SKU's Capabilities name/value list. Unparseable or missing capabilities → 0, treated as "unknown". - populateVMSKUMapFromPage was extracted out of fetchSKUCatalogue to stay under the project's gocyclo=10 threshold (matches the cache/database extraction pattern from PR #81). - convertAzureVMRecommendation now takes ctx (was _) and populates Details.VCPU/MemoryGB from the cache when both >0. The single caller GetRecommendations already passes ctx; no public API change. Behaviour preserved on failure: a transient ResourceSKUsClient error no longer breaks the conversion — VCPU/MemoryGB stay at 0 (omitempty hides them from API payloads), the rest of Details is populated from the recommendation payload as before, and a WARN is logged once per client lifetime. UX improvement: common.ComputeDetails.GetDetailDescription appends " (<vcpu> vCPU / <memory> GB)" when both fields are >0, so Azure VM recommendation summaries now include the size string instead of the SKU name alone. Tests added in providers/azure/services/compute/client_test.go (file-scoped vmSKUCatalogueMockPager keeps the shared mocks.MockResourceSKUsPager surface untouched, mirroring the cosmosdb / cache test convention): - TestComputeClient_ConvertAzureVMRecommendation_PopulatesVCPUAndMemoryFromSKUCache: catalogue hit populates VCPU=2, MemoryGB=8 for Standard_D2s_v3. - TestComputeClient_ConvertAzureVMRecommendation_PagerErrorFallsBack: fetch error leaves both fields at 0; conversion succeeds. - TestComputeClient_ConvertAzureVMRecommendation_NoMatchLeavesFieldsZero: catalogue miss leaves both fields at 0; conversion succeeds. - TestComputeClient_CachedSKULookup_FetchedOnce: 10 lookups trigger exactly 1 NextPage call (pins the N+1 invariant per PR #81). Out of scope (separate follow-ups, per the issue body): - AWS / GCP compute converter wiring. - Platform / Tenancy / Scope enrichment (require non-ResourceSKUs Azure data sources). Verification: gofmt clean; go vet ./... clean; gocyclo -over 10 finds no offenders in the modified file; go test github.com/LeanerCloud/CUDly/providers/azure/services/compute → 42 passing (4 new + all existing). The pre-existing providers/azure/services/search build failure (missing go.sum entry) is unrelated to this change. Closes #148

cristim merged commit 5e3b54d into feat/multicloud-web-frontend Apr 27, 2026
3 checks passed

This was referenced Apr 27, 2026

perf(azure): batched SKU catalogue lookup to populate deferred Details fields without N+1 API calls #49

Closed

perf(exchange): rec-driven cross-family alternatives via cached Cost Explorer recommendations #79

Merged

cristim added triaged Item has been triaged priority/p3 Polish / idea / may never ship severity/low Minor harm urgency/this-quarter Within the quarter impact/few Limited audience effort/l Weeks type/feat New capability labels Apr 28, 2026

cristim deleted the perf/azure-batched-sku-lookup branch April 29, 2026 10:08

cristim mentioned this pull request May 4, 2026

perf(azure): recommendation collection exceeds 60s API Lambda timeout, blocking on-demand refresh #257

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(azure): batched SKU catalogue lookup eliminates N+1 in recommendation converters#81

perf(azure): batched SKU catalogue lookup eliminates N+1 in recommendation converters#81
cristim merged 1 commit into
feat/multicloud-web-frontendfrom
perf/azure-batched-sku-lookup

cristim commented Apr 25, 2026

Uh oh!

coderabbitai Bot commented Apr 25, 2026 •

edited

Loading

Rate limit exceeded

Uh oh!

cristim commented Apr 25, 2026

Uh oh!

coderabbitai Bot commented Apr 25, 2026

Uh oh!

cristim commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cristim commented Apr 25, 2026

Summary

Test plan

Uh oh!

coderabbitai Bot commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Uh oh!

cristim commented Apr 25, 2026

Uh oh!

coderabbitai Bot commented Apr 25, 2026

Uh oh!

cristim commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Apr 25, 2026 •

edited

Loading