perf: aggregate parallelism cap + 2 GB Lambda for collect refresh (followup #269) by cristim · Pull Request #270 · LeanerCloud/CUDly

cristim · 2026-05-04T22:09:15Z

Summary

Follow-up to #269 (defac6c2a) — when that PR was deployed to dev, the
new fan-out parallelism exposed two operational problems we couldn't
see on the previous serial code path:

Lambda OOM at 28s (Max Memory Used: 512 MB, signal: killed)
on the user-triggered refresh — peak in-flight gRPC/HTTP clients
across the now-concurrent provider × account × service|region tree
exceeded the 512 MB cap.
60s timeout on a follow-up retry (Task timed out after 63.05 seconds) — same root cause; without a global cap, runaway
concurrent IO never settles inside the budget.

Two atomic commits address both:

perf(concurrency) — new pkg/concurrency package adds a
shared semaphore stashed on ctx. Every leaf goroutine across AWS /
Azure / GCP fan-outs Acquires one slot before issuing its cloud-API
call and Releases after. Aggregate concurrent IO is now hard-bounded
at CUDLY_MAX_PARALLELISM (default 20) regardless of nesting
depth. Intermediates (provider, account, GCP region dispatcher)
don't acquire — only leaves do — so no deadlock by holding-permit-
while-waiting-for-sub-permits. Acquire/Release are no-ops when
no semaphore is on ctx, so CLI tools and unit tests pass through
unchanged. Mirrors the per-fan-out pattern but adds a single global
bound; per-level caps still help with launch-side bookkeeping.
- 11 new tests in pkg/concurrency: env-knob parser, no-semaphore
  no-ops, cap=3 bounds load test (20 goroutines, peak ≤ 3),
  ctx-cancel propagation.
- 5 scheduler mocks loosened to mock.Anything for the now-wrapped
  ctx (UpsertRecommendations, SendNewRecommendationsNotification).
- Azure's GetRecommendations extracts a goService closure to
  keep the function under the project's 10-branch gocyclo gate
  after the per-service Acquire branches were added.
infra(aws/dev) — bump dev Lambda memory 512 MB → 2 GB in
terraform/environments/aws/github-dev.tfvars. Temporary headroom
scoped to dev — the semaphore is the durable fix, this just gives
the function room to breathe while we observe the new behaviour
live. Staging/prod untouched.

Test plan

go build ./... from repo root — clean
go test ./internal/scheduler/... — pass (74 tests, including
the 3 added in perf: parallelize recommendations collection at every level (closes #266 #267 #268) #269)
go test ./recommendations/... from providers/aws/ — 176 pass
go test . from providers/azure/ — 104 pass
go test . from providers/gcp/ (recs subset) — 18 pass
go test ./concurrency/... from pkg/ — 11 pass
Post-merge: redeploy + watch CloudWatch for the
"Recommendations collection: aggregate parallelism cap = 20"
log line on entry, and confirm Max Memory Used < 2048 MB and
Duration < 60s on a full refresh.

🤖 Generated with claude-flow

Summary by CodeRabbit

New Features
- Configurable, process-wide concurrency limiting for cloud API calls across AWS, Azure, and GCP to improve stability and prevent resource exhaustion during batch recommendation processing.
Chores
- Increased AWS Lambda memory allocation from 512 MB to 2048 MB to enhance processing capacity and throughput.

…hore The recommendations-collection fan-out has up to four nested levels: provider → account → service|region → per-region service. Each level was independently capped, so peak goroutine counts multiplied through the tree (3 providers × 20 accounts × 30 GCP regions × 2 services = 3 600 in-flight gRPC/HTTP clients on a real multi-account deploy). On a 512 MB Lambda this exhausted memory before the work could finish — observed in dev as `signal: killed` at Max Memory Used = 512 MB after ~28s, with the runtime never reaching the 60s timeout. A single semaphore stashed on ctx now lets every leaf goroutine (the goroutine that issues the actual cloud-API call) Acquire one slot before doing IO and Release after, so aggregate concurrent IO is hard-bounded by CUDLY_MAX_PARALLELISM (default 20) regardless of how nested the dispatch is. Intermediate dispatchers (provider, account, GCP region) do NOT acquire — they only launch sub-goroutines — so no goroutine can deadlock by holding a permit while waiting for sub-permits. Mechanics: - New pkg/concurrency package: WithSharedSemaphore / SharedSemaphore / Acquire / Release helpers that read the semaphore from ctx, plus MaxParallelismFromEnv reading CUDLY_MAX_PARALLELISM (default 20). Acquire/Release are no-ops when no semaphore is on ctx — CLI tools and unit tests skip the semaphore entirely without per-call branching. - Scheduler.CollectRecommendations allocates the semaphore (size from env) and attaches to ctx via concurrency.WithSharedSemaphore BEFORE the provider fan-out, so the wrapped ctx flows through every nested level. Logs the effective cap on entry. - Three leaf fan-outs wired: - providers/aws/recommendations/client.go::GetAllRecommendations — 5 service goroutines (EC2 / RDS / ElastiCache / OpenSearch / Redshift) - providers/azure/recommendations.go::GetRecommendations — 5 service goroutines (compute / database / cache / cosmosdb / advisor); Acquire/Release boilerplate extracted into a goService helper so the function stays under gocyclo's 10-branch gate. - providers/gcp/recommendations.go::collectRegion — 2 sub-fan-out goroutines (compute / cloudsql) per region Each leaf calls Acquire(gctx) at the top, defers Release(gctx). On Acquire failure (parent ctx cancelled mid-wait) the leaf's per-service err variable is set to ctx.Err() and the goroutine returns nil to the errgroup — preserves the documented error-isolation contract. Test mocks updated: persistCollection's UpsertRecommendations and the notification email's SendNewRecommendationsNotification both run inside CollectRecommendations after the ctx is wrapped, so their mock setups were pinning the original (unwrapped) ctx. mock.Anything is the appropriate looseness — the wrapped ctx is implementation detail. New pkg/concurrency tests pin: - MaxParallelismFromEnv env-knob parser (unset / positive / zero / negative / non-numeric / explicit-unset) - Acquire/Release are no-ops when no semaphore on ctx - WithSharedSemaphore returns ctx unchanged when sem is nil - 20 goroutines under cap=3 never see >3 in-flight (load-bearing) - Acquire returns ctx.Err() when cancelled mid-wait This is a smoke-test config — the user's dev deploy hit OOM with the existing per-level caps; CUDLY_MAX_PARALLELISM=20 (paired with a Lambda memory bump to 2 GB in a follow-up commit) should give the dev refresh headroom to complete. Operators can dial up further via env override.

The dev Lambda hit Max Memory Used = 512 MB and was killed by the runtime at ~28s (Duration 28701ms / Timeout 60s) on a user-triggered recommendations refresh. The collection fan-out (post-#266 #267 #268) keeps many concurrent gRPC/HTTP clients in flight — 30+ GCP regions × 2 services each, plus 5 AWS services and 5 Azure services — and 512 MB is too tight for that working-set even with the new shared-semaphore cap. This is a temporary headroom bump scoped to dev so the user can verify the post-#266/#267/#268/concurrency-semaphore stack actually completes end-to-end on a deployed Lambda. The shared semaphore (preceding commit) is the durable fix for peak concurrency; this just gives the function room to breathe while we observe the new behaviour live. Only github-dev.tfvars is touched. github-staging.tfvars and github-prod.tfvars stay at 1024 MB — they don't exercise every provider with mostly-broken credentials the way dev does, and bumping prod without first observing dev would be premature. dev.tfvars (the user's local-apply config, gitignored) is also bumped locally to keep parity with the CI deploy until this lands.

coderabbitai · 2026-05-04T22:09:21Z

📝 Walkthrough

Walkthrough

A new package provides a process-wide, environment-configurable weighted semaphore. The scheduler installs a shared semaphore on the context, and AWS/Azure/GCP provider code acquires/releases permits around their leaf cloud-API calls to cap aggregate parallelism while preserving per-service error isolation.

Changes

Concurrency Control Integration

Layer / File(s)	Summary
Data Shape / Config `pkg/concurrency/concurrency.go`	New package: `DefaultMaxParallelism`, `MaxParallelismFromEnv()` reads `CUDLY_MAX_PARALLELISM` with validation; context key wiring `WithSharedSemaphore` / `SharedSemaphore`.
Core Primitives `pkg/concurrency/concurrency.go`	`Acquire(ctx)` blocks to obtain one permit from a context-attached `*semaphore.Weighted`, returns `ctx.Err()` on cancellation; `Release(ctx)` releases one permit; both are no-ops if no semaphore attached.
Scheduler Wiring `internal/scheduler/scheduler.go`	`CollectRecommendations()` creates a weighted semaphore with `MaxParallelismFromEnv()`, attaches it to the shared ctx via `WithSharedSemaphore()`, and logs the cap.
Provider Integration (leaf calls) `providers/aws/recommendations/client.go`, `providers/aws/recommendations/parser_sp.go`, `providers/azure/recommendations.go`, `providers/gcp/recommendations.go`	Per-provider/leaf SDK calls are wrapped with `concurrency.Acquire(ctx)` / `defer concurrency.Release(ctx)` (or immediate Release), acquisition failures are recorded into per-service error variables and goroutines return `nil` to preserve sibling isolation.
Tests / Mocks `pkg/concurrency/concurrency_test.go`, `internal/scheduler/scheduler_test.go`	New tests for env parsing, no-op behavior, bounding and cancellation for concurrency primitives; scheduler tests accept wrapped contexts (mock.Anything) for persistence/email calls.
Module / Infra `pkg/go.mod`, `providers/gcp/go.mod`, `terraform/environments/aws/github-dev.tfvars`	Go version bumped to 1.25.0 and `golang.org/x/sync` added/updated; Lambda memory increased to 2048 MB.

Sequence Diagram

sequenceDiagram
    participant Scheduler
    participant Concurrency as Concurrency Package
    participant Context as Context
    participant Provider as Provider (AWS/Azure/GCP)
    participant CloudAPI as Cloud API

    Scheduler->>Concurrency: MaxParallelismFromEnv()
    activate Concurrency
    Concurrency-->>Scheduler: numeric cap
    deactivate Concurrency

    Scheduler->>Concurrency: WithSharedSemaphore(ctx, sem)
    activate Concurrency
    Concurrency->>Context: attach semaphore
    Concurrency-->>Scheduler: wrapped ctx
    deactivate Concurrency

    Scheduler->>Provider: CollectRecommendations(wrapped ctx)

    rect rgba(100, 150, 200, 0.5)
        note over Provider: service/region fan-out
        par Service A
            Provider->>Concurrency: Acquire(wrapped ctx)
            activate Concurrency
            Concurrency-->>Provider: permit acquired (or wait)
            deactivate Concurrency
            Provider->>CloudAPI: Leaf API call
            CloudAPI-->>Provider: response/error
            Provider->>Concurrency: Release(wrapped ctx)
        and Service B
            Provider->>Concurrency: Acquire(wrapped ctx)
            activate Concurrency
            Concurrency-->>Provider: permit acquired (or wait)
            deactivate Concurrency
            Provider->>CloudAPI: Leaf API call
            CloudAPI-->>Provider: response/error
            Provider->>Concurrency: Release(wrapped ctx)
        end
    end

    Provider-->>Scheduler: merged results/errors

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related issues

bug(aws/recommendations): RateLimiter.Reset race after #269 parallelised GetAllRecommendations #271: touches the same AWS recommendations fan-out code and relates to concurrency/rate-limiter concerns; changes here add a shared semaphore around CE calls.
perf(scheduler): parallelize provider collection loop (AWS/Azure/GCP) #268: both address concurrency in provider collection/fan-out; this PR adds global semaphore-based caps complementary to provider-level parallelism work.

Possibly related PRs

LeanerCloud/CUDly#195: Modifies AWS recommendations fan-out; related to how Cost Explorer requests are issued.
LeanerCloud/CUDly#259: Also changes Azure per-service concurrent calls; related integration pattern.
LeanerCloud/CUDly#269: Parallelizes provider fan-out via errgroup; this PR complements it by capping total concurrency via a shared semaphore.

Suggested labels

type/feat

Poem

🐇 In code I hop, counting slots with care,
A semaphore nest keeps the cloud calls fair.
No stampede of requests, just orderly rhyme —
One permit, one hop, concurrency in time. 🥕

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and specifically summarizes the main changes: introducing an aggregate parallelism cap and increasing Lambda memory, which are the core objectives of the PR.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/recommendations-aggregate-cap

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cristim · 2026-05-04T22:10:01Z

@coderabbitai review

coderabbitai · 2026-05-04T22:10:05Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/concurrency/concurrency_test.go`:
- Around line 82-95: The worker goroutines call require.NoError(t, Acquire(ctx))
which is illegal from non-test goroutines; instead, have each goroutine capture
the error and report it back to the main test goroutine (e.g., send errors on an
err channel or append to a slice with a mutex) and only call require.NoError
from the main goroutine after wg.Wait(); also ensure Release(ctx) is only
deferred when Acquire returned nil (successful) so the goroutine sends the
Acquire error (or nil) back, main goroutine closes the channel after wg.Wait()
and asserts NoError for each reported error. Reference Acquire, Release,
inflight, updatePeak, goroutines, and wg to locate the logic to change.

In `@providers/aws/recommendations/client.go`:
- Around line 196-244: The current goroutines (the g.Go blocks) acquire a global
semaphore before calling GetRecommendationsForService, which holds a slot for
the entire service sweep; move the semaphore boundary into the Cost Explorer
call itself so each outbound CE request acquires/releases a permit. Remove
concurrency.Acquire/Release from the outer g.Go closures and instead update
GetRecommendationsForService (or the helper it calls) to call
concurrency.Acquire(ctx) immediately before each individual Cost Explorer API
invocation and defer concurrency.Release(ctx) right after that request/response
(including retries/backoffs around only the API call). This ensures the
semaphore limits concurrent CE calls, not per-service sweeps.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1a60a4b4-4a48-44a2-8222-e2c11a8bf20c

📥 Commits

Reviewing files that changed from the base of the PR and between defac6c and c9c314c.

⛔ Files ignored due to path filters (2)

pkg/go.sum is excluded by !**/*.sum
providers/gcp/go.sum is excluded by !**/*.sum

📒 Files selected for processing (10)

internal/scheduler/scheduler.go
internal/scheduler/scheduler_test.go
pkg/concurrency/concurrency.go
pkg/concurrency/concurrency_test.go
pkg/go.mod
providers/aws/recommendations/client.go
providers/azure/recommendations.go
providers/gcp/go.mod
providers/gcp/recommendations.go
terraform/environments/aws/github-dev.tfvars

Two CodeRabbit findings on PR #270: 1. **`pkg/concurrency/concurrency_test.go` (CRITICAL)**: `require.NoError` was called from inside the worker goroutines in `TestSharedSemaphore_BoundsConcurrency`. Testify's documented contract is that `require.*` / `FailNow`-style helpers must run on the test's own goroutine — calling them from a worker uses `runtime.Goexit` on the worker instead of safely stopping the test, which can hang or skip cleanup and produces race-detector noise. Workers now capture their Acquire result on a buffered `errCh` and the main goroutine drains the channel and asserts after `wg.Wait()`. `Release` is deferred only on a successful Acquire, matching the documented pairing contract. 2. **`providers/aws/recommendations/client.go` (MAJOR)**: The per-service `g.Go` goroutines in `GetAllRecommendations` were calling `concurrency.Acquire`/`Release` around the whole `GetRecommendationsForService` sweep — but that helper internally loops 2 × 3 (term, payment) variants and waits on the rate-limiter inside each retry, so a throttled service would hold a global slot for seconds while no Cost Explorer request was actually in flight. Effective parallelism dropped well below `CUDLY_MAX_PARALLELISM`. The Acquire/Release boundary moves down to each individual `costExplorerClient.GetReservationPurchaseRecommendation` call (and the SP equivalent in `parser_sp.go`). The rate-limiter's `Wait` happens *outside* the permit, so a goroutine waiting on exponential-backoff backoff doesn't tie up a slot. Each retry acquires a fresh permit, so throughput now scales with the cap regardless of how many services are individually throttled. Tests: - `go test ./recommendations/...` from `providers/aws/` — pass - `go test -race ./concurrency/...` from `pkg/` — race-clean - Existing `go test -race ./recommendations/...` reveals a pre-existing data race on `Client.rateLimiter` (concurrent `Reset` / state mutation across the per-service goroutines), introduced by the parallelisation in #269 (`defac6c2a`). Filed as #271 — out of scope for this CR-fix commit, fix is straightforward (per-goroutine rate- limiter or internal mutex). Other CodeRabbit checks (Azure / GCP per-service goroutines holding permits across SDK calls) are NOT applied here because: - Azure's per-service SDK calls don't have an explicit rate-limiter Wait loop equivalent to AWS's — the inner ARM/pricing calls block on network IO, not exponential backoff. - GCP's `collectRegion` sub-fan-out goroutines do similarly direct SDK calls without an explicit Wait loop. - CR didn't flag those layers; if a future review surfaces the same argument with concrete throughput evidence, we can mirror the AWS pattern there.

cristim · 2026-05-04T22:28:37Z

Addressed both CR findings in 092e0b3:

1. pkg/concurrency/concurrency_test.go:95 (Critical — require.* from worker goroutine) — Fixed exactly as suggested. Workers now capture the Acquire result on a buffered errCh; main goroutine drains and asserts after wg.Wait(). Release is deferred only on successful Acquire. Verified with go test -race ./concurrency/... — race-clean.

2. providers/aws/recommendations/client.go:244 (Major — per-sweep permit holds slot during rate-limiter Wait) — Boundary moved down. Acquire/Release now wraps just the costExplorerClient.GetReservationPurchaseRecommendation call (and the SP equivalent in parser_sp.go); the rate-limiter Wait runs outside the permit, so backoff waits free the slot for siblings. Each retry re-acquires. Per-service g.Go bodies in GetAllRecommendations no longer Acquire — they only dispatch.

Out-of-scope finding surfaced during validation: go test -race ./recommendations/... reveals a pre-existing data race on Client.rateLimiter (concurrent Reset + state mutation across the 5 per-service goroutines). It was introduced by #269 (defac6c2a), not by either of the changes on this PR. Filed as #271 with a tight fix proposal (per-goroutine rate-limiter or internal mutex) — keeping this PR scoped to the CR feedback.

Azure/GCP not changed — CR's specific argument (permit held across rate-limiter Wait loop) is structural to AWS's Cost Explorer client, which has an explicit c.rateLimiter.Wait(ctx) inside its retry loop. Azure/GCP per-service goroutines block on network IO inside their SDK calls, not on a Wait-loop, so the throughput issue doesn't apply identically. Happy to mirror the pattern there in a follow-up if a future review surfaces concrete evidence.

@coderabbitai review

coderabbitai · 2026-05-04T22:28:59Z

Tip

For best results, initiate chat on the files or code changes.

@cristim: Thanks for the detailed write-up. Both fixes look well-reasoned:

errCh pattern — moving assertions out of worker goroutines is the correct approach; the buffered channel + drain-after-wg.Wait() pattern is idiomatic and race-safe. Good call deferring Release only on successful Acquire.
Narrowed Acquire/Release boundary — wrapping only the actual CE API call (and SP equivalent in parser_sp.go) rather than the entire retry sweep is the right call. Freeing the slot during rate-limiter backoff maximises sibling throughput under the cap, and re-acquiring per retry is the correct tradeoff.

The Azure/GCP non-change rationale is sound — the concern is specific to the explicit Wait loop in the CE retry path; SDK-internal blocking IO doesn't have the same slot-hoarding characteristic. A follow-up to mirror the tighter boundary there if profiling ever surfaces it makes sense.

Good catch on the Client.rateLimiter race and the clean separation into #271 — keeping this PR scoped is the right call.

Triggering a fresh review of the updated commit.

[review]

cristim · 2026-05-04T22:29:44Z

@coderabbitai review

coderabbitai · 2026-05-04T22:29:50Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

providers/aws/recommendations/client.go (1)

74-100: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Make the retry limiter request-scoped, not client-scoped.

GetAllRecommendations fans out multiple service sweeps concurrently, but this path still mutates the shared c.rateLimiter via Reset/Wait/ShouldRetry/GetRetryCount. That lets concurrent requests clobber each other’s retry/backoff state and matches the race already seen under go test -race. Please instantiate a fresh limiter per recommendation sweep here, and mirror that change in the Savings Plans path too.

Suggested direction

-	c.rateLimiter.Reset()
+	rateLimiter := NewRateLimiter()
+	rateLimiter.Reset()
 	for {
-		if waitErr := c.rateLimiter.Wait(ctx); waitErr != nil {
+		if waitErr := rateLimiter.Wait(ctx); waitErr != nil {
 			return nil, fmt.Errorf("rate limiter wait failed: %w", waitErr)
 		}

 		if acqErr := concurrency.Acquire(ctx); acqErr != nil {
 			return nil, fmt.Errorf("concurrency acquire failed: %w", acqErr)
 		}
 		result, err = c.costExplorerClient.GetReservationPurchaseRecommendation(ctx, input)
 		concurrency.Release(ctx)
-		if !c.rateLimiter.ShouldRetry(err) {
+		if !rateLimiter.ShouldRetry(err) {
 			break
 		}
 	}

 	if err != nil {
-		return nil, fmt.Errorf("failed to get RI recommendations after %d retries: %w", c.rateLimiter.GetRetryCount(), err)
+		return nil, fmt.Errorf("failed to get RI recommendations after %d retries: %w", rateLimiter.GetRetryCount(), err)
 	}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@providers/aws/recommendations/client.go` around lines 74 - 100, The shared
c.rateLimiter is being mutated across concurrent sweeps; make the limiter
request-scoped by creating a fresh local limiter instance at the start of the
recommendation loop and use that instead of c.rateLimiter. Concretely, before
the for-loop in GetReservationPurchaseRecommendation create a copy/localLimiter
(e.g., via a Clone/Copy/New method or by constructing a new limiter with the
same config), call
localLimiter.Reset()/localLimiter.Wait()/localLimiter.ShouldRetry()/localLimiter.GetRetryCount()
inside the loop, and remove mutations of c.rateLimiter; then apply the same
change to the Savings Plans path (GetSavingsPlanPurchaseRecommendation) and
ensure callers in GetAllRecommendations continue to fan out safely using
per-request limiters.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@providers/aws/recommendations/client.go`:
- Around line 74-100: The shared c.rateLimiter is being mutated across
concurrent sweeps; make the limiter request-scoped by creating a fresh local
limiter instance at the start of the recommendation loop and use that instead of
c.rateLimiter. Concretely, before the for-loop in
GetReservationPurchaseRecommendation create a copy/localLimiter (e.g., via a
Clone/Copy/New method or by constructing a new limiter with the same config),
call
localLimiter.Reset()/localLimiter.Wait()/localLimiter.ShouldRetry()/localLimiter.GetRetryCount()
inside the loop, and remove mutations of c.rateLimiter; then apply the same
change to the Savings Plans path (GetSavingsPlanPurchaseRecommendation) and
ensure callers in GetAllRecommendations continue to fan out safely using
per-request limiters.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f79e1acf-ebc6-4472-adec-8a90d306780f

📥 Commits

Reviewing files that changed from the base of the PR and between c9c314c and 092e0b3.

📒 Files selected for processing (3)

pkg/concurrency/concurrency_test.go
providers/aws/recommendations/client.go
providers/aws/recommendations/parser_sp.go

🚧 Files skipped from review as they are similar to previous changes (1)

pkg/concurrency/concurrency_test.go

cristim added 2 commits May 5, 2026 00:08

cristim added priority/p2 Backlog-worthy severity/medium Moderate harm urgency/this-sprint Within the current sprint impact/many Affects most users effort/m Days type/bug Defect triaged Item has been triaged labels May 4, 2026

coderabbitai Bot reviewed May 4, 2026

View reviewed changes

Comment thread pkg/concurrency/concurrency_test.go

Comment thread providers/aws/recommendations/client.go Outdated

cristim mentioned this pull request May 4, 2026

bug(aws/recommendations): RateLimiter.Reset race after #269 parallelised GetAllRecommendations #271

Open

3 tasks

coderabbitai Bot reviewed May 4, 2026

View reviewed changes

cristim merged commit e010987 into feat/multicloud-web-frontend May 4, 2026
4 checks passed

cristim deleted the perf/recommendations-aggregate-cap branch May 4, 2026 23:00

This was referenced May 4, 2026

ux(frontend/recs): Potential Monthly Savings card sums all variants — should mirror Recommended range #272

Open

feat(frontend/recs): auto-refresh on page open if >24h stale; drop freshness indicator + Refresh button #284

Open

Conversation

cristim commented May 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Poem

Uh oh!

cristim commented May 4, 2026

Uh oh!

coderabbitai Bot commented May 4, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cristim commented May 4, 2026

Uh oh!

coderabbitai Bot commented May 4, 2026

Uh oh!

cristim commented May 4, 2026

Uh oh!

coderabbitai Bot commented May 4, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cristim commented May 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 4, 2026 •

edited

Loading