fix(aws): fetch every Cost Explorer (term, payment) combo (#188) by cristim · Pull Request #195 · LeanerCloud/CUDly

cristim · 2026-04-29T12:59:14Z

Summary

Closes #188. Two-commit fix:

fix(aws): fetch both 1yr and 3yr Cost Explorer terms (#188) — root cause for the filed bug.
feat(aws): also fan out across all 3 Cost Explorer payment options (#188) — same root cause shape, different parameter; landed on user request so the UI's payment-option filter is non-empty.

Why two changes in one PR

PR #189 made recommendation IDs unique enough to keep variants from colliding downstream — but the AWS client was only ever asking Cost Explorer for one specific (term, payment) cell. GetReservationPurchaseRecommendation requires both TermInYears and PaymentOption per request and returns recs only for that single (term, payment) — there is no "give me all of them" mode. The old call hardcoded ("3yr", "partial-upfront"), so every other combo simply never reached the scheduler regardless of how unique the IDs were.

Symptom #188 (only 3-year recs visible) is the term axis. The payment axis has the same shape, with the same hardcoded-default origin and the same downstream wiring already in place.

Fix

GetRecommendationsForService now iterates the Cartesian product of two package constants:

defaultDiscoveryTerms          = ["1yr", "3yr"]
defaultDiscoveryPaymentOptions = ["all-upfront", "partial-upfront", "no-upfront"]

→ 6 Cost Explorer calls per service, 5 services = 30 calls per refresh.

Per-call CE errors are tolerated and skipped (matches GetAllRecommendations's per-service tolerance); only an error where every (term, payment) combo fails is propagated.

Downstream wiring is already correct

After PR #189 + migration 000042:

parser_ri.go:42 tags rec.Term = params.Term and rec.PaymentOption = params.PaymentOption.
scheduler.go:856-858 builds the rec ID from (provider | account | service | region | resource_type | engine | term | payment_option).
recommendations_natural_key_idx (migration 000042) is the matching 8-tuple unique index.
The frontend column-filter already exposes both term and payment-option as facets.

So all 6 variants land as distinct DB rows and render as distinct UI rows for free.

Test plan

TestGetRecommendationsForService: now expects 6 recs and asserts the full {1yr,3yr} × {all-upfront,partial-upfront,no-upfront} combo set is present.
TestGetRecommendationsForService_QueriesEveryCombo (new — replaces the term-only _QueriesBothTerms regression): captures the per-call CE input slice via a new riCalls field on the mock and asserts every one of the 6 (term, payment) pairs was requested. Pins the regression at the API-call layer so a future change that quietly drops a combo fails the test even if the surviving recs parse correctly.
go test ./... clean from repo root and providers/aws/.
Pre-commit hooks (gosec, go vet, gomod tidy, …) clean on both commits.
Manual UI verification (post-merge): pick an account with known recs in Cost Explorer; refresh the Recommendations page; confirm both 1 Year and 3 Years show in the term filter and all three of all-upfront/partial-upfront/no-upfront show in the payment-option filter.

Cost / cadence note

Recommendation collection runs at the daily 2 AM cron (recommendation_schedule = "0 2 * * *") plus an opportunistic refresh capped to once per 6h when users hit the page (defaultCacheTTL = 6 * time.Hour, scheduler.go:99). At ~5 refreshes/day worst case:

Old: 5 services × 1 combo × 5 refreshes/day ≈ 750 CE requests/month/account — just inside the 1,000-request free tier.
New: 5 services × 6 combos × 5 refreshes/day ≈ 4,500 CE requests/month/account — ~3,500 above free tier, ~$35/month/account at $0.01/request.

Trade-off accepted: the user-visible "every term × payment is selectable" win is worth ~$35/account/month for active accounts; idle accounts pay nothing extra (no opportunistic refreshes fire).

🤖 Generated with claude-flow

Summary by CodeRabbit

New Features
- AWS recommendations now aggregate results across all supported reservation term and payment option combinations for broader coverage.
Bug Fixes
- Improved resilience: continues when individual provider queries fail and reports an error only if all variants fail; aborts immediately on cancellation or timeout.
Tests
- Added tests to verify all term/payment combinations are requested and that cancellation short-circuits before further calls.

coderabbitai · 2026-04-29T12:59:20Z

Warning

Rate limit exceeded

@cristim has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 8 minutes and 27 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3913c5d7-29f7-4569-b17b-ada9bd433c3f

📥 Commits

Reviewing files that changed from the base of the PR and between fab78c7 and 00a2835.

📒 Files selected for processing (4)

internal/database/postgres/migrations/000043_recommendations_add_engine_to_key.down.sql
internal/database/postgres/migrations/000043_recommendations_add_engine_to_key.up.sql
providers/aws/recommendations/client.go
providers/aws/recommendations/client_test.go

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

GetRecommendationsForService now iterates the Cartesian product of discovery terms and payment options, making one Cost Explorer call per (term,payment) pair and aggregating labeled results. Per-combination errors are tolerated; context cancellation short-circuits immediately. GetAllRecommendations propagates context cancellation and stops further processing.

Changes

Cohort / File(s)	Summary
AWS recommendations client `providers/aws/recommendations/client.go`	Replaces the single Cost Explorer request with iteration over `defaultDiscoveryTerms × defaultDiscoveryPaymentOptions`, issuing up to 6 calls per service, aggregating recommendations, labeling them with `Term`/`PaymentOption`, tolerating per-combination errors, and short-circuiting on `ctx.Err()`.
Tests for AWS recommendations `providers/aws/recommendations/client_test.go`	Updates mock to record each `GetReservationPurchaseRecommendation` input; updates expectations to the full 2×3 Cartesian product (6 calls/results); adds a test verifying exactly 6 requests and a test that a canceled context returns `context.Canceled` without making API calls.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant AWSClient
  participant CostExplorer
  participant Aggregator

  Caller->>AWSClient: GetRecommendationsForService(ctx, service)
  AWSClient->>AWSClient: for each term ∈ Terms\n for each payment ∈ Payments
  AWSClient->>CostExplorer: GetReservationPurchaseRecommendation(term,payment)
  alt success
    CostExplorer-->>AWSClient: recommendations
    AWSClient->>Aggregator: append labeled recommendations
  else api error
    CostExplorer-->>AWSClient: error
    AWSClient->>AWSClient: record error, continue
  end
  AWSClient->>AWSClient: if ctx canceled -> return ctx.Err()
  AWSClient-->>Caller: aggregated recommendations (or error if all failed)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

effort/m

Poem

I’m a rabbit in the code, hopping through each pair,
Terms and payments multiply — I fetch them everywhere.
Six tiny recs I gather, labeled tidy and bright,
If the context says “halt!” — I pause mid-flight.
🐇✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: fetching every Cost Explorer (term, payment) combination instead of just one hardcoded pair.
Linked Issues check	✅ Passed	The PR addresses all coding requirements from issue `#188`: includes Term and PaymentOption in recommendation identity, fetches both 1yr and 3yr terms, handles multiple payment options, implements context cancellation short-circuiting, and includes comprehensive tests.
Out of Scope Changes check	✅ Passed	All changes are directly related to the objective of fetching Cost Explorer recommendations across multiple term and payment-option combinations; no unrelated modifications detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/issue-188-aws-1yr-recs

_{Review rate limit: 0/5 reviews remaining, refill in 8 minutes and 27 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cristim · 2026-04-29T12:59:31Z

@coderabbitai review

coderabbitai · 2026-04-29T12:59:36Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

providers/aws/recommendations/client_test.go (1)

344-374: Add a mixed outcome test (one term fails, one succeeds).

The production logic now tolerates per-term failures; add a test that verifies successful term results are returned when the other term errors, and a separate case where both fail returns error.

🧪 Suggested additional test shape

+func TestGetRecommendationsForService_OneTermFails_OtherSucceeds(t *testing.T) {
+	// Configure mock to fail first RI call and succeed second.
+	// Assert: err == nil, returned recs include exactly one term's results.
+}
+
+func TestGetRecommendationsForService_AllTermsFail_ReturnsError(t *testing.T) {
+	// Configure mock to fail both RI calls.
+	// Assert: err != nil, recs == nil.
+}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@providers/aws/recommendations/client_test.go` around lines 344 - 374, Add two
new tests alongside TestGetRecommendationsForService_QueriesBothTerms: one where
mockCostExplorerAPI is configured to return an error for one TermInYears and a
successful GetReservationPurchaseRecommendationOutput for the other, asserting
that client.GetRecommendationsForService returns the successful recommendations
and no error; and another where the mock returns errors for both terms and
assert that client.GetRecommendationsForService returns an error. Reuse
NewClientWithAPI, mockCostExplorerAPI, and inspect mockAPI.riCalls and
mockAPI.riRecommendations (or riErrors) to simulate per-term outcomes and verify
the client tolerates a single-term failure but fails when both terms error.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@providers/aws/recommendations/client.go`:
- Around line 123-133: The loop that calls c.GetRecommendations treats all
errors as skippable and only returns an error if every term failed; update the
handling around the c.GetRecommendations(ctx, params) call so that if err
indicates context cancellation/timeout (errors.Is(err, context.Canceled) or
errors.Is(err, context.DeadlineExceeded) or ctx.Err() != nil) you immediately
return nil, err (propagate the canceled/expired context) instead of continuing,
otherwise continue to collect successes as before; reference the
c.GetRecommendations call, lastErr and successCount logic and ensure the final
all-term failure message for service remains unchanged for non-context errors.

---

Nitpick comments:
In `@providers/aws/recommendations/client_test.go`:
- Around line 344-374: Add two new tests alongside
TestGetRecommendationsForService_QueriesBothTerms: one where mockCostExplorerAPI
is configured to return an error for one TermInYears and a successful
GetReservationPurchaseRecommendationOutput for the other, asserting that
client.GetRecommendationsForService returns the successful recommendations and
no error; and another where the mock returns errors for both terms and assert
that client.GetRecommendationsForService returns an error. Reuse
NewClientWithAPI, mockCostExplorerAPI, and inspect mockAPI.riCalls and
mockAPI.riRecommendations (or riErrors) to simulate per-term outcomes and verify
the client tolerates a single-term failure but fails when both terms error.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 968ee110-d979-4d03-b7be-4d830088a376

📥 Commits

Reviewing files that changed from the base of the PR and between b376822 and 0bb7d94.

📒 Files selected for processing (2)

providers/aws/recommendations/client.go
providers/aws/recommendations/client_test.go

cristim · 2026-04-29T17:03:12Z

Updated: extended to also fan out across the three payment options (all-upfront / partial-upfront / no-upfront) in addition to the term loop, so the UI's payment-option filter is no longer empty. Same root cause shape, same downstream wiring (already correct after #189 + migration 000042). Cost note in the PR body — corrected the per-refresh-cadence math after re-checking the actual cron (0 2 * * *) and 6h opportunistic-refresh TTL.

@coderabbitai review

coderabbitai · 2026-04-29T17:03:27Z

@cristim, understood — the extended fan-out across payment options (all-upfront / partial-upfront / no-upfront) on top of the term loop is a natural follow-up to the same root-cause fix. Triggering a fresh review of the updated changes now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

♻️ Duplicate comments (1)

providers/aws/recommendations/client.go (1)
145-147: ⚠️ Potential issue | 🟠 Major

Propagate canceled/expired context immediately instead of continuing.

On Line 145, canceled/deadline errors are treated like skippable combo failures. If earlier combos succeeded, this can return partial data with nil error after cancellation. Please short-circuit on context termination.
Suggested fix
 			recs, err := c.GetRecommendations(ctx, params)
 			if err != nil {
+				if ctx.Err() != nil {
+					return nil, ctx.Err()
+				}
 				lastErr = err
 				continue
 			}
Also applies to: 153-156
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@providers/aws/recommendations/client.go` around lines 145 - 147, Inside the
loop that currently does "if err != nil { lastErr = err; continue }" (the block
that assigns to lastErr), do not treat context cancellation or deadline errors
as skippable: detect context cancellation by checking errors.Is(err,
context.Canceled) || errors.Is(err, context.DeadlineExceeded) (or ctx.Err() !=
nil) and immediately return that error (short-circuit) instead of setting
lastErr and continuing; for non-context errors keep the existing behavior (set
lastErr and continue). Apply the same change to the similar error-handling block
that currently appears around lines 153-156.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@providers/aws/recommendations/client.go`:
- Around line 145-147: Inside the loop that currently does "if err != nil {
lastErr = err; continue }" (the block that assigns to lastErr), do not treat
context cancellation or deadline errors as skippable: detect context
cancellation by checking errors.Is(err, context.Canceled) || errors.Is(err,
context.DeadlineExceeded) (or ctx.Err() != nil) and immediately return that
error (short-circuit) instead of setting lastErr and continuing; for non-context
errors keep the existing behavior (set lastErr and continue). Apply the same
change to the similar error-handling block that currently appears around lines
153-156.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: eaa9ac8b-7527-4b82-8894-5ad808f853f0

📥 Commits

Reviewing files that changed from the base of the PR and between 0bb7d94 and 6dc6185.

📒 Files selected for processing (2)

providers/aws/recommendations/client.go
providers/aws/recommendations/client_test.go

Addresses CodeRabbit's actionable finding on PR #195: the (term, payment) loop in GetRecommendationsForService was treating a canceled / deadline-exceeded ctx as a per-combo failure to be tolerated. If earlier combos succeeded before cancellation, the function would return partial data with a nil error; if all combos ran after cancellation, it would burn 6× the wasted Cost Explorer attempts and aggregate them into "all variants failed", masking the real cancellation reason. Fix: after each c.GetRecommendations(ctx, params) error, check ctx.Err() first. If non-nil, return immediately so the caller sees the ctx error verbatim. Per-combo errors (throttle, 5xx) keep the existing skip-and-continue tolerance. Same pattern applied to GetAllRecommendations's per-service block, which had the identical risk: a canceled ctx during one service's fan-out would silently skip to the next service rather than short-circuiting. Test: TestGetRecommendationsForService_ContextCancelShortCircuits — pre-cancel a ctx, mock CE to return context.Canceled on every call, assert the function (a) returns context.Canceled verbatim via errors.Is and (b) issues at most one CE call before bailing (rather than marching through all 6 combos).

cristim · 2026-04-29T18:29:47Z

Addressed CodeRabbit's actionable finding: the (term, payment) loop now short-circuits on ctx.Err() != nil (canceled / deadline-exceeded) instead of treating it as a per-combo failure. Same pattern applied to GetAllRecommendations's per-service block — same risk shape. New regression test TestGetRecommendationsForService_ContextCancelShortCircuits pre-cancels the ctx and asserts (a) the function returns context.Canceled verbatim via errors.Is and (b) at most one Cost Explorer call was attempted (no fan-out-then-aggregate after cancellation).

@coderabbitai review

coderabbitai · 2026-04-29T18:30:33Z

@cristim, the context-cancellation short-circuit is a solid fix — propagating ctx.Err() immediately rather than aggregating it as a per-combo failure prevents the fan-out from continuing on a cancelled/deadline-exceeded context, which would otherwise waste CE quota and delay error surfacing. Applying the same pattern to GetAllRecommendations is the right call given the identical risk shape. The regression test covering both the errors.Is(err, context.Canceled) assertion and the at-most-one-call bound looks thorough. Triggering a fresh review now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@providers/aws/recommendations/client_test.go`:
- Around line 414-421: The test currently uses require.ErrorIs which allows
wrapped errors and only checks riCalls <= 1; change the assertion to require or
assert that err equals context.Canceled verbatim (e.g., assert.Equal(t,
context.Canceled, err)) to ensure GetRecommendationsForService returns the exact
ctx.Err(); additionally, strengthen the loop-cancellation check by asserting the
RateLimiter was not invoked (check mockAPI.riCalls == 0) to ensure cancellation
short-circuits before any Cost Explorer/RateLimiter work (referencing
GetRecommendationsForService, mockAPI.riCalls and RateLimiter.Wait).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 61634598-a8b3-4fda-8307-8ed47068b9d1

📥 Commits

Reviewing files that changed from the base of the PR and between 6dc6185 and 4df421d.

📒 Files selected for processing (2)

providers/aws/recommendations/client.go
providers/aws/recommendations/client_test.go

🚧 Files skipped from review as they are similar to previous changes (1)

providers/aws/recommendations/client.go

cristim · 2026-04-29T21:17:48Z

Addressed the latest CodeRabbit pass in commit fab78c7: pre-canceled contexts now short-circuit before the Cost Explorer request path, and the regression test asserts the exact context.Canceled error plus zero RI calls.\n\n@coderabbitai review

coderabbitai · 2026-04-29T21:18:03Z

@cristim, understood — moving the ctx.Err() check to before the Cost Explorer request path is a clean improvement: a pre-cancelled context now guarantees zero CE calls rather than tolerating up to one, and tightening the regression assertion to match makes the test more precise and self-documenting. Triggering a fresh review now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Closes #188. PR #189 made recommendation IDs unique enough to keep 1yr and 3yr variants from colliding downstream — but the AWS client never asked Cost Explorer for 1yr recs in the first place. The discovery flow hardcoded `Term: "3yr"` in providers/aws/recommendations/client.go, and Cost Explorer's GetReservationPurchaseRecommendation requires a `TermInYears` enum on each request and returns recs only for that specific term — there is no "give me both" mode. So even after #189, the user still saw only 3-year-term AWS recs: the upstream call simply never fetched the 1yr variants. Fix: GetRecommendationsForService now iterates over a small package constant `defaultDiscoveryTerms = ["1yr", "3yr"]` and concatenates the returned recs. Per-term Cost Explorer errors are tolerated and skipped (matching GetAllRecommendations's per-service tolerance) so a single throttle on one term doesn't suppress the other; only an error where *every* term fails is propagated. Downstream: the parser already tags `rec.Term = params.Term`, the scheduler ID encoding from #189 already includes term, and the DB natural-key index from migration 000042 already includes term — no further changes needed for the two recs to land in distinct DB rows and render as distinct rows in the UI. Tests: - TestGetRecommendationsForService: now expects 2 recs (one per term), with terms ["1yr", "3yr"] present. - TestGetRecommendationsForService_QueriesBothTerms (new): captures the per-call CostExplorer input slice via a new `riCalls` field on the mock and asserts both types.TermInYearsOneYear and types.TermInYearsThreeYears were requested. This pins the regression at the API-call layer so a future change that quietly drops one term fails the test even if the parser tagging happens to produce two recs for other reasons. - TestGetAllRecommendations / TestGetAllRecommendations_SomeServicesFail: unchanged assertions still hold (5 services × 2 terms = 10 parsed-and-warned attempts; the test only asserts NotEmpty). Verified `go test ./...` passes from the repo root and from the providers/aws submodule. API-call cost: doubles the Cost Explorer call count for the recommendations refresh (5 services × 2 terms = 10 calls vs 5 previously). The existing rate limiter and 100ms inter-service sleep handle this; refresh runs at the ~15 min cadence already budgeted for.

) Builds on the term loop landed in the prior commit. Same root cause (Cost Explorer requires the parameter on each request and returns recs only for that single value), same fix shape (replace the hardcoded default with a small package-level constant slice and iterate). The prior commit covered TermInYears; this one covers PaymentOption. Before: GetRecommendationsForService → 1 call per service at the hardcoded ("3yr", "partial-upfront") cell. After PR #189's ID-uniqueness fix the absent variants couldn't even surface from the cache; from the user's perspective the "all upfront" / "no upfront" filter values were always empty. After: GetRecommendationsForService iterates the Cartesian product of defaultDiscoveryTerms = ["1yr", "3yr"] defaultDiscoveryPaymentOptions = ["all-upfront", "partial-upfront", "no-upfront"] → 6 calls per service. Per-call CE errors are tolerated and skipped (matches GetAllRecommendations's per-service tolerance); only an error where every (term, payment) combo fails is propagated. Downstream wiring is already correct: - Parser tags rec.PaymentOption = params.PaymentOption (parser_ri.go:42). - Scheduler ID encoding includes PaymentOption (scheduler.go:856-858 — PR #189). - DB natural-key index includes payment_option (migration 000042). - Frontend column-filter already exposes payment-option as a facet. So the three payment variants land as distinct DB rows and render as distinct UI rows for free. Tests: - TestGetRecommendationsForService: now expects 6 recs (one per {term} × {payment} combo) and asserts the full combo set is present. - TestGetRecommendationsForService_QueriesBothTerms → TestGetRecommendationsForService_QueriesEveryCombo: regression expanded to assert all 6 (term, payment) pairs were requested against Cost Explorer, so a future change that quietly drops a combo fails the test even if the surviving recs parse correctly. Cost note: at the actual scheduler cadence (daily 2 AM cron + at most one opportunistic refresh per 6h window when users are active, so ~5 refreshes/day per account), 5 services × 6 combos × 5 refreshes/day ≈ 4,500 Cost Explorer requests/month/account. That's ~3,500 above the 1,000-request free tier, or ~$35/month/account at $0.01/request. Worth the user-visible "every term × payment is selectable" win.

Addresses CodeRabbit's actionable finding on PR #195: the (term, payment) loop in GetRecommendationsForService was treating a canceled / deadline-exceeded ctx as a per-combo failure to be tolerated. If earlier combos succeeded before cancellation, the function would return partial data with a nil error; if all combos ran after cancellation, it would burn 6× the wasted Cost Explorer attempts and aggregate them into "all variants failed", masking the real cancellation reason. Fix: after each c.GetRecommendations(ctx, params) error, check ctx.Err() first. If non-nil, return immediately so the caller sees the ctx error verbatim. Per-combo errors (throttle, 5xx) keep the existing skip-and-continue tolerance. Same pattern applied to GetAllRecommendations's per-service block, which had the identical risk: a canceled ctx during one service's fan-out would silently skip to the next service rather than short-circuiting. Test: TestGetRecommendationsForService_ContextCancelShortCircuits — pre-cancel a ctx, mock CE to return context.Canceled on every call, assert the function (a) returns context.Canceled verbatim via errors.Is and (b) issues at most one CE call before bailing (rather than marching through all 6 combos).

Check context cancellation before each Cost Explorer recommendation variant request so a pre-canceled context returns context.Canceled without touching the API path.

coderabbitai

🧹 Nitpick comments (1)

providers/aws/recommendations/client.go (1)

184-203: Avoid returning partial results on cancellation.

This branch is terminal, so returning allRecommendations alongside ctx.Err() makes the canceled path look partially successful to any future caller. Returning nil, ctx.Err() keeps the contract unambiguous.

Suggested fix

-			if ctx.Err() != nil {
-				return allRecommendations, ctx.Err()
-			}
+			if ctx.Err() != nil {
+				return nil, ctx.Err()
+			}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@providers/aws/recommendations/client.go` around lines 184 - 203, The loop
that aggregates recommendations calls GetRecommendationsForService and on
context cancellation returns allRecommendations along with ctx.Err(), which can
mislead callers into thinking partial results are valid; update both return
sites in the services loop (the early-cancel branch inside "if ctx.Err() != nil"
and the select branch after the time.After) to return nil, ctx.Err() instead of
allRecommendations, ctx.Err() so cancellations are unambiguous (change the
returns that reference allRecommendations to return nil).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@providers/aws/recommendations/client.go`:
- Around line 184-203: The loop that aggregates recommendations calls
GetRecommendationsForService and on context cancellation returns
allRecommendations along with ctx.Err(), which can mislead callers into thinking
partial results are valid; update both return sites in the services loop (the
early-cancel branch inside "if ctx.Err() != nil" and the select branch after the
time.After) to return nil, ctx.Err() instead of allRecommendations, ctx.Err() so
cancellations are unambiguous (change the returns that reference
allRecommendations to return nil).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8395007d-9ae0-4311-89f0-d9c19e2fcae2

📥 Commits

Reviewing files that changed from the base of the PR and between 4df421d and fab78c7.

📒 Files selected for processing (2)

providers/aws/recommendations/client.go
providers/aws/recommendations/client_test.go

Move the recommendations engine-key migration from 000042 to 000043 so it no longer collides with the retry-linkage migration now present on the PR base.

cristim · 2026-04-29T21:25:29Z

Addressed the migration-number collision from the updated base branch: rebased onto current feat/multicloud-web-frontend and renumbered 000042_recommendations_add_engine_to_key to 000043_recommendations_add_engine_to_key so it no longer conflicts with 000042_purchase_executions_retry_linkage. Verification: pre-commit migration conflict check passed; go test ./internal/database/postgres/migrations ./internal/config ./internal/scheduler; from providers/aws, go test ./recommendations. @coderabbitai review

coderabbitai · 2026-04-29T21:25:43Z

@cristim, understood — renumbering 000042_recommendations_add_engine_to_key → 000043_recommendations_add_engine_to_key is the right resolution: it keeps the sequential gap-free ordering, removes the collision with 000042_purchase_executions_retry_linkage, and the pre-commit migration conflict check confirms no remaining issues. Triggering a fresh review of the rebased branch now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Return nil with ctx.Err from GetAllRecommendations cancellation paths so callers cannot accidentally treat partial results as valid.

coderabbitai

🧹 Nitpick comments (1)

providers/aws/recommendations/client.go (1)
117-170: Consider adding runtime metrics for combo fan-out behavior.

Given this now drives up to 6 CE calls per service (and is invoked from scheduler refresh paths), emitting per-combo success/error counters and latency histograms would make throttling and timeout regressions much easier to detect early.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@providers/aws/recommendations/client.go` around lines 117 - 170, Add runtime
metrics around each Cost Explorer combo call inside
GetRecommendationsForService: record a per-combo latency histogram and increment
success/error counters for the specific (term, payment) combo when calling
c.GetRecommendations; include tags/labels for service, term and payment option.
Instrument right before/after the GetRecommendations call (measure start time,
observe duration on return) and increment a success counter on success (where
successCount++ and append occur) and an error counter on each non-context error
(where lastErr is set and continue). Ensure a metric is also incremented if
ctx.Err() causes the function to short-circuit so timeouts/cancels are visible.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@providers/aws/recommendations/client.go`:
- Around line 117-170: Add runtime metrics around each Cost Explorer combo call
inside GetRecommendationsForService: record a per-combo latency histogram and
increment success/error counters for the specific (term, payment) combo when
calling c.GetRecommendations; include tags/labels for service, term and payment
option. Instrument right before/after the GetRecommendations call (measure start
time, observe duration on return) and increment a success counter on success
(where successCount++ and append occur) and an error counter on each non-context
error (where lastErr is set and continue). Ensure a metric is also incremented
if ctx.Err() causes the function to short-circuit so timeouts/cancels are
visible.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 78e4afea-a25a-4ce2-8894-615c6665abaa

📥 Commits

Reviewing files that changed from the base of the PR and between fab78c7 and 7f7fd02.

📒 Files selected for processing (4)

internal/database/postgres/migrations/000043_recommendations_add_engine_to_key.down.sql
internal/database/postgres/migrations/000043_recommendations_add_engine_to_key.up.sql
providers/aws/recommendations/client.go
providers/aws/recommendations/client_test.go

cristim · 2026-04-29T21:29:54Z

Addressed the latest cancellation nitpick: GetAllRecommendations now returns nil, ctx.Err() from both cancellation paths instead of returning partial accumulated results. Verification: from providers/aws, go test ./recommendations; pre-commit Go hooks passed. @coderabbitai review

coderabbitai · 2026-04-29T21:33:29Z

@cristim, returning nil, ctx.Err() from both cancellation paths in GetAllRecommendations is the correct final form — discarding partial results on cancellation avoids handing callers a potentially inconsistent slice while signalling the exact cancellation reason. This is fully consistent with Go's context propagation conventions and mirrors the short-circuit already applied to GetRecommendationsForService. Triggering a fresh review now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

…205) (#206) * fix(db): idempotently restore retry_execution_id column dropped by migration drift (closes #204, closes #205) Production /api/purchases/* surfaced 500 errors after PR #161 deployed, with the underlying CloudWatch error: ERROR: column "retry_execution_id" does not exist (SQLSTATE 42703) User-facing symptoms (dashboard "Upcoming Scheduled Purchases" panel): * "Failed to load purchase details: Internal server error" (#205) * "Failed to cancel purchase" (#204) Both buttons hit `Handler.GetExecutionByID` whose SELECT references the retry_execution_id and retry_attempt_n columns introduced by PR #168. Root cause: migration version drift. * PR #189 added 000042_recommendations_add_engine_to_key earlier on the same day. Deployed Lambdas applied it as version 42. * PR #168 then added 000042_purchase_executions_retry_linkage, colliding on version 42. * PR #195 resolved the on-disk duplicate by renaming RECOMMENDATIONS to 000043 — the wrong direction. Deployed DBs already had schema_migrations.version=42 marked as the engine content, so on the next migrate.Up() they: - skipped on-disk 000042 (now retry_linkage) entirely (DB version >= 42 already) - applied on-disk 000043 (now engine again) as a no-op via the ADD COLUMN IF NOT EXISTS guards already in that file Net effect: the retry_execution_id / retry_attempt_n columns were never added on prod, but the Go code expects them. Forward-only corrective. The new 000044 replays 000042's schema additions wrapped in IF NOT EXISTS / CREATE INDEX IF NOT EXISTS so it's a no-op on any DB that already has the columns (fresh deploys, manually-fixed envs) and additive on the broken ones. We don't try to repair schema_migrations history — golang-migrate only cares about the high-water version, and we can't tell from outside whether a given environment skipped 42 or applied 43 twice. The migration up + down sql files document the incident and the edge cases inline so future readers don't have to re-derive the history from the git log. Closes #204 Closes #205 * fix(db): repair partially-fixed DBs in 000044, not just no-op (CR pass 1) CR pass 1 on PR #206 flagged that the original 000044 only added columns when missing — a manually-fixed DB with the column but no FK, or with retry_attempt_n nullable, would still be broken after this migration. Rewrite as five idempotent steps that each check pg_catalog / information_schema for the precise piece they own, so every reachable partial state is repaired: 1. Add retry_execution_id column if missing. 2. Add the FK on retry_execution_id -> execution_id if no FK on that column exists. Match by column relationship (pg_constraint join pg_attribute) NOT by constraint name, so a fresh-deploy DB whose 000042 ran cleanly with the auto-generated `purchase_executions_retry_execution_id_fkey` is recognised as already-FK'd and we don't add a second. 3. Add retry_attempt_n column if missing (as nullable, so step 4 can backfill before flipping to NOT NULL). 4. Backfill any NULLs and SET DEFAULT 0 / SET NOT NULL unconditionally — Postgres treats those as no-ops when the constraint is already as requested. 5. Create the partial index if missing. Down stays simple: DROP COLUMN ... CASCADE removes the FK regardless of its name. Comments inline document each reachable partial state and why each step is safe to re-run. Migration is a no-op on healthy DBs, fully repairing on broken ones.

#224) (#231) * fix(frontend/recs): enforce one-variant-per-cell radio selection (closes #224) After PR #195 (issue #188) the recommendations refresh fans out across 2 terms × 3 payments per `(provider, account, service, region, resource_type, engine)` cell — up to 6 alternative rec rows per physical resource. They're alternatives, not additions, but the checkboxes had no mutual-exclusion logic, producing wrong purchase intent in two ways: 1. **Manual checking**: a user could check `1yr/all-upfront` AND `3yr/no-upfront` for the same EC2 m5.large in the same account. Both fed into the purchase plan → double commitment for one resource. 2. **`select-all` was the destructive form**: every visible row got added, so 3 cells × 6 variants = 18 commitments instead of the 3 the user expected when clicking "buy what's recommended". 6× the intended spend per cell. # What changes - New `cellKey(rec)` helper returns the `(provider, cloud_account_id, service, region, resource_type, engine)` prefix (same as the scheduler ID encoding from #189, minus the `(term, payment)` suffix). Recs sharing this key are alternatives for the same physical resource. - Per-row checkbox change handler now enforces radio behaviour: on check, scan the visible-recs list for any sibling in the same cell that's already selected, deselect it FIRST, then add the new rec. Cells are independent — selecting in cell X doesn't touch cell Y's selection. - `select-all` handler rewritten: clear current selection, then call new `pickBestVariantPerCell(recs)` which groups by cell and picks the variant with the highest **effective monthly savings**: `effective = savings - (upfront_cost / (term * 12))`. Amortizing the upfront over the commitment term means a 3yr/all-upfront with a huge lump-sum doesn't beat a 1yr/no-upfront just on raw `savings`. Sibling issue #223 will replace this tiebreaker with "matches resolved GlobalConfig.DefaultTerm + DefaultPayment" when it lands; until then, amortized savings is the right deterministic default. # Out of scope (deliberate) - **Native `<input type="radio" name="cell-X">` markup**. Per the issue: "Designer call — if cell-grouping (sibling issue #226) lands first, radios become visually correct." Stayed with checkboxes-with-radio-behaviour for this PR; markup switch waits for cell-grouping visual. - **Default-select per cell from settings** (sibling #223). # Tests 4 new tests in `frontend/src/__tests__/recommendations.test.ts` inside the `'Issue #224: one-variant-per-cell radio selection'` describe block: (a) Manual toggle within a cell — checking variant B with variant A already selected: A is removed, B is added. Sibling A was not also added; B was not also removed. (b) Cross-cell independence — selecting in cell X must NOT remove cell Y's existing selection. (c) `select-all` collapses 18 → 3: 3 cells × 6 variants. After click, exactly 3 add calls; clearSelectedRecommendations was called first to drop stale state. (d) Tiebreaker pin — single cell with 3 variants whose `(savings, upfront_cost, term)` produce known amortized values ($200, $300, $400). The middle variant wins ($400 effective) despite the high-upfront variant having $1200 raw savings, proving the amortization is actually computed. `tsc --noEmit` clean. `npx jest --testPathPatterns="recommendations"` exit 0. `npm run build` (webpack production) exit 0. * fix(frontend/recs): address CR feedback on PR#231 — term clamp + full-set sibling eviction - pickBestVariantPerCell: fix term=0 clamp from Math.max(1, r.term*12) to Math.max(1, r.term||1)*12 so a malformed zero-term rec is treated as 1yr (12 months) not 1 month; keeps the effective-savings winner correct (CR actionable: 1-month fallback could skew per-cell winner selection) - checkbox change handler: iterate state.getRecommendations() (full loaded set) instead of filtered `recommendations` when evicting same-cell siblings, so variants hidden by a column filter are also deselected, preserving the one-variant-per-cell contract across filter changes (CR actionable: hidden siblings could remain selected, breaking invariant) - test: update issue-#224 test (a) and (b) to mock state.getRecommendations(); add new regression test for the hidden-by-filter scenario * fix(frontend): address issue 224 review comments

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread providers/aws/recommendations/client.go Outdated

cristim changed the title ~~fix(aws): fetch both 1yr and 3yr Cost Explorer terms (#188)~~ fix(aws): fetch every Cost Explorer (term, payment) combo (#188) Apr 29, 2026

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

cristim added bug Something isn't working triaged Item has been triaged priority/p1 Next up; this sprint severity/high Significant harm urgency/this-sprint Within the current sprint impact/all-users Affects every user impact/many Affects most users effort/s Hours type/bug Defect labels Apr 29, 2026

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread providers/aws/recommendations/client_test.go Outdated

cristim added 4 commits April 29, 2026 23:23

fix(aws): short-circuit pre-canceled recommendation fetches

6cb7324

Check context cancellation before each Cost Explorer recommendation variant request so a pre-canceled context returns context.Canceled without touching the API path.

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

fix(db): renumber recommendations engine migration

7f7fd02

Move the recommendations engine-key migration from 000042 to 000043 so it no longer collides with the retry-linkage migration now present on the PR base.

cristim force-pushed the fix/issue-188-aws-1yr-recs branch from fab78c7 to 7f7fd02 Compare April 29, 2026 21:24

fix(aws): avoid partial results on cancellation

00a2835

Return nil with ctx.Err from GetAllRecommendations cancellation paths so callers cannot accidentally treat partial results as valid.

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

cristim merged commit dce96ca into feat/multicloud-web-frontend Apr 29, 2026
4 checks passed

cristim deleted the fix/issue-188-aws-1yr-recs branch April 30, 2026 14:00

cristim mentioned this pull request May 3, 2026

fix(recommendations): AWS 1-year-term recs missing — only 3yr recs render when both exist #188

Closed

This was referenced May 4, 2026

perf: parallelize recommendations collection at every level (closes #266 #267 #268) #269

Merged

perf: aggregate parallelism cap + 2 GB Lambda for collect refresh (followup #269) #270

Merged

Conversation

cristim commented Apr 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why two changes in one PR

Fix

Downstream wiring is already correct

Test plan

Cost / cadence note

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

cristim commented Apr 29, 2026

Uh oh!

coderabbitai Bot commented Apr 29, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cristim commented Apr 29, 2026

Uh oh!

coderabbitai Bot commented Apr 29, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cristim commented Apr 29, 2026

Uh oh!

coderabbitai Bot commented Apr 29, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cristim commented Apr 29, 2026

Uh oh!

coderabbitai Bot commented Apr 29, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cristim commented Apr 29, 2026

Uh oh!

coderabbitai Bot commented Apr 29, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cristim commented Apr 29, 2026

Uh oh!

coderabbitai Bot commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cristim commented Apr 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading