perf(exchange): rec-driven cross-family alternatives via cached Cost Explorer recommendations#79
Conversation
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReplaces per-recommendation EC2 offerings fan-out with a Postgres-backed purchase-recommendation lookup; maps cached Cost Explorer recommendations to exchange.OfferingOption (amortized upfront, normalization, term seconds), threads the lookup into reshape analysis, removes the peer-family allowlist and offerings API, and updates handlers/tests for rec-driven alternatives. Changes
Sequence Diagram(s)sequenceDiagram
participant H as Handler
participant S as Store/Postgres
participant L as PurchaseRecLookup
participant A as AnalyzeReshapingWithRecs
H->>H: resolveAWSCloudAccountID(ctx)
H->>S: ListCloudAccounts(provider="aws")
S-->>H: cloud account UUID or none
H->>L: build purchaseRecLookupFromStore(region,currency,accountUUID)
L-->>H: lookup closure
H->>A: AnalyzeReshapingWithRecs(RIs, utilization, threshold, region, currency, L)
A->>L: lookup(ctx, region, currency)
L->>S: ListStoredRecommendations(filter: provider="aws", service="ec2", region, currency, accountIDs?)
S-->>L: []RecommendationRecord
L->>L: map records -> OfferingOption (amortize upfront, parse instance, set TermSeconds)
L-->>A: []OfferingOption
A->>A: filter offerings (exclude same-family, primary, term mismatch, $-units)
A-->>H: ReshapeRecommendations with AlternativeTargets
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/api/handler_ri_exchange.go (1)
227-257:⚠️ Potential issue | 🟠 MajorNormalize
regionbefore the recommendations lookup.Line 227 keeps the raw query-string value, so if the client omits
region,loadAWSConfigWithRegioncan still operate with a default AWS region whileAnalyzeReshapingWithRecsreceives"". That makes the store read unscoped by region and can surface alternatives from unrelated regions on the reshape page.Suggested fix
region := req.QueryStringParameters["region"] cfg, err := h.loadAWSConfigWithRegion(ctx, region) if err != nil { return nil, fmt.Errorf("failed to load AWS config: %w", err) } + if region == "" { + region = cfg.Region + } ec2Client := h.buildReshapeEC2Client(cfg)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/api/handler_ri_exchange.go` around lines 227 - 257, The handler is using the raw query-string region value which may be empty while loadAWSConfigWithRegion returns a concrete cfg.Region; normalize the region after loading cfg (e.g., set resolvedRegion := region; if resolvedRegion == "" && cfg.Region != "" then resolvedRegion = cfg.Region) and use resolvedRegion instead of region when calling getRIUtilizationCache().getOrFetch and exchange.AnalyzeReshapingWithRecs (and any other downstream region-scoped calls) so the recommendations lookup and utilization cache are scoped to the actual AWS region returned by loadAWSConfigWithRegion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/api/exchange_lookup.go`:
- Around line 113-129: The resolver currently hides runtime failures by
returning an empty string; change resolveAWSCloudAccountID(ctx) to return
(string, error) and propagate errors up so callers can fail closed: call
resolveAWSAccountID(ctx) and if empty return ("", nil) to indicate intentionally
unregistered, but if h.config.ListCloudAccounts(...) returns an error then
return ("", err) (and if it returns zero accounts treat as a non-match with nil
error); update callers such as purchaseRecLookupFromStore to check the error
and, on a non-nil error from resolveAWSCloudAccountID, abort/return the error
rather than treating an empty string as “omit AccountIDs”; keep the matching
loop logic (comparing a.ExternalID to awsAccountID) and return the matched a.ID
on success.
In `@pkg/exchange/reshape.go`:
- Around line 67-80: The PurchaseRecLookup contract currently loses
term/duration info so AnalyzeReshapingWithRecs cannot reject term-mismatched
alternatives; change the PurchaseRecLookup signature to return OfferingOption
values that include a term/duration field (e.g., add a Term or Duration
string/int to OfferingOption) and update all callers to populate that field,
then modify AnalyzeReshapingWithRecs to filter returned OfferingOption entries
by matching term/duration before building AlternativeTargets (ensure the term
check is applied prior to amortisation/cost comparisons so AlternativeTargets
never contains incompatible-term alternatives).
---
Outside diff comments:
In `@internal/api/handler_ri_exchange.go`:
- Around line 227-257: The handler is using the raw query-string region value
which may be empty while loadAWSConfigWithRegion returns a concrete cfg.Region;
normalize the region after loading cfg (e.g., set resolvedRegion := region; if
resolvedRegion == "" && cfg.Region != "" then resolvedRegion = cfg.Region) and
use resolvedRegion instead of region when calling
getRIUtilizationCache().getOrFetch and exchange.AnalyzeReshapingWithRecs (and
any other downstream region-scoped calls) so the recommendations lookup and
utilization cache are scoped to the actual AWS region returned by
loadAWSConfigWithRegion.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 206c0346-1819-46ea-9651-e5861f2f7a28
📒 Files selected for processing (8)
internal/api/exchange_lookup.gointernal/api/exchange_lookup_test.gointernal/api/handler_ri_exchange.gointernal/api/handler_ri_exchange_integration_test.gopkg/exchange/reshape.gopkg/exchange/reshape_crossfamily_test.goproviders/aws/services/ec2/client.goproviders/aws/services/ec2/client_offerings_test.go
💤 Files with no reviewable changes (2)
- providers/aws/services/ec2/client_offerings_test.go
- providers/aws/services/ec2/client.go
When the reshape endpoint is called without ?region=, loadAWSConfigWithRegion resolves a default region from the AWS SDK chain but the local `region` string stays empty. That empty value then propagates into the RI utilization cache key and AnalyzeReshapingWithRecs lookup, which scopes neither — so alternatives from unrelated regions could surface on the reshape page. Adopt cfg.Region after the config load when the query string was empty so every downstream consumer sees the same region the AWS clients are talking to. CodeRabbit (PR #79).
The resolver previously hid ListCloudAccounts errors behind an empty-string
return that the caller treated as "no AWS account registered, omit the
AccountIDs filter". A real DB error therefore caused the rec-driven lookup
to run unscoped and could match recommendations from another tenant.
Promote the function to (string, error):
- ("", nil) for the intentionally-unregistered cases (no STS resolution,
no CloudAccount rows, no matching ExternalID) — caller still treats
this as "skip the filter" so first-run/dev keeps working.
- ("", err) for genuine ListCloudAccounts failures — handler now aborts
the reshape request rather than silently broadening scope.
CodeRabbit (PR #79).
AnalyzeReshapingWithRecs surfaced cross-family alternatives without checking that the offering's commitment term matched the source RI's, so a 1y RI could land 3y alternatives in AlternativeTargets (and vice versa) — wrong because AWS only allows reshape exchanges within the same term. Add TermSeconds to RIInfo and OfferingOption (canonical AWS RI duration unit, matches ec2.ReservedInstance.Duration and the existing ConvertibleRI.Duration field). The handler plumbs inst.Duration through to RIInfo; the recs lookup converts rec.Term (years) using 31_536_000 s/year. fillAlternativesFromRecs rejects offerings whose TermSeconds differs from the source before the dollar-units gate runs. Both sides falling back to zero TermSeconds skips the term gate so existing fixtures and any older callers stay green. Regression tests: - pkg/exchange/reshape_crossfamily_test.go: 1y source + (1y, 3y) alternatives → only the 1y survives; zero-term source still passes the legacy mixed-term offering through. - internal/api/exchange_lookup_test.go: 1y rec → 31_536_000s, 3y rec → 3 × 31_536_000s, Term==0 → TermSeconds==0. CodeRabbit (PR #79).
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
♻️ Duplicate comments (1)
internal/api/exchange_lookup.go (1)
141-143:⚠️ Potential issue | 🔴 CriticalFail closed when source AWS account resolution fails at runtime.
Line 141 treats unresolved account identity as
("", nil). Combined with Line 48 (AccountIDsomitted), transient STS/config failures become unscoped recommendation reads across accounts in the region.🔧 Suggested direction
- awsAccountID := h.resolveAWSAccountID(ctx) - if awsAccountID == "" { - return "", nil - } + awsAccountID, err := h.resolveAWSAccountID(ctx) + if err != nil { + return "", fmt.Errorf("resolve source aws account for reshape scope: %w", err) + } + if awsAccountID == "" { + // keep as nil only for explicitly unsupported/unregistered mode + return "", nil + }Also update
resolveAWSAccountIDto return(string, error)so STS/config failures are distinguishable from intentional “no scope” mode.Also applies to: 48-50
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/api/exchange_lookup.go` around lines 141 - 143, Change resolveAWSAccountID to return (string, error) instead of (string, nil) so runtime STS/config failures are distinguishable from an intentional empty scope; update the implementation to return a non-nil error when account resolution fails, and update all call sites in exchange_lookup.go (e.g., the branch that currently does `if awsAccountID == "" { return "", nil }` and the code around the `AccountIDs` omission at the top of the file) to propagate the error and fail closed (return the error up the call chain) rather than treating an empty string as success/unscoped; ensure function signatures and any tests are updated accordingly.
🧹 Nitpick comments (1)
pkg/exchange/reshape.go (1)
449-454: Use a fallback normalization factor inpricingGatePassesfor defensive compatibility.Line 453 uses
src.NormalizationFactordirectly. If a caller providesMonthlyCostbut leavesNormalizationFactorunset, all alternatives are rejected even whensrc.InstanceTypeis parseable.♻️ Suggested refactor
func pricingGatePasses(src RIInfo, off OfferingOption, hasSrc bool) bool { if !hasSrc || src.MonthlyCost <= 0 { return true } - return passesDollarUnitsCheck(src.NormalizationFactor, src.MonthlyCost, src.CurrencyCode, off) + srcNF := src.NormalizationFactor + if srcNF <= 0 { + _, size := parseInstanceType(src.InstanceType) + srcNF = NormalizationFactorForSize(size) + } + return passesDollarUnitsCheck(srcNF, src.MonthlyCost, src.CurrencyCode, off) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/exchange/reshape.go` around lines 449 - 454, pricingGatePasses currently uses src.NormalizationFactor directly which can be zero/unset and wrongly reject alternatives; modify pricingGatePasses to compute a fallback normalization factor when src.NormalizationFactor <= 0 by attempting to derive it from src.InstanceType (e.g. parse vCPU/memory or call an existing parseNormalization helper) and if that fails default to 1.0, then pass that computed value to passesDollarUnitsCheck instead of src.NormalizationFactor; update references to RIInfo fields (NormalizationFactor, MonthlyCost, CurrencyCode, InstanceType) and the call to passesDollarUnitsCheck in pricingGatePasses.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@internal/api/exchange_lookup.go`:
- Around line 141-143: Change resolveAWSAccountID to return (string, error)
instead of (string, nil) so runtime STS/config failures are distinguishable from
an intentional empty scope; update the implementation to return a non-nil error
when account resolution fails, and update all call sites in exchange_lookup.go
(e.g., the branch that currently does `if awsAccountID == "" { return "", nil }`
and the code around the `AccountIDs` omission at the top of the file) to
propagate the error and fail closed (return the error up the call chain) rather
than treating an empty string as success/unscoped; ensure function signatures
and any tests are updated accordingly.
---
Nitpick comments:
In `@pkg/exchange/reshape.go`:
- Around line 449-454: pricingGatePasses currently uses src.NormalizationFactor
directly which can be zero/unset and wrongly reject alternatives; modify
pricingGatePasses to compute a fallback normalization factor when
src.NormalizationFactor <= 0 by attempting to derive it from src.InstanceType
(e.g. parse vCPU/memory or call an existing parseNormalization helper) and if
that fails default to 1.0, then pass that computed value to
passesDollarUnitsCheck instead of src.NormalizationFactor; update references to
RIInfo fields (NormalizationFactor, MonthlyCost, CurrencyCode, InstanceType) and
the call to passesDollarUnitsCheck in pricingGatePasses.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ca274271-69c6-4985-baa2-78fef6f57ae6
📒 Files selected for processing (5)
internal/api/exchange_lookup.gointernal/api/exchange_lookup_test.gointernal/api/handler_ri_exchange.gopkg/exchange/reshape.gopkg/exchange/reshape_crossfamily_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/api/exchange_lookup_test.go
…#79) CodeRabbit review #2 flagged this as CRITICAL: resolveAWSCloudAccountID still returned ("", nil) when the underlying STS call failed, which propagated up as "no AccountIDs filter" and could leak another tenants recommendations on a transient AWS API hiccup. Fix the underlying primitive AND the wrapper: - internal/api/handler.go::resolveAWSAccountID — signature changes from string to (string, error). Three return shapes now distinguished: ("", nil) — AWS SDK config could not load (Azure/GCP host) ("", err) — config loaded but STS GetCallerIdentity failed (accountID, nil) — success The non-AWS-host case stays a clean signal; only real STS failures now surface as errors. - internal/api/exchange_lookup.go::resolveAWSCloudAccountID — propagate the STS error verbatim. Multi-tenant deployments now fail closed on STS hiccups instead of silently issuing an unscoped recommendations read. - internal/api/handler.go::resolveSourceIdentity — keeps the previous best-effort semantics (drops the error explicitly with a comment). populateSourceAccountID already fails loud on an empty AccountID for the AWS-source case, so the security gate is still in place; we just preserve backwards compatibility for non-reshape callers. plus pricingGatePasses NF fallback (CodeRabbit nit on PR #79): - pkg/exchange/reshape.go::pricingGatePasses — when src.NormalizationFactor is zero (callers occasionally leave it unset), derive NF from InstanceType via NormalizationFactorForSize. Avoids the silent reject-everything failure mode for partial RIInfo constructions. Tests: full internal/api + pkg/exchange suites green.
|
Pushed CRITICAL (fail-closed on STS):
NIT (pricingGatePasses NF fallback): when Full |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
Filed four follow-up issues for items I flagged in the review but that are out of scope for this PR:
None block this PR. The CRITICAL + MAJOR + nit are all addressed in |
…Explorer recommendations
The reshape page used to enumerate cross-family alternatives via a
hand-curated `peerFamilyGroups` allowlist plus a per-recommendation
`DescribeReservedInstancesOfferings` API call. The recommendations
table already caches AWS Cost Explorer's `GetReservationPurchaseRecommendation`
output on every scheduler tick — pair underutilised convertibles
against that cache instead of fanning out N×M offering enumerations
on every page load.
What got deleted from `pkg/exchange/reshape.go`:
- `peerFamilyGroups` (hand-curated map; new families were invisible
until somebody updated the file by hand)
- `candidateFamilies()` and `alternativesForTarget()` helpers
- `OfferingLookup` (per-rec lookup signature)
- `AnalyzeReshapingWithOfferings()` and `distinctCandidateTypes()` /
`fillAlternativesFromOfferings()`
What replaces them:
- `PurchaseRecLookup func(ctx, region, currencyCode) ([]OfferingOption, error)`
- `AnalyzeReshapingWithRecs(ctx, ris, util, threshold, region, currency, lookup)`:
ONE region-scoped lookup, then per-rec the offerings are filtered
to cross-family options that pass the existing
`passesDollarUnitsCheck` gate (unchanged) and sorted ascending by
effective monthly cost.
New `internal/api/exchange_lookup.go` adapter
`purchaseRecLookupFromStore(store, accountID)`:
- Calls `store.ListStoredRecommendations(ctx,
RecommendationFilter{Provider: "aws", Service: "ec2", Region:
region, AccountIDs: [accountID]})`.
- Maps each `RecommendationRecord` → `OfferingOption` with
effective monthly = `UpfrontCost / (Term × 12) + MonthlyCost`.
- Filters by source CloudAccount UUID (resolved from STS account
ID → CloudAccount.ExternalID) so reshape can't surface another
tenant's recs. Empty UUID = ambient-credentials degraded mode.
`providers/aws/services/ec2/client.go::FindConvertibleOfferings`
deleted along with its dedicated test file (no remaining callers).
`internal/server/handler_ri_exchange.go::convertForAutoExchange`
unchanged — auto.go uses the base `AnalyzeReshaping` (no
alternatives, primary target only) so the auto-exchange pipeline's
contract is preserved.
Tests:
- `pkg/exchange/reshape_crossfamily_test.go` rewritten:
- Removed `TestCandidateFamilies_*` (function gone).
- Removed base-path cross-family assertion (alternatives now belong
exclusively to the WithRecs path).
- Added `TestAnalyzeReshaping_BaseEmitsNoAlternatives` regression.
- Migrated all `TestAnalyzeReshapingWithOfferings_*` →
`TestAnalyzeReshapingWithRecs_*` (recommendation-driven,
cold-cache, dollar-units filter, lookup error, primary-target
exclusion, legacy m4 family).
- New `internal/api/exchange_lookup_test.go`: region/account/empty/no-recs
filter coverage + UpfrontCost amortisation mapping.
- Integration test `internal/api/handler_ri_exchange_integration_test.go`
switched to seeding recs into the test container's recommendations
table instead of mocking `FindConvertibleOfferings`.
Closes #40
When the reshape endpoint is called without ?region=, loadAWSConfigWithRegion resolves a default region from the AWS SDK chain but the local `region` string stays empty. That empty value then propagates into the RI utilization cache key and AnalyzeReshapingWithRecs lookup, which scopes neither — so alternatives from unrelated regions could surface on the reshape page. Adopt cfg.Region after the config load when the query string was empty so every downstream consumer sees the same region the AWS clients are talking to. CodeRabbit (PR #79).
The resolver previously hid ListCloudAccounts errors behind an empty-string
return that the caller treated as "no AWS account registered, omit the
AccountIDs filter". A real DB error therefore caused the rec-driven lookup
to run unscoped and could match recommendations from another tenant.
Promote the function to (string, error):
- ("", nil) for the intentionally-unregistered cases (no STS resolution,
no CloudAccount rows, no matching ExternalID) — caller still treats
this as "skip the filter" so first-run/dev keeps working.
- ("", err) for genuine ListCloudAccounts failures — handler now aborts
the reshape request rather than silently broadening scope.
CodeRabbit (PR #79).
AnalyzeReshapingWithRecs surfaced cross-family alternatives without checking that the offering's commitment term matched the source RI's, so a 1y RI could land 3y alternatives in AlternativeTargets (and vice versa) — wrong because AWS only allows reshape exchanges within the same term. Add TermSeconds to RIInfo and OfferingOption (canonical AWS RI duration unit, matches ec2.ReservedInstance.Duration and the existing ConvertibleRI.Duration field). The handler plumbs inst.Duration through to RIInfo; the recs lookup converts rec.Term (years) using 31_536_000 s/year. fillAlternativesFromRecs rejects offerings whose TermSeconds differs from the source before the dollar-units gate runs. Both sides falling back to zero TermSeconds skips the term gate so existing fixtures and any older callers stay green. Regression tests: - pkg/exchange/reshape_crossfamily_test.go: 1y source + (1y, 3y) alternatives → only the 1y survives; zero-term source still passes the legacy mixed-term offering through. - internal/api/exchange_lookup_test.go: 1y rec → 31_536_000s, 3y rec → 3 × 31_536_000s, Term==0 → TermSeconds==0. CodeRabbit (PR #79).
…#79) CodeRabbit review #2 flagged this as CRITICAL: resolveAWSCloudAccountID still returned ("", nil) when the underlying STS call failed, which propagated up as "no AccountIDs filter" and could leak another tenants recommendations on a transient AWS API hiccup. Fix the underlying primitive AND the wrapper: - internal/api/handler.go::resolveAWSAccountID — signature changes from string to (string, error). Three return shapes now distinguished: ("", nil) — AWS SDK config could not load (Azure/GCP host) ("", err) — config loaded but STS GetCallerIdentity failed (accountID, nil) — success The non-AWS-host case stays a clean signal; only real STS failures now surface as errors. - internal/api/exchange_lookup.go::resolveAWSCloudAccountID — propagate the STS error verbatim. Multi-tenant deployments now fail closed on STS hiccups instead of silently issuing an unscoped recommendations read. - internal/api/handler.go::resolveSourceIdentity — keeps the previous best-effort semantics (drops the error explicitly with a comment). populateSourceAccountID already fails loud on an empty AccountID for the AWS-source case, so the security gate is still in place; we just preserve backwards compatibility for non-reshape callers. plus pricingGatePasses NF fallback (CodeRabbit nit on PR #79): - pkg/exchange/reshape.go::pricingGatePasses — when src.NormalizationFactor is zero (callers occasionally leave it unset), derive NF from InstanceType via NormalizationFactorForSize. Avoids the silent reject-everything failure mode for partial RIInfo constructions. Tests: full internal/api + pkg/exchange suites green.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
internal/api/exchange_lookup.go (1)
155-165:⚠️ Potential issue | 🔴 CriticalDon't treat a resolved-but-unregistered AWS account as “no filter”.
Lines 160-165 still return
("", nil)on “no matchingExternalID”.purchaseRecLookupFromStoreinterprets that as “omitAccountIDs”, which will read every account's cached recs if other AWS accounts are already registered. The zero-row bootstrap case is fine, but a resolved non-match should fail closed or suppress alternatives.Suggested fix
accounts, err := h.config.ListCloudAccounts(ctx, config.CloudAccountFilter{Provider: &provider}) if err != nil { return "", fmt.Errorf("list cloud accounts for reshape scope: %w", err) } + if len(accounts) == 0 { + return "", nil + } for _, a := range accounts { if a.ExternalID == awsAccountID { return a.ID, nil } } - return "", nil + return "", fmt.Errorf("source aws account %q is not registered", awsAccountID) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/api/exchange_lookup.go` around lines 155 - 165, The code currently returns ("", nil) when an awsAccountID is resolved but no matching account.ExternalID is found, which causes purchaseRecLookupFromStore to treat this as "no filter" and query all accounts; instead, in the function that calls h.config.ListCloudAccounts (the block using awsAccountID, provider, and config.CloudAccountFilter), return a non-nil error when a resolved awsAccountID does not match any account.ExternalID so callers (e.g., purchaseRecLookupFromStore) can fail closed or suppress alternatives; update the return to something like fmt.Errorf("aws account %s not registered", awsAccountID) and propagate that error to prevent omitting AccountIDs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/api/handler.go`:
- Around line 496-503: The current resolveAWSAccountID swallows all
awsconfig.LoadDefaultConfig errors by returning ("", nil); change it to only
ignore the error if the host is not AWS using sourceCloud(), and otherwise
return the load error so AWS-hosted deployments fail closed. Specifically,
inside Handler.resolveAWSAccountID (the block that sets h.awsCfg and h.awsCfgErr
via h.awsCfgOnce), after checking h.awsCfgErr use sourceCloud() to determine the
runtime cloud: if sourceCloud(ctx) indicates non-AWS return ("", nil) as before,
otherwise return ("", h.awsCfgErr) so callers like resolveAWSCloudAccountID can
handle real AWS config failures; keep existing h.awsCfg and awsCfgOnce usage
unchanged.
---
Duplicate comments:
In `@internal/api/exchange_lookup.go`:
- Around line 155-165: The code currently returns ("", nil) when an awsAccountID
is resolved but no matching account.ExternalID is found, which causes
purchaseRecLookupFromStore to treat this as "no filter" and query all accounts;
instead, in the function that calls h.config.ListCloudAccounts (the block using
awsAccountID, provider, and config.CloudAccountFilter), return a non-nil error
when a resolved awsAccountID does not match any account.ExternalID so callers
(e.g., purchaseRecLookupFromStore) can fail closed or suppress alternatives;
update the return to something like fmt.Errorf("aws account %s not registered",
awsAccountID) and propagate that error to prevent omitting AccountIDs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7c3710fc-ef99-406a-83e3-fcbae4c243f6
📒 Files selected for processing (3)
internal/api/exchange_lookup.gointernal/api/handler.gopkg/exchange/reshape.go
f44b065 to
c3428ad
Compare
|
Rebased onto current |
|
@coderabbitai review |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/api/handler_ri_exchange_integration_test.go (1)
63-75:⚠️ Potential issue | 🟠 MajorDon't hard-wire the unscoped recommendations path in the shared helper.
reshapeAccountResolveralways returning""means every integration test skipsAccountIDsfiltering. A regression in the scoped lookup would still go green here, even though cross-account isolation is one of the behaviors this PR hardens. Please make the helper accept a resolver override and add one scoped-account case that proves only the resolved tenant's recommendations are returned.Possible helper tweak
-func buildReshapeHandler(store *config.PostgresStore, ec2Fake *fakeReshapeEC2, recsFake *fakeReshapeRecs) *Handler { +func buildReshapeHandler( + store *config.PostgresStore, + ec2Fake *fakeReshapeEC2, + recsFake *fakeReshapeRecs, + resolveAccount func(context.Context) (string, error), +) *Handler { + if resolveAccount == nil { + resolveAccount = func(_ context.Context) (string, error) { return "", nil } + } h := &Handler{ config: store, auth: &mockAuthForExchange{}, apiKey: "test-api-key", reshapeEC2Factory: func(_ aws.Config) reshapeEC2Client { return ec2Fake }, reshapeRecsFactory: func(_ aws.Config) reshapeRecsClient { return recsFake }, - reshapeAccountResolver: func(_ context.Context) (string, error) { return "", nil }, + reshapeAccountResolver: resolveAccount, }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/api/handler_ri_exchange_integration_test.go` around lines 63 - 75, The helper buildReshapeHandler always sets reshapeAccountResolver to return "" causing all tests to exercise the unscoped path; update buildReshapeHandler (the Handler construction) to accept a reshapeAccountResolver override parameter (e.g., resolver func(context.Context)(string,error)) and use that instead of the hard-coded lambda, then add at least one integration test that passes a resolver returning a specific tenant/cloud-account ID and asserts that reshape recommendations returned by the Handler are filtered to that account only (verifying scoped lookup behavior).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/api/handler_ri_exchange_integration_test.go`:
- Around line 119-126: The test's seeded same-family record uses the identical
size as the primary target (m5.large), so it doesn't prove the family-exclusion
logic; update the seed in seedRecsForRegion to use a different size within the
same family (e.g., change the same-family entry from "m5.large" to "m5.2xlarge"
in the RecommendationRecord) and add/assert that this m5.* record is not present
in the cross-family alternatives returned by the handler (repeat the same change
for the other occurrence around lines 149-155) to ensure the same-family filter
is independently validated.
In `@internal/api/handler.go`:
- Around line 552-570: In resolveAWSCallerIdentity, short-circuit non-AWS hosts
before any AWS SDK work: at the top of the function check sourceCloud() != "aws"
and immediately return "", "", nil so you do not call h.awsCfgOnce.Do /
awsconfig.LoadDefaultConfig, create an sts client, or call GetCallerIdentity on
non-AWS deployments; keep existing handling of h.awsCfgErr and the
sts.GetCallerIdentity error path for actual AWS hosts (references:
resolveAWSCallerIdentity, h.awsCfgOnce, h.awsCfgErr,
awsconfig.LoadDefaultConfig, sts.NewFromConfig, GetCallerIdentity, sourceCloud).
---
Outside diff comments:
In `@internal/api/handler_ri_exchange_integration_test.go`:
- Around line 63-75: The helper buildReshapeHandler always sets
reshapeAccountResolver to return "" causing all tests to exercise the unscoped
path; update buildReshapeHandler (the Handler construction) to accept a
reshapeAccountResolver override parameter (e.g., resolver
func(context.Context)(string,error)) and use that instead of the hard-coded
lambda, then add at least one integration test that passes a resolver returning
a specific tenant/cloud-account ID and asserts that reshape recommendations
returned by the Handler are filtered to that account only (verifying scoped
lookup behavior).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9af3d361-3262-4c5f-8f09-787d9eb2d9f6
📒 Files selected for processing (4)
internal/api/exchange_lookup.gointernal/api/handler.gointernal/api/handler_ri_exchange.gointernal/api/handler_ri_exchange_integration_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- internal/api/handler_ri_exchange.go
- internal/api/exchange_lookup.go
CodeRabbit pass-5 follow-up: 62667c8 fixed loadAPIKey so it stops sharing h.awsCfgOnce/h.awsCfgErr with the request-path identity resolver, but the matching regression test was deferred. Adding it here. TestLoadAPIKey_DoesNotPoisonSharedAWSConfigCache asserts logical isolation rather than trying to provoke a config-load failure (LoadDefaultConfig itself rarely errs — credential failures surface later at the SDK call). Two halves per subtest: 1. h.awsCfgErr stays nil regardless of loadAPIKey's outcome — the shared error field must not be written at all. 2. h.awsCfgOnce is still virgin after loadAPIKey returns. Verified with a sentinel sync.Once.Do trick: Do only runs the func if it has not yet been called, so if loadAPIKey had touched the shared Once, the sentinel would no-op and the assertion would fail. Two table-style subtests cover both branches of the loadAPIKey path (empty ARN short-circuit, and non-empty ARN that goes through awsconfig.LoadDefaultConfig). AWS env vars are pinned via t.Setenv so the SDK's default config resolution stays deterministic in CI without hitting ambient credentials. If a future refactor reintroduces h.awsCfgOnce.Do inside loadAPIKey, the second subtest fails on the `fired` assertion — preventing the multi-tenant reshape scope filter from being permanently broken by a transient cold-start AWS config failure.
CodeRabbit pass-5 finding 2: the existing reshape integration tests
all wire reshapeAccountResolver to ("", nil), so only the unscoped
"no AccountIDs filter" path had coverage. The scoped branch — which
is the multi-tenant cross-account leak guard — was untested. A
regression that silently dropped the AccountIDs filter (e.g. a
future refactor reintroducing the resolved-but-unregistered fall-
through bug) would have shipped undetected.
This commit adds TestReshapeRecommendations_Integration_ScopedAccount_FiltersToAccount:
1. Seeds two CloudAccounts (Tenant A + Tenant B) in the test
Postgres container so the cloud_account_id FK on the
recommendations table is satisfied for both rows.
2. Seeds two cross-family recs in the same region — A's c5.large
and B's r5.large — each tagged with its account's UUID.
3. Overrides h.reshapeAccountResolver to return Tenant A's UUID
(the production equivalent of resolveAWSCloudAccountID
returning a registered match).
4. Asserts the response surfaces ONLY c5.large in alternatives,
never r5.large — proving the AccountIDs predicate landed on
the SQL query rather than being dropped.
Also tightens the EndToEnd test's same-family seed: was m5.large
(same SIZE as the primary target, which trivial exact-match dedupe
would also strip), now m5.2xlarge (different size, exercising the
FAMILY-exclusion contract that's actually under test).
A new seedCloudAccount helper keeps the new test readable and
reusable for any future scoped-branch coverage. Existing tests are
untouched beyond the same-family seed clarification.
Tests: go test ./... green at root + pkg/. The new integration
test follows the same setupReshapeHandlerIntegration / docker-
based testcontainers pattern as its siblings, so it runs under
the same CI gate.
|
CodeRabbit pass-5 follow-up — both findings addressed. Finding 1 ( Finding 2 ( @coderabbitai review |
|
Triggering review of the new commits now. ✅ Actions performedReview triggered.
|
CodeRabbit pass-6 inline finding: resolveAWSCallerIdentity used to
call LoadDefaultConfig + STS regardless of host, then check
sourceCloud() only to decide whether the failure was real. That
wasted a config-load attempt on every reshape request from an
Azure/GCP-hosted Lambda — pure overhead since the function will
return ("", "", nil) anyway.
Move the sourceCloud() check to the top of the function. Non-AWS
hosts return immediately, no AWS SDK work attempted. AWS hosts with
a broken SDK config continue to fail closed with the wrapped error
so the multi-tenant scope filter in resolveAWSCloudAccountID
doesn't degrade into an unscoped read.
Comment block updated to make the short-circuit policy explicit.
Tests: full internal/api unit suite green; build clean.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
internal/api/handler_test.go (1)
91-137: Make this regression test hermetic.The test still exercises the real AWS credential/provider chain and Secrets Manager client, so it can vary with ambient profiles or IMDS and become flaky on some CI hosts. Consider disabling metadata lookup explicitly or injecting a stubbed AWS seam so the test only verifies the shared-cache contract.
Suggested tweak
t.Setenv("AWS_REGION", "us-east-1") +t.Setenv("AWS_EC2_METADATA_DISABLED", "true") t.Setenv("AWS_ACCESS_KEY_ID", "") t.Setenv("AWS_SECRET_ACCESS_KEY", "") t.Setenv("AWS_SESSION_TOKEN", "")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/api/handler_test.go` around lines 91 - 137, The test is still hitting the real AWS provider chain; make it hermetic by disabling metadata lookups and/or injecting a fake Secrets Manager: in the test set AWS_EC2_METADATA_DISABLED="true" (alongside the existing AWS_* env vars) to stop IMDS lookups, and refactor the Handler to accept an injectable secrets-fetch seam (e.g., a SecretsClient interface or a loadSecretFunc) used by loadAPIKey so tests can pass a stub that simulates failures without calling AWS; update the test to construct Handler with the stubbed client when invoking Handler.loadAPIKey(secretsARN) to verify the shared-cache behavior deterministically.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@internal/api/handler_test.go`:
- Around line 91-137: The test is still hitting the real AWS provider chain;
make it hermetic by disabling metadata lookups and/or injecting a fake Secrets
Manager: in the test set AWS_EC2_METADATA_DISABLED="true" (alongside the
existing AWS_* env vars) to stop IMDS lookups, and refactor the Handler to
accept an injectable secrets-fetch seam (e.g., a SecretsClient interface or a
loadSecretFunc) used by loadAPIKey so tests can pass a stub that simulates
failures without calling AWS; update the test to construct Handler with the
stubbed client when invoking Handler.loadAPIKey(secretsARN) to verify the
shared-cache behavior deterministically.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e0c3896a-e413-41d9-ac4a-8fc863289ada
📒 Files selected for processing (2)
internal/api/handler_ri_exchange_integration_test.gointernal/api/handler_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/api/handler_ri_exchange_integration_test.go
|
Pushed Combined with the concurrent session's The buildReshapeHandler signature refactor CodeRabbit suggested isn't applied — |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
CodeRabbit pass-7 nitpick: TestLoadAPIKey_DoesNotPoisonSharedAWSConfigCache
cleared the credential env vars but left IMDS reachable. On CI
runners or dev EC2 hosts where the metadata service is reachable,
the SDK credential chain probes IMDS and waits the full retry
window before falling through to "no credentials" — making the
test slow and host-dependent.
Add t.Setenv("AWS_EC2_METADATA_DISABLED", "true") so the chain
short-circuits past IMDS and the test outcome depends only on
the static env-var configuration we control.
Test still passes; behaviour assertion (the shared cache stays
untouched) is unchanged — this just makes the path deterministic.
|
Pushed All CodeRabbit findings from passes 1–7 should now be addressed. |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
… empty-region normalization (closes #151) (#230) Adds two regression-guard tests covering the security paths PR #79 + follow-up commits f44b065 and afc3aa1 hardened. Both fixes landed without coverage; a future refactor could silently break either gate. Test 1 — TestResolveAWSCloudAccountID_FailsClosedOnSTSError Pins commit f44b065's multi-tenant safety contract: when STS GetCallerIdentity fails (transient AWS error, denied perm, expired token), resolveAWSCloudAccountID propagates the error rather than returning ("", nil). The empty-with-no-error path would have purchaseRecLookupFromStore skip the AccountIDs filter and surface another tenant's recs — exactly the leak the production fix prevents. Mock injection: a failing http.RoundTripper installed on the handler's pre-sealed aws.Config makes STS GetCallerIdentity fail without hitting the network. Static dummy credentials + an explicit region let the SDK's signer compose the request before the failing transport runs (otherwise the SDK would error with MissingRegion / NoCredentialProviders before we reached the STS branch). The test asserts the wrapped error AND that ListCloudAccounts was never invoked — proving the resolver short-circuited before the DB read. Test 2 — TestGetReshapeRecommendations_EmptyRegionUsesConfigRegion Pins commit afc3aa1's region-normalization fix: when ?region= is empty (or omitted), the handler adopts cfg.Region (the value the AWS clients are actually talking to) before threading region through to the recs lookup. Without normalization the recs query lands as Region:"" — an unscoped read that surfaces alternatives from every region onto the reshape page. Mock injection mirrors the integration-test helpers but lives in the unit-test file (no //go:build integration tag). MockConfigStore captures every ListStoredRecommendations filter via mock.Run so the assertion can pin filter.Region == cfg.Region. A guard require.NotEmpty(capturedFilters) prevents a silent-pass failure mode if the analyzer ever stops invoking the closure for synthetic RIs. Would-fail-if-reverted verification (manual, before commit): - Commenting out the `if region == "" { region = cfg.Region }` block in handler_ri_exchange.go made Test 2 fail with the captured filter showing Region:"" instead of "us-west-2". - Changing the STS-error branch in exchange_lookup.go from `return "", fmt.Errorf(...)` back to `return "", nil` made Test 1 fail with "An error is expected but got nil". No production code modified — additive tests only.
Summary
peerFamilyGroupsallowlist + per-recommendationDescribeReservedInstancesOfferingsAPI call with a single region-scoped read against the cached AWS Cost Explorer purchase recommendations table that the scheduler already populates.peerFamilyGroups/candidateFamilies/alternativesForTargetfrompkg/exchange/reshape.go; renamesOfferingLookup+AnalyzeReshapingWithOfferings→PurchaseRecLookup+AnalyzeReshapingWithRecs. The existingpassesDollarUnitsCheckgate stays unchanged.internal/api/exchange_lookup.go::purchaseRecLookupFromStoreadapter callsstore.ListStoredRecommendations(...)filtered by provider/service/region (+ optional source CloudAccount UUID for cross-account leak guard).providers/aws/services/ec2/client.go::FindConvertibleOfferingsand its dedicated test file deleted (no remaining callers).auto.gokeeps using the baseAnalyzeReshaping(no alternatives, primary target only) — auto-exchange contract is preserved.Closes #40.
Test plan
go build ./...(all 3 modules)go test -race -count=1 ./...for the main module — all packages greengo test -race -count=1 ./pkg/exchange/...— all greengo test -race -count=1 ./providers/aws/services/ec2/... ./providers/aws/recommendations/...— green (the only pre-existing failure in providers/aws isTestAWSProvider_GetDefaultRegionand is unrelated to this refactor)pkg/exchange/reshape_crossfamily_test.go:TestAnalyzeReshapingWithRecs_*(recommendation-driven, cold cache, dollar-units filter, lookup error, primary-target exclusion, legacy m4 family).internal/api/exchange_lookup_test.go: region/account filter, empty-account degraded mode, no-recs cold cache, error propagation,UpfrontCost / (Term × 12) + MonthlyCostmapping.internal/api/handler_ri_exchange_integration_test.gorewritten to seed recs into the test container's recommendations table instead of mockingFindConvertibleOfferings.Summary by CodeRabbit