Skip to content

feat(commitment-options): probe AWS reserved-offerings live & validate saves#54

Merged
cristim merged 2 commits into
feat/multicloud-web-frontendfrom
feat/runtime-commitment-options
Apr 25, 2026
Merged

feat(commitment-options): probe AWS reserved-offerings live & validate saves#54
cristim merged 2 commits into
feat/multicloud-web-frontendfrom
feat/runtime-commitment-options

Conversation

@cristim
Copy link
Copy Markdown
Member

@cristim cristim commented Apr 24, 2026

Summary

  • Backend probes AWS reserved-offerings APIs (RDS, ElastiCache, OpenSearch, Redshift, MemoryDB, EC2) and persists the discovered (term, payment) combos to two new Postgres tables; probe runs at most once across server restarts.
  • New endpoint GET /api/commitment-options exposes the live data; frontend overlays it onto commitmentOptions.ts at Settings load, silently falling back to hardcoded rules on any failure.
  • Save-side validation: PUT /api/config/service/{id} rejects term/payment combos the cloud doesn't sell (belt-and-braces against the frontend gate).

Design

  • All-or-nothing probe: one service failing aborts the whole run → ErrNoData → frontend fallback. Never persists a partial cache.
  • First-AWS-account credentials: probe uses the first enabled AWS CloudAccount; no AWS connected means ErrNoData.
  • Cross-process race: ON CONFLICT DO NOTHING on both new tables tolerates two Lambda instances racing the first probe.
  • Permissive validation: Validate() returns true when no probe data exists so saves aren't blocked on a fresh install.
  • Savings Plans excluded: different API (DescribeSavingsPlansOfferings), no current restrictions, stays hardcoded.

Test plan

  • go build ./... + go test ./... green across root module and pkg/ (22 packages pass).
  • Unit tests for every probe (6), normalizers (enum/string/legacy), Store Get/Save/HasData (integration-tagged), and Service cache-first + errgroup abort paths.
  • Handler tests: endpoint success + ErrNoData + unexpected-error + empty-AWS-collapses-to-unavailable paths.
  • Save-side validation tests: reject case (400, no SaveServiceConfig), accept case (200).
  • Frontend: jest full suite (1247 tests) green; new fetchAndPopulateCommitmentOptions describe block covers overlay, unavailable, 5xx, network error, all-combos-supported.
  • tsc --noEmit clean; npm run build produces a working bundle.
  • Deployed verification: hit /api/commitment-options from the preview environment and confirm the Settings page reflects the server response (pending CI + deploy).

Review-follow-up fixes (second commit)

Commit 1dcbef7 addresses gaps surfaced by a pre-coderabbit self-review:

  • service.go: probe errors now logged with account ID + originating probe name before collapsing to ErrNoData, so operators can diagnose IAM-permission issues without the cache silently returning "unavailable".
  • handler_commitment_options.go: any error (not just ErrNoData) now collapses to {"status":"unavailable"} at HTTP 200, matching the comment's original contract; non-ErrNoData errors still logged for telemetry.
  • commitmentOptions.ts: overlay now diffs against the canonical STANDARD_TERMS × AWS_PAYMENTS product (not the possibly-already-narrowed existing entry), so re-entering Settings after the server widens its supported set no longer leaves the UI stuck on the stale narrow intersection. New re-widening regression test.
  • openapi.yaml: /api/commitment-options documented with CommitmentOptionsResponse + CommitmentOptionCombo schemas.
  • app.go: comment explains why us-east-1 is hardcoded (reserved offerings are global; widest coverage; GovCloud/China fall back via ErrNoData).
  • handler_commitment_options_test.go: new TestNewHandler_CommitmentOptsWired regression guard.

Follow-up issues (out of this PR's scope)

🤖 Generated with claude-flow

…ate saves

Replaces the hardcoded per-service (term × payment) rules in
frontend/src/commitmentOptions.ts with data harvested from the AWS
reserved-offerings APIs. The probe runs at most once per server lifetime
(persisted in new tables commitment_options_probe_runs +
commitment_options_combos); if the DB already has data, subsequent
Get()s short-circuit on the cache. Any failure along the chain — no
AWS account connected, probe denied, unknown payment shape — collapses
to ErrNoData and the frontend falls back to its hardcoded rules, so
this is wired unconditionally.

Backend:
- internal/commitmentopts/: new package with per-service Probers
  (rds, elasticache, opensearch, redshift, memorydb, ec2), shared
  duration+payment normalizers (tokenizes inputs to handle enum
  stringers and string OfferingTypes, rejects pre-2014 legacy
  utilization values), Postgres-backed Store with ON CONFLICT DO
  NOTHING for cross-process races, and an errgroup-driven Service
  that enforces all-or-nothing persistence — one probe failure
  aborts the whole run so callers never see half-populated cache.
- GET /api/commitment-options returns {status:"ok", aws:{...}} with
  term/payment tuples, or {status:"unavailable"} (HTTP 200) when
  probe data is missing so the frontend's error-toast path stays
  quiet.
- updateServiceConfig (PUT /api/config/service/{id}) now calls
  Validate() via a new checkCommitmentOptionCombo helper before
  persisting. Rejects invalid (aws, service, term, payment) combos
  with 400; permissive-true when probe data is absent so the
  frontend's hardcoded rules remain the gate.
- Handler wiring threads credStore + awsCfg through a BuildConfigFn
  that calls credentials.ResolveAWSCredentialProvider for the first
  enabled AWS account and builds a regional aws.Config for probes.

Frontend:
- fetchAndPopulateCommitmentOptions() fires from loadGlobalSettings
  BEFORE syncAllServiceCommitmentConstraints(), overlays server
  data onto commitmentConfigs.aws (diffing supported combos against
  the full terms × payments product), and silently no-ops on
  network/parse/unavailable.

DB migration 000039 adds the two tables. Safe to drop and re-add
(ON CONFLICT-tolerant; no external references).
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 289e4056-b8af-46ff-8b7a-daa8d76a811c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/runtime-commitment-options

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cristim
Copy link
Copy Markdown
Member Author

cristim commented Apr 24, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

…y-error, stable frontend overlay

Follow-up to f315a03. Two parallel review agents surfaced six concrete
gaps against PR #54; this commit closes the material ones.

- service.go: probe errors are no longer silently discarded into
  ErrNoData. Both the buildConfig failure and the errgroup.Wait() error
  are logged with account ID + the originating probe name so an
  operator can tell "no AWS account connected" from "IAM permission
  missing on opensearch:DescribeReservedInstanceOfferings".

- handler_commitment_options.go: the endpoint now returns
  {status:"unavailable"} (HTTP 200) for ANY error from the service —
  not just ErrNoData. A transient DB blip or context cancellation no
  longer 500s the Settings overlay fetch. Non-ErrNoData errors are
  logged so telemetry is preserved. The existing
  "UnexpectedError_Propagates" test is rewritten to assert the new
  collapse-to-unavailable contract.

- commitmentOptions.ts: the overlay now diffs against the canonical
  STANDARD_TERMS × AWS_PAYMENTS product instead of the (possibly
  already-narrowed) existing entry. Previously, if the server widened
  its supported set between two Settings visits, the second overlay
  computed the intersection against the stale narrow base and the UI
  stayed restrictive. New test re-widens after a narrow overlay to
  regress-guard.

- openapi.yaml: /api/commitment-options is documented with its
  CommitmentOptionsResponse + CommitmentOptionCombo schemas, matching
  the convention used for every other route.

- app.go: a short comment explains why us-east-1 is hardcoded
  (reserved offerings are global facts; widest coverage; GovCloud /
  China fall back to hardcoded rules via ErrNoData).

- handler_commitment_options_test.go: new TestNewHandler_CommitmentOptsWired
  asserts that HandlerConfig.CommitmentOpts actually lands on the
  Handler — catches a future silent field-rename regression.

Three out-of-scope items (router-level AuthUser enforcement, MemoryDB
probe target verification, page-cap tests for the other 5 probers)
tracked as separate follow-up GitHub issues.
@cristim cristim merged commit c8f58e9 into feat/multicloud-web-frontend Apr 25, 2026
3 checks passed
cristim added a commit that referenced this pull request Apr 25, 2026
…test coverage (#69)

Combined follow-up to PR #54: addresses two issues surfaced during review
that both touch internal/commitmentopts/.

#62 — extract pagination loop into a shared walkPaginated helper

The 6 per-service probers (RDS, ElastiCache, OpenSearch, Redshift,
MemoryDB, EC2) all ran the same paginated-Describe loop body: page 0..N,
fetch, accumulate, check next-page token, break/loop, with a hard
maxPages=5 cap. The cap was previously exercised by a single
TestRDSProber_PageCap test, leaving the other five probers' cap
behaviour silently unverified — a refactor that broke one prober's loop
in isolation could regress the cap without any test failing.

Extracted the shared loop into walkPaginated(ctx, service, fetchPage)
where fetchPage is a per-prober closure that handles each AWS API's
specific quirks (Marker vs NextToken field name, per-item shape
conversion, optional client-side instance-type filter for OpenSearch and
Redshift). Error wrapping ("<service>: <err>") moves to the helper too.

Chose extraction over mirroring TestRDSProber_PageCap five times because
the loops were genuinely identical — only the per-item conversion
differed, which is exactly what the closure now isolates. The new
TestWalkPaginated_* tests cover page-cap, nil-token termination,
empty-string-token termination, token threading across pages, error
wrapping, and item accumulation in a single place. Per-prober Probe
tests still exercise end-to-end wiring (which token field, per-item
conversion, filters) so a wiring break in one prober is still caught.
TestRDSProber_PageCap is retained as an integration smoke test.

Closes #62.

#61 — MemoryDB probe target db.t4g.small empirical risk

Issue flagged that MemoryDB reserved-node coverage has historically
skewed to db.r6g.* tiers, so db.t4g.small may return an empty offerings
list in some regions, causing the cache to persist a probe run with
zero MemoryDB combos. Resolution path was either: verify with AWS creds
and switch to db.r6g.large if sparse, OR document the risk if no creds
are available.

This automated session had no AWS credentials (sts get-caller-identity
NoCredentials), so the probe target stays db.t4g.small. Added a code
comment in probe.go documenting the empirical risk, the verification
command, and the safe fallback (db.r6g.large) so a future engineer with
creds can finish the verification without re-discovering the context.
A follow-up comment will be posted on issue #61 noting that AWS
verification is still needed.

Closes #61.

Local verification: go build ./..., go vet ./..., go test ./... (all
pass; commitmentopts package: ok 3.666s).
@cristim cristim deleted the feat/runtime-commitment-options branch April 26, 2026 22:54
@cristim cristim added triaged Item has been triaged priority/p1 Next up; this sprint severity/high Significant harm urgency/this-sprint Within the current sprint impact/many Affects most users effort/l Weeks type/feat New capability labels Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

effort/l Weeks impact/many Affects most users priority/p1 Next up; this sprint severity/high Significant harm triaged Item has been triaged type/feat New capability urgency/this-sprint Within the current sprint

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant