Skip to content

fix(recommendations/aws): flush stale on_demand_cost cache + parser visibility (closes #321)#324

Merged
cristim merged 3 commits into
feat/multicloud-web-frontendfrom
fix/aws-on-demand-cost-cache-flush
May 6, 2026
Merged

fix(recommendations/aws): flush stale on_demand_cost cache + parser visibility (closes #321)#324
cristim merged 3 commits into
feat/multicloud-web-frontendfrom
fix/aws-on-demand-cost-cache-flush

Conversation

@cristim
Copy link
Copy Markdown
Member

@cristim cristim commented May 6, 2026

Summary

Closes

Closes #321

Follow-up deferred

The optional frontend change (effectiveSavingsPct returning null on AWS reconstruction fallback) has been deferred to #323. Decision should wait until production data confirms the migration succeeded and on_demand_cost is repopulated. This avoids conflicts with concurrent agents modifying frontend/src/recommendations.ts for issues #317/#318/#319/#320.

Runbook note

Deploy order:

  1. Apply this PR (migration 000049 runs automatically on Lambda cold-start via migrate.go).
  2. Wait for one scheduler tick (daily collector) to repopulate AWS rows with on_demand_cost.
  3. Verify Effective % shows realistic values (20–30% range for typical 1y RI/SP vs. ~86% before fix).

Diagnosis SQL (from issue #321):

SELECT id, payload->>'provider', payload ? 'on_demand_cost', payload->>'on_demand_cost'
FROM recommendations
WHERE payload->>'provider' = 'aws' LIMIT 50;

After migration: rows lacking on_demand_cost will have monthly_cost = null.
After next scheduler tick: all AWS rows should have on_demand_cost populated.

Summary by CodeRabbit

  • Bug Fixes
    • Fixed handling of incomplete AWS recommendation cost data to prevent incorrect monthly cost calculations.
    • Cleaned up database records by marking missing AWS on-demand cost as unknown so they won't be miscomputed.
    • Added runtime warnings when AWS cost fields are missing to surface data collection gaps.
    • Migration is idempotent and targets only affected AWS records.

cristim added 2 commits May 6, 2026 13:19
… 000049)

PR #312 added on_demand_cost population for AWS SP rows but shipped no
cache-invalidation migration. Pre-#312 AWS rows in recommendations.payload
lack the on_demand_cost key, so the frontend falls back to the broken
reconstruction formula (which double-counts amortization), rendering
implausibly high Effective % values (e.g. ~86% instead of ~22%).

Migration 000049 follows #256's pattern: set monthly_cost to null on
AWS rows that lack on_demand_cost, so the frontend renders "—" until
the next scheduler tick repopulates with correct values. The WHERE clause
is scoped strictly to AWS rows missing the field (idempotent on re-run).

Closes #321
Add structured warn-logs in the RI and SP parsers when the AWS CE API
fields that populate OnDemandCost are absent from the response:

- parseAWSCostDetails (parser_ri.go): logs when EstimatedMonthlyOnDemandCost
  is nil, which causes OnDemandCost=0 → stored as nil by the scheduler's
  nonZeroPtr helper → frontend reconstruction fallback.

- parseSavingsPlanDetail (parser_sp.go): logs when
  CurrentAverageHourlyOnDemandSpend is nil, same downstream effect.

Both logs follow the respective file's existing logging style (fmt.Printf
for parser_ri.go, log.Printf for parser_sp.go). The logs make it
observable when the fallback path is taken, aiding diagnosis for issue #321
and future regression detection.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 6, 2026

📝 Walkthrough

Walkthrough

Adds a Postgres migration that nulls monthly_cost in AWS recommendation JSONB payloads missing on_demand_cost, and adds defensive nil-checks with warning logs in the AWS RI and Savings Plans parsers. The down-migration contains explanatory comments and performs no data restoration.

Changes

AWS Cache Invalidation & Parser Observability

Layer / File(s) Summary
Data Shape / Invalidation
internal/database/postgres/migrations/000049_invalidate_aws_recs_missing_on_demand_cost.up.sql
UPDATE sets monthly_costnull inside payload JSONB for rows where payload->>'provider'='aws' AND NOT (payload ? 'on_demand_cost'). Documented as idempotent and scoped to AWS.
Migration Rollback Note
internal/database/postgres/migrations/000049_invalidate_aws_recs_missing_on_demand_cost.down.sql
Added SQL comment block stating there is no rollback for the payload rewrite; restoring stale on_demand_cost would reintroduce the bug and correct values will be written by the next collection.
Parser Observability (RI)
providers/aws/recommendations/parser_ri.go
Added import of log and an else branch in parseAWSCostDetails that logs a warning when EstimatedMonthlyOnDemandCost is nil and leaves OnDemandCost at zero (reconstruction fallback expected).
Parser Observability (SP)
providers/aws/recommendations/parser_sp.go
Added nil-check and log.Printf warning when CurrentAverageHourlyOnDemandSpend is nil, indicating the code will use a reconstruction fallback for effective percentage calculation.

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • LeanerCloud/CUDly#312: Plumbs AWS on_demand_cost through Savings Plans/RI parsers; closely related at the parser/data plumbing level.
  • LeanerCloud/CUDly#254: Also touches AWS recommendation parsers and handling of on‑demand/recurring cost fields.
  • LeanerCloud/CUDly#256: Introduced the cache-invalidation migration pattern (Azure) that this migration mirrors.

Poem

🐰 A missing cost made numbers sway,
I hopped the DB and cleared the way.
Monthly cost now marked as null,
Logs whisper why the fields were dull.
Next tick brings truth — hop, tally true!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning The PR implements the core requirements from issue #321: migration to invalidate AWS rows missing on_demand_cost, warn-logs in AWS parsers for missing CE API fields. However, backend unit tests covering parser warn-log paths are not present in the changeset. Add backend unit tests covering the parser warn-log paths in parser_ri.go and parser_sp.go to fulfill the acceptance criteria requirement.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately identifies the main changes: a migration to flush stale on_demand_cost cache and added parser logging for visibility.
Out of Scope Changes check ✅ Passed All changes are directly related to addressing the stale on_demand_cost cache issue: migrations flush the cache, parsers add observability for missing data. No unrelated modifications detected.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/aws-on-demand-cost-cache-flush

Comment @coderabbitai help to get the list of available commands and usage tips.

@cristim cristim added priority/p1 Next up; this sprint severity/medium Moderate harm urgency/this-sprint Within the current sprint impact/all-users Affects every user effort/s Hours type/bug Defect triaged Item has been triaged labels May 6, 2026
@cristim
Copy link
Copy Markdown
Member Author

cristim commented May 6, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 6, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
internal/database/postgres/migrations/000049_invalidate_aws_recs_missing_on_demand_cost.down.sql (1)

1-5: Consider adding a no-op statement for consistency with similar intentional no-op migrations.

The rationale for not rolling back is sound and matches the pattern in 000046_invalidate_monthly_cost_cache.down.sql, which has identical intent. While golang-migrate accepts comment-only down migrations (as 000046 demonstrates), migrations marked as deliberate no-ops in this codebase include SELECT 1; for explicitness (see 000006_ensure_admin_user.down.sql). Adding it here would improve consistency:

-- No rollback for the payload rewrite: the original AWS rows lacking
-- on_demand_cost were themselves stale (pre-PR-#312 data). Reverting
-- monthly_cost from null back to those stale values would re-introduce
-- the very bug this migration fixes. The correct values will be written
-- on the next scheduled collection.
SELECT 1;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@internal/database/postgres/migrations/000049_invalidate_aws_recs_missing_on_demand_cost.down.sql`
around lines 1 - 5, Add an explicit no-op SQL statement to the down migration to
match the repository pattern: in the migration
000049_invalidate_aws_recs_missing_on_demand_cost.down.sql, append a deliberate
no-op (e.g., SELECT 1;) after the existing comment so the file is not
comment-only and is consistent with other intentional no-op migrations like
000046_invalidate_monthly_cost_cache.down.sql and
000006_ensure_admin_user.down.sql.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@providers/aws/recommendations/parser_ri.go`:
- Around line 130-136: Replace the fmt.Printf call used when
EstimatedMonthlyOnDemandCost is nil in the RI recommendation handling with
log.Printf to match parser_sp.go behavior and ensure stderr/timestamped logging;
update the warning call that references rec.Service and rec.Account to use
log.Printf, and add "log" to the file imports if it's not already imported
(target the nil-check block that currently calls fmt.Printf in parser_ri.go).

---

Nitpick comments:
In
`@internal/database/postgres/migrations/000049_invalidate_aws_recs_missing_on_demand_cost.down.sql`:
- Around line 1-5: Add an explicit no-op SQL statement to the down migration to
match the repository pattern: in the migration
000049_invalidate_aws_recs_missing_on_demand_cost.down.sql, append a deliberate
no-op (e.g., SELECT 1;) after the existing comment so the file is not
comment-only and is consistent with other intentional no-op migrations like
000046_invalidate_monthly_cost_cache.down.sql and
000006_ensure_admin_user.down.sql.
🪄 Autofix (Beta)

✅ Autofix completed


ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 21d26fce-661e-4dba-9ec1-e77854046463

📥 Commits

Reviewing files that changed from the base of the PR and between 7032056 and d686131.

📒 Files selected for processing (4)
  • internal/database/postgres/migrations/000049_invalidate_aws_recs_missing_on_demand_cost.down.sql
  • internal/database/postgres/migrations/000049_invalidate_aws_recs_missing_on_demand_cost.up.sql
  • providers/aws/recommendations/parser_ri.go
  • providers/aws/recommendations/parser_sp.go

Comment thread providers/aws/recommendations/parser_ri.go
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 6, 2026

Note

Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it.

Fixes Applied Successfully

Fixed 1 file(s) based on 1 unresolved review comment.

Files modified:

  • providers/aws/recommendations/parser_ri.go

Commit: 351bb9116b9ab96828093a4b8e6d147ee48203dc

The changes have been pushed to the fix/aws-on-demand-cost-cache-flush branch.

Time taken: 2m 52s

Fixed 1 file(s) based on 1 unresolved review comment.

Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
@cristim
Copy link
Copy Markdown
Member Author

cristim commented May 6, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 6, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
providers/aws/recommendations/parser_ri.go (1)

127-137: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Missing warning coverage for zero/parse-failure still hides fallback cases.

On Line 127 onward, warnings are emitted only when the pointer is nil. If parsing fails (non-nil but invalid string) or parses to 0, rec.OnDemandCost still effectively drives fallback behavior without visibility.

Proposed patch
 	if details.EstimatedMonthlyOnDemandCost != nil {
-		if onDemand, err := strconv.ParseFloat(*details.EstimatedMonthlyOnDemandCost, 64); err == nil {
+		if onDemand, err := strconv.ParseFloat(*details.EstimatedMonthlyOnDemandCost, 64); err == nil {
 			rec.OnDemandCost = onDemand
+			if onDemand == 0 {
+				log.Printf("WARNING: EstimatedMonthlyOnDemandCost is 0 for RI recommendation (service=%s, account=%s) — Effective %% may use reconstruction fallback", rec.Service, rec.Account)
+			}
+		} else {
+			log.Printf("WARNING: failed to parse EstimatedMonthlyOnDemandCost=%q for RI recommendation (service=%s, account=%s): %v", *details.EstimatedMonthlyOnDemandCost, rec.Service, rec.Account, err)
 		}
 	} else {
 		// EstimatedMonthlyOnDemandCost absent from AWS CE response — OnDemandCost
 		// will be 0 and the scheduler's nonZeroPtr will store nil, causing the
 		// frontend to fall back to the reconstruction formula. Log so operators
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@providers/aws/recommendations/parser_ri.go` around lines 127 - 137, The
current logic only logs when details.EstimatedMonthlyOnDemandCost is nil, but it
silently ignores parse errors and zero values which also trigger the
reconstruction fallback; update the block around
details.EstimatedMonthlyOnDemandCost/strconv.ParseFloat so that you log a
warning (including the raw string and parse error if any) when parsing fails or
when the parsed onDemand == 0, and only suppress the warning when parsing
succeeds and onDemand > 0; reference the existing symbols rec.OnDemandCost,
details.EstimatedMonthlyOnDemandCost, and strconv.ParseFloat so the change is
applied in that parsing branch and include context (rec.Service, rec.Account) in
the log message.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@providers/aws/recommendations/parser_ri.go`:
- Around line 127-137: The current logic only logs when
details.EstimatedMonthlyOnDemandCost is nil, but it silently ignores parse
errors and zero values which also trigger the reconstruction fallback; update
the block around details.EstimatedMonthlyOnDemandCost/strconv.ParseFloat so that
you log a warning (including the raw string and parse error if any) when parsing
fails or when the parsed onDemand == 0, and only suppress the warning when
parsing succeeds and onDemand > 0; reference the existing symbols
rec.OnDemandCost, details.EstimatedMonthlyOnDemandCost, and strconv.ParseFloat
so the change is applied in that parsing branch and include context
(rec.Service, rec.Account) in the log message.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ff2c3328-a8fe-4173-8d69-b06f775366d6

📥 Commits

Reviewing files that changed from the base of the PR and between d686131 and 351bb91.

📒 Files selected for processing (1)
  • providers/aws/recommendations/parser_ri.go

@cristim cristim merged commit 843697c into feat/multicloud-web-frontend May 6, 2026
4 checks passed
@cristim cristim deleted the fix/aws-on-demand-cost-cache-flush branch May 6, 2026 13:45
cristim added a commit that referenced this pull request May 11, 2026
…irectly (closes #330) (#331)

* feat(frontend/recs): On-Demand Monthly column uses provider on_demand_cost directly (closes #330)

The On-Demand Monthly column now returns the provider-supplied
`r.on_demand_cost` directly instead of reconstructing the value from
`monthly_cost + savings + (upfront_cost / months_in_term)`. When
`on_demand_cost` is missing, undefined, or `0`, the cell renders the
existing em-dash sentinel.

PR #322 originally scoped the column to a *reconstruction-only* helper
("the column's purpose is to display the reconstructed denominator so
users can verify the formula against the raw fields visible in the
same row"). In practice, post-#277/#312/#324 every supported provider
(AWS, Azure, GCP) plumbs `on_demand_cost` through to the frontend,
and `effectiveSavingsPct` already prefers the provider value when
present (its `hasOnDemand` branch). The two columns therefore
disagreed for any row where the formula reconstruction drifted from
the provider's billed price (Azure all-upfront RIs, AWS Capacity
Reservation discounts, partial-day proration, list-price rounding).

The product decision: surface the **authoritative** number from the
provider, not a teaching-tool reconstruction. Users who want to
verify the formula can still compute `monthly_cost + savings +
upfront_cost/(term*12)` from the row's other columns. Aligning this
column with `effectiveSavingsPct`'s denominator means the two never
disagree.

Changes:

* `frontend/src/recommendations.ts::onDemandMonthly` simplified to:
  `return r.on_demand_cost > 0 ? r.on_demand_cost : null` — drops
  the `monthly_cost`/`term`/`upfront_cost` arithmetic entirely.
* Updated the function's doc comment to reflect the new contract
  (the prior comment explicitly said "does NOT use on_demand_cost"
  and is now exactly inverted).
* Updated the `SORTABLE_NUMERIC_COLUMNS.on_demand_monthly`
  preamble — the null-rendering condition is now "missing
  on_demand_cost" rather than "null monthly_cost / term=0".

Tests:

* Replaced the seven reconstruction-arithmetic tests in the
  `describe('onDemandMonthly')` block with five tests covering the
  new contract: provider value used directly, null/undefined/0
  return null, supporting fields ignored when on_demand_cost is
  positive.
* Updated three column-rendering tests in `describe('Monthly Cost +
  Effective % column rendering')` to set `on_demand_cost` on the
  fixture so the rendered value matches the new logic — the
  em-dash and filter regression tests still cover the same
  semantics, just with clearer fixture intent.

`effectiveSavingsPct` keeps its existing reconstruction fallback
unchanged — it's the canonical Effective Savings % calculation and
must stay backward-compatible with legacy cached rows that pre-date
#312/#324.

Verification:
  - `npx tsc --noEmit` clean
  - `npm test` clean (1578 passed / 0 failed across 42 suites)

* fix(frontend/recs): cover explicit-undefined + freshen filter comment (CR pass on PR #331)

Addresses both items from CodeRabbit review 4244506117 on PR #331:

1. **Actionable** — `frontend/src/__tests__/recommendations.test.ts`: the
   single "undefined / missing" test only exercised the
   delete-the-property branch. TypeScript callers can also pass
   `on_demand_cost: undefined` explicitly (strict-null safety pattern),
   and the type system distinguishes "field absent" from "field present
   with value undefined" — so a regression that uses `'on_demand_cost'
   in r` instead of `r.on_demand_cost > 0` would be caught by one but
   not the other. Split into two tests: one for explicit undefined, one
   for the deleted-property case, with comments explaining why both
   matter.

2. **Nitpick** — `frontend/src/recommendations.ts:1085-1088`: the
   parenthetical "(null monthly_cost or term=0)" described the
   pre-#330 reconstruction's null conditions. Refresh to match the
   new contract: "missing or zero on_demand_cost — see onDemandMonthly()
   for the contract". Adds a back-reference so a future reader can find
   the source of truth.

Verification:
  - `npx tsc --noEmit` clean
  - `npm test` clean (1579 passed / 0 failed across 42 suites — +1 test)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

effort/s Hours impact/all-users Affects every user priority/p1 Next up; this sprint severity/medium Moderate harm triaged Item has been triaged type/bug Defect urgency/this-sprint Within the current sprint

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant