Skip to content

fix(recommendations/aws): Effective % still shows wrong values — stale JSONB cache lacks on_demand_cost (followup to #312) #321

@cristim

Description

@cristim

Summary

The AWS Effective % column on the Recommendations page still shows implausible / wrong percentages (e.g. ~85.9% on rows where the realistic 1y RI value is ~20–25%) for AWS rows, even after #312 / #277 landed the on_demand_cost plumbing. Root cause is a stale JSONB cache — the previous fix added the field but did not ship a cache-invalidation migration, so pre-#312 AWS rows in the DB still lack on_demand_cost and the frontend silently falls back to a known-broken reconstruction formula.

Where it's computed

  • Frontend render: frontend/src/recommendations.ts:1492-1494 (buildVariantRowMarkupeffectiveSavingsPct(rec)pct.toFixed(1) + '%').
  • Frontend column header: frontend/src/recommendations.ts:486 (effective_savings_pct: 'Effective %').
  • Frontend formula: frontend/src/recommendations.ts::effectiveSavingsPct — prefers r.on_demand_cost when populated, else falls back to reconstruction (savings - amortized) / (monthly_cost + savings + amortized) which double-counts amortization vs. how AWS computes the savings amount.
  • Backend RI source: providers/aws/recommendations/parser_ri.go:120-138 (parseAWSCostDetails reads EstimatedMonthlyOnDemandCostrec.OnDemandCost).
  • Backend SP source: providers/aws/recommendations/parser_sp.go:131-141, 198 (parseSavingsPlanDetail derives OnDemandCost = CurrentAverageHourlyOnDemandSpend × 730, landed in fix(recommendations/aws): plumb on_demand_cost for canonical Effective Savings % (closes #303) #312).
  • Plumbing: pkg/common/types.go:148internal/scheduler/scheduler.go:1063 (OnDemandCost: nonZeroPtr(rec.OnDemandCost)) → internal/config/types.go:291frontend/src/types.ts:84 & frontend/src/api/types.ts:66.

Reproduction

The codebase already pins concrete cases as passing tests (frontend/src/__tests__/recommendations.test.ts:3185, 3252-3289). The illustrative numeric example:

Azure Standard_D11_v2, 1y all-upfront, 2 instances. Inputs: savings=$29, upfront=$26, monthly_cost=$0, term=1.

  • Without on_demand_cost: reconstruction → (29 - 2.17) / (0 + 29 + 2.17) = **86%** (wrong).
  • With provider-supplied on_demand_cost=$122.64: (29 - 2.17) / 122.64 = **21.9%** (correct).

For AWS specifically: when the production JSONB row lacks on_demand_cost (pre-#312 cache), the frontend renders the inflated reconstruction value.

Top hypotheses (most → least likely)

  1. Stale JSONB cache without on_demand_cost (highest). fix(recommendations): plumb provider on_demand_cost through to frontend (closes #274) #277 (Azure) shipped migration 000046_invalidate_monthly_cost_cache.up.sql. fix(recommendations/aws): plumb on_demand_cost for canonical Effective Savings % (closes #303) #312 (AWS) did NOT ship an equivalent. Pre-fix(recommendations/aws): plumb on_demand_cost for canonical Effective Savings % (closes #303) #312 AWS rows in recommendations.payload JSONB column have no on_demand_cost key, so the frontend falls back to the broken reconstruction. The daily collector overwrites them eventually but in the meantime AWS rows render the wrong %.
  2. Reconstruction-formula denominator wrong for AWS RIs (when the fallback fires). For AWS RIs, monthly_cost = RecurringStandardMonthlyCost is the discounted recurring charge, while AWS's EstimatedMonthlySavingsAmount (→ r.savings) is the savings delta vs. on-demand excluding upfront. Adding + amortized to reconstruct on-demand (recommendations.ts:686) double-counts amortization vs. how AWS computes savings, producing implausibly low denominators → inflated %.
  3. nonZeroPtr zeroes legitimate baselines (scheduler.go:968-974): writes nil when OnDemandCost == 0. For SP rows where AWS returns CurrentAverageHourlyOnDemandSpend == 0 (fresh accounts, dormant workloads), the SP parser produces OnDemandCost = 0 → stored as nil → frontend falls back to reconstruction.

Lower-likelihood ruled-out: RI/SP overlap double-counting (effectiveSavingsPct is per-rec), currency/unit mismatch (parsers use correct units), partial-load race (bug is stable across reloads), off-by-one in usage window (only affects AWS-side input).

Prior attempts (top 5)

Fix plan

  1. Confirm the cache state:
    SELECT id, payload->>'provider', payload ? 'on_demand_cost', payload->>'on_demand_cost'
    FROM recommendations
    WHERE payload->>'provider' = 'aws' LIMIT 50;
  2. Ship migration 000048_invalidate_aws_recs_missing_on_demand_cost.up.sql mirroring fix(db/azure): flush stale monthly_cost cache + Azure recurring cost (closes #252) #256's pattern: UPDATE recommendations SET payload = jsonb_set(payload, '{monthly_cost}', 'null'::jsonb) WHERE payload->>'provider'='aws' AND NOT (payload ? 'on_demand_cost'); so the frontend renders for affected rows until the next scheduler tick repopulates with the canonical baseline. Include the .down.sql companion.
  3. Add structured warn-logs in providers/aws/recommendations/parser_ri.go::parseAWSCostDetails (when EstimatedMonthlyOnDemandCost is nil/zero) and providers/aws/recommendations/parser_sp.go::parseSavingsPlanDetail (when CurrentAverageHourlyOnDemandSpend is nil/zero) so production visibility surfaces the gap rather than silently zeroing.
  4. Optional follow-up (decide after migration verified in prod): make frontend/src/recommendations.ts::effectiveSavingsPct return null when the reconstruction fallback would fire on AWS rows, so the UI renders instead of a silently-wrong %.

Files most likely changed

  • internal/database/postgres/migrations/000048_*.{up,down}.sql — NEW migration
  • providers/aws/recommendations/parser_sp.go — warn-log when CurrentAverageHourlyOnDemandSpend nil
  • providers/aws/recommendations/parser_ri.go::parseAWSCostDetails — warn-log when EstimatedMonthlyOnDemandCost nil
  • frontend/src/recommendations.ts::effectiveSavingsPct — optional: return null on AWS reconstruction fallback (decide post-migration)

Acceptance criteria

Closes


Investigation by Opus agent on 2026-05-06; full evidence trail and file:line references above.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions