Skip to content

External/preemptive wait benefit formula + CPU:Elapsed adjustment (#215 C6, C7)#259

Merged
erikdarlingdata merged 1 commit into
devfrom
fix/external-wait-formula-and-cpu-elapsed
Apr 22, 2026
Merged

External/preemptive wait benefit formula + CPU:Elapsed adjustment (#215 C6, C7)#259
erikdarlingdata merged 1 commit into
devfrom
fix/external-wait-formula-and-cpu-elapsed

Conversation

@erikdarlingdata
Copy link
Copy Markdown
Owner

Summary

Joe's feedback from #215, public plan `MwX36LEHJS`:

  • C6: the current parallel-wait formula uses per-thread `elapsed - cpu`, which is ≈0 for external/preemptive waits where the worker is CPU-busy in kernel (MEMORY_ALLOCATION_EXT, RESERVED_MEMORY_ALLOCATION_EXT, PREEMPTIVE_*). New path computes `(wait_ms / total_cpu_ms) * Σ max-thread-self-cpu / elapsed_ms`.
  • C7: CPU:Elapsed ratio subtracts Σ external-wait ms from CPU before dividing — those waits aren't real query CPU.

Verified

  • Joe's worked example: `(26307/45075) * (11341 + 42) / 13748 = 48.3%` — formula matches.
  • Private plan `c1-c5.sqlplan` (serial): RESERVED_MEMORY_ALLOCATION_EXT benefit → 0.5% (serial: sum-of-self-cpu ≈ stmt cpu, so formula collapses to simple ratio — no regression).
  • Regression on `20260415_1.sqlplan` (parallel batch-mode): CXPACKET 99.2%, ASYNC_NETWORK_IO 93.5% unchanged; MEMORY_ALLOCATION_EXT stays at 0.1%.

Implementation notes

  • `BenefitScorer.IsExternalWait` classifier (public — ResultMapper needs it).
  • `OperatorWaitProfile` grows `MaxThreadCpuMs`, populated via new `PlanAnalyzer.GetOperatorMaxThreadOwnCpuMs` which mirrors the self-elapsed helper: per-thread self for parallel, operator-minus-children for serial. Crucial — using cumulative CPU would double-count in serial plans.
  • `QueryTimeResult` gains `ExternalWaitMs` populated during mapping.
  • CPU:Elapsed renderers updated in HtmlExporter runtime card, Index.razor runtime card, PlanViewerControl DOP efficiency calc.

Test plan

  • Formula matches Joe's expected 48.3% by math
  • Private + public reference plans show expected numbers
  • Regression on existing parallel plan
  • Verify against raw XML of MwX36LEHJS when available
  • Web viewer visual post-deploy

Version 1.7.3 → 1.7.4.

🤖 Generated with Claude Code

…nt (#215 C6, C7)

C6 — external-wait benefit formula:
The existing parallel-wait formula uses per-thread wait = max(0, elapsed - cpu).
External / preemptive waits (MEMORY_ALLOCATION_EXT, RESERVED_MEMORY_ALLOCATION_EXT,
PREEMPTIVE_*) are CPU-busy in kernel, so elapsed ≈ cpu for those threads and the
wait barely shows in that math. Joe's formula:

  wait_cpu_share = wait_ms / total_cpu_ms
  sum_max_cpu    = Σ max-thread-self-cpu across operators
  benefit_ms     = wait_cpu_share * sum_max_cpu

Verified against Joe's worked example on MwX36LEHJS:
  (26307 / 45075) * (11341 + 42) / 13748 = 48.3% ✓

Implementation:
- New IsExternalWait classifier (public so mapping can see it) covering
  MEMORY_ALLOCATION* and PREEMPTIVE_* prefixes.
- OperatorWaitProfile grows MaxThreadCpuMs — per-thread self-CPU
  (non-cumulative), taken from new PlanAnalyzer.GetOperatorMaxThreadOwnCpuMs
  which mirrors GetOperatorOwnElapsedMs. Uses self-CPU so summing across
  operators in a serial plan doesn't double-count cumulative CPU, and
  sums to approximate critical-path CPU in parallel plans.
- ScoreWaitStats routes external waits to CalculateExternalWaitBenefit.
- Serial plans with external waits: sum-of-self-cpu ≈ statement cpu, so
  the formula collapses to wait/elapsed — same as before, no regression.

C7 — CPU:Elapsed numerator:
External waits inflate the CPU number without representing real query CPU,
so the CPU:Elapsed ratio (used in web runtime card, HTML export runtime
card, and Avalonia DOP efficiency) subtracts Σ external-wait ms from CPU
before dividing by elapsed.

- AnalysisResult.QueryTimeResult gains ExternalWaitMs, populated in
  ResultMapper by summing wait stats matching IsExternalWait.
- HtmlExporter.WriteRuntimeCard, Index.razor runtime card, and
  PlanViewerControl DOP-efficiency calc all use (cpu - externalWaitMs).

Version bump 1.7.3 -> 1.7.4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Owner Author

@erikdarlingdata erikdarlingdata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What this PR does

Addresses #215 C6 and C7. External/preemptive waits (MEMORY_ALLOCATION_*, PREEMPTIVE_*) are CPU-busy in kernel, so the existing per-thread elapsed − cpu benefit formula scores them near zero. This PR:

  • Adds BenefitScorer.IsExternalWait and a dedicated CalculateExternalWaitBenefit using Joe's (wait_ms / cpu_ms) * Σ max-thread-self-cpu / elapsed_ms.
  • Adds PlanAnalyzer.GetOperatorMaxThreadOwnCpuMs (the CPU analogue of GetPerThreadOwnElapsed) to feed the numerator.
  • Surfaces QueryTimeResult.ExternalWaitMs and subtracts it from CPU in the CPU:Elapsed ratio across HtmlExporter, Index.razor, and the DOP-efficiency calc in PlanViewerControl.
  • Bumps version 1.7.3 → 1.7.4. Base branch is dev, correct.

What's good

  • Formula is derivation-consistent with the (elapsed - cpu) path for the non-external case; serial plans collapse to the old simple ratio so there's no regression path (confirmed by your c1-c5 verification).
  • Single classifier (IsExternalWait) used in all four consumers — no drift risk as long as it stays authoritative.
  • GetOperatorMaxThreadOwnCpuMs mirrors GetPerThreadOwnElapsed's exchange look-through pattern; serial branch clamps with Max(0, ...) for the known pass-through / timing-skew cases.

Needs attention

  1. Cross-repo rule-set drift. This modifies BenefitScorer.ScoreWaitStats and adds a new PlanAnalyzer helper — both are part of the analysis rule set shared with PerformanceMonitor's Dashboard and Lite copies. The PR body doesn't mention sync'ing them. If those are meant to stay in lockstep with Studio, companion PRs need to land there, otherwise the same plan will score differently in each product.

  2. No tests. tests/PlanViewer.Core.Tests/ has nothing asserting on benefit percentages for external waits, and nothing that would catch a regression in Joe's 48.3% worked example. Inline comment on ResultMapper.cs has suggested coverage.

  3. Minor, inline:

    • IsExternalWait uses Contains("MEMORY_ALLOCATION") — works, but broader than needed.
    • CollectOperatorWaitProfiles now walks for every statement with a root node, not just parallel — cheap per-plan but adds up in CLI bulk runs; gate-on-need is straightforward if it matters.
    • Doc-comment on CalculateExternalWaitBenefit reads parallel-only, but the function is intentionally also the serial path — worth a line noting that.
  4. Open TODOs in the PR body (raw XML for MwX36LEHJS, web visual post-deploy) — not blocking review, just flagging they're still unchecked.

No Avalonia concerns (no new controls, no XAML styles, no MVVM creep, no contrast-range brushes introduced).


Generated by Claude Code

if (string.IsNullOrEmpty(waitType)) return false;
var wt = waitType.ToUpperInvariant();
return wt.Contains("MEMORY_ALLOCATION")
|| wt.StartsWith("PREEMPTIVE_");
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contains("MEMORY_ALLOCATION") catches both MEMORY_ALLOCATION_EXT and RESERVED_MEMORY_ALLOCATION_EXT — that's the intent — but it's substring-wide, so any future wait type embedding that token anywhere in the name would be (mis)classified as CPU-busy. Tighter alternative:

return wt == "MEMORY_ALLOCATION_EXT"
    || wt == "RESERVED_MEMORY_ALLOCATION_EXT"
    || wt.StartsWith("PREEMPTIVE_");

Not a bug today, just a robustness nit. Same classifier is reused from ResultMapper and PlanViewerControl, so whatever shape it takes here is authoritative.


Generated by Claude Code

{
var childNode = child;
if (child.PhysicalOp == "Parallelism" && child.Children.Count > 0)
childNode = child.Children.OrderByDescending(c => c.ActualCPUMs).First();
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exchange look-through mirrors GetPerThreadOwnElapsed at 1573-1574 — good, pattern is consistent. One subtle thing: on the parent side (parentByThread), thread IDs come from the operator itself; on the child side (childSumByThread), they come from the grandchild when the direct child is Parallelism. Producer-side threads below a parallelism exchange are a different thread set from consumer-side threads above it, so childSumByThread lookups keyed by parent's thread IDs will often miss and contribute 0. That's actually the right answer (you don't want to subtract producer CPU from consumer thread self-CPU), but it means in those cases this collapses to "parent thread cpu" as self. Worth a one-line comment noting that's intentional.


Generated by Claude Code

// in kernel and inflate the ratio without representing real query work.
long externalWaitMs = 0;
foreach (var w in statement.WaitStats)
if (BenefitScorer.IsExternalWait(w.WaitType))
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResultMapper already derives QueryTimeResult.ExternalWaitMs using the same classifier (ResultMapper.cs:115-119). This code-behind works against the raw PlanStatement, not the mapped StatementResult, so the re-sum is necessary — just flagging that the classifier (BenefitScorer.IsExternalWait) is the single source of truth across all three sites (here, ResultMapper, HtmlExporter/Index.razor via the mapped field). If the classifier ever drifts, this site would need to drift with it.


Generated by Claude Code

// uses sum-of-max-thread-cpu across operators and works for both.
var operatorProfiles = new List<OperatorWaitProfile>();
if (stmt.RootNode != null)
CollectOperatorWaitProfiles(stmt.RootNode, operatorProfiles);
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously the tree walk was skipped for serial plans. Now CollectOperatorWaitProfiles runs on every statement that has a root node, including serial plans that have no external waits. For statements with many operators and no MEMORY_ALLOCATION_*/PREEMPTIVE_* waits this is pure overhead. If it shows up in CLI/bulk-scan profiling, gate the walk on stmt.WaitStats.Any(w => IsExternalWait(w.WaitType)) || isParallel.


Generated by Claude Code

// is CPU-busy in kernel, so operator elapsed ≈ operator cpu and the wait
// barely shows in the per-thread (elapsed - cpu) calculation. Joe's formula:
// benefit = (wait_ms / total_cpu_ms) * Σ max_thread_cpu_per_operator / elapsed
benefitPct = CalculateExternalWaitBenefit(wait, operatorProfiles, stmt.QueryTimeStats!.CpuTimeMs, elapsedMs);
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The external-wait path is taken for serial plans too, not just parallel. On a serial plan with per-operator CPU stats, sumMaxCpu ≈ stmt.CpuTimeMs and the formula collapses to wait_ms / elapsed_ms — which is exactly the old serial behavior, so no regression (matches your c1-c5.sqlplan verification). Worth noting in the doc-comment above (Σ max_thread_cpu_per_operator reads as parallel-only at a glance).


Generated by Claude Code

CpuTimeMs = stmt.QueryTimeStats.CpuTimeMs,
ElapsedTimeMs = stmt.QueryTimeStats.ElapsedTimeMs
ElapsedTimeMs = stmt.QueryTimeStats.ElapsedTimeMs,
ExternalWaitMs = externalWaitMs
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No test coverage landing with this PR. tests/PlanViewer.Core.Tests/PlanAnalyzerTests.cs has wait-stat tests (Rule 11 severity gating, CPU vs I/O waits) but nothing asserting on benefit % values. A small fixture covering:

  • serial plan with a RESERVED_MEMORY_ALLOCATION_EXT wait → asserts ExternalWaitMs is populated and benefit% is reasonable;
  • a plan matching Joe's 48.3% worked example → locks that math in so a future refactor can't silently move the number.

Would catch any future drift in either IsExternalWait or the formula. Not blocking, but flagging per the review checklist.


Generated by Claude Code

@erikdarlingdata erikdarlingdata merged commit 6b61d79 into dev Apr 22, 2026
2 checks passed
@erikdarlingdata erikdarlingdata deleted the fix/external-wait-formula-and-cpu-elapsed branch April 22, 2026 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant