External/preemptive wait benefit formula + CPU:Elapsed adjustment (#215 C6, C7)#259
Conversation
…nt (#215 C6, C7) C6 — external-wait benefit formula: The existing parallel-wait formula uses per-thread wait = max(0, elapsed - cpu). External / preemptive waits (MEMORY_ALLOCATION_EXT, RESERVED_MEMORY_ALLOCATION_EXT, PREEMPTIVE_*) are CPU-busy in kernel, so elapsed ≈ cpu for those threads and the wait barely shows in that math. Joe's formula: wait_cpu_share = wait_ms / total_cpu_ms sum_max_cpu = Σ max-thread-self-cpu across operators benefit_ms = wait_cpu_share * sum_max_cpu Verified against Joe's worked example on MwX36LEHJS: (26307 / 45075) * (11341 + 42) / 13748 = 48.3% ✓ Implementation: - New IsExternalWait classifier (public so mapping can see it) covering MEMORY_ALLOCATION* and PREEMPTIVE_* prefixes. - OperatorWaitProfile grows MaxThreadCpuMs — per-thread self-CPU (non-cumulative), taken from new PlanAnalyzer.GetOperatorMaxThreadOwnCpuMs which mirrors GetOperatorOwnElapsedMs. Uses self-CPU so summing across operators in a serial plan doesn't double-count cumulative CPU, and sums to approximate critical-path CPU in parallel plans. - ScoreWaitStats routes external waits to CalculateExternalWaitBenefit. - Serial plans with external waits: sum-of-self-cpu ≈ statement cpu, so the formula collapses to wait/elapsed — same as before, no regression. C7 — CPU:Elapsed numerator: External waits inflate the CPU number without representing real query CPU, so the CPU:Elapsed ratio (used in web runtime card, HTML export runtime card, and Avalonia DOP efficiency) subtracts Σ external-wait ms from CPU before dividing by elapsed. - AnalysisResult.QueryTimeResult gains ExternalWaitMs, populated in ResultMapper by summing wait stats matching IsExternalWait. - HtmlExporter.WriteRuntimeCard, Index.razor runtime card, and PlanViewerControl DOP-efficiency calc all use (cpu - externalWaitMs). Version bump 1.7.3 -> 1.7.4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
erikdarlingdata
left a comment
There was a problem hiding this comment.
What this PR does
Addresses #215 C6 and C7. External/preemptive waits (MEMORY_ALLOCATION_*, PREEMPTIVE_*) are CPU-busy in kernel, so the existing per-thread elapsed − cpu benefit formula scores them near zero. This PR:
- Adds
BenefitScorer.IsExternalWaitand a dedicatedCalculateExternalWaitBenefitusing Joe's(wait_ms / cpu_ms) * Σ max-thread-self-cpu / elapsed_ms. - Adds
PlanAnalyzer.GetOperatorMaxThreadOwnCpuMs(the CPU analogue ofGetPerThreadOwnElapsed) to feed the numerator. - Surfaces
QueryTimeResult.ExternalWaitMsand subtracts it from CPU in the CPU:Elapsed ratio acrossHtmlExporter,Index.razor, and the DOP-efficiency calc inPlanViewerControl. - Bumps version 1.7.3 → 1.7.4. Base branch is
dev, correct.
What's good
- Formula is derivation-consistent with the
(elapsed - cpu)path for the non-external case; serial plans collapse to the old simple ratio so there's no regression path (confirmed by your c1-c5 verification). - Single classifier (
IsExternalWait) used in all four consumers — no drift risk as long as it stays authoritative. GetOperatorMaxThreadOwnCpuMsmirrorsGetPerThreadOwnElapsed's exchange look-through pattern; serial branch clamps withMax(0, ...)for the known pass-through / timing-skew cases.
Needs attention
-
Cross-repo rule-set drift. This modifies
BenefitScorer.ScoreWaitStatsand adds a newPlanAnalyzerhelper — both are part of the analysis rule set shared with PerformanceMonitor's Dashboard and Lite copies. The PR body doesn't mention sync'ing them. If those are meant to stay in lockstep with Studio, companion PRs need to land there, otherwise the same plan will score differently in each product. -
No tests.
tests/PlanViewer.Core.Tests/has nothing asserting on benefit percentages for external waits, and nothing that would catch a regression in Joe's 48.3% worked example. Inline comment onResultMapper.cshas suggested coverage. -
Minor, inline:
IsExternalWaitusesContains("MEMORY_ALLOCATION")— works, but broader than needed.CollectOperatorWaitProfilesnow walks for every statement with a root node, not just parallel — cheap per-plan but adds up in CLI bulk runs; gate-on-need is straightforward if it matters.- Doc-comment on
CalculateExternalWaitBenefitreads parallel-only, but the function is intentionally also the serial path — worth a line noting that.
-
Open TODOs in the PR body (raw XML for
MwX36LEHJS, web visual post-deploy) — not blocking review, just flagging they're still unchecked.
No Avalonia concerns (no new controls, no XAML styles, no MVVM creep, no contrast-range brushes introduced).
Generated by Claude Code
| if (string.IsNullOrEmpty(waitType)) return false; | ||
| var wt = waitType.ToUpperInvariant(); | ||
| return wt.Contains("MEMORY_ALLOCATION") | ||
| || wt.StartsWith("PREEMPTIVE_"); |
There was a problem hiding this comment.
Contains("MEMORY_ALLOCATION") catches both MEMORY_ALLOCATION_EXT and RESERVED_MEMORY_ALLOCATION_EXT — that's the intent — but it's substring-wide, so any future wait type embedding that token anywhere in the name would be (mis)classified as CPU-busy. Tighter alternative:
return wt == "MEMORY_ALLOCATION_EXT"
|| wt == "RESERVED_MEMORY_ALLOCATION_EXT"
|| wt.StartsWith("PREEMPTIVE_");Not a bug today, just a robustness nit. Same classifier is reused from ResultMapper and PlanViewerControl, so whatever shape it takes here is authoritative.
Generated by Claude Code
| { | ||
| var childNode = child; | ||
| if (child.PhysicalOp == "Parallelism" && child.Children.Count > 0) | ||
| childNode = child.Children.OrderByDescending(c => c.ActualCPUMs).First(); |
There was a problem hiding this comment.
Exchange look-through mirrors GetPerThreadOwnElapsed at 1573-1574 — good, pattern is consistent. One subtle thing: on the parent side (parentByThread), thread IDs come from the operator itself; on the child side (childSumByThread), they come from the grandchild when the direct child is Parallelism. Producer-side threads below a parallelism exchange are a different thread set from consumer-side threads above it, so childSumByThread lookups keyed by parent's thread IDs will often miss and contribute 0. That's actually the right answer (you don't want to subtract producer CPU from consumer thread self-CPU), but it means in those cases this collapses to "parent thread cpu" as self. Worth a one-line comment noting that's intentional.
Generated by Claude Code
| // in kernel and inflate the ratio without representing real query work. | ||
| long externalWaitMs = 0; | ||
| foreach (var w in statement.WaitStats) | ||
| if (BenefitScorer.IsExternalWait(w.WaitType)) |
There was a problem hiding this comment.
ResultMapper already derives QueryTimeResult.ExternalWaitMs using the same classifier (ResultMapper.cs:115-119). This code-behind works against the raw PlanStatement, not the mapped StatementResult, so the re-sum is necessary — just flagging that the classifier (BenefitScorer.IsExternalWait) is the single source of truth across all three sites (here, ResultMapper, HtmlExporter/Index.razor via the mapped field). If the classifier ever drifts, this site would need to drift with it.
Generated by Claude Code
| // uses sum-of-max-thread-cpu across operators and works for both. | ||
| var operatorProfiles = new List<OperatorWaitProfile>(); | ||
| if (stmt.RootNode != null) | ||
| CollectOperatorWaitProfiles(stmt.RootNode, operatorProfiles); |
There was a problem hiding this comment.
Previously the tree walk was skipped for serial plans. Now CollectOperatorWaitProfiles runs on every statement that has a root node, including serial plans that have no external waits. For statements with many operators and no MEMORY_ALLOCATION_*/PREEMPTIVE_* waits this is pure overhead. If it shows up in CLI/bulk-scan profiling, gate the walk on stmt.WaitStats.Any(w => IsExternalWait(w.WaitType)) || isParallel.
Generated by Claude Code
| // is CPU-busy in kernel, so operator elapsed ≈ operator cpu and the wait | ||
| // barely shows in the per-thread (elapsed - cpu) calculation. Joe's formula: | ||
| // benefit = (wait_ms / total_cpu_ms) * Σ max_thread_cpu_per_operator / elapsed | ||
| benefitPct = CalculateExternalWaitBenefit(wait, operatorProfiles, stmt.QueryTimeStats!.CpuTimeMs, elapsedMs); |
There was a problem hiding this comment.
The external-wait path is taken for serial plans too, not just parallel. On a serial plan with per-operator CPU stats, sumMaxCpu ≈ stmt.CpuTimeMs and the formula collapses to wait_ms / elapsed_ms — which is exactly the old serial behavior, so no regression (matches your c1-c5.sqlplan verification). Worth noting in the doc-comment above (Σ max_thread_cpu_per_operator reads as parallel-only at a glance).
Generated by Claude Code
| CpuTimeMs = stmt.QueryTimeStats.CpuTimeMs, | ||
| ElapsedTimeMs = stmt.QueryTimeStats.ElapsedTimeMs | ||
| ElapsedTimeMs = stmt.QueryTimeStats.ElapsedTimeMs, | ||
| ExternalWaitMs = externalWaitMs |
There was a problem hiding this comment.
No test coverage landing with this PR. tests/PlanViewer.Core.Tests/PlanAnalyzerTests.cs has wait-stat tests (Rule 11 severity gating, CPU vs I/O waits) but nothing asserting on benefit % values. A small fixture covering:
- serial plan with a
RESERVED_MEMORY_ALLOCATION_EXTwait → assertsExternalWaitMsis populated and benefit% is reasonable; - a plan matching Joe's 48.3% worked example → locks that math in so a future refactor can't silently move the number.
Would catch any future drift in either IsExternalWait or the formula. Not blocking, but flagging per the review checklist.
Generated by Claude Code
Summary
Joe's feedback from #215, public plan `MwX36LEHJS`:
Verified
Implementation notes
Test plan
Version 1.7.3 → 1.7.4.
🤖 Generated with Claude Code