From 5067481d24e30f0b14d82fabc8d988b9ff4a69c7 Mon Sep 17 00:00:00 2001 From: Ralf Anton Beier Date: Thu, 23 Apr 2026 22:07:06 +0200 Subject: [PATCH] docs: v0.7.0 Track A commit-2 design (hierarchical IRQ-aware RTA) Captures the design for the next Track A commit after PR #145 lands. Two-tier analysis: ISR layer preempts all tasks, residual capacity feeds classical task-priority RTA; Tindell-Clark jitter terms fold into the fixed-point recurrence; Compute_Execution_Time's Time_Range consumed as (BCET, WCET) with a new ResponseBand diagnostic. Includes: current rta.rs walkthrough with line refs, gap analysis, algorithm pseudocode, new diagnostics table, file-by-file change scope with LOC estimates, 10-unit-test plan + 3 fixture tests, out-of-scope list (PIP/PCP, multi-processor ISR migration), commit message template, and an executable agent brief ready to hand off when #145 merges. Non-regression is the critical gate: models with no Spar_Timing::* properties must produce byte-identical RTA output. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/designs/v0.7.0-hierarchical-rta.md | 319 ++++++++++++++++++++++++ 1 file changed, 319 insertions(+) create mode 100644 docs/designs/v0.7.0-hierarchical-rta.md diff --git a/docs/designs/v0.7.0-hierarchical-rta.md b/docs/designs/v0.7.0-hierarchical-rta.md new file mode 100644 index 0000000..6ddf181 --- /dev/null +++ b/docs/designs/v0.7.0-hierarchical-rta.md @@ -0,0 +1,319 @@ +# Design: Hierarchical IRQ-aware RTA (v0.7.0 Track A, commit 2) + +Status: **proposed** — implementation waits on PR #145 merge. +Last update: 2026-04-23. +Preceded by: PR #145 (property set surface — `Spar_Timing::*`, `Spar_Trace::*`). +Followed by: commit 3 (`Spar_Proofs/Scheduling/RTAJittered.lean`), commit 4 (integration +tests + COMPLIANCE). + +## Context: what RTA does today + +`crates/spar-analysis/src/rta.rs` implements exact response-time analysis +for fixed-priority preemptive scheduling (`RtaAnalysis` at line 34). For +each processor, threads are sorted by priority and the fixed-point + +```text +R(0) = C_i +R(n+1) = C_i + Σ_j ⌈R(n)/T_j⌉ × C_j (all higher-priority threads j) +``` + +is computed by the Lean4-verified +`scheduling_verified::compute_response_time(exec_ps, deadline_ps, &higher_priority)` +(called at `rta.rs:121`). Priority ordering: explicit +`Deployment_Properties::Priority` first (lower = higher), then shorter +period (RM ordering). If response time exceeds `Deadline` (falls back to +`Period`), an `Error` diagnostic is emitted; otherwise `Info`. + +Property accessors already available in +`crates/spar-analysis/src/property_accessors.rs`: + +| Function | Returns | Line | +|---|---|---| +| `get_timing_property(props, name)` | `Option` picoseconds | 195 | +| `get_execution_time(props)` | `Option` — WCET from range | 214 | +| `get_execution_time_range(props)` | `Option<(u64, u64)>` — **BCET, WCET** | 240 | +| `get_processor_binding(props)` | `Option` | existing | + +The BCET/WCET split accessor exists but is not used by `rta.rs` today — +only the WCET path is exercised. + +## The gap: what classical RTA misses for IRQ-driven systems + +1. **No interrupt priority layer.** Every thread is treated as if the + highest task-priority level were the system's top priority. ISRs that + run at higher-than-task priority (all of them, on any real kernel) + are invisible to RTA — their CPU steal is silently ignored. +2. **`Dispatch_Jitter` is parsed but never used** in the recurrence. +3. **Time_Range collapsed to WCET.** We lose BCET entirely, which means + we can't compute slack band (WCET − BCET) or best-case response + time needed for some safety arguments. +4. **No probe/trace awareness.** Today RTA can't say "this is the + response chain the user wants to verify with trace data in v0.8.0". + +## Target design: two-tier analysis + +``` +┌─────────────────────────────────────────────────────────────┐ +│ Tier 1 — ISR layer (preempts everything) │ +│ For each processor p: │ +│ U_isr(p) = Σ_{ISR i on p} ISR_exec_i / Period_i │ +│ If U_isr(p) ≥ threshold → IsrOverloadedCpu diagnostic │ +│ Residual_capacity(p) = 1 − U_isr(p) │ +├─────────────────────────────────────────────────────────────┤ +│ Tier 2 — Task RTA on inflated interference │ +│ For each thread i bound to processor p: │ +│ J_i = Dispatch_Jitter(i) (0 if unset) │ +│ R_i(0) = C_i + J_i │ +│ R_i(n+1) = C_i + J_i │ +│ + Σ_j∈HP ⌈(R_i(n) + J_j) / T_j⌉ × C_j │ +│ + IsrInterference(p, R_i(n)) │ +│ where IsrInterference(p, R) = Σ_{ISR k on p} ⌈R/T_k⌉×I_k│ +├─────────────────────────────────────────────────────────────┤ +│ Chain view — for Sporadic-dispatched threads with an ISR │ +│ upstream: │ +│ Total_response = Interrupt_Latency_Bound │ +│ + ISR_Execution_Time │ +│ + Dispatch_overhead │ +│ + R_handler (from Tier 2 above) │ +└─────────────────────────────────────────────────────────────┘ +``` + +### Why this shape + +Classical RTA assumes a single priority order. Hierarchical scheduling +models the ISR layer as a higher-priority *server* that preempts tasks +unconditionally — the cleanest extension of the existing fixed-point +that preserves convergence properties. Adding ISR interference as an +extra term inside the recurrence is equivalent to treating ISRs as +highest-priority pseudo-tasks, for which the Lean convergence proof in +`proofs/Proofs/Scheduling/RTA.lean` still holds (the proof is monotonic +in the interference term). + +### Jitter handling + +The jittered response-time recurrence `R = C + J + Σ⌈(R+J)/T⌉·C` is the +standard Tindell-Clark extension. Each jitter term inflates a task's +own release window *and* every interfering task's release window. The +Lean convergence proof needs a small extension (monotonic addition of J +preserves least-fixed-point existence); sketched in commit 3 as +`RTAJittered.lean`. + +### BCET/WCET split use + +For the response-time recurrence we continue to use WCET — BCET cannot +produce a safe upper bound. However, two new informational outputs use +both: + +- `Slack_Band = Deadline − R_wcet` (how much headroom over WCET path). +- `Response_Band = (R_bcet, R_wcet)` where `R_bcet` uses BCET in place + of WCET for the target thread only (keeping interference at WCET). + +These feed v0.8.0's trace verification: observed response times outside +`[R_bcet, R_wcet]` indicate the model's execution-time range is wrong. + +## New diagnostics + +Added to `crate::AnalysisDiagnostic` via new `DiagnosticKind` values or +inline variants — follow whatever pattern `diagnostics.rs` uses today +(the agent should read first). + +| Kind | Severity | When emitted | +|---|---|---| +| `IrqResponseBudget { event_port, predicted, deadline, slack }` | Info | Per Sporadic-dispatched thread with an ISR source — one per IRQ chain | +| `IrqBudgetViolated { event_port, predicted, deadline }` | Error | IRQ chain total response > deadline | +| `IsrOverloadedCpu { processor, utilization_pct, threshold_pct }` | Error | Per processor where U_isr ≥ threshold (default 0.3) — ISR saturation regardless of task RTA | +| `MissingBottomHalfServer { isr_component }` | Warning | `ISR_Execution_Time` is set but no `Bottom_Half_Server` reference | +| `ResponseBand { thread, bcet_response, wcet_response }` | Info | Per thread when BCET is available from Time_Range | + +Existing `response time <= deadline` Info diagnostic stays — now reflects +the *residual-capacity-adjusted* RTA. + +## What counts as an ISR + +A component instance is treated as an ISR iff +`Spar_Timing::ISR_Priority` is set. Two sub-cases: + +- ISR is a *thread* with dispatch protocol `Sporadic`: the thread body + is the ISR body, `ISR_Execution_Time` supersedes + `Compute_Execution_Time` for the tier-1 utilization calculation. +- ISR is a *device* with `ISR_Execution_Time` set: the device models + the ISR; if `Bottom_Half_Server` references a thread, that thread is + the bottom-half handler whose classical RTA runs on residual capacity. + +The `Interrupt_Latency_Bound` is a **processor** property: the +hardware + kernel ack latency. Added to the chain view regardless of +per-ISR specifics. + +## Files touched (commit 2 scope) + +| File | Change | Approx LOC | +|---|---|---| +| `crates/spar-analysis/src/rta.rs` | Split analyze loop into tier-1 (ISR collection + utilization) and tier-2 (task RTA with IsrInterference term + jitter). Add diagnostic helpers. | +250 / −30 | +| `crates/spar-analysis/src/property_accessors.rs` | Add `get_isr_priority`, `get_isr_execution_time`, `get_interrupt_latency_bound`, `get_bottom_half_server`, `get_dispatch_jitter` typed-first accessors. | +80 | +| `crates/spar-analysis/src/scheduling_verified.rs` | Extend `compute_response_time` with an optional `isr_interference: u64` additive constant OR add `compute_response_time_jittered(…, jitter: u64, isr_overhead: u64)`. Prefer the latter: less disruptive to callers. | +30 | +| `crates/spar-analysis/src/diagnostics.rs` | Add new `DiagnosticKind` variants (see table above). | +40 | +| `crates/spar-analysis/src/rta.rs` (tests) | 8–10 new unit tests covering each algorithm branch; 3 fixture tests demonstrating the two-tier vs classical gap. | +300 | + +No changes outside `spar-analysis/`. No HIR changes needed — property +surface already landed in #145. + +## Test plan + +**Unit tests in rta.rs**: + +1. `single_isr_reduces_task_capacity` — 1 CPU, 1 ISR (5% util), 1 task + (90% util): classical RTA passes, new RTA passes (task fits in 95% + residual); assert response time is *larger* than classical-RTA + result by the correct interference term. +2. `overloaded_isr_fires_diagnostic` — 1 CPU, 3 ISRs totaling 35% util: + `IsrOverloadedCpu` fires at default threshold 30%. +3. `dispatch_jitter_inflates_response` — Task with 50µs jitter on a + competing 1ms-period higher-priority thread produces response ≈ C + + ⌈(R+50µs)/1ms⌉·C'; compare to no-jitter baseline. +4. `bcet_wcet_response_band` — Time_Range `Compute_Execution_Time => + 50us .. 200us` produces a `ResponseBand` diagnostic with both values. +5. `irq_chain_total_response` — Sporadic thread with upstream ISR and + `Interrupt_Latency_Bound = 10us`, `ISR_Execution_Time = 20us..30us`: + total response = 10µs (latency) + 30µs (ISR) + R_handler. +6. `missing_bottom_half_server_warning` — ISR_Execution_Time set + without Bottom_Half_Server: warning fires. +7. `no_isrs_matches_classical_rta` — Regression: in models with no + `Spar_Timing::*` properties, new RTA output is byte-for-byte the + same as old RTA output. This is the critical non-regression gate. +8. `multi_processor_isolation` — ISR on CPU1 does not inflate task + response on CPU2. +9. `zero_jitter_matches_unjittered` — J=0 recurrence equals the + non-jittered recurrence. +10. `isr_priority_above_all_tasks` — ISR with priority 99 cannot be + preempted by task with priority 0 (lower number). + +**Fixture tests** under `crates/spar-analysis/tests/fixtures/`: + +- `irq_brake_handler.aadl` — Representative: device triggers an event + port, ISR (20..30µs) fires, bottom-half handler thread (50..200µs + WCET) must complete within 1ms deadline. Expected outputs committed + as `irq_brake_handler.expected.json`. +- `multi_isr_same_cpu.aadl` — Three interrupt sources on one CPU, + differing periods, showing total ISR utilization computation. +- `jittered_chain.aadl` — Sporadic chain with MINT + jitter, + demonstrating Tindell-Clark extension effect. + +## Out of scope for commit 2 + +- **Priority Inheritance Protocol / Priority Ceiling Protocol**. The + no-blocking assumption stays. PIP/PCP is v0.7.1. A TODO comment in + `rta.rs` cites AS5506D §5.4.4 and notes `Locking_Protocol` is not + yet modeled. +- **Multi-processor ISR migration**. Each ISR is bound to one CPU. +- **Cache-aware interference inflation.** Research topic; left for + v1.0+. +- **Rate monotonic vs earliest deadline first split at tier 1.** Tier 1 + is always FP (standard for ISRs). Tier 2 RTA uses whatever priority + assignment the model declares. + +## Commit message template + +``` +feat(rta): hierarchical IRQ-aware RTA with jitter and BCET/WCET split + +Extends response-time analysis with an ISR tier above the classical +task-priority fixed-point. ISRs identified by Spar_Timing::ISR_Priority +steal CPU capacity first; residual capacity feeds task RTA. Tindell- +Clark jitter terms (Dispatch_Jitter) inflate the recurrence in the +standard way. Compute_Execution_Time's Time_Range is consumed as +(BCET, WCET) where only WCET enters the recurrence but both emerge in +a ResponseBand diagnostic for v0.8.0 trace comparison. + +New diagnostics: IrqResponseBudget, IrqBudgetViolated, +IsrOverloadedCpu, MissingBottomHalfServer, ResponseBand. + +Non-regression: models with no Spar_Timing::* properties produce +byte-identical RTA output to the prior implementation. + +Out of scope (v0.7.1): priority inheritance / ceiling protocols; +multi-processor ISR migration. + +Closes (partial) the IRQ-RTA acceptance criteria from the v0.7.0 +plan. Track A commit 2 of 4. + +Refs: Spar_Timing property surface from #145. + +Co-Authored-By: Claude Opus 4.7 (1M context) +``` + +## Agent brief (executable section) + +When PR #145 merges, launch one agent with the following instructions. + +**Scope:** strictly this commit. Do not touch `spar-hir-def/`. Do not +touch `spar-cli/`. Only `spar-analysis/`. + +**Steps:** + +1. Read `crates/spar-analysis/src/rta.rs` in full. Understand the + current flow. +2. Read `crates/spar-analysis/src/scheduling_verified.rs` in full. + Decide: add an overload (`compute_response_time_jittered`) or + parameterize (`compute_response_time(..., jitter, isr_interference)`) + — the design note prefers the overload. +3. Read `crates/spar-analysis/src/diagnostics.rs` to learn the + diagnostic-kind pattern. Add the five new variants. +4. Read `crates/spar-analysis/src/property_accessors.rs` around lines + 195–290 to see the typed-first / string-fallback idiom. Add the + new typed accessors following the same pattern. +5. Modify `rta.rs`: + - Split `analyze()` into `collect_isrs_per_cpu()` and + `run_task_rta_per_cpu()`. + - Compute U_isr per CPU; emit `IsrOverloadedCpu` as needed. + - In task loop: fold jitter into the recurrence call; add + ISR interference term. + - For threads that are `Sporadic` and have an ISR upstream: emit + `IrqResponseBudget` or `IrqBudgetViolated`. + - Emit `ResponseBand` if BCET available. + - Preserve existing Info/Error "response time" diagnostic. +6. Write 10 unit tests per the test plan. Each test should fit the + existing `TestBuilder` idiom in `rta.rs:240`. +7. Add three `.aadl` fixtures + `.expected.json` under a new + `tests/fixtures/rta/` directory; register them in an integration + test file that loads each fixture, runs RTA, and diffs against + `.expected.json`. +8. **Run the non-regression test first.** If existing RTA tests pass + before any Spar_Timing-using tests are added, you have the flow + right. If any existing test behavior changes, the default path + has regressed — fix before proceeding. +9. Update `COMPLIANCE.md`: + - In "In progress / v0.7.0" add: "IRQ-aware RTA: hierarchical + two-tier analysis with Dispatch_Jitter and BCET/WCET split + landed; PIP/PCP blocking deferred to v0.7.1." +10. Add rivet verification entries linking the new tests to + `REQ-TIMING-IRQ-{001,002,003}` from #145. +11. Quality gates — all must pass before push: + - `cargo build --workspace` + - `cargo test -p spar-analysis` + - `cargo clippy --workspace --all-targets -- -D warnings` + - `cargo fmt --all -- --check` + - `rivet validate` +12. Push `feat/v0.7.0-hierarchical-rta`; open PR using the commit + message above as the title. + +**Guardrails:** + +- Do not change the Lean theorems in `proofs/Proofs/Scheduling/`. + Commit 3 extends them separately. +- Do not add MBPTA or probabilistic WCET anywhere. That is v1.0+. +- Do not touch `spar-trace`, `spar-insight`, `spar-variants` — these + crates do not yet exist and are not part of this commit. +- If a test fails because the model is ambiguous, write a + diagnostic, not a panic. The Lean-verified `compute_response_time` + already handles the diverged case; follow its pattern. + +## References + +- Tindell and Clark, *Holistic schedulability analysis for distributed + hard real-time systems*, Microprocessing and Microprogramming + 40(2–3), 1994. (The jitter extension.) +- AS5506D §5.4.4 *Thread Scheduling Properties*. (What AADL says about + priority and dispatch; what it doesn't say about IRQs.) +- spar Lean proof `proofs/Proofs/Scheduling/RTA.lean` — fixed-point + convergence of the classical recurrence. +- PR #145 — property surface this commit consumes.