pulseengine · avrabe · Apr 24, 2026 · Apr 23, 2026
diff --git a/docs/designs/v0.7.0-hierarchical-rta.md b/docs/designs/v0.7.0-hierarchical-rta.md
@@ -0,0 +1,319 @@
+# Design: Hierarchical IRQ-aware RTA (v0.7.0 Track A, commit 2)
+
+Status: **proposed** — implementation waits on PR #145 merge.
+Last update: 2026-04-23.
+Preceded by: PR #145 (property set surface — `Spar_Timing::*`, `Spar_Trace::*`).
+Followed by: commit 3 (`Spar_Proofs/Scheduling/RTAJittered.lean`), commit 4 (integration
+tests + COMPLIANCE).
+
+## Context: what RTA does today
+
+`crates/spar-analysis/src/rta.rs` implements exact response-time analysis
+for fixed-priority preemptive scheduling (`RtaAnalysis` at line 34). For
+each processor, threads are sorted by priority and the fixed-point
+
+```text
+R(0) = C_i
+R(n+1) = C_i + Σ_j ⌈R(n)/T_j⌉ × C_j    (all higher-priority threads j)
+```
+
+is computed by the Lean4-verified
+`scheduling_verified::compute_response_time(exec_ps, deadline_ps, &higher_priority)`
+(called at `rta.rs:121`). Priority ordering: explicit
+`Deployment_Properties::Priority` first (lower = higher), then shorter
+period (RM ordering). If response time exceeds `Deadline` (falls back to
+`Period`), an `Error` diagnostic is emitted; otherwise `Info`.
+
+Property accessors already available in
+`crates/spar-analysis/src/property_accessors.rs`:
+
+| Function | Returns | Line |
+|---|---|---|
+| `get_timing_property(props, name)` | `Option<u64>` picoseconds | 195 |
+| `get_execution_time(props)` | `Option<u64>` — WCET from range | 214 |
+| `get_execution_time_range(props)` | `Option<(u64, u64)>` — **BCET, WCET** | 240 |
+| `get_processor_binding(props)` | `Option<String>` | existing |
+
+The BCET/WCET split accessor exists but is not used by `rta.rs` today —
+only the WCET path is exercised.
+
+## The gap: what classical RTA misses for IRQ-driven systems
+
+1. **No interrupt priority layer.** Every thread is treated as if the
+   highest task-priority level were the system's top priority. ISRs that
+   run at higher-than-task priority (all of them, on any real kernel)
+   are invisible to RTA — their CPU steal is silently ignored.
+2. **`Dispatch_Jitter` is parsed but never used** in the recurrence.
+3. **Time_Range collapsed to WCET.** We lose BCET entirely, which means
+   we can't compute slack band (WCET − BCET) or best-case response
+   time needed for some safety arguments.
+4. **No probe/trace awareness.** Today RTA can't say "this is the
+   response chain the user wants to verify with trace data in v0.8.0".
+
+## Target design: two-tier analysis
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│ Tier 1 — ISR layer (preempts everything)                    │
+│   For each processor p:                                     │
+│     U_isr(p) = Σ_{ISR i on p} ISR_exec_i / Period_i         │
+│     If U_isr(p) ≥ threshold → IsrOverloadedCpu diagnostic   │
+│   Residual_capacity(p) = 1 − U_isr(p)                       │
+├─────────────────────────────────────────────────────────────┤
+│ Tier 2 — Task RTA on inflated interference                  │
+│   For each thread i bound to processor p:                   │
+│     J_i = Dispatch_Jitter(i)   (0 if unset)                 │
+│     R_i(0) = C_i + J_i                                      │
+│     R_i(n+1) = C_i + J_i                                    │
+│              + Σ_j∈HP ⌈(R_i(n) + J_j) / T_j⌉ × C_j          │
+│              + IsrInterference(p, R_i(n))                   │
+│     where IsrInterference(p, R) = Σ_{ISR k on p} ⌈R/T_k⌉×I_k│
+├─────────────────────────────────────────────────────────────┤
+│ Chain view — for Sporadic-dispatched threads with an ISR    │
+│ upstream:                                                   │
+│   Total_response = Interrupt_Latency_Bound                  │
+│                  + ISR_Execution_Time                       │
+│                  + Dispatch_overhead                        │
+│                  + R_handler   (from Tier 2 above)          │
+└─────────────────────────────────────────────────────────────┘
+```
+
+### Why this shape
+
+Classical RTA assumes a single priority order. Hierarchical scheduling
+models the ISR layer as a higher-priority *server* that preempts tasks
+unconditionally — the cleanest extension of the existing fixed-point
+that preserves convergence properties. Adding ISR interference as an
+extra term inside the recurrence is equivalent to treating ISRs as
+highest-priority pseudo-tasks, for which the Lean convergence proof in
+`proofs/Proofs/Scheduling/RTA.lean` still holds (the proof is monotonic
+in the interference term).
+
+### Jitter handling
+
+The jittered response-time recurrence `R = C + J + Σ⌈(R+J)/T⌉·C` is the
+standard Tindell-Clark extension. Each jitter term inflates a task's
+own release window *and* every interfering task's release window. The
+Lean convergence proof needs a small extension (monotonic addition of J
+preserves least-fixed-point existence); sketched in commit 3 as
+`RTAJittered.lean`.
+
+### BCET/WCET split use
+
+For the response-time recurrence we continue to use WCET — BCET cannot
+produce a safe upper bound. However, two new informational outputs use
+both:
+
+- `Slack_Band = Deadline − R_wcet` (how much headroom over WCET path).
+- `Response_Band = (R_bcet, R_wcet)` where `R_bcet` uses BCET in place
+  of WCET for the target thread only (keeping interference at WCET).
+
+These feed v0.8.0's trace verification: observed response times outside
+`[R_bcet, R_wcet]` indicate the model's execution-time range is wrong.
+
+## New diagnostics
+
+Added to `crate::AnalysisDiagnostic` via new `DiagnosticKind` values or
+inline variants — follow whatever pattern `diagnostics.rs` uses today
+(the agent should read first).
+
+| Kind | Severity | When emitted |
+|---|---|---|
+| `IrqResponseBudget { event_port, predicted, deadline, slack }` | Info | Per Sporadic-dispatched thread with an ISR source — one per IRQ chain |
+| `IrqBudgetViolated { event_port, predicted, deadline }` | Error | IRQ chain total response > deadline |
+| `IsrOverloadedCpu { processor, utilization_pct, threshold_pct }` | Error | Per processor where U_isr ≥ threshold (default 0.3) — ISR saturation regardless of task RTA |
+| `MissingBottomHalfServer { isr_component }` | Warning | `ISR_Execution_Time` is set but no `Bottom_Half_Server` reference |
+| `ResponseBand { thread, bcet_response, wcet_response }` | Info | Per thread when BCET is available from Time_Range |
+
+Existing `response time <= deadline` Info diagnostic stays — now reflects
+the *residual-capacity-adjusted* RTA.
+
+## What counts as an ISR
+
+A component instance is treated as an ISR iff
+`Spar_Timing::ISR_Priority` is set. Two sub-cases:
+
+- ISR is a *thread* with dispatch protocol `Sporadic`: the thread body
+  is the ISR body, `ISR_Execution_Time` supersedes
+  `Compute_Execution_Time` for the tier-1 utilization calculation.
+- ISR is a *device* with `ISR_Execution_Time` set: the device models
+  the ISR; if `Bottom_Half_Server` references a thread, that thread is
+  the bottom-half handler whose classical RTA runs on residual capacity.
+
+The `Interrupt_Latency_Bound` is a **processor** property: the
+hardware + kernel ack latency. Added to the chain view regardless of
+per-ISR specifics.
+
+## Files touched (commit 2 scope)
+
+| File | Change | Approx LOC |
+|---|---|---|
+| `crates/spar-analysis/src/rta.rs` | Split analyze loop into tier-1 (ISR collection + utilization) and tier-2 (task RTA with IsrInterference term + jitter). Add diagnostic helpers. | +250 / −30 |
+| `crates/spar-analysis/src/property_accessors.rs` | Add `get_isr_priority`, `get_isr_execution_time`, `get_interrupt_latency_bound`, `get_bottom_half_server`, `get_dispatch_jitter` typed-first accessors. | +80 |
+| `crates/spar-analysis/src/scheduling_verified.rs` | Extend `compute_response_time` with an optional `isr_interference: u64` additive constant OR add `compute_response_time_jittered(…, jitter: u64, isr_overhead: u64)`. Prefer the latter: less disruptive to callers. | +30 |
+| `crates/spar-analysis/src/diagnostics.rs` | Add new `DiagnosticKind` variants (see table above). | +40 |
+| `crates/spar-analysis/src/rta.rs` (tests) | 8–10 new unit tests covering each algorithm branch; 3 fixture tests demonstrating the two-tier vs classical gap. | +300 |
+
+No changes outside `spar-analysis/`. No HIR changes needed — property
+surface already landed in #145.
+
+## Test plan
+
+**Unit tests in rta.rs**:
+
+1. `single_isr_reduces_task_capacity` — 1 CPU, 1 ISR (5% util), 1 task
+   (90% util): classical RTA passes, new RTA passes (task fits in 95%
+   residual); assert response time is *larger* than classical-RTA
+   result by the correct interference term.
+2. `overloaded_isr_fires_diagnostic` — 1 CPU, 3 ISRs totaling 35% util:
+   `IsrOverloadedCpu` fires at default threshold 30%.
+3. `dispatch_jitter_inflates_response` — Task with 50µs jitter on a
+   competing 1ms-period higher-priority thread produces response ≈ C +
+   ⌈(R+50µs)/1ms⌉·C'; compare to no-jitter baseline.
+4. `bcet_wcet_response_band` — Time_Range `Compute_Execution_Time =>
+   50us .. 200us` produces a `ResponseBand` diagnostic with both values.
+5. `irq_chain_total_response` — Sporadic thread with upstream ISR and
+   `Interrupt_Latency_Bound = 10us`, `ISR_Execution_Time = 20us..30us`:
+   total response = 10µs (latency) + 30µs (ISR) + R_handler.
+6. `missing_bottom_half_server_warning` — ISR_Execution_Time set
+   without Bottom_Half_Server: warning fires.
+7. `no_isrs_matches_classical_rta` — Regression: in models with no
+   `Spar_Timing::*` properties, new RTA output is byte-for-byte the
+   same as old RTA output. This is the critical non-regression gate.
+8. `multi_processor_isolation` — ISR on CPU1 does not inflate task
+   response on CPU2.
+9. `zero_jitter_matches_unjittered` — J=0 recurrence equals the
+   non-jittered recurrence.
+10. `isr_priority_above_all_tasks` — ISR with priority 99 cannot be
+    preempted by task with priority 0 (lower number).
+
+**Fixture tests** under `crates/spar-analysis/tests/fixtures/`:
+
+- `irq_brake_handler.aadl` — Representative: device triggers an event
+  port, ISR (20..30µs) fires, bottom-half handler thread (50..200µs
+  WCET) must complete within 1ms deadline. Expected outputs committed
+  as `irq_brake_handler.expected.json`.
+- `multi_isr_same_cpu.aadl` — Three interrupt sources on one CPU,
+  differing periods, showing total ISR utilization computation.
+- `jittered_chain.aadl` — Sporadic chain with MINT + jitter,
+  demonstrating Tindell-Clark extension effect.
+
+## Out of scope for commit 2
+
+- **Priority Inheritance Protocol / Priority Ceiling Protocol**. The
+  no-blocking assumption stays. PIP/PCP is v0.7.1. A TODO comment in
+  `rta.rs` cites AS5506D §5.4.4 and notes `Locking_Protocol` is not
+  yet modeled.
+- **Multi-processor ISR migration**. Each ISR is bound to one CPU.
+- **Cache-aware interference inflation.** Research topic; left for
+  v1.0+.
+- **Rate monotonic vs earliest deadline first split at tier 1.** Tier 1
+  is always FP (standard for ISRs). Tier 2 RTA uses whatever priority
+  assignment the model declares.
+
+## Commit message template
+
+```
+feat(rta): hierarchical IRQ-aware RTA with jitter and BCET/WCET split
+
+Extends response-time analysis with an ISR tier above the classical
+task-priority fixed-point. ISRs identified by Spar_Timing::ISR_Priority
+steal CPU capacity first; residual capacity feeds task RTA. Tindell-
+Clark jitter terms (Dispatch_Jitter) inflate the recurrence in the
+standard way. Compute_Execution_Time's Time_Range is consumed as
+(BCET, WCET) where only WCET enters the recurrence but both emerge in
+a ResponseBand diagnostic for v0.8.0 trace comparison.
+
+New diagnostics: IrqResponseBudget, IrqBudgetViolated,
+IsrOverloadedCpu, MissingBottomHalfServer, ResponseBand.
+
+Non-regression: models with no Spar_Timing::* properties produce
+byte-identical RTA output to the prior implementation.
+
+Out of scope (v0.7.1): priority inheritance / ceiling protocols;
+multi-processor ISR migration.
+
+Closes (partial) the IRQ-RTA acceptance criteria from the v0.7.0
+plan. Track A commit 2 of 4.
+
+Refs: Spar_Timing property surface from #145.
+
+Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
+```
+
+## Agent brief (executable section)
+
+When PR #145 merges, launch one agent with the following instructions.
+
+**Scope:** strictly this commit. Do not touch `spar-hir-def/`. Do not
+touch `spar-cli/`. Only `spar-analysis/`.
+
+**Steps:**
+
+1. Read `crates/spar-analysis/src/rta.rs` in full. Understand the
+   current flow.
+2. Read `crates/spar-analysis/src/scheduling_verified.rs` in full.
+   Decide: add an overload (`compute_response_time_jittered`) or
+   parameterize (`compute_response_time(..., jitter, isr_interference)`)
+   — the design note prefers the overload.
+3. Read `crates/spar-analysis/src/diagnostics.rs` to learn the
+   diagnostic-kind pattern. Add the five new variants.
+4. Read `crates/spar-analysis/src/property_accessors.rs` around lines
+   195–290 to see the typed-first / string-fallback idiom. Add the
+   new typed accessors following the same pattern.
+5. Modify `rta.rs`:
+   - Split `analyze()` into `collect_isrs_per_cpu()` and
+     `run_task_rta_per_cpu()`.
+   - Compute U_isr per CPU; emit `IsrOverloadedCpu` as needed.
+   - In task loop: fold jitter into the recurrence call; add
+     ISR interference term.
+   - For threads that are `Sporadic` and have an ISR upstream: emit
+     `IrqResponseBudget` or `IrqBudgetViolated`.
+   - Emit `ResponseBand` if BCET available.
+   - Preserve existing Info/Error "response time" diagnostic.
+6. Write 10 unit tests per the test plan. Each test should fit the
+   existing `TestBuilder` idiom in `rta.rs:240`.
+7. Add three `.aadl` fixtures + `.expected.json` under a new
+   `tests/fixtures/rta/` directory; register them in an integration
+   test file that loads each fixture, runs RTA, and diffs against
+   `.expected.json`.
+8. **Run the non-regression test first.** If existing RTA tests pass
+   before any Spar_Timing-using tests are added, you have the flow
+   right. If any existing test behavior changes, the default path
+   has regressed — fix before proceeding.
+9. Update `COMPLIANCE.md`:
+   - In "In progress / v0.7.0" add: "IRQ-aware RTA: hierarchical
+     two-tier analysis with Dispatch_Jitter and BCET/WCET split
+     landed; PIP/PCP blocking deferred to v0.7.1."
+10. Add rivet verification entries linking the new tests to
+    `REQ-TIMING-IRQ-{001,002,003}` from #145.
+11. Quality gates — all must pass before push:
+    - `cargo build --workspace`
+    - `cargo test -p spar-analysis`
+    - `cargo clippy --workspace --all-targets -- -D warnings`
+    - `cargo fmt --all -- --check`
+    - `rivet validate`
+12. Push `feat/v0.7.0-hierarchical-rta`; open PR using the commit
+    message above as the title.
+
+**Guardrails:**
+
+- Do not change the Lean theorems in `proofs/Proofs/Scheduling/`.
+  Commit 3 extends them separately.
+- Do not add MBPTA or probabilistic WCET anywhere. That is v1.0+.
+- Do not touch `spar-trace`, `spar-insight`, `spar-variants` — these
+  crates do not yet exist and are not part of this commit.
+- If a test fails because the model is ambiguous, write a
+  diagnostic, not a panic. The Lean-verified `compute_response_time`
+  already handles the diverged case; follow its pattern.
+
+## References
+
+- Tindell and Clark, *Holistic schedulability analysis for distributed
+  hard real-time systems*, Microprocessing and Microprogramming
+  40(2–3), 1994. (The jitter extension.)
+- AS5506D §5.4.4 *Thread Scheduling Properties*. (What AADL says about
+  priority and dispatch; what it doesn't say about IRQs.)
+- spar Lean proof `proofs/Proofs/Scheduling/RTA.lean` — fixed-point
+  convergence of the classical recurrence.
+- PR #145 — property surface this commit consumes.