Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
319 changes: 319 additions & 0 deletions docs/designs/v0.7.0-hierarchical-rta.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,319 @@
# Design: Hierarchical IRQ-aware RTA (v0.7.0 Track A, commit 2)

Status: **proposed** — implementation waits on PR #145 merge.
Last update: 2026-04-23.
Preceded by: PR #145 (property set surface — `Spar_Timing::*`, `Spar_Trace::*`).
Followed by: commit 3 (`Spar_Proofs/Scheduling/RTAJittered.lean`), commit 4 (integration
tests + COMPLIANCE).

## Context: what RTA does today

`crates/spar-analysis/src/rta.rs` implements exact response-time analysis
for fixed-priority preemptive scheduling (`RtaAnalysis` at line 34). For
each processor, threads are sorted by priority and the fixed-point

```text
R(0) = C_i
R(n+1) = C_i + Σ_j ⌈R(n)/T_j⌉ × C_j (all higher-priority threads j)
```

is computed by the Lean4-verified
`scheduling_verified::compute_response_time(exec_ps, deadline_ps, &higher_priority)`
(called at `rta.rs:121`). Priority ordering: explicit
`Deployment_Properties::Priority` first (lower = higher), then shorter
period (RM ordering). If response time exceeds `Deadline` (falls back to
`Period`), an `Error` diagnostic is emitted; otherwise `Info`.

Property accessors already available in
`crates/spar-analysis/src/property_accessors.rs`:

| Function | Returns | Line |
|---|---|---|
| `get_timing_property(props, name)` | `Option<u64>` picoseconds | 195 |
| `get_execution_time(props)` | `Option<u64>` — WCET from range | 214 |
| `get_execution_time_range(props)` | `Option<(u64, u64)>` — **BCET, WCET** | 240 |
| `get_processor_binding(props)` | `Option<String>` | existing |

The BCET/WCET split accessor exists but is not used by `rta.rs` today —
only the WCET path is exercised.

## The gap: what classical RTA misses for IRQ-driven systems

1. **No interrupt priority layer.** Every thread is treated as if the
highest task-priority level were the system's top priority. ISRs that
run at higher-than-task priority (all of them, on any real kernel)
are invisible to RTA — their CPU steal is silently ignored.
2. **`Dispatch_Jitter` is parsed but never used** in the recurrence.
3. **Time_Range collapsed to WCET.** We lose BCET entirely, which means
we can't compute slack band (WCET − BCET) or best-case response
time needed for some safety arguments.
4. **No probe/trace awareness.** Today RTA can't say "this is the
response chain the user wants to verify with trace data in v0.8.0".

## Target design: two-tier analysis

```
┌─────────────────────────────────────────────────────────────┐
│ Tier 1 — ISR layer (preempts everything) │
│ For each processor p: │
│ U_isr(p) = Σ_{ISR i on p} ISR_exec_i / Period_i │
│ If U_isr(p) ≥ threshold → IsrOverloadedCpu diagnostic │
│ Residual_capacity(p) = 1 − U_isr(p) │
├─────────────────────────────────────────────────────────────┤
│ Tier 2 — Task RTA on inflated interference │
│ For each thread i bound to processor p: │
│ J_i = Dispatch_Jitter(i) (0 if unset) │
│ R_i(0) = C_i + J_i │
│ R_i(n+1) = C_i + J_i │
│ + Σ_j∈HP ⌈(R_i(n) + J_j) / T_j⌉ × C_j │
│ + IsrInterference(p, R_i(n)) │
│ where IsrInterference(p, R) = Σ_{ISR k on p} ⌈R/T_k⌉×I_k│
├─────────────────────────────────────────────────────────────┤
│ Chain view — for Sporadic-dispatched threads with an ISR │
│ upstream: │
│ Total_response = Interrupt_Latency_Bound │
│ + ISR_Execution_Time │
│ + Dispatch_overhead │
│ + R_handler (from Tier 2 above) │
└─────────────────────────────────────────────────────────────┘
```

### Why this shape

Classical RTA assumes a single priority order. Hierarchical scheduling
models the ISR layer as a higher-priority *server* that preempts tasks
unconditionally — the cleanest extension of the existing fixed-point
that preserves convergence properties. Adding ISR interference as an
extra term inside the recurrence is equivalent to treating ISRs as
highest-priority pseudo-tasks, for which the Lean convergence proof in
`proofs/Proofs/Scheduling/RTA.lean` still holds (the proof is monotonic
in the interference term).

### Jitter handling

The jittered response-time recurrence `R = C + J + Σ⌈(R+J)/T⌉·C` is the
standard Tindell-Clark extension. Each jitter term inflates a task's
own release window *and* every interfering task's release window. The
Lean convergence proof needs a small extension (monotonic addition of J
preserves least-fixed-point existence); sketched in commit 3 as
`RTAJittered.lean`.

### BCET/WCET split use

For the response-time recurrence we continue to use WCET — BCET cannot
produce a safe upper bound. However, two new informational outputs use
both:

- `Slack_Band = Deadline − R_wcet` (how much headroom over WCET path).
- `Response_Band = (R_bcet, R_wcet)` where `R_bcet` uses BCET in place
of WCET for the target thread only (keeping interference at WCET).

These feed v0.8.0's trace verification: observed response times outside
`[R_bcet, R_wcet]` indicate the model's execution-time range is wrong.

## New diagnostics

Added to `crate::AnalysisDiagnostic` via new `DiagnosticKind` values or
inline variants — follow whatever pattern `diagnostics.rs` uses today
(the agent should read first).

| Kind | Severity | When emitted |
|---|---|---|
| `IrqResponseBudget { event_port, predicted, deadline, slack }` | Info | Per Sporadic-dispatched thread with an ISR source — one per IRQ chain |
| `IrqBudgetViolated { event_port, predicted, deadline }` | Error | IRQ chain total response > deadline |
| `IsrOverloadedCpu { processor, utilization_pct, threshold_pct }` | Error | Per processor where U_isr ≥ threshold (default 0.3) — ISR saturation regardless of task RTA |
| `MissingBottomHalfServer { isr_component }` | Warning | `ISR_Execution_Time` is set but no `Bottom_Half_Server` reference |
| `ResponseBand { thread, bcet_response, wcet_response }` | Info | Per thread when BCET is available from Time_Range |

Existing `response time <= deadline` Info diagnostic stays — now reflects
the *residual-capacity-adjusted* RTA.

## What counts as an ISR

A component instance is treated as an ISR iff
`Spar_Timing::ISR_Priority` is set. Two sub-cases:

- ISR is a *thread* with dispatch protocol `Sporadic`: the thread body
is the ISR body, `ISR_Execution_Time` supersedes
`Compute_Execution_Time` for the tier-1 utilization calculation.
- ISR is a *device* with `ISR_Execution_Time` set: the device models
the ISR; if `Bottom_Half_Server` references a thread, that thread is
the bottom-half handler whose classical RTA runs on residual capacity.

The `Interrupt_Latency_Bound` is a **processor** property: the
hardware + kernel ack latency. Added to the chain view regardless of
per-ISR specifics.

## Files touched (commit 2 scope)

| File | Change | Approx LOC |
|---|---|---|
| `crates/spar-analysis/src/rta.rs` | Split analyze loop into tier-1 (ISR collection + utilization) and tier-2 (task RTA with IsrInterference term + jitter). Add diagnostic helpers. | +250 / −30 |
| `crates/spar-analysis/src/property_accessors.rs` | Add `get_isr_priority`, `get_isr_execution_time`, `get_interrupt_latency_bound`, `get_bottom_half_server`, `get_dispatch_jitter` typed-first accessors. | +80 |
| `crates/spar-analysis/src/scheduling_verified.rs` | Extend `compute_response_time` with an optional `isr_interference: u64` additive constant OR add `compute_response_time_jittered(…, jitter: u64, isr_overhead: u64)`. Prefer the latter: less disruptive to callers. | +30 |
| `crates/spar-analysis/src/diagnostics.rs` | Add new `DiagnosticKind` variants (see table above). | +40 |
| `crates/spar-analysis/src/rta.rs` (tests) | 8–10 new unit tests covering each algorithm branch; 3 fixture tests demonstrating the two-tier vs classical gap. | +300 |

No changes outside `spar-analysis/`. No HIR changes needed — property
surface already landed in #145.

## Test plan

**Unit tests in rta.rs**:

1. `single_isr_reduces_task_capacity` — 1 CPU, 1 ISR (5% util), 1 task
(90% util): classical RTA passes, new RTA passes (task fits in 95%
residual); assert response time is *larger* than classical-RTA
result by the correct interference term.
2. `overloaded_isr_fires_diagnostic` — 1 CPU, 3 ISRs totaling 35% util:
`IsrOverloadedCpu` fires at default threshold 30%.
3. `dispatch_jitter_inflates_response` — Task with 50µs jitter on a
competing 1ms-period higher-priority thread produces response ≈ C +
⌈(R+50µs)/1ms⌉·C'; compare to no-jitter baseline.
4. `bcet_wcet_response_band` — Time_Range `Compute_Execution_Time =>
50us .. 200us` produces a `ResponseBand` diagnostic with both values.
5. `irq_chain_total_response` — Sporadic thread with upstream ISR and
`Interrupt_Latency_Bound = 10us`, `ISR_Execution_Time = 20us..30us`:
total response = 10µs (latency) + 30µs (ISR) + R_handler.
6. `missing_bottom_half_server_warning` — ISR_Execution_Time set
without Bottom_Half_Server: warning fires.
7. `no_isrs_matches_classical_rta` — Regression: in models with no
`Spar_Timing::*` properties, new RTA output is byte-for-byte the
same as old RTA output. This is the critical non-regression gate.
8. `multi_processor_isolation` — ISR on CPU1 does not inflate task
response on CPU2.
9. `zero_jitter_matches_unjittered` — J=0 recurrence equals the
non-jittered recurrence.
10. `isr_priority_above_all_tasks` — ISR with priority 99 cannot be
preempted by task with priority 0 (lower number).

**Fixture tests** under `crates/spar-analysis/tests/fixtures/`:

- `irq_brake_handler.aadl` — Representative: device triggers an event
port, ISR (20..30µs) fires, bottom-half handler thread (50..200µs
WCET) must complete within 1ms deadline. Expected outputs committed
as `irq_brake_handler.expected.json`.
- `multi_isr_same_cpu.aadl` — Three interrupt sources on one CPU,
differing periods, showing total ISR utilization computation.
- `jittered_chain.aadl` — Sporadic chain with MINT + jitter,
demonstrating Tindell-Clark extension effect.

## Out of scope for commit 2

- **Priority Inheritance Protocol / Priority Ceiling Protocol**. The
no-blocking assumption stays. PIP/PCP is v0.7.1. A TODO comment in
`rta.rs` cites AS5506D §5.4.4 and notes `Locking_Protocol` is not
yet modeled.
- **Multi-processor ISR migration**. Each ISR is bound to one CPU.
- **Cache-aware interference inflation.** Research topic; left for
v1.0+.
- **Rate monotonic vs earliest deadline first split at tier 1.** Tier 1
is always FP (standard for ISRs). Tier 2 RTA uses whatever priority
assignment the model declares.

## Commit message template

```
feat(rta): hierarchical IRQ-aware RTA with jitter and BCET/WCET split

Extends response-time analysis with an ISR tier above the classical
task-priority fixed-point. ISRs identified by Spar_Timing::ISR_Priority
steal CPU capacity first; residual capacity feeds task RTA. Tindell-
Clark jitter terms (Dispatch_Jitter) inflate the recurrence in the
standard way. Compute_Execution_Time's Time_Range is consumed as
(BCET, WCET) where only WCET enters the recurrence but both emerge in
a ResponseBand diagnostic for v0.8.0 trace comparison.

New diagnostics: IrqResponseBudget, IrqBudgetViolated,
IsrOverloadedCpu, MissingBottomHalfServer, ResponseBand.

Non-regression: models with no Spar_Timing::* properties produce
byte-identical RTA output to the prior implementation.

Out of scope (v0.7.1): priority inheritance / ceiling protocols;
multi-processor ISR migration.

Closes (partial) the IRQ-RTA acceptance criteria from the v0.7.0
plan. Track A commit 2 of 4.

Refs: Spar_Timing property surface from #145.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
```

## Agent brief (executable section)

When PR #145 merges, launch one agent with the following instructions.

**Scope:** strictly this commit. Do not touch `spar-hir-def/`. Do not
touch `spar-cli/`. Only `spar-analysis/`.

**Steps:**

1. Read `crates/spar-analysis/src/rta.rs` in full. Understand the
current flow.
2. Read `crates/spar-analysis/src/scheduling_verified.rs` in full.
Decide: add an overload (`compute_response_time_jittered`) or
parameterize (`compute_response_time(..., jitter, isr_interference)`)
— the design note prefers the overload.
3. Read `crates/spar-analysis/src/diagnostics.rs` to learn the
diagnostic-kind pattern. Add the five new variants.
4. Read `crates/spar-analysis/src/property_accessors.rs` around lines
195–290 to see the typed-first / string-fallback idiom. Add the
new typed accessors following the same pattern.
5. Modify `rta.rs`:
- Split `analyze()` into `collect_isrs_per_cpu()` and
`run_task_rta_per_cpu()`.
- Compute U_isr per CPU; emit `IsrOverloadedCpu` as needed.
- In task loop: fold jitter into the recurrence call; add
ISR interference term.
- For threads that are `Sporadic` and have an ISR upstream: emit
`IrqResponseBudget` or `IrqBudgetViolated`.
- Emit `ResponseBand` if BCET available.
- Preserve existing Info/Error "response time" diagnostic.
6. Write 10 unit tests per the test plan. Each test should fit the
existing `TestBuilder` idiom in `rta.rs:240`.
7. Add three `.aadl` fixtures + `.expected.json` under a new
`tests/fixtures/rta/` directory; register them in an integration
test file that loads each fixture, runs RTA, and diffs against
`.expected.json`.
8. **Run the non-regression test first.** If existing RTA tests pass
before any Spar_Timing-using tests are added, you have the flow
right. If any existing test behavior changes, the default path
has regressed — fix before proceeding.
9. Update `COMPLIANCE.md`:
- In "In progress / v0.7.0" add: "IRQ-aware RTA: hierarchical
two-tier analysis with Dispatch_Jitter and BCET/WCET split
landed; PIP/PCP blocking deferred to v0.7.1."
10. Add rivet verification entries linking the new tests to
`REQ-TIMING-IRQ-{001,002,003}` from #145.
11. Quality gates — all must pass before push:
- `cargo build --workspace`
- `cargo test -p spar-analysis`
- `cargo clippy --workspace --all-targets -- -D warnings`
- `cargo fmt --all -- --check`
- `rivet validate`
12. Push `feat/v0.7.0-hierarchical-rta`; open PR using the commit
message above as the title.

**Guardrails:**

- Do not change the Lean theorems in `proofs/Proofs/Scheduling/`.
Commit 3 extends them separately.
- Do not add MBPTA or probabilistic WCET anywhere. That is v1.0+.
- Do not touch `spar-trace`, `spar-insight`, `spar-variants` — these
crates do not yet exist and are not part of this commit.
- If a test fails because the model is ambiguous, write a
diagnostic, not a panic. The Lean-verified `compute_response_time`
already handles the diverged case; follow its pattern.

## References

- Tindell and Clark, *Holistic schedulability analysis for distributed
hard real-time systems*, Microprocessing and Microprogramming
40(2–3), 1994. (The jitter extension.)
- AS5506D §5.4.4 *Thread Scheduling Properties*. (What AADL says about
priority and dispatch; what it doesn't say about IRQs.)
- spar Lean proof `proofs/Proofs/Scheduling/RTA.lean` — fixed-point
convergence of the classical recurrence.
- PR #145 — property surface this commit consumes.
Loading