Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
1e83310
Add comprehensive roadmap and technical documentation
claude Jan 2, 2026
80e672a
Add implementation plan and testing strategy documentation
claude Jan 2, 2026
80a7296
Add WebSocket bidirectional streaming for persistent GPU kernels
claude Jan 2, 2026
35a46cd
Add WgpuPersistentHandle for emulated GPU persistence
claude Jan 2, 2026
9c395a4
Implement Metal backend KernelHandleInner and K2K integration
claude Jan 2, 2026
e5e7da7
Add Metal K2K device-side structures for threadgroup messaging
claude Jan 2, 2026
58ca87a
Add ringkernel-ir crate for unified GPU code generation
claude Jan 2, 2026
bc264fb
Add CUDA lowering pass for IR to CUDA code generation
claude Jan 2, 2026
4335d6a
Add WGSL lowering pass for IR to WebGPU code generation
claude Jan 2, 2026
c1fd7b2
Add MSL lowering pass for IR to Metal code generation
claude Jan 2, 2026
95c7081
Add multi-backend GPU kernel proc macro with capability checking
claude Jan 2, 2026
3ee8172
Add kernel checkpointing infrastructure for persistent state snapshot…
claude Jan 3, 2026
4934120
Add GPU topology discovery, cross-GPU K2K routing, and kernel migration
claude Jan 3, 2026
2c899d8
Add observability infrastructure with OpenTelemetry, Prometheus, and …
claude Jan 3, 2026
1ef877c
Add health monitoring and resilience infrastructure
claude Jan 3, 2026
dd23e0f
Add checkpoint-migration integration and enhanced error handling
claude Jan 3, 2026
75b784c
Add unified configuration system for enterprise features
claude Jan 3, 2026
7e23318
Add RuntimeBuilder and RingKernelContext for unified runtime management
claude Jan 3, 2026
7238e40
Add lifecycle management and kernel migration on device unregister
claude Jan 3, 2026
103d10c
Add enterprise features example and integration tests
claude Jan 3, 2026
3c4904f
Rename RuntimeMetrics to ContextMetrics to avoid naming conflict
claude Jan 3, 2026
215a8c7
Update CLAUDE.md with enterprise features documentation
claude Jan 3, 2026
1b93ad5
Add enterprise enhancements: async monitoring, config files, ecosyste…
claude Jan 3, 2026
6a1c713
Add ringkernel-cli scaffolding tool and update roadmap tracking
claude Jan 3, 2026
7c53286
Add GraphQL subscriptions and WebGPU batched dispatch optimization
claude Jan 3, 2026
2d8c036
Update ROADMAP: mark multi-backend proc macro features as complete
claude Jan 3, 2026
eb6589d
Add optimization passes, fuzzing, SIMD, subgroup ops, and GPU profile…
claude Jan 5, 2026
c6c8485
Add interactive tutorials and Metal K2K halo exchange
claude Jan 5, 2026
d1a0e9f
Add CI GPU testing, mock module, GPU memory dashboard, and hot reload
claude Jan 5, 2026
c993eea
Complete roadmap to 100%: security, ML bridges, and developer tools
claude Jan 5, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
189 changes: 189 additions & 0 deletions .github/workflows/gpu-tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
name: GPU Tests

on:
# Manual trigger for GPU tests
workflow_dispatch:
inputs:
backend:
description: 'GPU backend to test'
required: true
default: 'all'
type: choice
options:
- all
- cuda
- wgpu
- metal
# Run on PRs with GPU label
pull_request:
types: [labeled]

env:
CARGO_TERM_COLOR: always
RUST_BACKTRACE: 1

jobs:
# CUDA GPU Tests - requires self-hosted runner with NVIDIA GPU
cuda-tests:
name: CUDA Tests
if: |
github.event_name == 'workflow_dispatch' &&
(github.event.inputs.backend == 'all' || github.event.inputs.backend == 'cuda')
|| (github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'gpu-test'))
runs-on: [self-hosted, gpu, cuda]
timeout-minutes: 30
steps:
- uses: actions/checkout@v4

- name: Setup Rust
uses: dtolnay/rust-toolchain@stable

- name: Check CUDA availability
run: |
nvidia-smi
nvcc --version

- name: Cache cargo
uses: Swatinem/rust-cache@v2
with:
shared-key: "gpu-cuda"

- name: Run CUDA codegen tests
run: cargo test -p ringkernel-cuda-codegen --all-features

- name: Run CUDA backend tests
run: cargo test -p ringkernel-cuda --features cuda

- name: Run GPU execution verification tests
run: cargo test -p ringkernel-cuda --test gpu_execution_verify --features cuda

- name: Run WaveSim3D GPU benchmark
run: |
cargo run -p ringkernel-wavesim3d --bin wavesim3d-benchmark --release --features cuda-codegen -- --quick
continue-on-error: true

- name: Run TxMon GPU benchmark
run: |
cargo run -p ringkernel-txmon --bin txmon-benchmark --release --features cuda-codegen -- --quick
continue-on-error: true

# WebGPU Tests - can run on any runner with Vulkan/DX12/Metal support
wgpu-tests:
name: WebGPU Tests
if: |
github.event_name == 'workflow_dispatch' &&
(github.event.inputs.backend == 'all' || github.event.inputs.backend == 'wgpu')
|| (github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'gpu-test'))
runs-on: [self-hosted, gpu]
timeout-minutes: 20
steps:
- uses: actions/checkout@v4

- name: Setup Rust
uses: dtolnay/rust-toolchain@stable

- name: Cache cargo
uses: Swatinem/rust-cache@v2
with:
shared-key: "gpu-wgpu"

- name: Run WGSL codegen tests
run: cargo test -p ringkernel-wgpu-codegen --all-features

- name: Run WebGPU backend tests
run: cargo test -p ringkernel-wgpu --features wgpu-tests -- --ignored
continue-on-error: true

# Metal Tests - macOS only
metal-tests:
name: Metal Tests
if: |
github.event_name == 'workflow_dispatch' &&
(github.event.inputs.backend == 'all' || github.event.inputs.backend == 'metal')
|| (github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'gpu-test'))
runs-on: macos-latest
timeout-minutes: 20
steps:
- uses: actions/checkout@v4

- name: Setup Rust
uses: dtolnay/rust-toolchain@stable

- name: Cache cargo
uses: Swatinem/rust-cache@v2
with:
shared-key: "gpu-metal"

- name: Check Metal availability
run: |
system_profiler SPDisplaysDataType | grep -i metal || echo "Metal info not available"

- name: Run Metal backend tests
run: cargo test -p ringkernel-metal --features metal
continue-on-error: true

- name: Build Metal examples
run: cargo build -p ringkernel --examples --features metal
continue-on-error: true

# CPU Backend GPU Mock Tests - runs on all platforms
cpu-mock-tests:
name: CPU Mock GPU Tests
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Setup Rust
uses: dtolnay/rust-toolchain@stable

- name: Cache cargo
uses: Swatinem/rust-cache@v2

- name: Run CPU backend tests (GPU mock)
run: cargo test -p ringkernel-cpu --all-features

- name: Run core tests with CPU backend
run: cargo test -p ringkernel-core --all-features

- name: Run ecosystem tests with CPU mock
run: cargo test -p ringkernel-ecosystem --features "persistent,actix,tower,axum,grpc"

# Performance baseline on CPU
benchmark-baseline:
name: Performance Baseline
runs-on: ubuntu-latest
if: github.event_name == 'workflow_dispatch'
steps:
- uses: actions/checkout@v4

- name: Setup Rust
uses: dtolnay/rust-toolchain@stable

- name: Cache cargo
uses: Swatinem/rust-cache@v2

- name: Run CPU benchmarks
run: cargo bench --package ringkernel -- --noplot --quick
continue-on-error: true

- name: Run WaveSim CPU benchmark
run: cargo run -p ringkernel-wavesim --example benchmark --release -- --quick
continue-on-error: true

# Summary report
summary:
name: Test Summary
needs: [cuda-tests, wgpu-tests, metal-tests, cpu-mock-tests]
if: always()
runs-on: ubuntu-latest
steps:
- name: Report Status
run: |
echo "## GPU Test Results" >> $GITHUB_STEP_SUMMARY
echo "" >> $GITHUB_STEP_SUMMARY
echo "| Backend | Status |" >> $GITHUB_STEP_SUMMARY
echo "|---------|--------|" >> $GITHUB_STEP_SUMMARY
echo "| CUDA | ${{ needs.cuda-tests.result }} |" >> $GITHUB_STEP_SUMMARY
echo "| WebGPU | ${{ needs.wgpu-tests.result }} |" >> $GITHUB_STEP_SUMMARY
echo "| Metal | ${{ needs.metal-tests.result }} |" >> $GITHUB_STEP_SUMMARY
echo "| CPU Mock | ${{ needs.cpu-mock-tests.result }} |" >> $GITHUB_STEP_SUMMARY
59 changes: 59 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,12 @@ cargo test -p ringkernel-ecosystem --features "persistent,actix,tower,axum,grpc"

# Run ecosystem example (Axum REST API)
cargo run -p ringkernel-ecosystem --example axum_persistent_api --features "axum,persistent"

# RingKernel CLI tool
cargo run -p ringkernel-cli -- new my-app --template persistent-actor
cargo run -p ringkernel-cli -- codegen src/kernels/mod.rs --backend cuda,wgsl
cargo run -p ringkernel-cli -- check --backends all
cargo run -p ringkernel-cli -- init --backends cuda
```

## Architecture
Expand All @@ -95,6 +101,7 @@ The project is a Cargo workspace with these crates:
- **`ringkernel-cuda-codegen`** - Rust-to-CUDA transpiler for writing GPU kernels in Rust DSL
- **`ringkernel-wgpu-codegen`** - Rust-to-WGSL transpiler for writing GPU kernels in Rust DSL (WebGPU backend)
- **`ringkernel-ecosystem`** - Ecosystem integrations with **persistent GPU actor support** (Actix `GpuPersistentActor`, Axum REST/SSE, Tower `PersistentKernelService`, gRPC streaming)
- **`ringkernel-cli`** - CLI tool for project scaffolding, kernel code generation, and compatibility checking
- **`ringkernel-audio-fft`** - Example application: GPU-accelerated audio FFT processing
- **`ringkernel-wavesim`** - Example application: 2D acoustic wave simulation with GPU-accelerated FDTD, tile-based ring kernel actors, and educational simulation modes
- **`ringkernel-wavesim3d`** - Example application: 3D acoustic wave simulation with binaural audio, **persistent GPU actors** (H2K/K2H messaging, K2K halo exchange, cooperative groups), and volumetric ray marching visualization
Expand All @@ -114,6 +121,58 @@ The project is a Cargo workspace with these crates:
- **`K2KBroker`/`K2KEndpoint`** - Kernel-to-kernel direct messaging
- **`PubSubBroker`** - Topic-based publish/subscribe with wildcards

### Enterprise Features (in ringkernel-core)

The following enterprise-grade features provide production-ready infrastructure:

- **`RingKernelContext`** - Unified runtime managing all enterprise features
- **`RuntimeBuilder`** - Fluent builder with `development()`, `production()`, `high_performance()` presets
- **`ConfigBuilder`** - Unified configuration system with nested builders

**Health & Resilience:**
- **`HealthChecker`** - Liveness/readiness probes with async health checks
- **`CircuitBreaker`** - Fault tolerance with automatic recovery
- **`DegradationManager`** - Graceful degradation with 5 levels (Normal → Critical)
- **`KernelWatchdog`** - Stale kernel detection with heartbeat monitoring

**Observability:**
- **`PrometheusExporter`** - Prometheus metrics export
- **`ObservabilityContext`** - Distributed tracing with spans

**Multi-GPU:**
- **`MultiGpuCoordinator`** - Device selection with load balancing strategies
- **`KernelMigrator`** - Live kernel migration between GPUs using checkpoints
- **`GpuTopology`** - NVLink/PCIe topology discovery

**Lifecycle:**
- **`LifecycleState`** - Initializing → Running → Draining → ShuttingDown → Stopped
- **`ShutdownReport`** - Final statistics on graceful shutdown

```rust
// Enterprise runtime usage
use ringkernel_core::prelude::*;

let runtime = RuntimeBuilder::new()
.production() // or .development() or .high_performance()
.build()?;

runtime.start()?; // Transition to Running state

// Run health monitoring
let result = runtime.run_health_check_cycle();
println!("Health: {:?}, Circuit: {:?}", result.status, result.circuit_state);

// Circuit breaker protection
let guard = CircuitGuard::new(&runtime, "operation");
guard.execute(|| { /* protected operation */ })?;

// Graceful shutdown
let report = runtime.complete_shutdown()?;
println!("Uptime: {:?}", report.total_uptime);
```

Run the enterprise demo: `cargo run -p ringkernel --example enterprise_runtime`

### Backend System

Backends implement `RingKernelRuntime` trait. Selection via features:
Expand Down
Loading
Loading