Prerequisites
Feature Description
Optimization Brief: AMD Strix Halo (RDNA 3.5) Hardware Enhancements
Now that the Vulkan Sparse Allocator is stable and the upstream Qwen3.5 caching bug is resolved, we have an unconstrained dual-model engine. The next phase is optimizing llama.cpp to fully exploit the RDNA 3.5 Unified Memory Architecture (UMA) for our agent harness.
Please review these four hardware-level enhancements, ranked by implementation effort, and prepare to integrate them.
4. Transition to the Windows HIP (ROCm) SDK
The Target: Vulkan is a graphics API running translated SPIR-V compute shaders. AMD’s native compute language is HIP. Compiling llama.cpp on Windows with the HIP SDK links the engine directly to hipBLAS/rocBLAS, which contain hand-tuned, assembly-level microkernels written specifically for RDNA 3.5 architecture, yielding mathematically perfect register allocation.
Motivation
Effort: Very High (9/10)
Expected Improvement: 15% - 25% overall token generation throughput uplift (target: ~46-50 t/s).
Possible Implementation
The Implementation:
This requires completely swapping the build toolchain. You must bypass the Vulkan CMake parameters and compile exclusively for HIP.
Prerequisites for the environment:
- Install AMD HIP SDK for Windows.
- Install Strawberry Perl (required for the
hipcc compiler wrapper).
- Ensure MSVC and Ninja are in the PATH.
The Build Script:
# Set compiler paths explicitly to the HIP SDK LLVM toolchain
$env:CC = "C:\Program Files\AMD\ROCm\5.7\bin\clang.exe"
$env:CXX = "C:\Program Files\AMD\ROCm\5.7\bin\clang++.exe"
$env:HIP_PATH = "C:\Program Files\AMD\ROCm\5.7"
# Note: gfx1151 is the specific microarchitecture target for Strix Halo / RDNA 3.5
cmake -G "Ninja Multi-Config" -S . -B build_hip `
-DCMAKE_BUILD_TYPE=Release `
-DLLAMA_HIPBLAS=ON `
-DAMDGPU_TARGETS="gfx1151" `
-DCMAKE_C_COMPILER=$env:CC `
-DCMAKE_CXX_COMPILER=$env:CXX
cmake --build build_hip --config Release
Prerequisites
Feature Description
Optimization Brief: AMD Strix Halo (RDNA 3.5) Hardware Enhancements
Now that the Vulkan Sparse Allocator is stable and the upstream Qwen3.5 caching bug is resolved, we have an unconstrained dual-model engine. The next phase is optimizing
llama.cppto fully exploit the RDNA 3.5 Unified Memory Architecture (UMA) for our agent harness.Please review these four hardware-level enhancements, ranked by implementation effort, and prepare to integrate them.
4. Transition to the Windows HIP (ROCm) SDK
The Target: Vulkan is a graphics API running translated SPIR-V compute shaders. AMD’s native compute language is HIP. Compiling
llama.cppon Windows with the HIP SDK links the engine directly tohipBLAS/rocBLAS, which contain hand-tuned, assembly-level microkernels written specifically for RDNA 3.5 architecture, yielding mathematically perfect register allocation.Motivation
Effort: Very High (9/10)
Expected Improvement: 15% - 25% overall token generation throughput uplift (target: ~46-50 t/s).
Possible Implementation
The Implementation:
This requires completely swapping the build toolchain. You must bypass the Vulkan CMake parameters and compile exclusively for HIP.
Prerequisites for the environment:
hipcccompiler wrapper).The Build Script: