You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
HIP/ROCm fork of llama.cpp optimized for AMD gfx1030/RDNA2 architecture with support for PrismML's Bonsai Q1_0_G128 '1-bit' models, TurboQuant TQ3_0 KV cache, RotorQuant (iso and planar), and EAGLE3 speculative decoding.
ROCm/HIP fork of SGLang with TurboQuant tq2/tq3/tq4 KV cache, RotorQuant (iso and planar), Triton and radix-cache serving, EAGLE3 speculative decoding, P-EAGLE checkpoint support, and PrismML Bonsai 1-bit GGUF compatibility on gfx1030/RDNA2.