Sandermage / genesis-vllm-patches Star 47 Code Issues Pull requests Discussions Runtime patches for vLLM — Qwen3.6 (27B int4 / 35B-A3B FP8) on consumer Ampere. 50+ patches: TurboQuant KV, MTP / DFlash / ngram spec-decode, FULL cudagraph, 256K-320K context. v7.64: P67 non-pow-2 GQA, Cliff 1 fix, 6 docs (FAQ/HARDWARE/CONFIGS/CLIFFS), Genesis Compat Layer. cuda nvidia moe gdn ampere structured-output long-context fp8 vllm llm-inference qwen speculative-decoding tool-calling qwen3 rtx-3090 runtime-patches dflash turboquant ampere-sm86 rtx-a5000 Updated May 2, 2026 Python