From ec1768578f33a5bcc821f70775e1e1a288314660 Mon Sep 17 00:00:00 2001 From: functionstackx <47992694+functionstackx@users.noreply.github.com> Date: Mon, 19 Jan 2026 16:32:49 -0500 Subject: [PATCH] inital experimental folder code --- experimental/.gitignore | 1 + experimental/README.md | 5 +++++ experimental/multiturn/README.md | 14 ++++++++++++++ 3 files changed, 20 insertions(+) create mode 100644 experimental/.gitignore create mode 100644 experimental/README.md create mode 100644 experimental/multiturn/README.md diff --git a/experimental/.gitignore b/experimental/.gitignore new file mode 100644 index 000000000..735d7060f --- /dev/null +++ b/experimental/.gitignore @@ -0,0 +1 @@ +rocm-libraries/ \ No newline at end of file diff --git a/experimental/README.md b/experimental/README.md new file mode 100644 index 000000000..f39dfc4af --- /dev/null +++ b/experimental/README.md @@ -0,0 +1,5 @@ +# Experimental + +This folder contains experimental WIP code that is mostly Claude Code generated. + +**Warning:** Code in this directory is very basic and likely contains errors or incomplete implementations. It is not intended for production use or as part of the official InferenceMAX results. diff --git a/experimental/multiturn/README.md b/experimental/multiturn/README.md new file mode 100644 index 000000000..358b53991 --- /dev/null +++ b/experimental/multiturn/README.md @@ -0,0 +1,14 @@ +## Experimental WIP: Multi turn with/without CPU KVCache Offloading + +lit review +- https://lmsys.org/blog/2025-09-10-sglang-hicache/ +- sglang calls GPU HBM as (L1) and CPU DRAM as (L2) +- https://lmsys.org/images/blog/hicache/mooncake_benchmark.png +- single turn long context Q&A https://arxiv.org/abs/2311.04939 (seems more like an shared prefix style similar to cascade attention (pre cursor to sglang radix attention )) https://flashinfer.ai/2024/02/02/cascade-inference.html +- Production Alibiba Multi turn dataset https://arxiv.org/abs/2506.02634 (seem to not provide the acutal prompts and outputs tho, more just prompt lengths and output lengths, etc.) +- sglang synthetic multi turn benchmark script here https://github.com/sgl-project/sglang/tree/main/benchmark/hicache +- interestingly sglang blog simulates PD disagg via just setting OSL as 1 +```bash +python3 benchmark/hicache/bench_multiturn.py --model-path $MODEL_PATH --disable-random-sample \ +--output-length 1 --request-length 2048 \ # simulate P-D disaggregation +``` \ No newline at end of file