From ec1768578f33a5bcc821f70775e1e1a288314660 Mon Sep 17 00:00:00 2001
From: functionstackx <47992694+functionstackx@users.noreply.github.com>
Date: Mon, 19 Jan 2026 16:32:49 -0500
Subject: [PATCH] inital experimental folder code

---
 experimental/.gitignore          |  1 +
 experimental/README.md           |  5 +++++
 experimental/multiturn/README.md | 14 ++++++++++++++
 3 files changed, 20 insertions(+)
 create mode 100644 experimental/.gitignore
 create mode 100644 experimental/README.md
 create mode 100644 experimental/multiturn/README.md

diff --git a/experimental/.gitignore b/experimental/.gitignore
new file mode 100644
index 000000000..735d7060f
--- /dev/null
+++ b/experimental/.gitignore
@@ -0,0 +1 @@
+rocm-libraries/
\ No newline at end of file
diff --git a/experimental/README.md b/experimental/README.md
new file mode 100644
index 000000000..f39dfc4af
--- /dev/null
+++ b/experimental/README.md
@@ -0,0 +1,5 @@
+# Experimental
+
+This folder contains experimental WIP code that is mostly Claude Code generated.
+
+**Warning:** Code in this directory is very basic and likely contains errors or incomplete implementations. It is not intended for production use or as part of the official InferenceMAX results.
diff --git a/experimental/multiturn/README.md b/experimental/multiturn/README.md
new file mode 100644
index 000000000..358b53991
--- /dev/null
+++ b/experimental/multiturn/README.md
@@ -0,0 +1,14 @@
+## Experimental WIP: Multi turn with/without CPU KVCache Offloading
+
+lit review
+- https://lmsys.org/blog/2025-09-10-sglang-hicache/
+-  sglang calls GPU HBM as (L1) and CPU DRAM as (L2)
+- https://lmsys.org/images/blog/hicache/mooncake_benchmark.png
+- single turn long context Q&A  https://arxiv.org/abs/2311.04939 (seems more like an shared prefix style similar to cascade attention (pre cursor to sglang radix attention )) https://flashinfer.ai/2024/02/02/cascade-inference.html
+- Production Alibiba Multi turn dataset https://arxiv.org/abs/2506.02634 (seem to not provide the acutal prompts and outputs tho, more just prompt lengths and output lengths, etc.)
+- sglang synthetic multi turn benchmark script here https://github.com/sgl-project/sglang/tree/main/benchmark/hicache
+- interestingly sglang blog simulates PD disagg via just setting OSL as 1
+```bash
+python3 benchmark/hicache/bench_multiturn.py --model-path $MODEL_PATH --disable-random-sample \
+--output-length 1 --request-length 2048 \ # simulate P-D disaggregation
+```
\ No newline at end of file