Adds a Jupyter notebook tutorial#213
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a ROCm-focused Jupyter notebook tutorial to demonstrate using amd-flashinfer (module flashinfer) for runtime validation (hip_utils), AITER-backed prefill attention, and logits_processor pipelines, plus a helper script to launch JupyterLab from the repo.
Changes:
- Document the new tutorial notebook and Jupyter launcher in the README examples list.
- Add
examples/run_jupyter_server.shto start JupyterLab from the repo root and auto-installjupyterlabif missing. - Add
examples/amd_flashinfer_rocm_tutorial.ipynbwith end-to-end runnable tutorial cells (runtime checks, prefill, logits processing).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| README.md | Adds new “Available examples” entries for the tutorial notebook and Jupyter launcher. |
| examples/run_jupyter_server.sh | Introduces a convenience script to launch JupyterLab from the repository root. |
| examples/amd_flashinfer_rocm_tutorial.ipynb | New tutorial notebook covering ROCm environment verification, AITER-backed prefill, and LogitsPipe usage. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| PORT="${JUPYTER_PORT:-8888}" | ||
| IP="${JUPYTER_IP:-0.0.0.0}" | ||
|
|
||
| echo "Starting JupyterLab from: $ROOT" | ||
| echo " URL: http://127.0.0.1:${PORT}/lab (use SSH -L if remote)" | ||
| echo " Stop: Ctrl+C" |
There was a problem hiding this comment.
The script defaults to --ip=0.0.0.0, which binds JupyterLab on all network interfaces. This is risky on shared machines or when --network=host is used because it can expose the server beyond localhost; prefer defaulting to 127.0.0.1 and require users to explicitly set JUPYTER_IP=0.0.0.0 (or add a prominent warning) when they intend remote access with SSH port-forwarding.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "def reconstruct_seq_from_paged_nhd(kv_tensor, kv_ip, kv_lpl, seq_idx, kv_slot):\n", | ||
| " chunks = []\n", | ||
| " start = int(kv_ip[seq_idx].item())\n", | ||
| " end = int(kv_ip[seq_idx + 1].item())\n", | ||
| " last_tokens = int(kv_lpl[seq_idx].item())\n", | ||
| " for p in range(start, end - 1):\n", | ||
| " chunks.append(kv_tensor[p, kv_slot, :, :, :].reshape(-1, num_kv_heads, head_dim))\n", | ||
| " p_last = end - 1\n", | ||
| " chunks.append(\n", | ||
| " kv_tensor[p_last, kv_slot, :last_tokens, :, :].reshape(-1, num_kv_heads, head_dim)\n", | ||
| " )\n", | ||
| " return torch.cat(chunks, dim=0)\n", | ||
| "\n", | ||
| "\n", | ||
| "k0 = reconstruct_seq_from_paged_nhd(kv_data, kv_indptr, kv_last_page_len, 0, 0)\n", | ||
| "v0 = reconstruct_seq_from_paged_nhd(kv_data, kv_indptr, kv_last_page_len, 0, 1)\n", |
There was a problem hiding this comment.
reconstruct_seq_from_paged_nhd reconstructs pages using the raw page index p from kv_indptr, but it ignores kv_indices (the indirection table). This will reconstruct the wrong sequence whenever kv_indices is not the identity mapping (which is the common case in real paged KV caches). Consider passing kv_indices into this helper and using it to map per-seq page slots to global page IDs.
| "def reconstruct_seq_from_paged_nhd(kv_tensor, kv_ip, kv_lpl, seq_idx, kv_slot):\n", | |
| " chunks = []\n", | |
| " start = int(kv_ip[seq_idx].item())\n", | |
| " end = int(kv_ip[seq_idx + 1].item())\n", | |
| " last_tokens = int(kv_lpl[seq_idx].item())\n", | |
| " for p in range(start, end - 1):\n", | |
| " chunks.append(kv_tensor[p, kv_slot, :, :, :].reshape(-1, num_kv_heads, head_dim))\n", | |
| " p_last = end - 1\n", | |
| " chunks.append(\n", | |
| " kv_tensor[p_last, kv_slot, :last_tokens, :, :].reshape(-1, num_kv_heads, head_dim)\n", | |
| " )\n", | |
| " return torch.cat(chunks, dim=0)\n", | |
| "\n", | |
| "\n", | |
| "k0 = reconstruct_seq_from_paged_nhd(kv_data, kv_indptr, kv_last_page_len, 0, 0)\n", | |
| "v0 = reconstruct_seq_from_paged_nhd(kv_data, kv_indptr, kv_last_page_len, 0, 1)\n", | |
| "def reconstruct_seq_from_paged_nhd(kv_tensor, kv_ip, kv_indices, kv_lpl, seq_idx, kv_slot):\n", | |
| " chunks = []\n", | |
| " start = int(kv_ip[seq_idx].item())\n", | |
| " end = int(kv_ip[seq_idx + 1].item())\n", | |
| " last_tokens = int(kv_lpl[seq_idx].item())\n", | |
| " for p in range(start, end - 1):\n", | |
| " page_id = int(kv_indices[p].item())\n", | |
| " chunks.append(kv_tensor[page_id, kv_slot, :, :, :].reshape(-1, num_kv_heads, head_dim))\n", | |
| " p_last = end - 1\n", | |
| " page_id_last = int(kv_indices[p_last].item())\n", | |
| " chunks.append(\n", | |
| " kv_tensor[page_id_last, kv_slot, :last_tokens, :, :].reshape(-1, num_kv_heads, head_dim)\n", | |
| " )\n", | |
| " return torch.cat(chunks, dim=0)\n", | |
| "\n", | |
| "\n", | |
| "k0 = reconstruct_seq_from_paged_nhd(kv_data, kv_indptr, kv_indices, kv_last_page_len, 0, 0)\n", | |
| "v0 = reconstruct_seq_from_paged_nhd(kv_data, kv_indptr, kv_indices, kv_last_page_len, 0, 1)\n", |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Diptorup Deb <diptorup@cs.unc.edu>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Diptorup Deb <diptorup@cs.unc.edu>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Diptorup Deb <diptorup@cs.unc.edu>
031670f to
d6ee47f
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "execution_count": 1, | ||
| "id": "0ebe68e6", | ||
| "metadata": { | ||
| "execution": { | ||
| "iopub.execute_input": "2026-04-13T19:55:15.549091Z", | ||
| "iopub.status.busy": "2026-04-13T19:55:15.548883Z", | ||
| "iopub.status.idle": "2026-04-13T19:55:19.421222Z", | ||
| "shell.execute_reply": "2026-04-13T19:55:19.420842Z" | ||
| } | ||
| }, | ||
| "outputs": [ | ||
| { | ||
| "name": "stderr", | ||
| "output_type": "stream", | ||
| "text": [ | ||
| "[aiter] import [module_aiter_enum] under /home/AMD/diptodeb/micromamba/envs/flashinfer-rocm-devel/lib/python3.12/site-packages/aiter/jit/module_aiter_enum.so\n" | ||
| ] | ||
| }, | ||
| { | ||
| "name": "stdout", | ||
| "output_type": "stream", | ||
| "text": [ | ||
| "flashinfer: 0.5.3+amd.1.dev9\n", | ||
| "torch: 2.9.1+rocm7.2.0.git7e1940d4\n", | ||
| "PyTorch HIP / ROCm build: 7.2.26015-fc0010cf6a\n", | ||
| "Detected system ROCm version: 7.2.0\n", | ||
| "Architectures with AMD FlashInfer ports: gfx942, gfx950\n", | ||
| "GPU count (torch): 1\n", | ||
| "Device indices FlashInfer treats as supported Instinct (rocminfo): (0,)\n", | ||
| "Using device: cuda:0\n" | ||
| ] | ||
| } | ||
| ], |
There was a problem hiding this comment.
This notebook is committed with cell outputs and execution metadata (timestamps, stderr logs, absolute paths, etc.). That makes diffs noisy and can unintentionally leak machine-specific information. Please clear all outputs and reset execution counts/metadata before committing (keep only the source/markdown).
| "execution_count": 1, | |
| "id": "0ebe68e6", | |
| "metadata": { | |
| "execution": { | |
| "iopub.execute_input": "2026-04-13T19:55:15.549091Z", | |
| "iopub.status.busy": "2026-04-13T19:55:15.548883Z", | |
| "iopub.status.idle": "2026-04-13T19:55:19.421222Z", | |
| "shell.execute_reply": "2026-04-13T19:55:19.420842Z" | |
| } | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stderr", | |
| "output_type": "stream", | |
| "text": [ | |
| "[aiter] import [module_aiter_enum] under /home/AMD/diptodeb/micromamba/envs/flashinfer-rocm-devel/lib/python3.12/site-packages/aiter/jit/module_aiter_enum.so\n" | |
| ] | |
| }, | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "flashinfer: 0.5.3+amd.1.dev9\n", | |
| "torch: 2.9.1+rocm7.2.0.git7e1940d4\n", | |
| "PyTorch HIP / ROCm build: 7.2.26015-fc0010cf6a\n", | |
| "Detected system ROCm version: 7.2.0\n", | |
| "Architectures with AMD FlashInfer ports: gfx942, gfx950\n", | |
| "GPU count (torch): 1\n", | |
| "Device indices FlashInfer treats as supported Instinct (rocminfo): (0,)\n", | |
| "Using device: cuda:0\n" | |
| ] | |
| } | |
| ], | |
| "execution_count": null, | |
| "id": "0ebe68e6", | |
| "metadata": {}, | |
| "outputs": [], |
A Jupyter notebook demo on how to use amd-flashinfer