From 88c1a16405dc0aa5d2bb06db75ea16cc28f744ba Mon Sep 17 00:00:00 2001 From: "claude[bot]" <41898282+claude[bot]@users.noreply.github.com> Date: Tue, 17 Feb 2026 04:10:11 +0000 Subject: [PATCH] Add STP/MTP terminology definitions to agent docs Define STP (Single Token Prediction) and MTP (Multi-Token Prediction) in AGENTS.md and workflow prompt configs so agents understand that STP means standard autoregressive decoding with no speculative decoding or MTP. Co-authored-by: functionstackx --- .github/workflows/claude-pr-review.yml | 4 ++++ .github/workflows/claude.yml | 2 ++ AGENTS.md | 5 +++++ 3 files changed, 11 insertions(+) diff --git a/.github/workflows/claude-pr-review.yml b/.github/workflows/claude-pr-review.yml index a21b2afe6..1b5e3f96e 100644 --- a/.github/workflows/claude-pr-review.yml +++ b/.github/workflows/claude-pr-review.yml @@ -125,6 +125,10 @@ jobs: - The perf-changelog entry should document what changed in the config and include the PR link - Format: "Master config files were modified but `perf-changelog.yaml` was not updated. When changing `.github/configs/amd-master.yaml` or `.github/configs/nvidia-master.yaml`, you must add a corresponding entry to `perf-changelog.yaml` documenting the changes." + ## Terminology: + - **STP (Single Token Prediction)**: Standard autoregressive decoding — one token per forward pass. No speculative decoding or MTP. Benchmarks labeled "STP only" use vanilla decoding. + - **MTP (Multi-Token Prediction)**: Predicts multiple tokens per forward pass using speculative decoding (e.g., EAGLE, NEXTN). + Remember: Silence is golden. No comment is better than a low-value comment. ## Container Image Accessibility Validation: diff --git a/.github/workflows/claude.yml b/.github/workflows/claude.yml index 4614556d5..8b4f81663 100644 --- a/.github/workflows/claude.yml +++ b/.github/workflows/claude.yml @@ -227,4 +227,6 @@ jobs: ### Additional Knowledge - MI355 is gfx950 not gfx1201 + - **STP (Single Token Prediction)**: Standard autoregressive decoding — one token per forward pass. No speculative decoding or MTP. Benchmarks labeled "STP only" use vanilla decoding. + - **MTP (Multi-Token Prediction)**: Predicts multiple tokens per forward pass using speculative decoding (e.g., EAGLE, NEXTN). diff --git a/AGENTS.md b/AGENTS.md index ecc1862f8..8ed144c81 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -46,6 +46,11 @@ InferenceMAX is an open-source, automated benchmarking system that continuously └── perf-changelog.yaml # Triggers benchmarks on changes ``` +## Terminology + +- **STP (Single Token Prediction)**: Standard autoregressive decoding where one token is generated per forward pass. No speculative decoding or MTP (Multi-Token Prediction) is used. When a benchmark is labeled "STP only", it means vanilla decoding without any speculation. +- **MTP (Multi-Token Prediction)**: A technique where the model predicts multiple tokens per forward pass, typically using speculative decoding methods like EAGLE or NEXTN. + ## Key Technologies - **Python 3.13**: Core automation and config generation