SemiAnalysisAI · functionstackx · Feb 17, 2026 · Feb 17, 2026
diff --git a/.github/workflows/claude-pr-review.yml b/.github/workflows/claude-pr-review.yml
@@ -125,6 +125,10 @@ jobs:
             - The perf-changelog entry should document what changed in the config and include the PR link
             - Format: "Master config files were modified but `perf-changelog.yaml` was not updated. When changing `.github/configs/amd-master.yaml` or `.github/configs/nvidia-master.yaml`, you must add a corresponding entry to `perf-changelog.yaml` documenting the changes."
 
+            ## Terminology:
+            - **STP (Single Token Prediction)**: Standard autoregressive decoding — one token per forward pass. No speculative decoding or MTP. Benchmarks labeled "STP only" use vanilla decoding.
+            - **MTP (Multi-Token Prediction)**: Predicts multiple tokens per forward pass using speculative decoding (e.g., EAGLE, NEXTN).
+
             Remember: Silence is golden. No comment is better than a low-value comment.
 
             ## Container Image Accessibility Validation:

diff --git a/.github/workflows/claude.yml b/.github/workflows/claude.yml
@@ -227,4 +227,6 @@ jobs:
 
             ### Additional Knowledge
             - MI355 is gfx950 not gfx1201
+            - **STP (Single Token Prediction)**: Standard autoregressive decoding — one token per forward pass. No speculative decoding or MTP. Benchmarks labeled "STP only" use vanilla decoding.
+            - **MTP (Multi-Token Prediction)**: Predicts multiple tokens per forward pass using speculative decoding (e.g., EAGLE, NEXTN).
 
diff --git a/AGENTS.md b/AGENTS.md
@@ -46,6 +46,11 @@ InferenceMAX is an open-source, automated benchmarking system that continuously
 └── perf-changelog.yaml      # Triggers benchmarks on changes
 ```
 
+## Terminology
+
+- **STP (Single Token Prediction)**: Standard autoregressive decoding where one token is generated per forward pass. No speculative decoding or MTP (Multi-Token Prediction) is used. When a benchmark is labeled "STP only", it means vanilla decoding without any speculation.
+- **MTP (Multi-Token Prediction)**: A technique where the model predicts multiple tokens per forward pass, typically using speculative decoding methods like EAGLE or NEXTN.
+
 ## Key Technologies
 
 - **Python 3.13**: Core automation and config generation