An independent reproduction study of "Attention Is Not What You Need" (arXiv 2512.19428).
This repository contains a reproduction of Grassmann flow layers for sequence modeling. The original paper claims performance "within 10-15% of size-matched Transformers" on Wikitext-2. Our reproduction shows a 22.6% gap - significantly larger than claimed.
| Model | Parameters | Test PPL |
|---|---|---|
| Grassmann (paper arch) | 17.70M | 242.94 |
| Transformer | 17.67M | 198.17 |
Gap: 22.6% (vs claimed 10-15%)
Custom CUDA kernels provide 2x inference speedup:
| Metric | PyTorch | CUDA | Speedup |
|---|---|---|---|
| Full model inference | 9.16 ms | 4.53 ms | 2.0x |
Full analysis and discussion: blog.md
# Install dependencies
pip install torch datasets transformers tqdm
# Run reproduction
python train_wikitext2.py --model both --epochs 20
# Build CUDA kernels (optional)
cd src/cuda && python setup.py installtrain_wikitext2.py- Training scriptsrc/models/grassmann_v4.py- Paper-exact implementationsrc/cuda/- CUDA kernel implementationblog.md- Full reproduction reporttechnical.md- Technical details
Experiments run on NVIDIA H100 SXM5 80GB (Voltage Park Cloud).
@article{arledge2025grassmann,
title={Grassmann Flows for Sequence Modeling: An Independent Reproduction Study},
author={Arledge, Elliot},
year={2025},
month={December},
url={https://github.com/Infatoshi/grassmann-flows}
}MIT