Expand rotary, attention, and embedding layer tests into parametrized suites#504
Merged
jlamypoirier merged 4 commits intomainfrom May 1, 2026
Merged
Expand rotary, attention, and embedding layer tests into parametrized suites#504jlamypoirier merged 4 commits intomainfrom
jlamypoirier merged 4 commits intomainfrom
Conversation
test_rotary.py: - Rewrites the procedural test as a parametrized config-driven suite covering default, llama3, yarn, and 2D rotary across multiple head sizes and sequence lengths. - Independent 1D/2D reference implementations in plain PyTorch (no Fast-LLM kernel calls) make the expected values auditable and kernel-agnostic. - Clones query before calling forward: triton_rotary_ writes results in-place, corrupting the reference if both share storage. test_attention.py: - Replaces the per-feature test stubs with a single combined test that exercises an independent einsum reference, packing equivalence (packed == per-sequence forward and backward), and flash equivalence. - Covers causal, non-causal, sliding-window, MQA (1 KV head), MHA (head_groups == heads), and default-rotary configurations. - TF32 disabled for the reference check to keep summation-order differences below 1e-7; packing and flash checks use looser tolerances. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Covers padding (masked tokens), position embeddings, bfloat16, and full-precision residual across all combinations. Reference implementation in plain PyTorch, independent of Fast-LLM embedding internals. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Drop redundant `_test_attention_impl` wrapper; collapse `_no_tf32` context into `test_attention` directly. - Gate the per-sequence backward in `_run_per_seq_reference` behind a `with_backward` flag; the bf16 reference doesn't consume gradients. - Parameterize rotary theta on `AttentionTestConfig` so the reference and the attention layer can't desync on theta. - Switch `Assert.rms_close` (flash) and `torch.testing.assert_close` (embedding) to `Assert.rms_close_relative` for consistency with the other layer tests. - Drop redundant comments restating obvious code; trim stale comment about Triton CPU failures (q/k norm not exercised here). - Use `float32` as the default-variant name in test_embedding so the empty-string special case can be removed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3f75028 to
4c0da7a
Compare
…ding+position case - Add @pytest.mark.slow to test_attention and test_rotary (consistent with test_embedding and test_ssm) - Fix misleading comment on _attention_rotary_cases: packing equivalence does run for single-doc inputs - Replace _add_configs side-effectful helper with a declarative list comprehension - Add padding+position_embeddings base case to test_embedding to catch masking/position ordering bugs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
test_rotary.py: rewrites the procedural test as a parametrized config-driven suite coveringdefault,llama3,yarn, and2Drotary across multiple head sizes and sequence lengths. Independent 1D/2D reference implementations in plain PyTorch (no Fast-LLM kernel calls). Clonesquerybefore calling forward —triton_rotary_writes results in-place and would corrupt the reference otherwise.test_attention.py: replaces sparse per-feature stubs with a single combined test. Each configuration exercises three checks:F.linear+ per-head matmul, no Fast-LLM internals)Configurations: causal, non-causal, sliding-window, MQA (1 KV head), MHA (head_groups == heads), default-rotary.
test_embedding.py(new): parametrized coverage forLanguageModelEmbedding. Base cases (default, padding, position embeddings) × variants (float32, bfloat16, full-precision residual). Reference in plain PyTorch, independent of Fast-LLM embedding internals.Test plan
pytest -v tests/layers/test_rotary.py tests/layers/test_attention.py tests/layers/test_embedding.py