Skip to content

docs: update cuDNN sliding window attention support#2624

Open
sbhavani wants to merge 2 commits intoNVIDIA:mainfrom
sbhavani:docs/update-cudnn-swa-support
Open

docs: update cuDNN sliding window attention support#2624
sbhavani wants to merge 2 commits intoNVIDIA:mainfrom
sbhavani:docs/update-cudnn-swa-support

Conversation

@sbhavani
Copy link
Collaborator

Description

Update documentation to reflect that cuDNN now supports causal sliding window attention (SWA) starting from version 9.2+.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Please list the changes introduced in this PR:

Changes:

  • Updated backend support matrix table to show cuDNN supports SWA (cuDNN 9.2+, causal masks only)
  • Added SWA comparison between flash-attention and cuDNN in section 1.3
  • Added clarifying note in cp_ag_thd_dpa_jax_deep_dive.ipynb that cuDNN supports SWA but not all striping patterns for context parallelism

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Update documentation to reflect that cuDNN now supports causal sliding
window attention (SWA) starting from version 9.2+.

Changes:
- Updated backend support matrix table to show cuDNN supports SWA
  (cuDNN 9.2+, causal masks only)
- Added SWA comparison between flash-attention and cuDNN in section 1.3
- Added clarifying note in cp_ag_thd_dpa_jax_deep_dive.ipynb that cuDNN
  supports SWA but not all striping patterns for context parallelism

Technical details:
- cuDNN 9.2+: Supports causal SWA with window_size=(left, 0)
- cuDNN 9.6+: Enhanced support for asymmetric windows (left, right)
- Constraints: Requires dropout=0.0 and bias_type="no_bias"
- Only works with causal mask types

Signed-off-by: Santosh Bhavani <santosh.bhavani@live.com>
@sbhavani sbhavani requested a review from pggPL January 26, 2026 18:57
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 26, 2026

Greptile Summary

Documentation-only PR that updates the attention documentation to reflect cuDNN's sliding window attention (SWA) support starting from version 9.2+.

  • Updated the backend support matrix table in attention.ipynb to show cuDNN supports SWA with the qualifier "cuDNN 9.2+, causal masks only" (previously listed as "No")
  • Added a new bullet point in Section 1.3 comparing SWA support between flash-attention (full support) and cuDNN (causal-only, no dropout, no bias)
  • Added a clarifying note in cp_ag_thd_dpa_jax_deep_dive.ipynb explaining that cuDNN supports SWA but not all striping patterns for context parallelism

All documented constraints (dropout=0.0, bias_type="no_bias", causal masks, cuDNN 9.2+ version requirement) are verified against the actual validation logic in fused_attn.cpp and utils.py.

Confidence Score: 5/5

  • This PR is safe to merge — it contains only documentation updates with no code changes.
  • All changes are documentation-only (Jupyter notebook markdown cells). The documented constraints (cuDNN 9.2+, causal masks, dropout=0.0, no_bias) are verified against the actual C++ and Python validation logic in the codebase. No code behavior is affected.
  • No files require special attention.

Important Files Changed

Filename Overview
docs/examples/attention/attention.ipynb Updated support matrix table (SWA column from "No" to "Yes (cuDNN 9.2+, causal masks only)") and added SWA comparison bullet in Section 1.3. All claims verified against code.
docs/examples/attention/cp_ag_thd_dpa_jax_deep_dive.ipynb Added a clarifying note that cuDNN supports SWA (9.2+, causal masks) but not all striping patterns for context parallelism. Accurate and helpful context.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[SWA Request] --> B{cuDNN Version?}
    B -->|< 9.2| C[SWA Not Supported\nFallback to flash-attention]
    B -->|>= 9.2| D{Mask Type?}
    D -->|causal / causal_bottom_right| E{dropout == 0.0?}
    D -->|non-causal| C
    E -->|Yes| F{bias_type == no_bias?}
    E -->|No| C
    F -->|Yes| G[cuDNN SWA Enabled]
    F -->|No| C
Loading

Last reviewed commit: 12be7aa

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Collaborator

@pggPL pggPL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments