[RoadMap] Support more Attention Template for training and inference

# Roadmap

Plan to support more Attention pattern on more devices.

## Tilelang Kernel Template 
Tilelang kernel. 
Example(mha forward&backward): `attention_engine/core/template/tl_template/attn/attn_tl.py`

- [x] MLA decode (@smallscientist1 )
- [x] GQA forward & backward
- [ ] Varlen attention forward & backward & decode
- [ ] Block-sparse mask for attention bwd & decode
- [ ] Block-sparse indices for attention fwd & bwd & decode

## Lowering 
Lowering customized code into the kernel, such as score_mod, online_func and mask_mod. 
Example: `attention_engine/core/lower/lower.py`

- [x] Customized Attention(sigmoid, relu, ...) decode (assigned to @smallscientist1 )
- [ ] Retnet backward
- [ ] dynamic max seqlen support (assigned to @smallscientist1 )
     - [x] Support mha prefill&backward dynamic seqlen

## Device

- [ ] Amd mi300 Kernel Template & lowering (@smallscientist1 )
- [x] NVIDIA device(RTX4090, A100, ...) hardware config
- [ ] NVIDIA device(RTX4090, A100, ...) performance tuning for more template (assigned to @smallscientist1 )
     - [x] Implement mha fwd&bwd autotune in `attention_engine/core/template/tl_template/attn/attn_tl.py`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RoadMap] Support more Attention Template for training and inference #13

Roadmap

Tilelang Kernel Template

Lowering

Device

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RoadMap] Support more Attention Template for training and inference #13

Description

Roadmap

Tilelang Kernel Template

Lowering

Device

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions