### Describe the feature Hi, I'm training LLaMA with SP and found that SP cannot be used with FlashAttention. When will SP compatible with FlashAttention?
Describe the feature
Hi, I'm training LLaMA with SP and found that SP cannot be used with FlashAttention. When will SP compatible with FlashAttention?