Skip to content

Remove attention slicing from docs #4487

@patrickvonplaten

Description

@patrickvonplaten

Describe the bug

The attention slicing function as described in:
Sliced attention for additional memory savings
under https://huggingface.co/docs/diffusers/optimization/fp16

Is very outdated and we should probably remove the section from the optimizations docs.

90% of users use either xformers or Torch 2.0 with SDPA. If this is used then the attention computation is dispatched to flash attention in which case the attention function is very memory efficient and attention slicing:
a) makes no sense because the attention computation won't be the memory bottleneck and
b) can lead to serious slow downs

Only in edge cases does it make sense to keep using attention slicing.

Reproduction

N/A

Logs

No response

System Info

N/A

Who can help?

@stevhliu @sayakpaul can we remove attention slicing from this highly read doc page and in general add a disclaimer to the function that it should not be used in combination with SDPA or xformers?

I'm opening this issue because a user on Discord stumbled upon it

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions