Skip to content

[kernels] exception handling for fa kernels#43549

Open
MekkCyber wants to merge 2 commits intomainfrom
better-fa-handling
Open

[kernels] exception handling for fa kernels#43549
MekkCyber wants to merge 2 commits intomainfrom
better-fa-handling

Conversation

@MekkCyber
Copy link
Copy Markdown
Contributor

What does this PR do?

Before we were just silently skipping parameters that are passed by the user like s_aux in case they are not supported by the attention backend specified, it would be better to raise an exception instead.

cc @danieldk

Copy link
Copy Markdown
Member

@danieldk danieldk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but I am not very familiar with this code, so someone else should review it too.

user_kwargs = {
"dropout_p": dropout,
"window_size": sliding_window,
"deterministic": deterministic,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sure about this one. If someone requests deterministic output, it might still be ok to emit non-deterministic output with a warning?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it depends I think! I will wait for @ArthurZucker and @vasqu inputs since they maintain the fa implementation

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't feel super confident about this

  • dropout is fair
  • sliding window might still unintentionally work if key_length <= sliding_window
  • deterministic has the env variable as well
  • s_aux is only supported in gpt oss and we check that people use it
    if value and "flash" in value and value.removeprefix("paged|") != "kernels-community/vllm-flash-attn3":

Additionally, I'm also for being more lenient at least on deterministic (warning)

TL;DR: The conditions below should be counted in when we raise the error + a small test would be nice

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants