Skip to content

Fix incorrect comments about atten mask for pytorch backend#18728

Merged
sgugger merged 3 commits intohuggingface:mainfrom
lygztq:fix-comment
Sep 23, 2022
Merged

Fix incorrect comments about atten mask for pytorch backend#18728
sgugger merged 3 commits intohuggingface:mainfrom
lygztq:fix-comment

Conversation

@lygztq
Copy link
Copy Markdown
Contributor

@lygztq lygztq commented Aug 23, 2022

What does this PR do?

I found these comments do not match the actual behavior. In these comments we use -10000.0 to mask out elements before softmax but the actual value used is torch.finfo(dtype).min.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

HuggingFaceDocBuilderDev commented Aug 23, 2022

The documentation is not available anymore as the PR was closed or merged.

@ydshieh
Copy link
Copy Markdown
Collaborator

ydshieh commented Aug 23, 2022

@lygztq Great catch. This inconsistency was introduced in (my) PR #17306, where the docstrings were not updated - my bad.

@ydshieh
Copy link
Copy Markdown
Collaborator

ydshieh commented Aug 23, 2022

I think it's more clear to change from ... the smallest value ... to ... the dtype's smallest value .... WDYT?

@lygztq
Copy link
Copy Markdown
Contributor Author

lygztq commented Aug 23, 2022

I think it's more clear to change from ... the smallest value ... to ... the dtype's smallest value .... WDYT?

I agree, this is better.

@huggingface huggingface deleted a comment from github-actions Bot Sep 22, 2022
# effectively the same as removing these entirely.
attention_mask = attention_mask.to(dtype=self.dtype) # fp16 compatibility
attention_mask = (1.0 - attention_mask) * -10000.0
attention_mask = (1.0 - attention_mask) * torch.finfo(self.dtype).min
Copy link
Copy Markdown
Collaborator

@ydshieh ydshieh Sep 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeGen was added after #17306. So update it here along with the comment.

Copy link
Copy Markdown
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @lygztq Sorry for being so late for this PR. I updated the PR with the main branch, and with an update for CodeGen model. Once the tests pass, we are ready to merge the PR. Thank you for the fix 🤗 !

@ydshieh ydshieh requested a review from sgugger September 23, 2022 17:42
Copy link
Copy Markdown
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for fixing!

@sgugger sgugger merged commit ece7624 into huggingface:main Sep 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants