Skip to content

Skip M4T test_retain_grad_hidden_states_attentions #28060

Merged
ylacombe merged 2 commits intohuggingface:mainfrom
ylacombe:fix-grad-test-m4t
Dec 15, 2023
Merged

Skip M4T test_retain_grad_hidden_states_attentions #28060
ylacombe merged 2 commits intohuggingface:mainfrom
ylacombe:fix-grad-test-m4t

Conversation

@ylacombe
Copy link
Copy Markdown
Contributor

What does this PR do?

After investigating the reasons for the test_retain_grad_hidden_states_attentions flaky failure, I realized the speech encoder attentions can be None with a non-zero probability when training=True. Skipping the test is the fastest fix.

Fixes #28036

cc @gante @amyeroberts @ydshieh

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, thanks 🤗

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@amyeroberts
Copy link
Copy Markdown
Contributor

Thanks for fixing!

If training is allowed to happen on the model but it can fail e.g. with attentions being None, could you open an issue to track this? Training should either be prevented with an exception or made possible (probably 1 then the other)

@ylacombe
Copy link
Copy Markdown
Contributor Author

ylacombe commented Dec 15, 2023

Hey @amyeroberts, in theory, training is supported for the tasks that translate inputs (text or audio) into texts, since it's a classic LLM with classic objective.
To improve training, the model randomly skip layers in the speech encoder block (thus having None as attention weights), but it doesn't break training when it happens.

@ylacombe ylacombe merged commit deb72cb into huggingface:main Dec 15, 2023
iantbutler01 pushed a commit to BismuthCloud/transformers that referenced this pull request Dec 16, 2023
* skip test from SpeechInput

* refine description of skip
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SeamlessM4T: test_retain_grad_hidden_states_attentions is flaky

4 participants