Fix max_length criteria when using inputs_embeds#28994
Fix max_length criteria when using inputs_embeds#28994amyeroberts merged 18 commits intohuggingface:mainfrom
Conversation
There was a problem hiding this comment.
Technically fulfils the main request of the GH issue, but I'd like for us to go one step further!
In the test you wrote, we check self.assertEqual(out_gen.shape[-1], input_len + out_gen_embeds.shape[-1] - 1). Ideally, the final -1 shouldn't be there: we initialize input_ids with decoder_start_id, causing the additional length, and we probably shouldn't. As such, we can add an additional condition in _prepare_decoder_input_ids_for_generation: in this specific case, input_ids should be empty.
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
|
oh, i see, added a new fix and checked that creating an empty tensor does not break anything |
gante
left a comment
There was a problem hiding this comment.
Perfect! Thank you for iterating 🤗
Regarding failing CI: it seems unrelated to this PR and main does not have this failure. Therefore, it will likely be solved by rebasing with main and then force-pushing.
amyeroberts
left a comment
There was a problem hiding this comment.
Thanks for fixing this! Very nice and clean PR :)
Just some outstanding questions so I can understand what's happening here before approving
|
|
||
| def test_max_length_if_input_embeds(self): | ||
| # PT-only test: TF doesn't have StoppingCriteria | ||
| article = "Hey, are you conscious?" |
There was a problem hiding this comment.
Can we use a different phrase here? Talking about consciousness with these LLMs isn't ideal
| input_len = input_ids.shape[-1] | ||
| out_gen = model.generate(input_ids=input_ids, max_length=max_length) | ||
| out_gen_embeds = model.generate(inputs_embeds=inputs_embeds, max_length=max_length) | ||
| self.assertEqual(out_gen.shape[-1], input_len + out_gen_embeds.shape[-1]) |
There was a problem hiding this comment.
For my own understanding - why is the returned generation when passing in input_ids, a concatenation of the input and newly generated tokens, but for embeds we only return the new embeddings?
There was a problem hiding this comment.
The addition of input_length here is needed because the output of generation with inputs_embeds return only newly generated text, while the input_ids return the whole text, including prompt. So, we are just making sure the lengths of both are equal
There was a problem hiding this comment.
Right, but why is the behaviour different for embeddings and input_ids?
There was a problem hiding this comment.
If I understand the question correctly, the lengths here differ because we return the whole text (prompt+new) when user passes ids. But we cannot recover prompt text from input_embeds, so we just return the newly generated part
There was a problem hiding this comment.
As @zucchini-nlp wrote.
There is no mismatch if the user passes input_ids and inputs_embeds, as generate continues populating input_ids. But passing both kinda defeats the point of feeding inputs_embeds, which is used mostly for experimental purposes, and thus the shape difference when only inputs_embeds is set. Although we can technically recover input_ids from inputs_embeds (reverse lookup search) in most cases to make the shapes consistent, it's probably not a good use of our engineering time :D
There was a problem hiding this comment.
@zucchini-nlp @gante Thanks for the explanation!
| break | ||
|
|
||
| if "inputs_embeds" in model_kwargs: | ||
| return torch.ones((batch_size, 0), dtype=torch.long, device=self.device) |
There was a problem hiding this comment.
For my own understanding - am I correct in understanding when using input_embeds we don't use any initialization then, this is just an empty placeholder?
There was a problem hiding this comment.
Yep, when we initialized with size 1 filled with BOS tokens, that ruined max_length by one token. We want want the final generation be a continuation of input_embeds and not start with BOS
| ) | ||
| generation_config.max_length = generation_config.max_new_tokens + input_ids_length | ||
|
|
||
| # adjust max_length when using `input_embeds` in decoder-only models |
There was a problem hiding this comment.
Rather than saying what this is doing (we can tell from the code) it would be useful for the comment to explain why we need to do this.
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
amyeroberts
left a comment
There was a problem hiding this comment.
Looks great - thanks for iterating!
|
@amyeroberts unrelated CI failures, I believe this can be merged 🤗 |
|
@zucchini-nlp Can you try rebasing? Fixes should have been merged into main with resolve the currently failing tests |
|
@amyeroberts thanks, now it's all green and can be merged |
What does this PR do?
Fixes #28953 . StoppingCriteria with max_length behaves differently when provided
input_idsorinputs_embeds, this happens only on decoder-only models. The PR fixes it so that the criteria accounts for the length ofinput_embedswhen generatingBefore submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@gante