Skip to content

Fix max_length criteria when using inputs_embeds#28994

Merged
amyeroberts merged 18 commits intohuggingface:mainfrom
zucchini-nlp:fix/max_length_generation
Feb 16, 2024
Merged

Fix max_length criteria when using inputs_embeds#28994
amyeroberts merged 18 commits intohuggingface:mainfrom
zucchini-nlp:fix/max_length_generation

Conversation

@zucchini-nlp
Copy link
Copy Markdown
Member

What does this PR do?

Fixes #28953 . StoppingCriteria with max_length behaves differently when provided input_ids or inputs_embeds, this happens only on decoder-only models. The PR fixes it so that the criteria accounts for the length of input_embeds when generating

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@gante

Comment thread src/transformers/generation/utils.py Outdated
Copy link
Copy Markdown
Contributor

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically fulfils the main request of the GH issue, but I'd like for us to go one step further!

In the test you wrote, we check self.assertEqual(out_gen.shape[-1], input_len + out_gen_embeds.shape[-1] - 1). Ideally, the final -1 shouldn't be there: we initialize input_ids with decoder_start_id, causing the additional length, and we probably shouldn't. As such, we can add an additional condition in _prepare_decoder_input_ids_for_generation: in this specific case, input_ids should be empty.

Comment thread src/transformers/generation/utils.py Outdated
@zucchini-nlp
Copy link
Copy Markdown
Member Author

oh, i see, added a new fix and checked that creating an empty tensor does not break anything

Copy link
Copy Markdown
Contributor

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect! Thank you for iterating 🤗

Regarding failing CI: it seems unrelated to this PR and main does not have this failure. Therefore, it will likely be solved by rebasing with main and then force-pushing.

@gante gante requested a review from amyeroberts February 14, 2024 10:37
Copy link
Copy Markdown
Contributor

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this! Very nice and clean PR :)

Just some outstanding questions so I can understand what's happening here before approving

Comment thread tests/generation/test_utils.py Outdated

def test_max_length_if_input_embeds(self):
# PT-only test: TF doesn't have StoppingCriteria
article = "Hey, are you conscious?"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a different phrase here? Talking about consciousness with these LLMs isn't ideal

input_len = input_ids.shape[-1]
out_gen = model.generate(input_ids=input_ids, max_length=max_length)
out_gen_embeds = model.generate(inputs_embeds=inputs_embeds, max_length=max_length)
self.assertEqual(out_gen.shape[-1], input_len + out_gen_embeds.shape[-1])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own understanding - why is the returned generation when passing in input_ids, a concatenation of the input and newly generated tokens, but for embeds we only return the new embeddings?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition of input_length here is needed because the output of generation with inputs_embeds return only newly generated text, while the input_ids return the whole text, including prompt. So, we are just making sure the lengths of both are equal

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but why is the behaviour different for embeddings and input_ids?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the question correctly, the lengths here differ because we return the whole text (prompt+new) when user passes ids. But we cannot recover prompt text from input_embeds, so we just return the newly generated part

Copy link
Copy Markdown
Contributor

@gante gante Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @zucchini-nlp wrote.

There is no mismatch if the user passes input_ids and inputs_embeds, as generate continues populating input_ids. But passing both kinda defeats the point of feeding inputs_embeds, which is used mostly for experimental purposes, and thus the shape difference when only inputs_embeds is set. Although we can technically recover input_ids from inputs_embeds (reverse lookup search) in most cases to make the shapes consistent, it's probably not a good use of our engineering time :D

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zucchini-nlp @gante Thanks for the explanation!

Comment thread src/transformers/generation/utils.py Outdated
break

if "inputs_embeds" in model_kwargs:
return torch.ones((batch_size, 0), dtype=torch.long, device=self.device)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own understanding - am I correct in understanding when using input_embeds we don't use any initialization then, this is just an empty placeholder?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, when we initialized with size 1 filled with BOS tokens, that ruined max_length by one token. We want want the final generation be a continuation of input_embeds and not start with BOS

Comment thread src/transformers/generation/utils.py Outdated
)
generation_config.max_length = generation_config.max_new_tokens + input_ids_length

# adjust max_length when using `input_embeds` in decoder-only models
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than saying what this is doing (we can tell from the code) it would be useful for the comment to explain why we need to do this.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amyeroberts done for all comments

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Contributor

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great - thanks for iterating!

@gante
Copy link
Copy Markdown
Contributor

gante commented Feb 15, 2024

@amyeroberts unrelated CI failures, I believe this can be merged 🤗

@amyeroberts
Copy link
Copy Markdown
Contributor

@zucchini-nlp Can you try rebasing? Fixes should have been merged into main with resolve the currently failing tests

@zucchini-nlp
Copy link
Copy Markdown
Member Author

@amyeroberts thanks, now it's all green and can be merged

@amyeroberts amyeroberts merged commit aee11fe into huggingface:main Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Report inconsistent output length from decoder-only model generate with input_ids and inputs_embeds

4 participants