Fix: Jamba batched generation by vasqu · Pull Request #32914 · huggingface/transformers

vasqu · 2024-08-21T10:33:03Z

What does this PR do?

Basically a continuation of #32677 which implements the fixes for Jamba this time. Batched generation tests might need to be changed, especially the logits, but not sure how to proceed there as the logits are HW dependent.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@molbap @ArthurZucker

vasqu · 2024-08-21T10:35:01Z

Passed locally without the higher rtol/atol. Will see if the CI agrees.

Seems like it does :D

Keeping it open for visibility: Left padding works fine now, it was an issue of how padding has been handled in general (for mamba-related models).

vasqu · 2024-08-21T10:37:27Z

CI failure seems unrelated to the PR, some import issues from another model.

ArthurZucker

What I would find weird is if this does not improve / change the results. Especially for batched generation! The model is tiny random, would be nice if we can run this with the big one 👀

vasqu · 2024-08-23T10:05:04Z

Yea, I think it should definitely improve the batched generation. Especially since the test_left_padding_compatibility doesn't need higher atols anymore, padding is not as big of a problem as before (I think).

Too GPU poor to run the Jamba models, iirc they require at least an 80GB Vram GPU 😢 Maybe we could notify the guys behind Jamba? I doubt they are aware of this issue.

… batch gen (with todo on logits comp)

vasqu · 2024-08-23T10:22:10Z

#32250 seems to have changed the integration tests cc @gante

Guess we have to redo them again 👀

vasqu · 2024-08-23T10:23:14Z

            with torch.no_grad():
                logits = self.model(input_ids=inputs["input_ids"]).logits

+            # TODO fix logits


For more visibility so that I don't forget about it.

gante

LGTM, thank you for fixing! 🙌

Added a nit to confirm. Pre-approving assuming the logits tests will be addressed (sorry about that :) )

gante · 2024-08-23T13:06:00Z

+        # No need for zeroing states when
+        # 1. Cached forward
+        # 2. Attending to all inputs
+        if cache_position[0] > 0 or (attention_mask is not None and torch.all(attention_mask == 1)):


I suspect this line will fail at compilation time (data-dependent conditional branch). Can you confirm, i.e. try running a compiled forward pass?

If it fails, we can add a compile guard, i.e. start the if with not is_torchdynamo_compiling()

Tested via the following mini script:

import torch from transformers import JambaForCausalLM, AutoTokenizer model_id = "ai21labs/Jamba-tiny-random" model = JambaForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, low_cpu_mem_usage=True).to("cuda") model = torch.compile(model) tokenizer = AutoTokenizer.from_pretrained(model_id) # tested on both, batched or non-batched input #input = tokenizer(["Hey how are you doing on this lovely evening?", "What is the purpose of life?"], padding=True, return_tensors="pt").to("cuda") input = tokenizer(["What is the purpose of life?"], padding=True, return_tensors="pt").to("cuda") # tested on both, forward call or generate out = model(**input) #out = model.generate(**input, do_sample=False, max_new_tokens=10)

Haven't encountered any compilation errors locally, so seems to be fine. Is this what you had in mind to test compilation?

Yes, that's it!

Perfect, thank you for confirming :)

ArthurZucker

LGTM thanks again @vasqu for your great contributions!

* init fix * fix mask during cached forward, move mask related stuff to own function * adjust tests as left padding does not change logits as much anymore + batch gen (with todo on logits comp) * revert overwriting new integration tests * move some comments to docstring

vasqu commented Aug 21, 2024

View reviewed changes

Comment thread src/transformers/models/jamba/modeling_jamba.py Outdated

vasqu commented Aug 21, 2024

View reviewed changes

Comment thread tests/models/jamba/test_modeling_jamba.py Outdated

vasqu commented Aug 21, 2024

View reviewed changes

Comment thread tests/models/jamba/test_modeling_jamba.py Outdated

ArthurZucker reviewed Aug 22, 2024

View reviewed changes

Comment thread src/transformers/models/jamba/modeling_jamba.py Outdated

Comment thread tests/models/jamba/test_modeling_jamba.py Outdated

vasqu added 3 commits August 23, 2024 12:06

init fix

c02bf38

fix mask during cached forward, move mask related stuff to own function

fcd6d20

adjust tests as left padding does not change logits as much anymore +…

e2c2341

… batch gen (with todo on logits comp)

vasqu force-pushed the jamba-batched-gen-fix branch from cfc73d9 to e2c2341 Compare August 23, 2024 10:08

revert overwriting new integration tests

e81eee2

vasqu commented Aug 23, 2024

View reviewed changes

gante approved these changes Aug 23, 2024

View reviewed changes

move some comments to docstring

43e08dd

ArthurZucker approved these changes Aug 28, 2024

View reviewed changes

ArthurZucker merged commit 3bfd3e4 into huggingface:main Aug 28, 2024

vasqu deleted the jamba-batched-gen-fix branch August 28, 2024 11:15

Conversation

vasqu commented Aug 21, 2024

What does this PR do?

Before submitting

Who can review?

Uh oh!

Uh oh!

vasqu Aug 21, 2024

Choose a reason for hiding this comment

Uh oh!

vasqu Aug 21, 2024

Choose a reason for hiding this comment

Uh oh!

vasqu Aug 23, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vasqu commented Aug 21, 2024

Uh oh!

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vasqu commented Aug 23, 2024

Uh oh!

vasqu commented Aug 23, 2024

Uh oh!

vasqu Aug 23, 2024

Choose a reason for hiding this comment

Uh oh!

gante left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gante Aug 23, 2024

Choose a reason for hiding this comment

Uh oh!

vasqu Aug 23, 2024

Choose a reason for hiding this comment

Uh oh!

gante Aug 23, 2024

Choose a reason for hiding this comment

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gante left a comment •

edited

Loading