Assisted decoding multi-gpu#35116
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| dim=0, | ||
| ) | ||
|
|
||
| candidate_input_ids = candidate_input_ids.to(self.device) |
There was a problem hiding this comment.
in a previous PR https://github.com/keyboardAnt/transformers/pull/4/files I proposed to do it just after get_candidates at the same place we move candidate_logits to self.device
There was a problem hiding this comment.
oke, might be better to move it there for easier readability :)
ArthurZucker
left a comment
There was a problem hiding this comment.
Thanks! Regarding multi-gpu, we don't have fast tests, but we could add them with github actions.
|
@zucchini-nlp @ArthurZucker |
|
@jmamou yes, feel free to add it in the docs in subsequent open PRs if relevant |
What does this PR do?
Fixes #35099, we need to manually move inputs to the correct device as assistant and target models can be initialized on different devices. We already move the input to assistant's device when
get_candidate_inputsbut don't do it in target models' forwardBtw, seems like we are now having more troubles with certain generation techniques or cache not working in multi-gpu setting. And we don't have much tests for multiple gpus, so I am not sure if that's intended or we can add more integration tests. Also if those tests will be run as part of pull request CI or not. cc @gante for this question
cc @jmamou