support copies#32159
Conversation
What kind of copying are we talking about here? Like cache.copy? |
|
@amyeroberts On main, without the fix, we get Cache copying is needed to reuse the cache from the prompt. E.g. to run new prompts on top of the system prompt without spending compute on the system prompt. |
da262b0 to
80bb8fb
Compare
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
I'm sorry if it's not the right place to ask this question, but. In Llama.cpp it's trivial to save and load state to/from disk to maintain the cache between sessions. Is it currently possible with Transformers, and if yes, could you please provide a minimal example or point to docs? Cheers, |
|
@vladfaust yes it is possible, but it requires custom code (i.e. you would need to store and restore the cache's tensors). We will add a user-friendly API for that in the future :) |
|
Ps this was actually already merged in #32168 so I'll close this one! |
What does this PR do?
We can't copy the cache 😢 inheriting from module fixes this easily
This renders us unable to re-use prompts / system prompt like this: