Conversation
The 128 MB was too optimistic. Too bad it is not dynamically computed.
|
@LostRuins, you suggested increasing to 256MB. Is that going to be enough? What is the best way to test it? |
|
It seems to run for me with both Q4_0 and Q5_1 on context 2048 and batch 512 with only 128MB. |
|
Hi @SlyEcho I guess the best way to test it is to download and run your OpenLLAMA 3B ggml quant (which I don't know if I am allowed to link here). But running it as q4_0 with 256MB scratch at batch size 512, for 2048 context, it seems to work for me. I dunno if there may be some boundary parameter that could still fail. 128MB crashes for me at around 1.5k token mark |
|
OK, I will merge it. Memory use is probably is also dependent on the user's system and build. |
yes you are :) . openllama is an opensource reproduction. licensed under the apache2 license |
The 128 MB was too optimistic. Too bad it is not dynamically computed.
The 128 MB was too optimistic. Too bad it is not dynamically computed.
The 128 MB was too optimistic.
Too bad it is not dynamically computed.
Ref: #1588 (comment)