Increase 3B scratch buffers. by SlyEcho · Pull Request #1698 · ggml-org/llama.cpp

SlyEcho · 2023-06-05T09:09:18Z

The 128 MB was too optimistic.
Too bad it is not dynamically computed.

Ref: #1588 (comment)

The 128 MB was too optimistic. Too bad it is not dynamically computed.

SlyEcho · 2023-06-05T09:11:30Z

@LostRuins, you suggested increasing to 256MB.

Is that going to be enough? What is the best way to test it?

SlyEcho · 2023-06-05T10:34:44Z

It seems to run for me with both Q4_0 and Q5_1 on context 2048 and batch 512 with only 128MB.

LostRuins · 2023-06-05T10:35:31Z

Hi @SlyEcho I guess the best way to test it is to download and run your OpenLLAMA 3B ggml quant (which I don't know if I am allowed to link here). But running it as q4_0 with 256MB scratch at batch size 512, for 2048 context, it seems to work for me. I dunno if there may be some boundary parameter that could still fail.

128MB crashes for me at around 1.5k token mark

LostRuins

lgtm

SlyEcho · 2023-06-05T10:42:58Z

OK, I will merge it. Memory use is probably is also dependent on the user's system and build.

Green-Sky · 2023-06-05T12:12:17Z

download and run your OpenLLAMA 3B ggml quant (which I don't know if I am allowed to link here)

yes you are :) . openllama is an opensource reproduction. licensed under the apache2 license

The 128 MB was too optimistic. Too bad it is not dynamically computed.

Increase 3B scratch buffers.

dffd2a7

The 128 MB was too optimistic. Too bad it is not dynamically computed.

SlyEcho requested a review from Green-Sky June 5, 2023 09:09

Green-Sky approved these changes Jun 5, 2023

View reviewed changes

LostRuins approved these changes Jun 5, 2023

View reviewed changes

SlyEcho merged commit 5220a99 into master Jun 5, 2023

SlyEcho deleted the fix-3b-mem-req branch June 5, 2023 10:43

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Closed

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

Increase 3B scratch buffers. (ggml-org#1698)

517c83a

The 128 MB was too optimistic. Too bad it is not dynamically computed.

phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026

Increase 3B scratch buffers. (ggml-org#1698)

c0bd595

The 128 MB was too optimistic. Too bad it is not dynamically computed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase 3B scratch buffers.#1698

Increase 3B scratch buffers.#1698
SlyEcho merged 1 commit intomasterfrom
fix-3b-mem-req

SlyEcho commented Jun 5, 2023

Uh oh!

SlyEcho commented Jun 5, 2023

Uh oh!

SlyEcho commented Jun 5, 2023

Uh oh!

LostRuins commented Jun 5, 2023 •

edited

Loading

Uh oh!

LostRuins left a comment

Uh oh!

SlyEcho commented Jun 5, 2023

Uh oh!

Green-Sky commented Jun 5, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

SlyEcho commented Jun 5, 2023

Uh oh!

SlyEcho commented Jun 5, 2023

Uh oh!

SlyEcho commented Jun 5, 2023

Uh oh!

LostRuins commented Jun 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LostRuins left a comment

Choose a reason for hiding this comment

Uh oh!

SlyEcho commented Jun 5, 2023

Uh oh!

Green-Sky commented Jun 5, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LostRuins commented Jun 5, 2023 •

edited

Loading