Skip to content

llama: max ctx by default, fix fit magic number#18567

Closed
JohannesGaessler wants to merge 3 commits intoggml-org:masterfrom
JohannesGaessler:llama-fp-fix-ctx-magic
Closed

llama: max ctx by default, fix fit magic number#18567
JohannesGaessler wants to merge 3 commits intoggml-org:masterfrom
JohannesGaessler:llama-fp-fix-ctx-magic

Conversation

@JohannesGaessler
Copy link
Copy Markdown
Contributor

Fixes #18376 .

The intent with llama_params_fit is to only change those parameters that a user is not setting explicitly themself. However, if a user is explicitly setting -c 0 this is currently being changed by llama_params_fit. I would propose the following changes to fix this:

  • Change the data type of llama_context_params::n_ctx from uint32_t to int32_t with all values <= 0 being interpreted to mean that the full context should be used.
  • Change the default value set llama_context_default_params from 512 to -1. This would mean with the llama C API by default the full context of the model would now be used, consistent with the common API.
  • In llama_params_fit, check the context size against the new default value instead of vs. 0.
  • For the CLI, add strings "auto" and "full" for --ctx-size which by default both use the full context but differ w.r.t. whether they should be changed by llama_params_fit.

An alternative approach would be keep the changes entirely in the common API.

@CISC
Copy link
Copy Markdown
Member

CISC commented Jan 3, 2026

Hmmm, what's with the sign disparity between common and llama params?

@JohannesGaessler
Copy link
Copy Markdown
Contributor Author

I think the biggest reason is simply that no one invested the effort to make things consistent. I would suggest that going forward, if we have non-negative quantities like the context size we store them internally as unsigned values but use signed values in llama_model_params and llama_context_params in order to encode special values.

Copy link
Copy Markdown
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative approach would be keep the changes entirely in the common API.

This seems like the better approach.

Comment thread include/llama.h
// https://github.com/ggml-org/llama.cpp/pull/7544
struct llama_context_params {
uint32_t n_ctx; // text context, 0 = from model
int32_t n_ctx; // context size in tokens, use llama_model_n_ctx_train for values <= 0
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I don't see why this change is needed. Looks like it allows negative values that have the same functionality as n_ctx == 0. So the user code can simply used params.n_ctx = max(0, n_ctx); to achieve the same.

@JohannesGaessler
Copy link
Copy Markdown
Contributor Author

Superseded by #19070 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Misc. bug: Maximum Context Size

3 participants