Skip to content

Remove split metadata when quantize model shards#6591

Merged
ggerganov merged 4 commits intoggml-org:masterfrom
zj040045:jiez/rm-split-metadata
Apr 12, 2024
Merged

Remove split metadata when quantize model shards#6591
ggerganov merged 4 commits intoggml-org:masterfrom
zj040045:jiez/rm-split-metadata

Conversation

@zj040045
Copy link
Copy Markdown
Contributor

  • Add gguf_remove_key to remove key from gguf_remove_key
  • Remove split metadata when quantizing.

Comment thread ggml.c Outdated
Comment on lines +21155 to +21156
for (int i = idx; i < n_kv; ++i)
ctx->kv[i] = ctx->kv[i+1];
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this loop be up to n_kv-1? The body of the loop should also be in brackets.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment thread ggml.c
Comment thread llama.cpp Outdated
gguf_set_val_u32(ctx_out, "general.quantization_version", GGML_QNT_VERSION);
gguf_set_val_u32(ctx_out, "general.file_type", ftype);
// Remove split metadata
gguf_remove_key(ctx_out, "split.no");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is constant for that keys: LLM_KV_SPLIT*

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@github-actions

This comment was marked as off-topic.

@phymbert
Copy link
Copy Markdown
Collaborator

Copy link
Copy Markdown
Collaborator

@phymbert phymbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, althought I believe the right approach should be to generate split in quantize if the input models is splitted.
It can be done later on.
Please merge after @ggerganov approval

@ggerganov ggerganov merged commit 91c7360 into ggml-org:master Apr 12, 2024
@zj040045
Copy link
Copy Markdown
Contributor Author

zj040045 commented Apr 12, 2024

Thanks, althought I believe the right approach should be to generate split in quantize if the input models is splitted.

@phymbert I am checking it. Is it good to add "--split-max-*" for quantize re-split models? Or it only needs a option like "--keep-split" to generate the same number of splits?

@phymbert
Copy link
Copy Markdown
Collaborator

phymbert commented Apr 12, 2024

Is it good to add "--split-max-*" for quantize re-split models? Or it only needs a option like "--keep-split" to generate the same number of splits?

Yes, let's keep it simple at the moment, with the same distribution of tensors per file as the original

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
…-org#6591)

* Remove split metadata when quantize model shards

* Find metadata key by enum

* Correct loop range for gguf_remove_key and code format

* Free kv memory

---------

Co-authored-by: z5269887 <z5269887@unsw.edu.au>
phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026
…-org#6591)

* Remove split metadata when quantize model shards

* Find metadata key by enum

* Correct loop range for gguf_remove_key and code format

* Free kv memory

---------

Co-authored-by: z5269887 <z5269887@unsw.edu.au>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants