Remove split metadata when quantize model shards#6591
Remove split metadata when quantize model shards#6591ggerganov merged 4 commits intoggml-org:masterfrom
Conversation
| for (int i = idx; i < n_kv; ++i) | ||
| ctx->kv[i] = ctx->kv[i+1]; |
There was a problem hiding this comment.
Shouldn't this loop be up to n_kv-1? The body of the loop should also be in brackets.
| gguf_set_val_u32(ctx_out, "general.quantization_version", GGML_QNT_VERSION); | ||
| gguf_set_val_u32(ctx_out, "general.file_type", ftype); | ||
| // Remove split metadata | ||
| gguf_remove_key(ctx_out, "split.no"); |
There was a problem hiding this comment.
There is constant for that keys: LLM_KV_SPLIT*
This comment was marked as off-topic.
This comment was marked as off-topic.
phymbert
left a comment
There was a problem hiding this comment.
Thanks, althought I believe the right approach should be to generate split in quantize if the input models is splitted.
It can be done later on.
Please merge after @ggerganov approval
@phymbert I am checking it. Is it good to add "--split-max-*" for |
Yes, let's keep it simple at the moment, with the same distribution of tensors per file as the original |
…-org#6591) * Remove split metadata when quantize model shards * Find metadata key by enum * Correct loop range for gguf_remove_key and code format * Free kv memory --------- Co-authored-by: z5269887 <z5269887@unsw.edu.au>
…-org#6591) * Remove split metadata when quantize model shards * Find metadata key by enum * Correct loop range for gguf_remove_key and code format * Free kv memory --------- Co-authored-by: z5269887 <z5269887@unsw.edu.au>
gguf_remove_keyto remove key fromgguf_remove_key