Implement '--keep-split' to quantize model into several shards by zj040045 · Pull Request #6688 · ggml-org/llama.cpp

zj040045 · 2024-04-15T14:38:10Z

Fix #6548
--keep-split allows quantize to output shards instead of a full model. The number of shards depends on the input model files

phymbert · 2024-04-17T17:53:55Z

Thanks. Do you mind to add a tests.sh as we did in #6655

zj040045 · 2024-04-18T14:22:22Z

@phymbert Done

phymbert · 2024-04-18T15:59:49Z

-    LLAMA_LOG_INFO("%s: meta size = %zu bytes\n", __func__, meta_size);
+        auto weight = ml.get_weight(i);
+        struct ggml_tensor * tensor = weight->tensor;
+        if (weight->idx != (ctx_outs.size() - 1) && params->keep_split) {


I feel it not safe for future evolution as it assumes writing the tensors the same order they have been read. Could we simply check if weight->idx is not present in ctx_outs and retrieve ctx_out by tensor ?

You are right. Then model splits writing should follow this logic to support case like "0 0 0 2 2 1 1". Besides, do you think incontinuous order should be considered like "0 0 0 2 1 2 1"?

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

…#6688) * Implement '--keep-split' to quantize model into several shards * Add test script * Update examples/quantize/quantize.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Split model correctly even if tensor id is out-of-order * Update llama_model_quantize_params * Fix preci failures --------- Co-authored-by: z5269887 <z5269887@unsw.edu.au> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Implement '--keep-split' to quantize model into several shards

17519e1

phymbert added the split GGUF split model sharding label Apr 17, 2024

Add test script

79bbf42

phymbert requested a review from ggerganov April 18, 2024 14:25

This comment was marked as off-topic.

Sign in to view

phymbert reviewed Apr 18, 2024

View reviewed changes

ggerganov reviewed Apr 19, 2024

View reviewed changes

Comment thread llama.h Outdated

Comment thread examples/quantize/quantize.cpp Outdated

zj040045 and others added 4 commits April 22, 2024 22:27

Update examples/quantize/quantize.cpp

6d66e60

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Split model correctly even if tensor id is out-of-order

d6e453e

Update llama_model_quantize_params

141eb51

Fix preci failures

e0a3679

ggerganov approved these changes Apr 25, 2024

View reviewed changes

ggerganov merged commit 1966eb2 into ggml-org:master Apr 25, 2024

ggerganov mentioned this pull request Apr 25, 2024

tests : minor bash stuff #6902

Merged

christianazinn mentioned this pull request Apr 27, 2024

Option to split during conversion #6942

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement '--keep-split' to quantize model into several shards#6688

Implement '--keep-split' to quantize model into several shards#6688
ggerganov merged 6 commits intoggml-org:masterfrom
zj040045:jiez/quantize-keep-split

zj040045 commented Apr 15, 2024

Uh oh!

phymbert commented Apr 17, 2024

Uh oh!

zj040045 commented Apr 18, 2024

Uh oh!

This comment was marked as off-topic.

phymbert Apr 18, 2024

Uh oh!

zj040045 Apr 22, 2024

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zj040045 commented Apr 15, 2024

Uh oh!

phymbert commented Apr 17, 2024

Uh oh!

zj040045 commented Apr 18, 2024

Uh oh!

This comment was marked as off-topic.

phymbert Apr 18, 2024

Choose a reason for hiding this comment

Uh oh!

zj040045 Apr 22, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants