Skip to content

ggml : remove old quantization functions#5942

Merged
ggerganov merged 6 commits intomasterfrom
gg/simplify-quant-api
Mar 9, 2024
Merged

ggml : remove old quantization functions#5942
ggerganov merged 6 commits intomasterfrom
gg/simplify-quant-api

Conversation

@ggerganov
Copy link
Copy Markdown
Member

@ggerganov ggerganov commented Mar 8, 2024

Remove ggml_quantize_ API in favor of ggml_quantize_chunk

No longer computing hists, but we can re-introduce this later

@slaren
Copy link
Copy Markdown
Member

slaren commented Mar 8, 2024

We could probably use from_float and blck_size from type_traits instead of duplicating the same code for every type in ggml_quantize_chunk to simplify this function, there is a lot of code duplication there.

@ggerganov
Copy link
Copy Markdown
Member Author

The imatrix is making it complicated - probably have to extend type_traits with a new function from_float_imatrix, or extend from_float signature with an imatrix and assign the quantize_ instead of quantize_row_ functions to it

@ggerganov ggerganov force-pushed the gg/simplify-quant-api branch from 4b0ddcd to 5460dcc Compare March 9, 2024 10:48
@ggerganov ggerganov requested a review from slaren March 9, 2024 11:36
Comment thread ggml.c Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the code to collect the histograms has been removed, are there any plans to reintroduce it? Otherwise, this parameter could be removed.

@ggerganov ggerganov force-pushed the gg/simplify-quant-api branch from 1c37d3e to 95ea0ff Compare March 9, 2024 12:58
Comment thread ggml-vulkan.cpp Outdated
Comment on lines +4105 to +4107
std::vector<int64_t> hist_cur(1 << 4, 0);

switch(quant) {
case GGML_TYPE_F32:
memcpy(to, from, sizeof(float) * ne);
break;
case GGML_TYPE_Q4_0:
ggml_quantize_q4_0(from, to, ne, ne, hist_cur.data());
break;
case GGML_TYPE_Q4_1:
ggml_quantize_q4_1(from, to, ne, ne, hist_cur.data());
break;
case GGML_TYPE_Q5_0:
ggml_quantize_q5_0(from, to, ne, ne, hist_cur.data());
break;
case GGML_TYPE_Q5_1:
ggml_quantize_q5_1(from, to, ne, ne, hist_cur.data());
break;
case GGML_TYPE_Q8_0:
ggml_quantize_q8_0(from, to, ne, ne, hist_cur.data());
break;
case GGML_TYPE_Q2_K:
ggml_quantize_q2_K(from, to, ne, ne, hist_cur.data());
break;
case GGML_TYPE_Q3_K:
ggml_quantize_q3_K(from, to, ne, ne, hist_cur.data());
break;
case GGML_TYPE_Q4_K:
ggml_quantize_q4_K(from, to, ne, ne, hist_cur.data());
break;
case GGML_TYPE_Q5_K:
ggml_quantize_q5_K(from, to, ne, ne, hist_cur.data());
break;
case GGML_TYPE_Q6_K:
ggml_quantize_q6_K(from, to, ne, ne, hist_cur.data());
break;
default:
GGML_ASSERT(false);
}
gml_quantize_chunk(quant, from, to, 0, 1, ne, hist_cur.data(), nullptr);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hist should be removed here too, and there is a typo in the function name.

@ggerganov ggerganov merged commit 5b09797 into master Mar 9, 2024
@ggerganov ggerganov deleted the gg/simplify-quant-api branch March 9, 2024 13:54
hazelnutcloud pushed a commit to hazelnutcloud/llama.cpp that referenced this pull request Mar 10, 2024
* ggml : remove old quantization functions

ggml-ci

* ggml : simplify ggml_quantize_chunk

ggml-ci

* ggml : restrict correctness

ggml-ci

* ggml : remove hist data from the quantization API

ggml-ci

* tests : remove hist usage in test-backend-ops

ggml-ci

* vulkan : remove hist and fix typo
NeoZhangJianyu pushed a commit to NeoZhangJianyu/llama.cpp that referenced this pull request Mar 12, 2024
* ggml : remove old quantization functions

ggml-ci

* ggml : simplify ggml_quantize_chunk

ggml-ci

* ggml : restrict correctness

ggml-ci

* ggml : remove hist data from the quantization API

ggml-ci

* tests : remove hist usage in test-backend-ops

ggml-ci

* vulkan : remove hist and fix typo
jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
* ggml : remove old quantization functions

ggml-ci

* ggml : simplify ggml_quantize_chunk

ggml-ci

* ggml : restrict correctness

ggml-ci

* ggml : remove hist data from the quantization API

ggml-ci

* tests : remove hist usage in test-backend-ops

ggml-ci

* vulkan : remove hist and fix typo
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
* ggml : remove old quantization functions

ggml-ci

* ggml : simplify ggml_quantize_chunk

ggml-ci

* ggml : restrict correctness

ggml-ci

* ggml : remove hist data from the quantization API

ggml-ci

* tests : remove hist usage in test-backend-ops

ggml-ci

* vulkan : remove hist and fix typo
phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026
* ggml : remove old quantization functions

ggml-ci

* ggml : simplify ggml_quantize_chunk

ggml-ci

* ggml : restrict correctness

ggml-ci

* ggml : remove hist data from the quantization API

ggml-ci

* tests : remove hist usage in test-backend-ops

ggml-ci

* vulkan : remove hist and fix typo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants