Fix integer overflow in GGUF tensor parsing#18674
Fix integer overflow in GGUF tensor parsing#18674alexanderkent wants to merge 2 commits intoggml-org:masterfrom
Conversation
This PR addresses a heap buffer overflow vulnerability caused by integer overflow in ggml_nbytes during GGUF tensor parsing. Changes: - ggml/src/ggml.c: Added ggml_nbytes_safe() with checked arithmetic that returns SIZE_MAX on overflow. - ggml/src/gguf.cpp: Added strict validation in gguf_init_from_file_impl to reject tensors where byte size overflows. - ggml/include/ggml.h: Declared ggml_nbytes_safe() API. Impact: Prevents heap-based buffer overflow where ggml_nbytes wraps around due to integer overflow. Mitigates potential RCE via malicious GGUF files.
How would one do that? Was "AI" used for this PR? |
|
Let's not be alarmist here. It would be helpful to publish the malformed GGUF example. This is a buffer overflow, not an RCE (the R stands for Remote), and turning it into a genuine ACE (Arbitrary Code Execution: Local exploit) would require significant expertise and bypassing modern OS protections (ASLR, DEP/NX, stack canaries, etc.). So let's fix this local buffer overflow in a minimalist way :) I tested the patch, no regressions, but I think we can make it simpler. |
JohannesGaessler
left a comment
There was a problem hiding this comment.
This PR is needlessly complicated, we already have a check that the number of elements is representable as int64_t, it's enough to just do an equivalent check afterwards for the size in bytes and size_t. Also add a corresponding test to test-gguf.cpp. I don't see how a potential overflow in ggml_nbytes could be exploited in terms of security.
Removed redundant inline overflow checks from stride calculations. The ggml_nbytes_safe() call before allocation handles all overflow scenarios, making the earlier checks unnecessary.
|
Like this one ? (root|~/llama.cpp.pascal) git diff 86ec8a55964de893229209b320367a461d03eb86
diff --git a/ggml/src/ggml.c b/ggml/src/ggml.c
index 09b8eb466..7845c2cb2 100644
--- a/ggml/src/ggml.c
+++ b/ggml/src/ggml.c
@@ -1235,6 +1235,26 @@ int64_t ggml_nrows(const struct ggml_tensor * tensor) {
return tensor->ne[1]*tensor->ne[2]*tensor->ne[3];
}
+static inline bool ggml_size_add_overflow(size_t a, size_t b, size_t * result) {
+ if (a > SIZE_MAX - b) {
+ return true;
+ }
+ *result = a + b;
+ return false;
+}
+
+static inline bool ggml_size_mul_overflow(size_t a, size_t b, size_t * result) {
+ if (a == 0 || b == 0) {
+ *result = 0;
+ return false;
+ }
+ if (a > SIZE_MAX / b) {
+ return true;
+ }
+ *result = a * b;
+ return false;
+}
+
size_t ggml_nbytes(const struct ggml_tensor * tensor) {
for (int i = 0; i < GGML_MAX_DIMS; ++i) {
if (tensor->ne[i] <= 0) {
@@ -1247,13 +1267,25 @@ size_t ggml_nbytes(const struct ggml_tensor * tensor) {
if (blck_size == 1) {
nbytes = ggml_type_size(tensor->type);
for (int i = 0; i < GGML_MAX_DIMS; ++i) {
- nbytes += (tensor->ne[i] - 1)*tensor->nb[i];
+ size_t add;
+ if (ggml_size_mul_overflow((size_t) (tensor->ne[i] - 1), tensor->nb[i], &add) ||
+ ggml_size_add_overflow(nbytes, add, &nbytes)) {
+ GGML_ABORT("%s: tensor byte size overflow", __func__);
+ }
}
}
else {
- nbytes = tensor->ne[0]*tensor->nb[0]/blck_size;
+ size_t base;
+ if (ggml_size_mul_overflow((size_t) tensor->ne[0], tensor->nb[0], &base)) {
+ GGML_ABORT("%s: tensor byte size overflow", __func__);
+ }
+ nbytes = base / blck_size;
for (int i = 1; i < GGML_MAX_DIMS; ++i) {
- nbytes += (tensor->ne[i] - 1)*tensor->nb[i];
+ size_t add;
+ if (ggml_size_mul_overflow((size_t) (tensor->ne[i] - 1), tensor->nb[i], &add) ||
+ ggml_size_add_overflow(nbytes, add, &nbytes)) {
+ GGML_ABORT("%s: tensor byte size overflow", __func__);
+ }
}
}
This patch just adds two inline helpers that check if multiplication or addition would overflow before doing them, then aborts cleanly if they would: keeps it surgical in ggml_nbytes() without touching anything else. |
|
I'm checking I think we can make it even simpler, earlier in the loader !!!! |
|
We can do it here, with the other checks : diff --git a/ggml/src/gguf.cpp b/ggml/src/gguf.cpp
index b165d8bdc..5cd11ba46 100644
--- a/ggml/src/gguf.cpp
+++ b/ggml/src/gguf.cpp
@@ -585,6 +585,16 @@ struct gguf_context * gguf_init_from_file_impl(FILE * file, struct gguf_init_par
break;
}
+ // check that total size in bytes fits in size_t
+ const uint64_t ne_total = (uint64_t)info.t.ne[0] * info.t.ne[1] * info.t.ne[2] * info.t.ne[3];
+ const uint64_t bytes_total = ne_total * type_size / blck_size;
+ if (bytes_total > SIZE_MAX) {
+ GGML_LOG_ERROR("%s: tensor '%s' size overflow (%" PRIu64 " bytes > SIZE_MAX)\n",
+ __func__, info.t.name, bytes_total);
+ ok = false;
+ break;
+ }
+
// calculate byte offsets given the tensor shape and type
info.t.nb[0] = type_size;
info.t.nb[1] = info.t.nb[0]*(info.t.ne[0]/blck_size); |
|
Just check whether |
|
Yes! divides before multiplying to avoid intermediate overflow in the calculation itself : |
has offset 320868864, expected 5383208929597152
|
|
OK I need to patch a model with a 3D or 4D tensor? |
|
Even the MoE have 3D, I didn't notice LLM with 4D tensors it's a clean exit, master without patch, no buffer overflow, this clean exit does not invalidate the arithmetic issue itself. My repro only patches an existing GGUF and therefore fails early on offset validation; reaching the ggml_nbytes overflow would likely require generating a fully coherent GGUF from scratch. |
Could you provide a working PoC for such attack? |
Worst-case scenario (realistic): |
|
Repro input would really help here. For GGUF parsing (or any file/param input) issues, I think we should require at least a minimal crashing sample (DoS-level, shared privately with maintainers (?)). That allows us to add a regression test and validate a minimal fix (crash before -> no crash after) instead of discussing theoretical exploitability. I can test any provided samples in a hardened VM on a dedicated machine without problem! |
|
it's interesting now : |
|
With minimized patch : BEFORE:
AFTER:
9-line fix catches overflow early in loader, prevents segfault in llama-gguf and clarifies error in llama-cli |
|
A working python generator to document/test : And tested patch : ServeurpersoCom@c503651 |
|
@alexanderkent I don't think the PoC is valid as-is, because you are intentionally using When running with I don't believe this is a valid vulnerability in GGUF or GGML library. It's just the downstream code misses the check for boundary. llama.cpp has such check, but |
|
Closing because this is not a valid bug in GGML or libllama |
|
I would consider it a valid bug in |
|
I can reproduce with llama-quantize, though I understand there may be debate about whether this tool qualifies as a production utility or primarily a debugging tool. With my 2 day ago patch : Do you want me to test the PR patch? It seems unnecessarily larger. |
|
|


This PR addresses a heap buffer overflow vulnerability caused by integer overflow in ggml_nbytes during GGUF tensor parsing.
Changes:
Impact:
Prevents heap-based buffer overflow where ggml_nbytes wraps around due to integer overflow. Mitigates potential RCE via malicious GGUF files.