Fix ggml_tensor_extra_gpu memory leak#2146
Fix ggml_tensor_extra_gpu memory leak#2146eugen-ajechiloae-clearml wants to merge 1 commit intoggml-org:masterfrom
Conversation
Allocate the `extra` member of `ggml_tensor` statically
JohannesGaessler
left a comment
There was a problem hiding this comment.
Would be fine with me but ggml changes will need to be approved by @ggerganov.
|
#2099 uses |
|
Not a good change - the CUDA implementation should clean-up the allocated memory |
|
Hi @ggerganov ! Is there any reason why you allocate the memory dynamically? Cleaning up the memory seems very difficult . Looks like the same memory region is used by multiple instances of |
|
The goal is to keep
Is it really so difficult. I guess we could pre-allocate a single large enough buffer in If it is indeed too difficult we can think of some alternative approaches, but the main requirement is to not couple |
|
AFAIK, there are two types of Since the lifecycle of them is tied to All temp allocations are made in typedef void (*offload_func_t)(void * offload_ctx, struct ggml_tensor * tensor);
int llama_eval_internal(...) {
ctx0 = ggml_init(...);
#ifdef GGML_USE_CUBLAS
void * offload_ctx = ggml_cuda_init_offload_ctx(...);
#endif
// Now the offload_func will have to be called like this
offload_func(offload_ctx, tensor);
#ifdef GGML_USE_CUBLAS
ggml_cuda_free_offload_ctx(offload_ctx);
#endif
ggml_free(ctx0);
}It's a bit more verbose but we correctly track the life cycle of those temp allocations.
The |
|
Memory management seems a bit too complicated for me in this case, so I just wanted to let you know that I will not open another PR for this issue. Thank you guys for your input! |
Fixes #2145. Allocate the
extramember ofggml_tensorstatically