eval-callback: Example how to use eval callback for debugging#6576
eval-callback: Example how to use eval callback for debugging#6576
Conversation
|
Pretty cool. Some notes:
ggml_tensor * t;
char * data; // tensor data
for (int64_t i3 = 0; i3 < t->ne[3]; i3++) {
for (int64_t i2 = 0; i2 < t->ne[2]; i2++) {
for (int64_t i1 = 0; i1 < t->ne[1]; i1++) {
for (int64_t i0 = 0; i0 < t->ne[0]; i0++) {
size_t i = i3*t->nb[3] + i2*t->nb[2] + i1*t->nb[1] + i0*t->nb[0];
float v = *(float *)(data + i);
}
}
}
}
|
This comment was marked as off-topic.
This comment was marked as off-topic.
Looks better, cool! |
|
@slaren @ggerganov Can I merge ? |
ggerganov
left a comment
There was a problem hiding this comment.
Nice tool! I've found that also printing the total sum of the elements in the tensor is sometimes a useful metric to look at when debugging
Merge after slaren approves
|
Not a big deal, but I think that |
| } | ||
|
|
||
| printf("%s: %24s = (%s) %10s(%s{%s}, %s}) = {%s}\n", __func__, | ||
| t->name, ggml_type_name(t->type), ggml_op_name(t->op), |
There was a problem hiding this comment.
I forgot to mention that you can use ggml_op_desc instead of ggml_op_name to get proper names for unary ops too instead of just UNARY.
|
I'm not sure, but it looks like this PR is behind the currently-failing CI builds on OSX, and I'm not currently smart-enough to figure out why. Example: https://github.com/ggerganov/llama.cpp/actions/runs/8651988935/job/23723903645#step:5:4773 Also, I could be wrong, but this might have snuck in a libcurl dependency in at a level that we aren't comfortable with, but again, I don't have a full handle on this yet either. |
|
Maybe adding |
…rg#6576) * gguf-debug: Example how to use ggml callback for debugging * gguf-debug: no mutex, verify type, fix stride. * llama: cv eval: move cb eval field in common gpt_params * ggml_debug: use common gpt_params to pass cb eval. Fix get tensor SIGV random. * ggml_debug: ci: add tests * ggml_debug: EOL in CMakeLists.txt * ggml_debug: Remove unused param n_batch, no batching here * ggml_debug: fix trailing spaces * ggml_debug: fix trailing spaces * common: fix cb_eval and user data not initialized * ci: build revert label * ggml_debug: add main test label * doc: add a model: add a link to ggml-debug * ggml-debug: add to make toolchain * ggml-debug: tests add the main label * ggml-debug: ci add test curl label * common: allow the warmup to be disabled in llama_init_from_gpt_params * ci: add curl test * ggml-debug: better tensor type support * gitignore : ggml-debug * ggml-debug: printing also the sum of each tensor * ggml-debug: remove block size * eval-callback: renamed from ggml-debug * eval-callback: fix make toolchain --------- Co-authored-by: slaren <slarengh@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
…rg#6576) * gguf-debug: Example how to use ggml callback for debugging * gguf-debug: no mutex, verify type, fix stride. * llama: cv eval: move cb eval field in common gpt_params * ggml_debug: use common gpt_params to pass cb eval. Fix get tensor SIGV random. * ggml_debug: ci: add tests * ggml_debug: EOL in CMakeLists.txt * ggml_debug: Remove unused param n_batch, no batching here * ggml_debug: fix trailing spaces * ggml_debug: fix trailing spaces * common: fix cb_eval and user data not initialized * ci: build revert label * ggml_debug: add main test label * doc: add a model: add a link to ggml-debug * ggml-debug: add to make toolchain * ggml-debug: tests add the main label * ggml-debug: ci add test curl label * common: allow the warmup to be disabled in llama_init_from_gpt_params * ci: add curl test * ggml-debug: better tensor type support * gitignore : ggml-debug * ggml-debug: printing also the sum of each tensor * ggml-debug: remove block size * eval-callback: renamed from ggml-debug * eval-callback: fix make toolchain --------- Co-authored-by: slaren <slarengh@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Motivation
It can be useful to debug the inference graph, for example, to compare with the original pytorch version.
Also, one would need to retrieve intermediate tensors data like imatrix or advanced NLP.
This example shows how to use inference callback.
Suggestions from @slaren, thanks:
DbrxForCausalLM#6515 (comment)Changes
ggml-debugthat prints each operation and outputs tensors' data.Example
Output
ggml_debug: inp_embd = (f32) GET_ROWS(token_embd.weight{2560, 51200, 1, 1}, inp_tokens{1, 1, 1, 1}}) = {2560, 1, 1, 1} [ [ [ -0.0181, 0.0272, 0.0272, ...], ], ] ggml_debug: norm-0 = (f32) NORM(CUDA0#inp_embd#0{2560, 1, 1, 1}, }) = {2560, 1, 1, 1} [ [ [ -0.6989, 1.0636, 1.0636, ...], ], ] ggml_debug: norm_w-0 = (f32) MUL(norm-0{2560, 1, 1, 1}, blk.0.attn_norm.weight{2560, 1, 1, 1}}) = {2560, 1, 1, 1} [ [ [ -0.1800, 0.2817, 0.2632, ...], ], ] ggml_debug: attn_norm-0 = (f32) ADD(norm_w-0{2560, 1, 1, 1}, blk.0.attn_norm.bias{2560, 1, 1, 1}}) = {2560, 1, 1, 1} # redacted ggml_debug: wqkv-0 = (f32) MUL_MAT(blk.0.attn_qkv.weight{2560, 7680, 1, 1}, attn_norm-0{2560, 1, 1, 1}}) = {7680, 1, 1, 1} # redacted ggml_debug: bqkv-0 = (f32) ADD(wqkv-0{7680, 1, 1, 1}, blk.0.attn_qkv.bias{7680, 1, 1, 1}}) = {7680, 1, 1, 1} # redacted ggml_debug: bqkv-0 (view) = (f32) VIEW(bqkv-0{7680, 1, 1, 1}, }) = {2560, 1, 1, 1} # redacted ggml_debug: Qcur-0 = (f32) CONT(bqkv-0 (view){2560, 1, 1, 1}, }) = {2560, 1, 1, 1} # redacted ggml_debug: Qcur-0 (reshaped) = (f32) RESHAPE(Qcur-0{2560, 1, 1, 1}, }) = {80, 32, 1, 1} # redacted ggml_debug: Qcur-0 = (f32) ROPE(Qcur-0 (reshaped){80, 32, 1, 1}, CUDA0#inp_pos#0{1, 1, 1, 1}}) = {80, 32, 1, 1} [ [ [ -1.1135, 1.4604, -1.9226, ...], [ -0.3608, 0.5076, -1.8866, ...], [ 1.7643, 0.0273, -2.1065, ...], ... ], ] ... ml_debug: kq_soft_max_ext-0 = (f32) SOFT_MAX(kq-0{32, 1, 32, 1}, CUDA0#KQ_mask#0{32, 1, 1, 1}}) = {32, 1, 32, 1} [ [ [ 1.0000, 0.0000, 0.0000, ...], ], [ [ 1.0000, 0.0000, 0.0000, ...], ], [ [ 1.0000, 0.0000, 0.0000, ...], ], ... ] ggml_debug: kqv-0 = (f32) MUL_MAT(v-0{32, 80, 32, 1}, kq_soft_max_ext-0{32, 1, 32, 1}}) = {80, 1, 32, 1} [ [ [ -0.2136, -0.2137, 0.3335, ...], ], [ [ -0.2139, 0.2949, -0.0338, ...], ], [ [ -0.4204, -0.0442, -0.6392, ...], ], ... ] ggml_debug: kqv_merged-0 = (f32) PERMUTE(kqv-0{80, 1, 32, 1}, }) = {80, 32, 1, 1} [ [ [ -0.2136, -0.2137, 0.3335, ...], [ -0.2139, 0.2949, -0.0338, ...], [ -0.4204, -0.0442, -0.6392, ...], ... ], ] ggml_debug: kqv_merged_cont-0 = (f32) CONT(kqv_merged-0{80, 32, 1, 1}, }) = {2560, 1, 1, 1} # redacted ggml_debug: kqv_wo-0 = (f32) MUL_MAT(blk.0.attn_output.weight{2560, 2560, 1, 1}, kqv_merged_cont-0{2560, 1, 1, 1}}) = {2560, 1, 1, 1} # redacted ggml_debug: kqv_out-0 = (f32) ADD(kqv_wo-0{2560, 1, 1, 1}, blk.0.attn_output.bias{2560, 1, 1, 1}}) = {2560, 1, 1, 1} # redacted ggml_debug: ffn_up-0 = (f32) MUL_MAT(blk.0.ffn_up.weight{2560, 10240, 1, 1}, attn_norm-0{2560, 1, 1, 1}}) = {10240, 1, 1, 1} # redacted ggml_debug: ffn_up_b-0 = (f32) ADD(ffn_up-0{10240, 1, 1, 1}, blk.0.ffn_up.bias{10240, 1, 1, 1}}) = {10240, 1, 1, 1} # redacted ggml_debug: ffn_gelu-0 = (f32) UNARY(ffn_up_b-0{10240, 1, 1, 1}, }) = {10240, 1, 1, 1} # redacted ggml_debug: ffn_down-0 = (f32) MUL_MAT(blk.0.ffn_down.weight{10240, 2560, 1, 1}, ffn_gelu-0{10240, 1, 1, 1}}) = {2560, 1, 1, 1} # redacted ggml_debug: ffn_out-0 = (f32) ADD(ffn_down-0{2560, 1, 1, 1}, blk.0.ffn_down.bias{2560, 1, 1, 1}}) = {2560, 1, 1, 1} [ [ [ 0.3379, -0.0695, -0.1994, ...], ], ] # redacted ggml_debug: node_1218 = (f32) GET_ROWS(l_out-30{2560, 1, 1, 1}, CUDA0#inp_out_ids#0{1, 1, 1, 1}}) = {2560, 1, 1, 1} [ [ [ 5.1697, 0.5750, 2.1958, ...], ], ] ggml_debug: l_out-31 = (f32) ADD(l_out-31{2560, 1, 1, 1}, node_1218{2560, 1, 1, 1}}) = {2560, 1, 1, 1} [ [ [ 5.3609, 0.9790, 2.5628, ...], ], ] # redacted ggml_debug: result_output_no_bias = (f32) MUL_MAT(output.weight{2560, 51200, 1, 1}, result_norm{2560, 1, 1, 1}}) = {51200, 1, 1, 1} # redacted ggml_debug: result_output = (f32) ADD(result_output_no_bias{51200, 1, 1, 1}, output.bias{51200, 1, 1, 1}}) = {51200, 1, 1, 1} [ [ [ 10.6199, 8.5389, 11.3658, ...], ],