eval-callback: Example how to use eval callback for debugging by phymbert · Pull Request #6576 · ggml-org/llama.cpp

phymbert · 2024-04-10T01:57:22Z

Motivation

It can be useful to debug the inference graph, for example, to compare with the original pytorch version.
Also, one would need to retrieve intermediate tensors data like imatrix or advanced NLP.
This example shows how to use inference callback.

Suggestions from @slaren, thanks:

model: support arch DbrxForCausalLM #6515 (comment)

Changes

Add a simple example ggml-debug that prints each operation and outputs tensors' data.

Example

eval-callback \
  --hf-repo ggml-org/models \
  --hf-file phi-2/ggml-model-q4_0.gguf \
  --model phi-2-q4_0.gguf \
  --prompt hello \
  --seed 42 \
  -ngl 33

Output

ggml_debug:                 inp_embd = (f32)   GET_ROWS(token_embd.weight{2560, 51200, 1, 1}, inp_tokens{1, 1, 1, 1}}) = {2560, 1, 1, 1} 
                                     [
                                      [
                                       [ -0.0181,   0.0272,   0.0272, ...],
                                      ],
                                     ]
ggml_debug:                   norm-0 = (f32)       NORM(CUDA0#inp_embd#0{2560, 1, 1, 1}, }) = {2560, 1, 1, 1} 
                                     [
                                      [
                                       [ -0.6989,   1.0636,   1.0636, ...],
                                      ],
                                     ]
ggml_debug:                 norm_w-0 = (f32)        MUL(norm-0{2560, 1, 1, 1}, blk.0.attn_norm.weight{2560, 1, 1, 1}}) = {2560, 1, 1, 1} 
                                     [
                                      [
                                       [ -0.1800,   0.2817,   0.2632, ...],
                                      ],
                                     ]
ggml_debug:              attn_norm-0 = (f32)        ADD(norm_w-0{2560, 1, 1, 1}, blk.0.attn_norm.bias{2560, 1, 1, 1}}) = {2560, 1, 1, 1} 
# redacted
ggml_debug:                   wqkv-0 = (f32)    MUL_MAT(blk.0.attn_qkv.weight{2560, 7680, 1, 1}, attn_norm-0{2560, 1, 1, 1}}) = {7680, 1, 1, 1} 
# redacted
ggml_debug:                   bqkv-0 = (f32)        ADD(wqkv-0{7680, 1, 1, 1}, blk.0.attn_qkv.bias{7680, 1, 1, 1}}) = {7680, 1, 1, 1} 
# redacted
ggml_debug:            bqkv-0 (view) = (f32)       VIEW(bqkv-0{7680, 1, 1, 1}, }) = {2560, 1, 1, 1} 
# redacted
ggml_debug:                   Qcur-0 = (f32)       CONT(bqkv-0 (view){2560, 1, 1, 1}, }) = {2560, 1, 1, 1} 
# redacted
ggml_debug:        Qcur-0 (reshaped) = (f32)    RESHAPE(Qcur-0{2560, 1, 1, 1}, }) = {80, 32, 1, 1} 
# redacted
ggml_debug:                   Qcur-0 = (f32)       ROPE(Qcur-0 (reshaped){80, 32, 1, 1}, CUDA0#inp_pos#0{1, 1, 1, 1}}) = {80, 32, 1, 1} 
                                     [
                                      [
                                       [ -1.1135,   1.4604,  -1.9226, ...],
                                       [ -0.3608,   0.5076,  -1.8866, ...],
                                       [  1.7643,   0.0273,  -2.1065, ...],
                                       ...
                                      ],
                                     ]
...
ml_debug:        kq_soft_max_ext-0 = (f32)   SOFT_MAX(kq-0{32, 1, 32, 1}, CUDA0#KQ_mask#0{32, 1, 1, 1}}) = {32, 1, 32, 1} 
                                     [
                                      [
                                       [  1.0000,   0.0000,   0.0000, ...],
                                      ],
                                      [
                                       [  1.0000,   0.0000,   0.0000, ...],
                                      ],
                                      [
                                       [  1.0000,   0.0000,   0.0000, ...],
                                      ],
                                     ...
                                     ]
ggml_debug:                    kqv-0 = (f32)    MUL_MAT(v-0{32, 80, 32, 1}, kq_soft_max_ext-0{32, 1, 32, 1}}) = {80, 1, 32, 1} 
                                     [
                                      [
                                       [ -0.2136,  -0.2137,   0.3335, ...],
                                      ],
                                      [
                                       [ -0.2139,   0.2949,  -0.0338, ...],
                                      ],
                                      [
                                       [ -0.4204,  -0.0442,  -0.6392, ...],
                                      ],
                                     ...
                                     ]
ggml_debug:             kqv_merged-0 = (f32)    PERMUTE(kqv-0{80, 1, 32, 1}, }) = {80, 32, 1, 1} 
                                     [
                                      [
                                       [ -0.2136,  -0.2137,   0.3335, ...],
                                       [ -0.2139,   0.2949,  -0.0338, ...],
                                       [ -0.4204,  -0.0442,  -0.6392, ...],
                                       ...
                                      ],
                                     ]
ggml_debug:        kqv_merged_cont-0 = (f32)       CONT(kqv_merged-0{80, 32, 1, 1}, }) = {2560, 1, 1, 1} 
# redacted
ggml_debug:                 kqv_wo-0 = (f32)    MUL_MAT(blk.0.attn_output.weight{2560, 2560, 1, 1}, kqv_merged_cont-0{2560, 1, 1, 1}}) = {2560, 1, 1, 1} 
# redacted
ggml_debug:                kqv_out-0 = (f32)        ADD(kqv_wo-0{2560, 1, 1, 1}, blk.0.attn_output.bias{2560, 1, 1, 1}}) = {2560, 1, 1, 1} 
# redacted
ggml_debug:                 ffn_up-0 = (f32)    MUL_MAT(blk.0.ffn_up.weight{2560, 10240, 1, 1}, attn_norm-0{2560, 1, 1, 1}}) = {10240, 1, 1, 1} 
# redacted
ggml_debug:               ffn_up_b-0 = (f32)        ADD(ffn_up-0{10240, 1, 1, 1}, blk.0.ffn_up.bias{10240, 1, 1, 1}}) = {10240, 1, 1, 1} 
# redacted
ggml_debug:               ffn_gelu-0 = (f32)      UNARY(ffn_up_b-0{10240, 1, 1, 1}, }) = {10240, 1, 1, 1} 
# redacted
ggml_debug:               ffn_down-0 = (f32)    MUL_MAT(blk.0.ffn_down.weight{10240, 2560, 1, 1}, ffn_gelu-0{10240, 1, 1, 1}}) = {2560, 1, 1, 1} 
# redacted
ggml_debug:                ffn_out-0 = (f32)        ADD(ffn_down-0{2560, 1, 1, 1}, blk.0.ffn_down.bias{2560, 1, 1, 1}}) = {2560, 1, 1, 1} 
                                     [
                                      [
                                       [  0.3379,  -0.0695,  -0.1994, ...],
                                      ],
                                     ]
# redacted
ggml_debug:                node_1218 = (f32)   GET_ROWS(l_out-30{2560, 1, 1, 1}, CUDA0#inp_out_ids#0{1, 1, 1, 1}}) = {2560, 1, 1, 1} 
                                     [
                                      [
                                       [  5.1697,   0.5750,   2.1958, ...],
                                      ],
                                     ]
ggml_debug:                 l_out-31 = (f32)        ADD(l_out-31{2560, 1, 1, 1}, node_1218{2560, 1, 1, 1}}) = {2560, 1, 1, 1} 
                                     [
                                      [
                                       [  5.3609,   0.9790,   2.5628, ...],
                                      ],
                                     ]
# redacted
ggml_debug:    result_output_no_bias = (f32)    MUL_MAT(output.weight{2560, 51200, 1, 1}, result_norm{2560, 1, 1, 1}}) = {51200, 1, 1, 1} 
# redacted
ggml_debug:            result_output = (f32)        ADD(result_output_no_bias{51200, 1, 1, 1}, output.bias{51200, 1, 1, 1}}) = {51200, 1, 1, 1} 
                                     [
                                      [
                                       [ 10.6199,   8.5389,  11.3658, ...],
                                      ],

slaren · 2024-04-10T02:16:12Z

Pretty cool. Some notes:

There is no need to use a mutex, the callback will never be called from multiple threads
You should at least check the type of the tensors, if only float is supported, ignore the other types
To work correctly with non-contiguous tensors you need to take into account the strides. This is the general pattern to do this:

ggml_tensor * t;
char * data; // tensor data
for (int64_t i3 = 0; i3 < t->ne[3]; i3++) {
    for (int64_t i2 = 0; i2 < t->ne[2]; i2++) {
        for (int64_t i1 = 0; i1 < t->ne[1]; i1++) {
            for (int64_t i0 = 0; i0 < t->ne[0]; i0++) {
                size_t i = i3*t->nb[3] + i2*t->nb[2] + i1*t->nb[1] + i0*t->nb[0];
                float v = *(float *)(data + i);
            }
        }
    }
}

phymbert · 2024-04-10T07:58:20Z

To work correctly with non-contiguous tensors you need to take into account the strides.

Looks better, cool!

Fix get tensor SIGV random.

phymbert · 2024-04-11T10:11:22Z

@slaren @ggerganov Can I merge ?

ggerganov

Nice tool! I've found that also printing the total sum of the elements in the tensor is sometimes a useful metric to look at when debugging

Merge after slaren approves

slaren · 2024-04-11T11:59:48Z

Not a big deal, but I think that ggml-debug will be a confusing name for this example. Maybe something like eval-callback would be more clear.

slaren · 2024-04-11T20:34:47Z

+    }
+
+    printf("%s: %24s = (%s) %10s(%s{%s}, %s}) = {%s}\n", __func__,
+           t->name, ggml_type_name(t->type), ggml_op_name(t->op),


I forgot to mention that you can use ggml_op_desc instead of ggml_op_name to get proper names for unary ops too instead of just UNARY.

HanClinto · 2024-04-11T20:40:54Z

I'm not sure, but it looks like this PR is behind the currently-failing CI builds on OSX, and I'm not currently smart-enough to figure out why.

Example: https://github.com/ggerganov/llama.cpp/actions/runs/8651988935/job/23723903645#step:5:4773

Also, I could be wrong, but this might have snuck in a libcurl dependency in at a level that we aren't comfortable with, but again, I don't have a full handle on this yet either.

slaren · 2024-04-11T20:46:11Z

Maybe adding -ngl 0 to the test to avoid using Metal would work. The Metal backend doesn't work reliably in the github runners.

…OSX from ggml-org#6576

…OSX from #6576 (#6619)

…rg#6576) * gguf-debug: Example how to use ggml callback for debugging * gguf-debug: no mutex, verify type, fix stride. * llama: cv eval: move cb eval field in common gpt_params * ggml_debug: use common gpt_params to pass cb eval. Fix get tensor SIGV random. * ggml_debug: ci: add tests * ggml_debug: EOL in CMakeLists.txt * ggml_debug: Remove unused param n_batch, no batching here * ggml_debug: fix trailing spaces * ggml_debug: fix trailing spaces * common: fix cb_eval and user data not initialized * ci: build revert label * ggml_debug: add main test label * doc: add a model: add a link to ggml-debug * ggml-debug: add to make toolchain * ggml-debug: tests add the main label * ggml-debug: ci add test curl label * common: allow the warmup to be disabled in llama_init_from_gpt_params * ci: add curl test * ggml-debug: better tensor type support * gitignore : ggml-debug * ggml-debug: printing also the sum of each tensor * ggml-debug: remove block size * eval-callback: renamed from ggml-debug * eval-callback: fix make toolchain --------- Co-authored-by: slaren <slarengh@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

…OSX from ggml-org#6576 (ggml-org#6619)

…rg#6576) * gguf-debug: Example how to use ggml callback for debugging * gguf-debug: no mutex, verify type, fix stride. * llama: cv eval: move cb eval field in common gpt_params * ggml_debug: use common gpt_params to pass cb eval. Fix get tensor SIGV random. * ggml_debug: ci: add tests * ggml_debug: EOL in CMakeLists.txt * ggml_debug: Remove unused param n_batch, no batching here * ggml_debug: fix trailing spaces * ggml_debug: fix trailing spaces * common: fix cb_eval and user data not initialized * ci: build revert label * ggml_debug: add main test label * doc: add a model: add a link to ggml-debug * ggml-debug: add to make toolchain * ggml-debug: tests add the main label * ggml-debug: ci add test curl label * common: allow the warmup to be disabled in llama_init_from_gpt_params * ci: add curl test * ggml-debug: better tensor type support * gitignore : ggml-debug * ggml-debug: printing also the sum of each tensor * ggml-debug: remove block size * eval-callback: renamed from ggml-debug * eval-callback: fix make toolchain --------- Co-authored-by: slaren <slarengh@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

…OSX from ggml-org#6576 (ggml-org#6619)

gguf-debug: Example how to use ggml callback for debugging

067e294

phymbert added model Model specific demo Demonstrate some concept or idea, not intended to be merged labels Apr 10, 2024

phymbert requested review from ggerganov and slaren April 10, 2024 01:57

This comment was marked as off-topic.

Sign in to view

phymbert marked this pull request as draft April 10, 2024 02:36

gguf-debug: no mutex, verify type, fix stride.

f63b722

phymbert marked this pull request as ready for review April 10, 2024 07:57

ggerganov approved these changes Apr 10, 2024

View reviewed changes

Comment thread examples/ggml-debug/ggml-debug.cpp Outdated

phymbert changed the title ~~gguf-debug: Example how to use callback for debugging~~ ggml-debug: Example how to use callback for debugging Apr 10, 2024

phymbert added 3 commits April 10, 2024 20:51

llama: cv eval: move cb eval field in common gpt_params

8fe3be8

ggml_debug: use common gpt_params to pass cb eval.

01dd5e9

Fix get tensor SIGV random.

ggml_debug: ci: add tests

cda1d42

phymbert commented Apr 10, 2024

View reviewed changes

Comment thread .github/workflows/build.yml

ggml_debug: EOL in CMakeLists.txt

2d34bbe

phymbert changed the title ~~ggml-debug: Example how to use callback for debugging~~ ggml-debug: Example how to use eval callback for debugging Apr 10, 2024

phymbert added 4 commits April 10, 2024 21:15

ggml_debug: Remove unused param n_batch, no batching here

fe4b191

ggml_debug: fix trailing spaces

08fa088

ggml_debug: fix trailing spaces

ca6f3ff

common: fix cb_eval and user data not initialized

f3f0d18

phymbert commented Apr 10, 2024

View reviewed changes

Comment thread common/common.h

phymbert added 6 commits April 10, 2024 21:48

ci: build revert label

1a031d3

ggml_debug: add main test label

368272c

Merge remote-tracking branch 'origin/master' into hp/ggml/debug

deadf29

doc: add a model: add a link to ggml-debug

0b33928

ggml-debug: add to make toolchain

a42ebbd

ggml-debug: tests add the main label

f84473d

phymbert added 2 commits April 10, 2024 22:50

ci: add curl test

3f8a93f

ggml-debug: better tensor type support

cfb820b

phymbert requested a review from slaren April 10, 2024 21:45

phymbert mentioned this pull request Apr 11, 2024

Server: Add prompt processing progress endpoint? #6586

Closed

ggerganov approved these changes Apr 11, 2024

View reviewed changes

ggerganov and others added 2 commits April 11, 2024 13:58

gitignore : ggml-debug

bb359cd

ggml-debug: printing also the sum of each tensor

8d7be2c

slaren reviewed Apr 11, 2024

View reviewed changes

Comment thread examples/ggml-debug/ggml-debug.cpp Outdated

slaren approved these changes Apr 11, 2024

View reviewed changes

phymbert added 2 commits April 11, 2024 14:07

ggml-debug: remove block size

28fd76f

eval-callback: renamed from ggml-debug

ee588a5

phymbert changed the title ~~ggml-debug: Example how to use eval callback for debugging~~ eval-callback: Example how to use eval callback for debugging Apr 11, 2024

eval-callback: fix make toolchain

12731d2

phymbert merged commit b804b1e into master Apr 11, 2024

phymbert deleted the hp/ggml/debug branch April 11, 2024 12:51

slaren reviewed Apr 11, 2024

View reviewed changes

HanClinto added a commit to HanClinto/llama.cpp that referenced this pull request Apr 11, 2024

As suggested by @slaren, disabling Metal for test to fix CI build on …

d031c8f

…OSX from ggml-org#6576

HanClinto mentioned this pull request Apr 11, 2024

Fix for CI macOS-latest-cmake-arm64 #6619

Merged

HanClinto added a commit that referenced this pull request Apr 11, 2024

As suggested by @slaren, disabling Metal for test to fix CI build on …

f7001cc

…OSX from #6576 (#6619)

phymbert mentioned this pull request Apr 12, 2024

eval-callback: use ggml_op_desc to pretty print unary operator name #6631

Merged

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

As suggested by @slaren, disabling Metal for test to fix CI build on …

a3b1d70

…OSX from ggml-org#6576 (ggml-org#6619)

phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026

As suggested by @slaren, disabling Metal for test to fix CI build on …

971d231

…OSX from ggml-org#6576 (ggml-org#6619)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eval-callback: Example how to use eval callback for debugging#6576

eval-callback: Example how to use eval callback for debugging#6576
phymbert merged 25 commits intomasterfrom
hp/ggml/debug

phymbert commented Apr 10, 2024 •

edited

Loading

Uh oh!

slaren commented Apr 10, 2024 •

edited

Loading

Uh oh!

This comment was marked as off-topic.

phymbert commented Apr 10, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

phymbert commented Apr 11, 2024

Uh oh!

ggerganov left a comment

Uh oh!

Uh oh!

slaren commented Apr 11, 2024

Uh oh!

slaren Apr 11, 2024

Uh oh!

HanClinto commented Apr 11, 2024

Uh oh!

slaren commented Apr 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

phymbert commented Apr 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Example

Output

Uh oh!

slaren commented Apr 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as off-topic.

phymbert commented Apr 10, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

phymbert commented Apr 11, 2024

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

slaren commented Apr 11, 2024

Uh oh!

slaren Apr 11, 2024

Choose a reason for hiding this comment

Uh oh!

HanClinto commented Apr 11, 2024

Uh oh!

slaren commented Apr 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

phymbert commented Apr 10, 2024 •

edited

Loading

slaren commented Apr 10, 2024 •

edited

Loading