Implement backward passes for llama with small training llama from scratch example by xaedes · Pull Request #1360 · ggml-org/llama.cpp

xaedes · 2023-05-07T21:36:41Z

Training a llama directly with ggml would be really nice.

For this I implemented the backward passes required for the llama model, tested them with test_grad0 from ggml repo and trained a small llama from scratch to output a sinus wave.

Also see the more detailed discussion in ggml-org/ggml#8 (comment)

List of all new operations that I had to add:

GGML_OP_ADD1 : Could be replaced with add(X,repeat(Y,X)) but it is faster when repeat can be avoided.
GGML_OP_ACC : Necessary for view backward pass. This adds src1 to view(src0, nb, offset) and returns tensor of shape src0.
GGML_OP_SET : (this is new) Necessary for propagating gradients through kv cache. Instead of copying to kv cache this function sets the values in the kv cache viewed with offsets and strides and returns a tensor representing the modified kv cache. This can also be inplace returning a view to the modified kv cache.
GGML_OP_LOG : (this is new) Necessary for cross entropy loss
GGML_OP_SUM_ROWS : Necessary for repeat backward pass: Reduces rows by summing them. shape[a,b,c,d] -> shape[1,b,c,d]
GGML_OP_SILU_BACK : Necessary for silu backward pass
GGML_OP_RMS_NORM_BACK : Could also be implemented using primitives, at the cost of performance.
GGML_OP_GET_ROWS_BACK : Necessary for get_rows backward pass: Adds src0[i] rows to opt0[src1[i]] rows, returning a tensor of shape opt0.
GGML_OP_DIAG : Necessary for softmax backward pass, alternative would have been to implement SOFTMAX_BACK directly, but DIAG is at least usable for other stuff. It turns rows into diagonal matrices.
GGML_OP_DIAG_MASK_ZERO : Necessary for diag_mask_inf backward pass
GGML_OP_ROPE_BACK : Necessary for rope backward pass.

Notable other changes:

add inplace and non-inplace variants for scale, diag_mask_inf, soft_max and rope
fix sub, mul and div functions to work correctly with transposed tensor, uses the same logic as in add:
fix ggml_forward_add functions to work correctly with transposed tensors. uses the same logic as in ggml_compute_forward_add_q_f32, but make it consistent across all ggml_compute_forward_add_... functions. this also slightly changes the mem access pattern of the different threads to work as in ggml_compute_forward_add_q_f32.
de-duplicate ggml_forward_dup code taking care of contiguous tensors of same type. with this we can duplicate tensor of any type as long as they are contiguous. the function is used in dup, get_rows_back and diag_mask (when not inplace).
there are some maybe too verbose comments including step-by-step derivation of gradients that could (or should?) be cleaned away.
(this is new) I added 1d and 4d functions for ggml_reshape and ggml_view.

The performance of various parts of the training could be improved, especially a fast ggml_out_prod could help speeding up the matrix multiplication backward pass.

There are two additional test files, one for testing gradients taken from ggml repo and one small test for testing optimization in general.

Exemplary training of a small llama model is demonstrated in the self-contained baby-llama example, where it is trained to output a sinus wave.

Some notes on first training tests:

lbfgs optimizer is faster and trains better than adam
target logits should represent NEXT token, not the current
target logits -1 & +1 work much better than 0 & +1
I think adding a BOS token also helped improving the training
trained with cross entropy loss gave worse generation results than trained with summed squared logit difference loss

A parallel batched forward function would probably be a good improvement. Training on multiple examples in (parallel) batch really seems to improve the training, but currently I can only do that by calling the forward function multiple times with different input data, which costs a lot of nodes in the computation graph, especially since backward pass is necessary as well.
The batches could be stored in another dimension of the tensors. It would probably just require some reshapes and view operations to make it work.

I did not look into training a LoRa finetune yet, but all the necessary machinery for that seems to be working.

- GGML_OP_ADD_AT - GGML_OP_CPY - GGML_OP_MUL_MAT (src0.grad) - GGML_OP_PERMUTE - GGML_OP_RESHAPE - GGML_OP_SCALE - GGML_OP_TRANSPOSE - GGML_OP_VIEW implement additional ggml operation GGML_OP_ADD_AT, which is necessary for backward pass of GGML_OP_VIEW. this operation adds src1 to src0 with data offset, i.e. to view(src0, ..., offset). the values are return in a tensor size of src0. values outside of [data+offset:data+offset+nbytes(src1)] are just the original values from src0. still missing backward passes for llama: - GGML_OP_DIAG_MASK_INF - GGML_OP_GET_ROWS - GGML_OP_RMS_NORM - GGML_OP_ROPE - GGML_OP_SILU - GGML_OP_SOFT_MAX

- GGML_OP_DIAG_MASK_INF - GGML_OP_GET_ROWS - GGML_OP_RMS_NORM - GGML_OP_SILU - GGML_OP_SOFT_MAX add necessary ggml operations GGML_OP_ADD1, GGML_OP_SILU_BACK, GGML_OP_RMS_NORM_BACK, GGML_OP_DIAG_MASK_ZERO, and GGML_OP_ROPE_BACK GGML_OP_ADD1 is necessary to add a scalar value in the backward pass of GGML_OP_SOFT_MAX GGML_OP_ADD1 could also be replaced by using GGML_OP_ADD and GGML_OP_REPEAT, but the performance would be worse. additionally GGML_OP_REPEAT will return unexpected value when the the input to GGML_OP_SOFT_MAX contains only a single scalar. in this case GGML_OP_REPEAT will not return the value that should be repeated (src1) but the value which shape the result should take (src0). So in this case it can not replace GGML_OP_ADD1. GGML_OP_SILU_BACK, GGML_OP_RMS_NORM_BACK and GGML_OP_ROPE_BACK are necessary for backward pass of GGML_OP_SILU, GGML_OP_RMS_NORM and GGML_OP_ROPE. The backward pass for these functions cannot be easily composed of existing operations. Since the backward pass builds a computation graph we need operations forward pass implementations of the the required backward passes. Sounds a bit confusing at first, I know... GGML_OP_DIAG_MASK_ZERO is necessary for backward pass of GGML_OP_DIAG_MASK_INF. Some operations where previously inplace-only. for backward pass there needs to be non-inplace variants. staying consistent with other operations that have non-inplace and inplace variants, the operations are changed to non-inplace and functions with "_inplace" are added which are inplace. in llama we need to call the inplace variants so that it is implemented as before. for llama backward pass we need to use the non-inplace variants. still not completely implemented backward passes for llama: - GGML_OP_ROPE: needs forward pass for GGML_OP_ROPE_BACK - GGML_OP_GET_ROWS: only necessary for tokenizer

after investigation rms norm for quite some time I come to the conclusion that neither norm, nor rms_norm can be threaded, because we need mean over all items, not just of the slices each thread sees.

…get_rows_back

…e console

use sum instead of mean for gradient of scalar scale parameter

use add1(x,y) instead of add(x,repeat(y,x))

use scale(x,y) instead of mul(x,repeat(y,x))

this uses ggml_opt to train a,b for minimal e=sum(sqr(c - a*b)) for random initial a,b,c

ggml_diag constructs diagonal matrices with entries. ggml_diag(shape[a,1,c,d]) -> shape[a,a,c,d]

…of same type. with this we can duplicate tensor of any typ as long as they are contiguous.

when more threads are used than elements exist ie1 was less than ie0, resulting in invalid negative byte count argument in memcpy

required for view backward pass src0 values must be copied to dst, because during addition we don't touch all dst elements in contrast to the normal add function.

ggerganov · 2023-05-08T18:19:31Z

After fixing the vDSP_vsub argument order, it works now

xaedes · 2023-05-11T17:55:59Z

I got a batched forward function working. With this greater number of parallel batches can be trained with ease.

This should also be useful to implement beam search sampling.

ggerganov · 2023-05-11T21:14:47Z

Ha, that is interesting. I somehow thought that to support batched inference some changes in ggml.c would be needed.
Well done!

This should also be useful to implement beam search sampling.

Yes, this is needed for the beam-search decoding in whisper.cpp
Also, another project that can greatly benefit from batched inference is bert.cpp

Edit: now that #1405 has been merged, this PR is now highest priority for merging. Will be look into more details in the following days

ggerganov · 2023-05-12T20:07:50Z

+                    struct ggml_tensor* F08 = ggml_transpose (ctx, F07);
+                    struct ggml_tensor* F09 = ggml_cont      (ctx, F08);
+                    struct ggml_tensor* F10 = ggml_reshape   (ctx, F09, src0->grad);
+


Wow!

Btw, would it make sense to have GGML_OP_REPEAT_BACK that implements this in a kernel?
Maybe if this is some sort of bottleneck. Otherwise, keep it like this.

Yea I shuddered myself at this^^ I think one of the conts is not even necessary.
I also think it would make sense to implement this as an extra operation, should not be that difficult.
Most of the reshaping etc is only necessary to use sum_rows which could be done better in a special operation.

ggerganov · 2023-05-13T07:20:13Z

Hmm, what's wrong with the CI?
It was successful before my changes, then I made some minor edits (fix warnings, indentation) and now it is failing.

/home/runner/work/llama.cpp/llama.cpp/ggml.c: In function ‘ggml_compute_forward_add1’:
/home/runner/work/llama.cpp/llama.cpp/ggml.c:7398:14: error: ‘GGML_TYPE_Q4_2’ undeclared (first use in this function); did you mean ‘GGML_TYPE_Q4_1’?
 7398 |         case GGML_TYPE_Q4_2:
      |              ^~~~~~~~~~~~~~
      |              GGML_TYPE_Q4_1

Did it checkout the wrong branch 🤔

Edit: I see now - it is actually smart enough to merge origin/master into the branch

ggerganov · 2023-05-13T07:51:00Z

@xaedes Is it ok if I merged latest master into your branch? In case you have some pending changes, I can wait for you to do it yourself so I don't mess up your flow

xaedes · 2023-05-13T11:39:49Z

Is it ok if I merged latest master into your branch? In case you have some pending changes, I can wait for you to do it yourself so I don't mess up your flow

@ggerganov Sure

github-actions

clang-tidy made some suggestions

ggerganov

This is a pretty big addition to ggml - I hope it will open up some applications for training in the future

The tests will be moved to the ggml repo after we merge this PR
The batched forward example is very valuable. Have to apply it to whisper.cpp and bert.cpp
After merging this, it would be a good time for some refactoring passes in ggml.c to try and reduce code duplication and overall code size

Thanks @xaedes - excellent effort!

RonanKMcGovern · 2023-09-21T16:56:50Z

@xaedes has there been further progress in this direction since June? In particular, how far are things from being able to train LoRA adapters?

xaedes · 2023-09-22T10:37:58Z

@RonanKMcGovern
Yep, finetuning LORA adapters on LLAMA models works: #2632

…scratch example (ggml-org#1360) * implement 8 of 14 missing backward pass operations used by llama - GGML_OP_ADD_AT - GGML_OP_CPY - GGML_OP_MUL_MAT (src0.grad) - GGML_OP_PERMUTE - GGML_OP_RESHAPE - GGML_OP_SCALE - GGML_OP_TRANSPOSE - GGML_OP_VIEW implement additional ggml operation GGML_OP_ADD_AT, which is necessary for backward pass of GGML_OP_VIEW. this operation adds src1 to src0 with data offset, i.e. to view(src0, ..., offset). the values are return in a tensor size of src0. values outside of [data+offset:data+offset+nbytes(src1)] are just the original values from src0. still missing backward passes for llama: - GGML_OP_DIAG_MASK_INF - GGML_OP_GET_ROWS - GGML_OP_RMS_NORM - GGML_OP_ROPE - GGML_OP_SILU - GGML_OP_SOFT_MAX * implement 5 of 6 missing backward pass operations used by llama - GGML_OP_DIAG_MASK_INF - GGML_OP_GET_ROWS - GGML_OP_RMS_NORM - GGML_OP_SILU - GGML_OP_SOFT_MAX add necessary ggml operations GGML_OP_ADD1, GGML_OP_SILU_BACK, GGML_OP_RMS_NORM_BACK, GGML_OP_DIAG_MASK_ZERO, and GGML_OP_ROPE_BACK GGML_OP_ADD1 is necessary to add a scalar value in the backward pass of GGML_OP_SOFT_MAX GGML_OP_ADD1 could also be replaced by using GGML_OP_ADD and GGML_OP_REPEAT, but the performance would be worse. additionally GGML_OP_REPEAT will return unexpected value when the the input to GGML_OP_SOFT_MAX contains only a single scalar. in this case GGML_OP_REPEAT will not return the value that should be repeated (src1) but the value which shape the result should take (src0). So in this case it can not replace GGML_OP_ADD1. GGML_OP_SILU_BACK, GGML_OP_RMS_NORM_BACK and GGML_OP_ROPE_BACK are necessary for backward pass of GGML_OP_SILU, GGML_OP_RMS_NORM and GGML_OP_ROPE. The backward pass for these functions cannot be easily composed of existing operations. Since the backward pass builds a computation graph we need operations forward pass implementations of the the required backward passes. Sounds a bit confusing at first, I know... GGML_OP_DIAG_MASK_ZERO is necessary for backward pass of GGML_OP_DIAG_MASK_INF. Some operations where previously inplace-only. for backward pass there needs to be non-inplace variants. staying consistent with other operations that have non-inplace and inplace variants, the operations are changed to non-inplace and functions with "_inplace" are added which are inplace. in llama we need to call the inplace variants so that it is implemented as before. for llama backward pass we need to use the non-inplace variants. still not completely implemented backward passes for llama: - GGML_OP_ROPE: needs forward pass for GGML_OP_ROPE_BACK - GGML_OP_GET_ROWS: only necessary for tokenizer * norm & rms_norm can not be threaded: after investigation rms norm for quite some time I come to the conclusion that neither norm, nor rms_norm can be threaded, because we need mean over all items, not just of the slices each thread sees. * remove already resolved TODO * implement backward pass of ggml_rope and ggml_rope_back * implement backward pass for ggml_get_rows and for new operation ggml_get_rows_back * add test-grad0.c * use GGML_PRINT_DEBUG for debug messages which will otherwise flood the console * test both gradients of mul_mat * disable graph dot export as it floods console * bug fixes for silu_back * successfully test silu backward * bug fix for scale backward pass use sum instead of mean for gradient of scalar scale parameter * successfully test scale backward * improve performance of sum backward pass use add1(x,y) instead of add(x,repeat(y,x)) * improve performance of sqr backward pass use scale(x,y) instead of mul(x,repeat(y,x)) * successfully test rope backward * bug fix for cpy backward pass * successfully test cpy backward * bug fix for reshape backward pass * successfully test reshape backward * add test-opt.c this uses ggml_opt to train a,b for minimal e=sum(sqr(c - a*b)) for random initial a,b,c * correctly implement softmax backward pass using new operation ggml_diag ggml_diag constructs diagonal matrices with entries. ggml_diag(shape[a,1,c,d]) -> shape[a,a,c,d] * successfully test soft_max backward * align shape annotations * add shape annotations for llama * de-duplicate ggml_forward_dup code taking care of contiguous tensors of same type. with this we can duplicate tensor of any typ as long as they are contiguous. * fix ggml_compute_forward_dup_same_cont for when nelements < nthreads when more threads are used than elements exist ie1 was less than ie0, resulting in invalid negative byte count argument in memcpy * bug fix for add_at forward required for view backward pass src0 values must be copied to dst, because during addition we don't touch all dst elements in contrast to the normal add function. * successfully test view backward * minor code format improvement * fix ggml_forward_add functions to work correctly with transposed tensors uses the same logic as in ggml_compute_forward_add_q_f32, but make it consistent across all ggml_compute_forward_add_... functions. this also slightly changes the mem access pattern of the different threads to works as in ggml_compute_forward_add_q_f32. * fix ggml_forward_add1 functions to work correctly with transposed tensors uses the same logic as in ggml_compute_forward_add1_q_f32, but make it consistent across all ggml_compute_forward_add1_... functions. this also slightly changes the mem access pattern of the different threads to works as in ggml_compute_forward_add1_q_f32. * test-grad0.c : add print_elements to help with debugging * successfully test permute backward * some minor test-grad0 fixes * fix sub, mul and div functions to work correctly with transposed tensors uses the same logic as in add * implement ggml_cont backward pass * successfully test transpose backward and permute for all permutations also test sub, mul and div up to max n_dims * test-grad0.c add TODO for view_2d and view_3d add_at (required for view backward pass) is a bit tricky for n_dims > 1. * fix comments * successfully test diag_mask_inf and diag_mask_zero backward * test-grad0 : fix test for div nargs and ndims was swapped, corrupting the stack * fix diag_mask to work with non-inplace input * move dup call into the actual add_at functions * fix get rows backward pass * successfully test get_rows backward * fix view backward pass add nb parameters to add_at like in view. together with offset they define how to view dst and src0 during the add_at operation. * successfully test backward pass of view_1d, view_2d and view_3d * fix backward pass for rms_norm I would have used formulas from other frameworks, but they differed so I could not decide which is correct. Instead it was derived here in comment using manual forward-backward automatic differention of rms_norm and simplification. * successfully test backward pass of rms_norm some tests may fail when gradients are large. could not find a satisfying configuration to check for abs error and relative error that passes all tests while still actually testing the results with tight enough error bounds. when looking at the values the "failed" tests look actually ok. for example: rms_norm: ndims=2, i=0, k=2, x0=0.000153, xm=0.000053, xp=0.000253, f0=0.278594, f1=0.086213, g0=961.905457, g1=966.064941, eps=0.000100, error_abs=4.159485, error_rel=0.004324 it is due to the test logic in check_gradients that they fail. * add todos for llama backward pass - implementation for ADD1 backward pass should probably use sum instead of mean (but this backward pass is not required) - repeat is not yet tested and looks like it only works for single element src0 inputs. * add operation ggml_sum_rows ggml_sum_rows(shape[a,b,c,d]) -> shape[1,b,c,d] * add missing GGML_OP_SUM_ROWS * fix backward pass for repeat requires ggml_sum_rows * successfully test backward pass of repeat * update quantization types in switch-case of add_at and add1 * add baby-llama example training a very small llama model from scratch to output a sinusoidal wave. had to increase maximum number of optimization parameters to train from scratch. * fix softmax in baby-llama example * switching from training with adam to lbfgs produces much better results in the baby-llama example * train with two examples, creating new tensors each time.. * fix bug when using ggml_opt to optimize params in one context and use a renewable context for eval and opt when not keeping gradients of model parameters they are overwritten by tensors created by opt, which may be invalid after opt context is renewed. so we need to keep the original gradients and make dups for opt * train on multiple examples, generate & print tokens with trained model afterwards ctx0 for evaluation and optimization is renewed for each sample * add ggml_reshape_1d, ggml_reshape_4d and ggml_view_4d * fix soft_max backward pass for input->ne[1] != 1 * add ggml_log operation necessary for cross entropy loss * add test for ggml_log gradients * implement backward pass for ggml_sum_rows, necessary for cross entropy loss * implement ggml_repeat support for rank > 2 tensors * add test for ggml_sum_rows gradients * fix training get_example_targets predict the next token, not the current token! * add square_error_loss and cross_entropy_loss functions * optimize loss over multiple samples this increases computation graph, need parallel batched forward for more efficiency. * fix backward pass for add_at and change arguments to have same order as in view * add ggml_set(ctx, a, b) to set b in view of a and return modified a necessary to set values into kv_self cache and properly propagate the gradients * fix kv_self gradients for training use ggml_set instead of ggml_cpy to set kv_self cache with properly propagating gradients * replace inplace operations for training with copying operations to allow gradient propagation * add GGML_ASSERT to catch ggml_rope and back value errors * add trainable lora-only model with all big matrices C split into A,B with A*B=C this is not a lora-finetune, but the whole model changed to have only low-rank "lora" matrices. training this instead of the normal model resulted in much worse results though... * vastly improve training results instead of logit targets 0 and 1 use -1 and +1. * shorten code using a variable * change name of GGML_OP_ADD_AT to GGML_OP_ACC * smaller default values for baby llama model parameters * update static assert of GGML_OP_COUNT * remove shape annotations in llama_eval_internal * revert disabling of threading for rms_norm and norm * rename print functions in baby-llama example * fix call to ggml_set_name * add missing include for strcmp, etc * remove trailing whitespace * reduce number of test-grad0 iterations avoid exceeding timeout of automated tests * remove busy loop that was used as sleep for slower sinus wave generation * disable slow tests grad0 and opt to avoid exceeding timeouts * c++ in baby-llama example use c++ includes instead of c includes use std::min, std::max instead of MIN, MAX macros * c++ in baby-llama example use c++ includes instead of c includes use std::min, std::max instead of MIN, MAX macros * ggml : fix compiler warnings + cosmetic changes * ggml : fix nullptr derefs in GGML_OP_CONT and GGML_OP_RESHAPE back * swap arguments to vDSP_vdiv call documentation for vDSP_vdiv states: "Note that B comes before A!" * swap arguments to vDSP_vdiv call documentation for vDSP_vdiv states: "Note that B comes before A!" * ggml : swap vDSP_vsub args as per documentation * add parallel batched forward function for baby-llama training * cleanup code for batched training * remove trailing whitespace * minor : fix compiler warnings + indentation style * ggml : fix null ptr deref in backward pass * ggml : remove Q4_2 remnants * ggml : fix clang-tidy warnings * baby-llama : couple of clang-tidy warnings --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

xaedes added 30 commits May 1, 2023 02:41

norm & rms_norm can not be threaded:

b908007

after investigation rms norm for quite some time I come to the conclusion that neither norm, nor rms_norm can be threaded, because we need mean over all items, not just of the slices each thread sees.

remove already resolved TODO

36d8a05

implement backward pass of ggml_rope and ggml_rope_back

488decf

implement backward pass for ggml_get_rows and for new operation ggml_…

4e1f81d

…get_rows_back

add test-grad0.c

0da2675

use GGML_PRINT_DEBUG for debug messages which will otherwise flood th…

20e3c1d

…e console

test both gradients of mul_mat

9345f4c

disable graph dot export as it floods console

9d6fc28

bug fixes for silu_back

6fb08b4

successfully test silu backward

671e592

bug fix for scale backward pass

a367eb9

use sum instead of mean for gradient of scalar scale parameter

successfully test scale backward

0197bcb

improve performance of sum backward pass

bfe5072

use add1(x,y) instead of add(x,repeat(y,x))

improve performance of sqr backward pass

b583136

use scale(x,y) instead of mul(x,repeat(y,x))

successfully test rope backward

7571147

bug fix for cpy backward pass

0ea8201

successfully test cpy backward

b2bd822

bug fix for reshape backward pass

c483a7d

successfully test reshape backward

ecf949b

add test-opt.c

54ab300

this uses ggml_opt to train a,b for minimal e=sum(sqr(c - a*b)) for random initial a,b,c

correctly implement softmax backward pass using new operation ggml_diag

1a80e9a

ggml_diag constructs diagonal matrices with entries. ggml_diag(shape[a,1,c,d]) -> shape[a,a,c,d]

successfully test soft_max backward

fea42be

align shape annotations

9310650

add shape annotations for llama

38675e5

de-duplicate ggml_forward_dup code taking care of contiguous tensors …

c1a8893

…of same type. with this we can duplicate tensor of any typ as long as they are contiguous.

fix ggml_compute_forward_dup_same_cont for when nelements < nthreads

83fa6b3

when more threads are used than elements exist ie1 was less than ie0, resulting in invalid negative byte count argument in memcpy

bug fix for add_at forward

cecd6c7

required for view backward pass src0 values must be copied to dst, because during addition we don't touch all dst elements in contrast to the normal add function.

successfully test view backward

124fdca

ggml : swap vDSP_vsub args as per documentation

6ca682b

xaedes added 2 commits May 11, 2023 19:31

add parallel batched forward function for baby-llama training

3e3ed95

cleanup code for batched training

581e5eb

remove trailing whitespace

b9ef08c

ggerganov reviewed May 12, 2023

View reviewed changes

minor : fix compiler warnings + indentation style

f977243

This comment was marked as off-topic.

Sign in to view

ggml : fix null ptr deref in backward pass

33034cf

This comment was marked as off-topic.

Sign in to view

ggerganov added 2 commits May 13, 2023 15:20

Merge remote-tracking branch 'origin/master' into HEAD

092913e

ggml : remove Q4_2 remnants

95a487a

github-actions Bot reviewed May 13, 2023

View reviewed changes

Comment thread ggml.h Outdated

Comment thread ggml.h

Comment thread ggml.h

Comment thread ggml.c

Comment thread ggml.c

Comment thread ggml.c

Comment thread ggml.c

Comment thread ggml.c

Comment thread ggml.c

Comment thread ggml.c

ggerganov added 2 commits May 13, 2023 15:34

ggml : fix clang-tidy warnings

ef3d42a

baby-llama : couple of clang-tidy warnings

dae6ba2

ggerganov approved these changes May 13, 2023

View reviewed changes

ggerganov merged commit f954edd into ggml-org:master May 13, 2023

Green-Sky mentioned this pull request May 15, 2023

How do we finetune the model with new data? #466

Closed

xaedes mentioned this pull request May 30, 2023

Train Text from scratch #1652

Merged

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Closed

ggerganov mentioned this pull request Apr 3, 2026

ggml : deprecate GGML_OP_ADD1 #21363

Merged

Conversation

xaedes commented May 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented May 8, 2023

Uh oh!

xaedes commented May 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented May 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov May 12, 2023

Choose a reason for hiding this comment

Uh oh!

xaedes May 12, 2023

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

ggerganov commented May 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented May 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xaedes commented May 13, 2023

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

RonanKMcGovern commented Sep 21, 2023

Uh oh!

xaedes commented Sep 22, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xaedes commented May 7, 2023 •

edited

Loading

xaedes commented May 11, 2023 •

edited

Loading

ggerganov commented May 11, 2023 •

edited

Loading

ggerganov commented May 13, 2023 •

edited

Loading

ggerganov commented May 13, 2023 •

edited

Loading