Add `cvector-generator` example by ngxson · Pull Request #7514 · ggml-org/llama.cpp

ngxson · 2024-05-24T09:53:42Z

Resolve #6880

Result from last working version: #7514 (comment)

Get hidden layer embeddings
Calculate diff between positive and negative prompts
Implement PCA
Export output to gguf file
Support for multiple pairs of positive/negative prompts
Add README

TODO in next PRs:

Add tests
Generate completions from given prompt (see Add cvector-generator example #7514 (comment))
Using batch and multi-sequences (for llama_decode)
Add other methods: mean, UMAP, SVD

christianazinn · 2024-05-24T20:44:55Z

Could you add a quick usage summary - do you just run ./control-vector-generator -m model ... like usual inferencing?

Also, tried implementing PCA using the ggml library here. Maybe I'm using the wrong methods, but ggml_norm and ggml_norm_inplace always just return a zero vector, and there aren't any docs I can find to set me right - I just want to normalize those vectors to length 1 so repeated calls to ggml_mul_mat don't blow out of precision. Feel free to mess with the linked snippet.

ngxson · 2024-05-24T20:53:43Z

Hi @christianazinn and thanks for your response. We'll move the discussion to here.

Quick explanation: my code has been able to take a pair of positive + negative prompt, calculate embeddings for each layer, and then substract to get the diff. In the end, for each layer, we have one matrix with shape [n_embd, n_tokens]. What we can do now is to reduce it into one single vector [n_embd] using PCA.

The way to use it: ./control-vector-generator -m model.gguf. For now, you need to modify the prompts inside the code (by default, "happy" vs "sad")

It is not urgent so take your time. And feel free to let me know if you have other questions. Thank you.

christianazinn · 2024-05-29T16:16:23Z

Looking into PCA implementation and I realize we have the problem that we're not actually getting square matrices from get-hidden-layers (and one cannot retrieve eigenvectors directly from nonsquare matrices), but this is easily bypassed by multiplying the matrix by its transpose and doing power iteration on that.

However, it appears the matrices we receive are usually tall and skinny. SciPy's original implementation indicates that in this case, the problem is best handled by SVD with the covariance matrix. We may care to implement this after everything else works.

I also don't have push permissions to this branch so whatever changes I make, I'll fork the branch and PR into it.

ngxson · 2024-05-29T21:30:14Z

@christianazinn Thanks for the explanation. Yes I was also wonder how can we turn the embedding vectors into square matrix. It's all clear for me now.

I'll have a look during the weekend. In the meantime, I invited you to my forked repo. You can push directly onto this branch, or you can work on your own PR if you want. Feel free to tag me if you have questions. Thank you !

Implements PCA and file writing using mostly standard libraries. The output is recognized as a functional control vector, but outputs gibberish.

christianazinn · 2024-05-30T04:36:01Z

Thank you, have pushed an implementation with primitives/stdlib. Currently assumes Mistral architecture for the model_hint, but it successfully creates a control vector that is recognized for inference by llama.cpp. It is, of course, very slow, and many things still need to be implemented. I've marked what needs to be "translated" for reference, and left a few TODOs around.

Currently, however, it outputs gibberish when inferencing: e.g. [/AVAILABLE_TOOLS][control_26][control_10][control_31][control_17][control_36][/INST][/TOOL_RESULTS][control_20][TOOL_RESULTS][control_32][control_33][control_23][control_31][control_31][control_20][control_34][/TOOL_RESULTS][control_27][control_8][control_18][INST][control_30]</s> [end of text]. I am not sure why this happens. How are we retrieving positive/negative prompts - do we use the same completion format as the Python implementation, or something else?

Added basic command-line parameters for outfile and one each positive/negative prompt. Refactored some messy code in PCA computation and GGUF exporting. Left a bunch of comments regarding further work needed.

christianazinn · 2024-05-30T15:32:45Z

Notes follow.

I have implemented basic command-line arguments for --outfile, --positive, and --negative. Currently we only support one each of positive/negative prompts.

I've left a few comments about what needs to be fixed in my shoddy implementation, and other things we need to deal with, such as the prompt parsing thing mentioned. It appears we do just parse the individual positive/negative prompts - @ngxson confirm? We will likely want to change this to provide a larger sample space; the blogpost and Python implementation provide reference on implementation.

However, I am seeing promising results with "funny" vs. "boring". Llama2 Q8_0, prompt (for completion) "Here's a funny joke: ". Llama2 was used because #5970 indicates support has not been implemented for architectures other than Llama, but that is probably outdated.

Control vector -1: What do you call a group of paintballs in space? Gravity does not affect them! (and a lot of other very unfunny jokes.)
Control vector 1: A man walked into a library and asked the librarian, "Do you have any books on the history of Madness?" The librarian replied, "It's not a very good idea to write a book on the history of Madness. You will just get a lot of people asking for their money back." (the others were not great, but better than the -1 group.)
No control vector: Why don't scientists trust atoms? Because they make up everything! (and many other common jokes.)

ngxson · 2024-05-30T15:43:58Z

@christianazinn Wow this is awesome. I quickly had a look at the code, looks good to me. I'm try when I get back to home.

It appears we do just parse the individual positive/negative prompts

I started with single pair of pos-neg for simplification. But yes, eventually we will allow to have multiple pairs of pos - neg. The python implementation does that by calculating mean value of output direction vectors. We could do the same, but I'm thinking if we can even go a bit further by using LERP.

We can allow the program to take as input 2 file of prompts (one prompt per line), so we have 2 file: neg.txt and pos.txt for example. I can implement this quickly if needed.

However, I am seeing promising results with "funny" vs. "boring"

Very promising result. Even me (a human) sometimes struggle to control my own funny / boring vector.

christianazinn · 2024-05-30T16:06:46Z

@christianazinn Wow this is awesome. I quickly had a look at the code, looks good to me. I'm try when I get back to home.

Thank you! Take your time, I will keep testing in the meantime. Other results are varied: a test on happy/sad generates complete gibberish, and another control vector for funny/boring is ineffective.

I started with single pair of pos-neg for simplification. But yes, eventually we will allow to have multiple pairs of pos - neg.

Just to make sure we are on the same page, because there are two places where multiple pairs might be needed. We will also want to implement multiple sentiment pairs (i.e. happy/sad and funny/boring), but what I referred to was having multiple prompts generated from the same sentiment pair run through the tokenizer as in the second code block here. Currently we appear to just tokenize the term e.g. funny and boring, while the Python implementation tokenizes a template e.g. [INST] Pretend you're a funny person making statements about the world. [/INST] The. This means upon passing the tokens through the model, we see inference on completing the template, which is much more accurate than passing just a single-word prompt.

I think we want to be able to do that preprocessing in C++, so the user inputs the positive/negative sentiments and we create the template, format it, and pass it to get_hidden_layers. See below commit for an example of creating the prompts. Were you thinking of this as well?

I believe the great variance in my results may be due to only having one sample token sequence per sentiment, and therefore high variability in the resulting vectors between runs, hence my concern over this topic. However, more runs of PCA would slow down the already slow stdlib implementation to the point of unusability, so that is left for the GGML implementation.

Implements an example template set built from the positive/negative prompts like the control vector Python implementation.

christianazinn · 2024-05-30T18:36:16Z

It appears the way the Python implementation handles concatenating the matrices from each different prompt callback is by stacking them, so e.g. if each callback returned a 4096x2 matrix then using 1024 test prompts would yield a 4096x2048 matrix. Intuitively because rank AA^T = rank A this allows for more degrees of freedom/less dependency on each individual callback in each layer's overall matrix, and since the result will be 4096x4096 regardless of the other dimension this should not change much with the PCA. Will try to implement this.

(Strictly, it vertically stacks, but it doesn't matter since we multiply by transpose anyway.)

ngxson · 2024-05-30T22:03:20Z

I updated this PR with 2 small changes (feel free to test / adapt it if you want):

added arguments --positive-file and --negative-file for adding multiple prompts
output vector is the mean value of all pair of prompts
add multi-thread for PCA ==> mostly for better debugging experience, feel free to remove it if needed

ngxson · 2024-05-30T22:49:38Z

@christianazinn I'm having a problem is that power_iteration always returns vector with all elements are 0. I tried revert my multi-thread hack but that doesn't resolve the problem. I verified that v_diff is good before passing it to pca. Maybe something deeper here.. could you have a look? (And sorry if I broke anything). Thanks in advance.

christianazinn · 2024-05-30T23:22:50Z

@ngxson I'll take a look, thanks - not sure how I didn't think to check that, would explain why I was getting gibberish on 9/10 tests. My code is very patchwork at the moment, so there's likely to be a lot of these fixes. Thanks for the progress so far.

christianazinn · 2024-05-31T00:29:51Z

Strangely each matrix returned by square_diff has exactly 2^22 nonzero elements with n_embd = 4096, in which case we would expect 2^24 total elements (testing on Llama 2). Columns 2048-4096 of rows 2048-4096 are nonzero, which explains this - only the bottom right quadrant of the 4096x4096 matrix is nonzero. This corroborates what I see returned by power_iteration, a vector with 2048 consecutive zero entries followed by 2048 nonzero entries. @ngxson were you getting all entries 0 or just the visible ones?

What's printed to stdout from cb_eval implies we get a 4096x2(x1x1) matrix from Llama 2 (and other similar models like Mistral). What's likely is one of these dimensions is entirely 0, which would explain the aforementioned behavior. Will keep looking into it.

UPDATE: Am I misunderstanding these lines (I assumed this means we get a 4096x2x1x1 matrix?):

@ngxson I'm unfamiliar with this, please advise.

UPDATE 2: I had my numbers backward with zero/nonzero. Even more confused now.
...and I fixed it because I had them backward again. Not my day today.

christianazinn · 2024-05-31T01:21:48Z

What's likely is one of these dimensions is entirely 0, which would explain the aforementioned behavior. Will keep looking into it.

Thinking about it further, this isn't even true. I would still like to know how the dimensions are stored (image above). Is it a flattened matrix of dimensions cb_data.n_tokens x cb_data.n_embd, or cb_data.n_embd x cb_data.n_tokens? @ngxson Apologies for the repeated mentions, but would like answers. It looks like the former so far (first 4096 elements of each v_diff are zero and the second 4096 are nonzero), but that contradicts the 4096x2x1x1 dimension from above.

Frankly, this whole headache could probably be avoided if we just wrote the GGML implementation, but I don't know how.

christianazinn · 2024-05-31T01:35:43Z

fixed it... one liner... ugh

ggerganov · 2024-06-12T11:30:52Z

+    printf("\n");
+}
+
+static int ctrlvec_params_parse_ex(int argc, char ** argv, ctrl_params & params) {


Let's merge ctrl_params into gpt_params so that we have a consistent handling of CLI args in all examples

I didn't noticed that the gpt_params has been refactored. It's way easier to work with it now!

I moved ctrl_params to gpt_params. Please have a look on 679f513 . Thanks!

christianazinn · 2024-06-12T19:02:25Z

What I'm planning is to only evaluate distinct tokens at distinct positions. This way, we can sure that we get non-duplicated vectors.

This should be fine - just test it. With what you mention below about

Instead of simply concatenate prompt with completions, we should also generate the complete sentence based on the given prompt+completion ?

that should work much better. I think that's actually what the Python implementation does but I'm not certain. Feel free to try it if you like, or if you think the current outputs are acceptable, we can add that in a later PR. (We should compile a list of future improvements for this.)

I'll add my review for the code itself in a moment, and will test the generated control vectors when I get the chance.

ngxson · 2024-06-12T21:17:07Z

(We should compile a list of future improvements for this.)

Actually I updated a list the description of this PR. Feel free to let me know if you have other ideas to add.

I'll add my review for the code itself in a moment, and will test the generated control vectors when I get the chance.

Nice. Thanks for taking time to develop and to review this PR!

calvin-laurenson · 2024-06-13T00:29:08Z

I am very excited for control vectors and I have been routinely testing this PR. I got it to work yesterday with only a couple issues.

Won't compile on my Windows machine without #include <ctime> in pca.hpp. It compiles fine on my Linux machine and looking at other parts of the project it seems like most of the time there is an include and time is used without the std:: prefix.
I was unable to use a multiline prompt (for ChatML models like Dolphin) because it assumes I want to have multiple prompts. This manifested as a pretty weird error because it made both prompts exactly the same which made the subtraction make everything zeros (maybe there should be a check to see if the positive and negative prompts are the same?).
Does not work on CUDA (I get ggml_backend_cuda_graph_compute: op not supported (view) (SQRT)). Not really much of problem because it runs pretty fast on CPU.

I fixed 1 and 2 in a PR in the fork ngxson#6. 2 is fixed by adding a command line flag to combine all of the prompt lines into one prompt.

ngxson · 2024-06-13T10:04:05Z

@calvin-laurenson Thanks for testing out

Regarding the ability to have multi-line prompt, I prefer to add --escape option, since it's already part of common.cpp. This will allow having multiple prompts, each prompt having multiple lines. I'll add this option in later stage:

-e,    --escape   process escapes sequences (\\n, \\r, \\t, \\', \\\", \\\\) (default: %s)", params.escape ? "true" : "false"

The problem with #include <ctime> will be resolve along with all conflicts with main branch (there is a rename examples/* --> llama-*)

~~For the problem with GPU, it seems like some _inplace ops are not supported by GPU backend. I'll try replacing them with the non-_inplace.~~

Edit: CUDA backend does not support GGML_OP_SQRT

ggerganov

I haven't done tests, but I'm sure people will play with this and if there are any issues we can resolve them from master

ggerganov · 2024-06-13T10:03:43Z

+    options.push_back({ "control-vector" });
+    options.push_back({ "cvector", "-o,  --output FNAME",     "output file (default: '%s')", params.cvector_outfile.c_str() });
+    options.push_back({ "cvector", "--positive-file FNAME",   "positive prompts file, one prompt per line (default: '%s')", params.cvector_positive_file.c_str() });
+    options.push_back({ "cvector", "--negative-file FNAME",   "negative prompts file, one prompt per line (default: '%s')", params.cvector_negative_file.c_str() });
+    options.push_back({ "cvector", "--completions-file FNAME","completions file (default: '%s')", params.cvector_completions_file.c_str() });
+    options.push_back({ "cvector", "--completions N",         "number of lines of completions file to use (default: %d)", params.n_completions });
+    options.push_back({ "cvector", "--batch-pca N",           "batch size used for PCA. Larger batch runs faster, but uses more memory (default: %d)", params.n_pca_batch });
+    options.push_back({ "cvector", "--iter-pca N",            "number of iterations used for PCA (default: %d)", params.n_pca_iterations });
+


The whitespace padding should be kept so that the arguments are vertically aligned when the help is printed:

Suggested change

options.push_back({ "control-vector" });

options.push_back({ "cvector", "-o, --output FNAME", "output file (default: '%s')", params.cvector_outfile.c_str() });

options.push_back({ "cvector", "--positive-file FNAME", "positive prompts file, one prompt per line (default: '%s')", params.cvector_positive_file.c_str() });

options.push_back({ "cvector", "--negative-file FNAME", "negative prompts file, one prompt per line (default: '%s')", params.cvector_negative_file.c_str() });

options.push_back({ "cvector", "--completions-file FNAME","completions file (default: '%s')", params.cvector_completions_file.c_str() });

options.push_back({ "cvector", "--completions N", "number of lines of completions file to use (default: %d)", params.n_completions });

options.push_back({ "cvector", "--batch-pca N", "batch size used for PCA. Larger batch runs faster, but uses more memory (default: %d)", params.n_pca_batch });

options.push_back({ "cvector", "--iter-pca N", "number of iterations used for PCA (default: %d)", params.n_pca_iterations });

options.push_back({ "control-vector" });

options.push_back({ "cvector", "-o, --output FNAME", "output file (default: '%s')", params.cvector_outfile.c_str() });

options.push_back({ "cvector", " --positive-file FNAME", "positive prompts file, one prompt per line (default: '%s')", params.cvector_positive_file.c_str() });

options.push_back({ "cvector", " --negative-file FNAME", "negative prompts file, one prompt per line (default: '%s')", params.cvector_negative_file.c_str() });

options.push_back({ "cvector", " --completions-file FNAME",

"completions file (default: '%s')", params.cvector_completions_file.c_str() });

options.push_back({ "cvector", " --completions N", "number of lines from the completions file to use (default: %d)", params.n_completions });

options.push_back({ "cvector", " --batch-pca N", "batch size used for PCA. Larger batch runs faster, but uses more memory (default: %d)", params.n_pca_batch });

options.push_back({ "cvector", " --iter-pca N", "number of iterations used for PCA (default: %d)", params.n_pca_iterations });

FYI, I also changed the example name + binary name to llama-cvector-generator

ngxson · 2024-06-13T12:53:36Z

+
+```
+<|im_start|>system\nAct like a person who is extremely happy.<|im_end|>
+<|im_start|>system\nYou are in a very good mood today<|im_end|>


@calvin-laurenson I ended up enabling escape new line by default, which should be more convenient for most users.

person4268 · 2024-06-16T04:42:26Z

FYI, the help text refers to --iter-pca, but the code's looking for --pca-iter. The same applies to --pca-batch/--batch-pca

Also, if the completion portion bails out due to the number of positive prompts != negative prompts, PCA still tries to run:

Log

1 person4268@person4269 ~/source/llama.cpp/build/bin (git)-[master] % ./llama-cvector-generator -m /mnt4/models/L3-70B-Euryale-v2.1-IQ4_XS.gguf -ngl 19 -c 8192 --log-format text -fa --no-mmap --output /mnt4/models/h_sad_eur_cvec.gguf --pca-iter 2000 --pca-batch 100 --completions-file /mnt4/models/cvecs/completions.txt --positive-file /mnt4/models/cvecs/positive.txt --negative-file /mnt4/models/cvecs/negative.txt
main: build = 3153 (0c7b3595)
main: built with clang version 17.0.0 for x86_64-pc-linux-gnu
llama_model_loader: loaded meta data with 27 key-value pairs and 723 tensors from /mnt4/models/L3-70B-Euryale-v2.1-IQ4_XS.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = L3-70B-Euryale-v2.1
llama_model_loader: - kv   2:                          llama.block_count u32              = 80
llama_model_loader: - kv   3:                       llama.context_length u32              = 8192
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 8192
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 28672
llama_model_loader: - kv   6:                 llama.attention.head_count u32              = 64
llama_model_loader: - kv   7:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   8:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                          general.file_type u32              = 30
llama_model_loader: - kv  11:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  12:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  14:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  16:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  17:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  20:            tokenizer.ggml.padding_token_id u32              = 128001
llama_model_loader: - kv  21:                    tokenizer.chat_template str              = {% if not add_generation_prompt is de...
llama_model_loader: - kv  22:               general.quantization_version u32              = 2
llama_model_loader: - kv  23:                      quantize.imatrix.file str              = /models/L3-70B-Euryale-v2.1-GGUF/L3-7...
llama_model_loader: - kv  24:                   quantize.imatrix.dataset str              = /training_data/calibration_datav3.txt
llama_model_loader: - kv  25:             quantize.imatrix.entries_count i32              = 560
llama_model_loader: - kv  26:              quantize.imatrix.chunks_count i32              = 125
llama_model_loader: - type  f32:  161 tensors
llama_model_loader: - type q5_K:   80 tensors
llama_model_loader: - type q6_K:    1 tensors
llama_model_loader: - type iq4_xs:  481 tensors
llm_load_vocab: special tokens cache size = 256
llm_load_vocab: token to piece cache size = 0.8000 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 128256
llm_load_print_meta: n_merges         = 280147
llm_load_print_meta: n_ctx_train      = 8192
llm_load_print_meta: n_embd           = 8192
llm_load_print_meta: n_head           = 64
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 80
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 8
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 28672
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 8192
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 70B
llm_load_print_meta: model ftype      = IQ4_XS - 4.25 bpw
llm_load_print_meta: model params     = 70.55 B
llm_load_print_meta: model size       = 35.29 GiB (4.30 BPW) 
llm_load_print_meta: general.name     = L3-70B-Euryale-v2.1
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128009 '<|eot_id|>'
llm_load_print_meta: PAD token        = 128001 '<|end_of_text|>'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   yes
ggml_cuda_init: CUDA_USE_TENSOR_CORES: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 6700 XT, compute capability 10.3, VMM: no
llm_load_tensors: ggml ctx size =    0.74 MiB
llm_load_tensors: offloading 19 repeating layers to GPU
llm_load_tensors: offloaded 19/81 layers to GPU
llm_load_tensors:      ROCm0 buffer size =  8261.44 MiB
llm_load_tensors:  ROCm_Host buffer size = 27877.86 MiB
..................................................................................................
llama_new_context_with_model: n_ctx      = 8192
llama_new_context_with_model: n_batch    = 2048
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 1
llama_new_context_with_model: freq_base  = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      ROCm0 KV buffer size =   608.00 MiB
llama_kv_cache_init:  ROCm_Host KV buffer size =  1952.00 MiB
llama_new_context_with_model: KV self size  = 2560.00 MiB, K (f16): 1280.00 MiB, V (f16): 1280.00 MiB
llama_new_context_with_model:  ROCm_Host  output buffer size =     0.49 MiB
llama_new_context_with_model:      ROCm0 compute buffer size =  1088.45 MiB
llama_new_context_with_model:  ROCm_Host compute buffer size =    32.01 MiB
llama_new_context_with_model: graph nodes  = 2247
llama_new_context_with_model: graph splits = 675
number of positive and negative prompts must be equal
n_total_tokens: 0
Done evaluate prompts, unload model...
build_v_diff
print_debug_tensor: diff_0 (f32): [0, 8192]
print_debug_tensor: diff_0[0] = [ 0.000000, 0.000000, 0.000000, 0.000000, 245415337263104.000000, 13128352454087278592.000000, ... ]
print_debug_tensor: diff_1 (f32): [0, 8192]
print_debug_tensor: diff_1[0] = [ 0.000000, 0.000000, 0.000000, 0.000000, -16349680.000000, 0.000000, ... ]
print_debug_tensor: diff_2 (f32): [0, 8192]
print_debug_tensor: diff_2[0] = [ 0.000000, 0.000000, 0.000000, 0.000000, 74943081985119092736.000000, -7654367232.000000, ... ]
print_debug_tensor: diff_3 (f32): [0, 8192]
print_debug_tensor: diff_3[0] = [ 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_4 (f32): [0, 8192]
print_debug_tensor: diff_4[0] = [ 0.000000, 0.000000, 0.000000, 0.000000, -16349680.000000, 0.000000, ... ]
print_debug_tensor: diff_5 (f32): [0, 8192]
print_debug_tensor: diff_5[0] = [ 0.000000, 0.000000, 0.000000, 0.000000, 273573288019132153856.000000, 0.000000, ... ]
print_debug_tensor: diff_6 (f32): [0, 8192]
print_debug_tensor: diff_6[0] = [ -21950031649026397917780377600.000000, 0.000000, 0.000000, 0.000000, 273573288019132153856.000000, 0.000000, ... ]
print_debug_tensor: diff_7 (f32): [0, 8192]
print_debug_tensor: diff_7[0] = [ -21745479983637657800526528512.000000, 0.000000, 287592923290392330240.000000, 879223082556685096676811171954688.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_8 (f32): [0, 8192]
print_debug_tensor: diff_8[0] = [ -137126871040.000000, 0.000000, -137126871040.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_9 (f32): [0, 8192]
print_debug_tensor: diff_9[0] = [ -137126739968.000000, 0.000000, -80122.875000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_10 (f32): [0, 8192]
print_debug_tensor: diff_10[0] = [ 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_11 (f32): [0, 8192]
print_debug_tensor: diff_11[0] = [ 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_12 (f32): [0, 8192]
print_debug_tensor: diff_12[0] = [ 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_13 (f32): [0, 8192]
print_debug_tensor: diff_13[0] = [ 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_14 (f32): [0, 8192]
print_debug_tensor: diff_14[0] = [ 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_15 (f32): [0, 8192]
print_debug_tensor: diff_15[0] = [ 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_16 (f32): [0, 8192]
print_debug_tensor: diff_16[0] = [ -21957186034247945430279127040.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_17 (f32): [0, 8192]
print_debug_tensor: diff_17[0] = [ 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_18 (f32): [0, 8192]
print_debug_tensor: diff_18[0] = [ 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_19 (f32): [0, 8192]
print_debug_tensor: diff_19[0] = [ 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_20 (f32): [0, 8192]
print_debug_tensor: diff_20[0] = [ -21737560575045885405503160320.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_21 (f32): [0, 8192]
print_debug_tensor: diff_21[0] = [ -137127002112.000000, 0.000000, -8082344.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_22 (f32): [0, 8192]
print_debug_tensor: diff_22[0] = [ -137127002112.000000, 0.000000, -10023872.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_23 (f32): [0, 8192]
print_debug_tensor: diff_23[0] = [ -137127002112.000000, 0.000000, -79088.250000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_24 (f32): [0, 8192]
print_debug_tensor: diff_24[0] = [ -137127002112.000000, 0.000000, -78330.750000, 0.000000, -0.000000, -0.000000, ... ]
print_debug_tensor: diff_25 (f32): [0, 8192]
print_debug_tensor: diff_25[0] = [ -137127002112.000000, 0.000000, -78644.500000, 0.000000, 73604148164365473808384.000000, 73772818640122563267649339392.000000, ... ]
print_debug_tensor: diff_26 (f32): [0, 8192]
print_debug_tensor: diff_26[0] = [ -137127002112.000000, 0.000000, -813495668042629120.000000, 0.000000, 17256819553412347886305280.000000, 70799560714130813738592684212224.000000, ... ]
print_debug_tensor: diff_27 (f32): [0, 8192]
print_debug_tensor: diff_27[0] = [ -137127002112.000000, 0.000000, -137127002112.000000, 0.000000, -813519857298440192.000000, 0.000000, ... ]
print_debug_tensor: diff_28 (f32): [0, 8192]
print_debug_tensor: diff_28[0] = [ -137127133184.000000, 0.000000, -1052054506897932288.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_29 (f32): [0, 8192]
print_debug_tensor: diff_29[0] = [ -21947783802580551966658658304.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_30 (f32): [0, 8192]
print_debug_tensor: diff_30[0] = [ -137127133184.000000, 0.000000, -984517005161791488.000000, 0.000000, -1052057805432815616.000000, 0.000000, ... ]
print_debug_tensor: diff_31 (f32): [0, 8192]
print_debug_tensor: diff_31[0] = [ -21948409516139532194649473024.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_32 (f32): [0, 8192]
print_debug_tensor: diff_32[0] = [ -137127133184.000000, 0.000000, -10005248.000000, 0.000000, -3773708624480698368.000000, 0.000000, ... ]
print_debug_tensor: diff_33 (f32): [0, 8192]
print_debug_tensor: diff_33[0] = [ -21947842832161587837223829504.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_34 (f32): [0, 8192]
print_debug_tensor: diff_34[0] = [ -137127133184.000000, 0.000000, -80188.750000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_35 (f32): [0, 8192]
print_debug_tensor: diff_35[0] = [ -21771266465817367498215915520.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_36 (f32): [0, 8192]
print_debug_tensor: diff_36[0] = [ -137127133184.000000, 0.000000, -137127133184.000000, 0.000000, -0.000000, -0.000000, ... ]
print_debug_tensor: diff_37 (f32): [0, 8192]
print_debug_tensor: diff_37[0] = [ -21737565297412368275148374016.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_38 (f32): [0, 8192]
print_debug_tensor: diff_38[0] = [ -137127264256.000000, 0.000000, -6585195041076019200.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_39 (f32): [0, 8192]
print_debug_tensor: diff_39[0] = [ -137127002112.000000, 0.000000, -137127002112.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_40 (f32): [0, 8192]
print_debug_tensor: diff_40[0] = [ -137127264256.000000, 0.000000, -4103188278860054528.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_41 (f32): [0, 8192]
print_debug_tensor: diff_41[0] = [ -137127002112.000000, 0.000000, -137127002112.000000, 0.000000, -14249872.000000, 0.000000, ... ]
print_debug_tensor: diff_42 (f32): [0, 8192]
print_debug_tensor: diff_42[0] = [ -137127264256.000000, 0.000000, -4103104715976343552.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_43 (f32): [0, 8192]
print_debug_tensor: diff_43[0] = [ -137127002112.000000, 0.000000, -137127002112.000000, 0.000000, -14249872.000000, 0.000000, ... ]
print_debug_tensor: diff_44 (f32): [0, 8192]
print_debug_tensor: diff_44[0] = [ -137127264256.000000, 0.000000, -4103021153092632576.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_45 (f32): [0, 8192]
print_debug_tensor: diff_45[0] = [ -137127002112.000000, 0.000000, -137127002112.000000, 0.000000, -14249872.000000, 0.000000, ... ]
print_debug_tensor: diff_46 (f32): [0, 8192]
print_debug_tensor: diff_46[0] = [ -137127264256.000000, 0.000000, -4102871619511255040.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_47 (f32): [0, 8192]
print_debug_tensor: diff_47[0] = [ -137127002112.000000, 0.000000, -137127002112.000000, 0.000000, -14249872.000000, 0.000000, ... ]
print_debug_tensor: diff_48 (f32): [0, 8192]
print_debug_tensor: diff_48[0] = [ -137127264256.000000, 0.000000, -79094.125000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_49 (f32): [0, 8192]
print_debug_tensor: diff_49[0] = [ -137127002112.000000, 0.000000, -137127002112.000000, 0.000000, -14249872.000000, 0.000000, ... ]
print_debug_tensor: diff_50 (f32): [0, 8192]
print_debug_tensor: diff_50[0] = [ -137127264256.000000, 0.000000, -79077.000000, 0.000000, -0.000000, -0.000000, ... ]
print_debug_tensor: diff_51 (f32): [0, 8192]
print_debug_tensor: diff_51[0] = [ -137127002112.000000, 0.000000, -137127002112.000000, 0.000000, -0.000000, 0.000000, ... ]
print_debug_tensor: diff_52 (f32): [0, 8192]
print_debug_tensor: diff_52[0] = [ -137127264256.000000, 0.000000, -79081.375000, 0.000000, -0.000000, -0.000000, ... ]
print_debug_tensor: diff_53 (f32): [0, 8192]
print_debug_tensor: diff_53[0] = [ -137127002112.000000, 0.000000, -137127002112.000000, 0.000000, -0.000000, -0.000000, ... ]
print_debug_tensor: diff_54 (f32): [0, 8192]
print_debug_tensor: diff_54[0] = [ -137127264256.000000, 0.000000, -79078.750000, 0.000000, -0.000000, -0.000000, ... ]
print_debug_tensor: diff_55 (f32): [0, 8192]
print_debug_tensor: diff_55[0] = [ -137127002112.000000, 0.000000, -137127002112.000000, 0.000000, -0.000000, -0.000000, ... ]
print_debug_tensor: diff_56 (f32): [0, 8192]
print_debug_tensor: diff_56[0] = [ -137127264256.000000, 0.000000, -79077.875000, 0.000000, -0.000000, -0.000000, ... ]
print_debug_tensor: diff_57 (f32): [0, 8192]
print_debug_tensor: diff_57[0] = [ -137127002112.000000, 0.000000, -137127002112.000000, 0.000000, -0.000000, -0.000000, ... ]
print_debug_tensor: diff_58 (f32): [0, 8192]
print_debug_tensor: diff_58[0] = [ -137127264256.000000, 0.000000, -137127264256.000000, 0.000000, -0.000000, -0.000000, ... ]
print_debug_tensor: diff_59 (f32): [0, 8192]
print_debug_tensor: diff_59[0] = [ -137127002112.000000, 0.000000, -137127002112.000000, 0.000000, -0.000000, -0.000000, ... ]
print_debug_tensor: diff_60 (f32): [0, 8192]
print_debug_tensor: diff_60[0] = [ -137127395328.000000, 0.000000, -3773792187364409344.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_61 (f32): [0, 8192]
print_debug_tensor: diff_61[0] = [ -137127133184.000000, 0.000000, -137127133184.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_62 (f32): [0, 8192]
print_debug_tensor: diff_62[0] = [ -21957169505965255386520879104.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_63 (f32): [0, 8192]
print_debug_tensor: diff_63[0] = [ -137127395328.000000, 0.000000, -1338107850026647552.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_64 (f32): [0, 8192]
print_debug_tensor: diff_64[0] = [ -137127133184.000000, 0.000000, -137127133184.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_65 (f32): [0, 8192]
print_debug_tensor: diff_65[0] = [ -21957169505965255386520879104.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_66 (f32): [0, 8192]
print_debug_tensor: diff_66[0] = [ -137127395328.000000, 0.000000, -1298677164031344640.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_67 (f32): [0, 8192]
print_debug_tensor: diff_67[0] = [ -137127133184.000000, 0.000000, -137127133184.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_68 (f32): [0, 8192]
print_debug_tensor: diff_68[0] = [ -21950031649026397917780377600.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_69 (f32): [0, 8192]
print_debug_tensor: diff_69[0] = [ -137127395328.000000, 0.000000, -1298652974775533568.000000, 0.000000, -1549916962816.000000, 0.000000, ... ]
print_debug_tensor: diff_70 (f32): [0, 8192]
print_debug_tensor: diff_70[0] = [ -137127133184.000000, 0.000000, -137127133184.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_71 (f32): [0, 8192]
print_debug_tensor: diff_71[0] = [ -21949866366199497480197898240.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_72 (f32): [0, 8192]
print_debug_tensor: diff_72[0] = [ -137127395328.000000, 0.000000, -926125241145491456.000000, 0.000000, -1549916962816.000000, 0.000000, ... ]
print_debug_tensor: diff_73 (f32): [0, 8192]
print_debug_tensor: diff_73[0] = [ -137127133184.000000, 0.000000, -137127133184.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_74 (f32): [0, 8192]
print_debug_tensor: diff_74[0] = [ -21949866366199497480197898240.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_75 (f32): [0, 8192]
print_debug_tensor: diff_75[0] = [ -137127395328.000000, 0.000000, -926113146517585920.000000, 0.000000, -1549916962816.000000, 0.000000, ... ]
print_debug_tensor: diff_76 (f32): [0, 8192]
print_debug_tensor: diff_76[0] = [ -137127133184.000000, 0.000000, -137127133184.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_77 (f32): [0, 8192]
print_debug_tensor: diff_77[0] = [ -21947354067230610828944211968.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, ... ]
print_debug_tensor: diff_78 (f32): [0, 8192]
print_debug_tensor: diff_78[0] = [ -137127395328.000000, 0.000000, -1125619531377541120.000000, 0.000000, -1549916962816.000000, 0.000000, ... ]
run_pca: Running PCA...
GGML_ASSERT: /home/michael/source/llama.cpp/ggml.c:5284: !ggml_is_transposed(a)
ptrace: Operation not permitted.
No stack.
The program is not being run.

* add control-vector-generator * calc diff * add comments * proof-of-concept stdlib implementation Implements PCA and file writing using mostly standard libraries. The output is recognized as a functional control vector, but outputs gibberish. * param parsing, refactor, comments Added basic command-line parameters for outfile and one each positive/negative prompt. Refactored some messy code in PCA computation and GGUF exporting. Left a bunch of comments regarding further work needed. * example template completions Implements an example template set built from the positive/negative prompts like the control vector Python implementation. * add multi prompts, multi-thread for PCA * fix mem error * add debugs * fix matrix transpose multiplication you have got to be kidding me * preliminary template/multiprompt support model is running out of context and that ought to be fixed (segfaulting) but other than that it looks goodish * fix zero output & param parsing, functional templating fixed a bug where the output file had no tensor data/was all zero fixed a bug where single hyphen flags were not being correctly parsed implements creation of templated prompts from input (still need to adapt based on model) * fix square_diff matmul index range and CRLF->LF line endings fixed a logic error where square_diff would not multiply all rows fixed a formatting error where the provided completions.txt had CRLF line endings * add command-line args for num threads, num completions file lines, always reload model refactored a few things and did what the commit message says on the tin * code aestheticization * fix compiler warnings * in-series multithreading for prompt embedding? added commented-out code to attempt to start implementing mutlithreading for embedding in main * remove unnecessary multithreading * interim fix memory leak * translated everything but PCA (I think) * tentatively translate the rest * fix ggml errors and make new ones at least it compiles and runs * fix cb_eval * temporary commit while I move dev environments it finally outputs a functioning control vector - "functioning" in the sense that it can be loaded and it clearly has the right idea, but makes the model incoherent * update debug statements * pre-tokenize so we can allocate correct memory to ctx_diffs_wrapped * update comments * (wip) refactor * clean up PCA ggml implementation * fix shape of v_diff_original * add n_batch for pca * working version * remember to copy back the last_eigenvector * fix n_completions * bring back n_completions * default n_pca_batch to 20 * fix macos build * add to makefile all targets * use ggml_format_name * add readme * fix .editorconfig * use ggml_backend_tensor_copy * attemp to fix compile problem on mac * fix compile warn * reuse allocr * move param parser to common * better error handling * clean up a bit * add print_usage * shorten help msg * beautify help msg * escape prompt by default * change compile target to llama-cvector-generator * typo * disable GPU for PCA * code style --------- Co-authored-by: Christian Zhou-Zheng <christianzhouzheng@gmail.com>

ngxson added 2 commits May 24, 2024 11:11

add control-vector-generator

0a46d73

calc diff

c31c118

github-actions Bot added the examples label May 24, 2024

ngxson added the help wanted Needs help from the community label May 24, 2024

mofosyne added the Review Complexity : High Generally require indepth knowledge of LLMs or GPUs label May 24, 2024

add comments

b30bea3

proof-of-concept stdlib implementation

73747fe

Implements PCA and file writing using mostly standard libraries. The output is recognized as a functional control vector, but outputs gibberish.

param parsing, refactor, comments

f58f6af

Added basic command-line parameters for outfile and one each positive/negative prompt. Refactored some messy code in PCA computation and GGUF exporting. Left a bunch of comments regarding further work needed.

example template completions

dc46264

Implements an example template set built from the positive/negative prompts like the control vector Python implementation.

add multi prompts, multi-thread for PCA

447023f

ngxson commented May 30, 2024

View reviewed changes

Comment thread examples/control-vector-generator/control-vector-generator.cpp Outdated

fix mem error

287da25

ngxson commented May 30, 2024

View reviewed changes

Comment thread examples/control-vector-generator/control-vector-generator.cpp Outdated

slaren reviewed May 30, 2024

View reviewed changes

Comment thread examples/control-vector-generator/control-vector-generator.cpp Outdated

add debugs

d446c6d

ggerganov reviewed Jun 12, 2024

View reviewed changes

ngxson added 3 commits June 12, 2024 15:58

move param parser to common

679f513

better error handling

a2a5f1b

clean up a bit

b22c845

ggerganov reviewed Jun 12, 2024

View reviewed changes

Comment thread common/common.cpp Outdated

Comment thread common/common.cpp Outdated

Comment thread examples/control-vector-generator/control-vector-generator.cpp Outdated

ngxson added 2 commits June 12, 2024 17:12

add print_usage

c59bfa6

shorten help msg

334dbae

ggerganov approved these changes Jun 13, 2024

View reviewed changes

ngxson added 4 commits June 13, 2024 13:29

beautify help msg

25fb0a6

escape prompt by default

ca86d4f

Merge branch 'master' into xsn/control-vector-generator

2f05558

change compile target to llama-cvector-generator

64cad20

ngxson commented Jun 13, 2024

View reviewed changes

ngxson added 2 commits June 13, 2024 14:55

typo

91f7dbf

disable GPU for PCA

f99be2c

ngxson changed the title ~~Add control-vector-generator example~~ Add cvector-generator example Jun 13, 2024

code style

6d2464a

slaren approved these changes Jun 14, 2024

View reviewed changes

ngxson added the merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. label Jun 14, 2024

ngxson merged commit 0c7b359 into ggml-org:master Jun 15, 2024

ngxson mentioned this pull request Jun 22, 2024

cvector: better prompt handling, add "mean vector" method #8069

Merged

4 tasks

ngxson mentioned this pull request Jul 27, 2024

Improve cvector-generator #8724

Open

jakexcosme mentioned this pull request Oct 22, 2025

Improve cvector-generator COG-GTM/llama.cpp#251

Open

Conversation

ngxson commented May 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

christianazinn commented May 24, 2024

Uh oh!

ngxson commented May 24, 2024

Uh oh!

christianazinn commented May 29, 2024

Uh oh!

ngxson commented May 29, 2024

Uh oh!

christianazinn commented May 30, 2024

Uh oh!

christianazinn commented May 30, 2024

Uh oh!

ngxson commented May 30, 2024

Uh oh!

christianazinn commented May 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

christianazinn commented May 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented May 30, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngxson commented May 30, 2024

Uh oh!

christianazinn commented May 30, 2024

Uh oh!

christianazinn commented May 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

christianazinn commented May 31, 2024

Uh oh!

christianazinn commented May 31, 2024

Uh oh!

ggerganov Jun 12, 2024

Choose a reason for hiding this comment

Uh oh!

ngxson Jun 12, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

christianazinn commented Jun 12, 2024

Uh oh!

ngxson commented Jun 12, 2024

Uh oh!

calvin-laurenson commented Jun 13, 2024

Uh oh!

ngxson commented Jun 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

ggerganov Jun 13, 2024

Choose a reason for hiding this comment

Uh oh!

ngxson Jun 13, 2024

Choose a reason for hiding this comment

Uh oh!

ngxson Jun 13, 2024

Choose a reason for hiding this comment

Uh oh!

person4268 commented Jun 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

ngxson commented May 24, 2024 •

edited

Loading

christianazinn commented May 30, 2024 •

edited

Loading

christianazinn commented May 30, 2024 •

edited

Loading

christianazinn commented May 31, 2024 •

edited

Loading

ngxson commented Jun 13, 2024 •

edited

Loading

person4268 commented Jun 16, 2024 •

edited

Loading