Add cvector-generator example#7514
Conversation
|
Could you add a quick usage summary - do you just run Also, tried implementing PCA using the |
|
Hi @christianazinn and thanks for your response. We'll move the discussion to here. Quick explanation: my code has been able to take a pair of positive + negative prompt, calculate embeddings for each layer, and then substract to get the diff. In the end, for each layer, we have one matrix with shape The way to use it: It is not urgent so take your time. And feel free to let me know if you have other questions. Thank you. |
|
Looking into PCA implementation and I realize we have the problem that we're not actually getting square matrices from However, it appears the matrices we receive are usually tall and skinny. SciPy's original implementation indicates that in this case, the problem is best handled by SVD with the covariance matrix. We may care to implement this after everything else works. I also don't have push permissions to this branch so whatever changes I make, I'll fork the branch and PR into it. |
|
@christianazinn Thanks for the explanation. Yes I was also wonder how can we turn the embedding vectors into square matrix. It's all clear for me now. I'll have a look during the weekend. In the meantime, I invited you to my forked repo. You can push directly onto this branch, or you can work on your own PR if you want. Feel free to tag me if you have questions. Thank you ! |
Implements PCA and file writing using mostly standard libraries. The output is recognized as a functional control vector, but outputs gibberish.
|
Thank you, have pushed an implementation with primitives/stdlib. Currently assumes Mistral architecture for the Currently, however, it outputs gibberish when inferencing: e.g. |
Added basic command-line parameters for outfile and one each positive/negative prompt. Refactored some messy code in PCA computation and GGUF exporting. Left a bunch of comments regarding further work needed.
|
Notes follow. I have implemented basic command-line arguments for I've left a few comments about what needs to be fixed in my shoddy implementation, and other things we need to deal with, such as the prompt parsing thing mentioned. It appears we do just parse the individual positive/negative prompts - @ngxson confirm? We will likely want to change this to provide a larger sample space; the blogpost and Python implementation provide reference on implementation. However, I am seeing promising results with "funny" vs. "boring". Llama2 Q8_0, prompt (for completion) "Here's a funny joke: ". Llama2 was used because #5970 indicates support has not been implemented for architectures other than Llama, but that is probably outdated. Control vector -1: |
|
@christianazinn Wow this is awesome. I quickly had a look at the code, looks good to me. I'm try when I get back to home.
I started with single pair of pos-neg for simplification. But yes, eventually we will allow to have multiple pairs of pos - neg. The python implementation does that by calculating mean value of output We can allow the program to take as input 2 file of prompts (one prompt per line), so we have 2 file: neg.txt and pos.txt for example. I can implement this quickly if needed.
Very promising result. Even me (a human) sometimes struggle to control my own funny / boring vector. |
Thank you! Take your time, I will keep testing in the meantime. Other results are varied: a test on happy/sad generates complete gibberish, and another control vector for funny/boring is ineffective.
Just to make sure we are on the same page, because there are two places where multiple pairs might be needed. We will also want to implement multiple sentiment pairs (i.e. happy/sad and funny/boring), but what I referred to was having multiple prompts generated from the same sentiment pair run through the tokenizer as in the second code block here. Currently we appear to just tokenize the term e.g. I think we want to be able to do that preprocessing in C++, so the user inputs the positive/negative sentiments and we create the template, format it, and pass it to I believe the great variance in my results may be due to only having one sample token sequence per sentiment, and therefore high variability in the resulting vectors between runs, hence my concern over this topic. However, more runs of PCA would slow down the already slow stdlib implementation to the point of unusability, so that is left for the GGML implementation. |
Implements an example template set built from the positive/negative prompts like the control vector Python implementation.
|
It appears the way the Python implementation handles concatenating the matrices from each different prompt callback is by stacking them, so e.g. if each callback returned a 4096x2 matrix then using 1024 test prompts would yield a 4096x2048 matrix. Intuitively because rank AA^T = rank A this allows for more degrees of freedom/less dependency on each individual callback in each layer's overall matrix, and since the result will be 4096x4096 regardless of the other dimension this should not change much with the PCA. Will try to implement this. (Strictly, it vertically stacks, but it doesn't matter since we multiply by transpose anyway.) |
|
I updated this PR with 2 small changes (feel free to test / adapt it if you want):
|
|
@christianazinn I'm having a problem is that |
|
@ngxson I'll take a look, thanks - not sure how I didn't think to check that, would explain why I was getting gibberish on 9/10 tests. My code is very patchwork at the moment, so there's likely to be a lot of these fixes. Thanks for the progress so far. |
|
Strangely each matrix returned by What's printed to stdout from UPDATE: Am I misunderstanding these lines (I assumed this means we get a 4096x2x1x1 matrix?): UPDATE 2: I had my numbers backward with zero/nonzero. Even more confused now. |
Thinking about it further, this isn't even true. I would still like to know how the dimensions are stored (image above). Is it a flattened matrix of dimensions Frankly, this whole headache could probably be avoided if we just wrote the GGML implementation, but I don't know how. |
|
fixed it... one liner... ugh |
| printf("\n"); | ||
| } | ||
|
|
||
| static int ctrlvec_params_parse_ex(int argc, char ** argv, ctrl_params & params) { |
There was a problem hiding this comment.
Let's merge ctrl_params into gpt_params so that we have a consistent handling of CLI args in all examples
There was a problem hiding this comment.
I didn't noticed that the gpt_params has been refactored. It's way easier to work with it now!
I moved ctrl_params to gpt_params. Please have a look on 679f513 . Thanks!
This should be fine - just test it. With what you mention below about
that should work much better. I think that's actually what the Python implementation does but I'm not certain. Feel free to try it if you like, or if you think the current outputs are acceptable, we can add that in a later PR. (We should compile a list of future improvements for this.) I'll add my review for the code itself in a moment, and will test the generated control vectors when I get the chance. |
Actually I updated a list the description of this PR. Feel free to let me know if you have other ideas to add.
Nice. Thanks for taking time to develop and to review this PR! |
|
I am very excited for control vectors and I have been routinely testing this PR. I got it to work yesterday with only a couple issues.
I fixed 1 and 2 in a PR in the fork ngxson#6. 2 is fixed by adding a command line flag to combine all of the prompt lines into one prompt. |
|
@calvin-laurenson Thanks for testing out Regarding the ability to have multi-line prompt, I prefer to add The problem with
Edit: CUDA backend does not support GGML_OP_SQRT |
ggerganov
left a comment
There was a problem hiding this comment.
I haven't done tests, but I'm sure people will play with this and if there are any issues we can resolve them from master
| options.push_back({ "control-vector" }); | ||
| options.push_back({ "cvector", "-o, --output FNAME", "output file (default: '%s')", params.cvector_outfile.c_str() }); | ||
| options.push_back({ "cvector", "--positive-file FNAME", "positive prompts file, one prompt per line (default: '%s')", params.cvector_positive_file.c_str() }); | ||
| options.push_back({ "cvector", "--negative-file FNAME", "negative prompts file, one prompt per line (default: '%s')", params.cvector_negative_file.c_str() }); | ||
| options.push_back({ "cvector", "--completions-file FNAME","completions file (default: '%s')", params.cvector_completions_file.c_str() }); | ||
| options.push_back({ "cvector", "--completions N", "number of lines of completions file to use (default: %d)", params.n_completions }); | ||
| options.push_back({ "cvector", "--batch-pca N", "batch size used for PCA. Larger batch runs faster, but uses more memory (default: %d)", params.n_pca_batch }); | ||
| options.push_back({ "cvector", "--iter-pca N", "number of iterations used for PCA (default: %d)", params.n_pca_iterations }); | ||
|
|
There was a problem hiding this comment.
The whitespace padding should be kept so that the arguments are vertically aligned when the help is printed:
| options.push_back({ "control-vector" }); | |
| options.push_back({ "cvector", "-o, --output FNAME", "output file (default: '%s')", params.cvector_outfile.c_str() }); | |
| options.push_back({ "cvector", "--positive-file FNAME", "positive prompts file, one prompt per line (default: '%s')", params.cvector_positive_file.c_str() }); | |
| options.push_back({ "cvector", "--negative-file FNAME", "negative prompts file, one prompt per line (default: '%s')", params.cvector_negative_file.c_str() }); | |
| options.push_back({ "cvector", "--completions-file FNAME","completions file (default: '%s')", params.cvector_completions_file.c_str() }); | |
| options.push_back({ "cvector", "--completions N", "number of lines of completions file to use (default: %d)", params.n_completions }); | |
| options.push_back({ "cvector", "--batch-pca N", "batch size used for PCA. Larger batch runs faster, but uses more memory (default: %d)", params.n_pca_batch }); | |
| options.push_back({ "cvector", "--iter-pca N", "number of iterations used for PCA (default: %d)", params.n_pca_iterations }); | |
| options.push_back({ "control-vector" }); | |
| options.push_back({ "cvector", "-o, --output FNAME", "output file (default: '%s')", params.cvector_outfile.c_str() }); | |
| options.push_back({ "cvector", " --positive-file FNAME", "positive prompts file, one prompt per line (default: '%s')", params.cvector_positive_file.c_str() }); | |
| options.push_back({ "cvector", " --negative-file FNAME", "negative prompts file, one prompt per line (default: '%s')", params.cvector_negative_file.c_str() }); | |
| options.push_back({ "cvector", " --completions-file FNAME", | |
| "completions file (default: '%s')", params.cvector_completions_file.c_str() }); | |
| options.push_back({ "cvector", " --completions N", "number of lines from the completions file to use (default: %d)", params.n_completions }); | |
| options.push_back({ "cvector", " --batch-pca N", "batch size used for PCA. Larger batch runs faster, but uses more memory (default: %d)", params.n_pca_batch }); | |
| options.push_back({ "cvector", " --iter-pca N", "number of iterations used for PCA (default: %d)", params.n_pca_iterations }); | |
There was a problem hiding this comment.
FYI, I also changed the example name + binary name to llama-cvector-generator
|
|
||
| ``` | ||
| <|im_start|>system\nAct like a person who is extremely happy.<|im_end|> | ||
| <|im_start|>system\nYou are in a very good mood today<|im_end|> |
There was a problem hiding this comment.
@calvin-laurenson I ended up enabling escape new line by default, which should be more convenient for most users.
control-vector-generator examplecvector-generator example
|
FYI, the help text refers to Also, if the completion portion bails out due to the number of positive prompts != negative prompts, PCA still tries to run: Log |
* add control-vector-generator * calc diff * add comments * proof-of-concept stdlib implementation Implements PCA and file writing using mostly standard libraries. The output is recognized as a functional control vector, but outputs gibberish. * param parsing, refactor, comments Added basic command-line parameters for outfile and one each positive/negative prompt. Refactored some messy code in PCA computation and GGUF exporting. Left a bunch of comments regarding further work needed. * example template completions Implements an example template set built from the positive/negative prompts like the control vector Python implementation. * add multi prompts, multi-thread for PCA * fix mem error * add debugs * fix matrix transpose multiplication you have got to be kidding me * preliminary template/multiprompt support model is running out of context and that ought to be fixed (segfaulting) but other than that it looks goodish * fix zero output & param parsing, functional templating fixed a bug where the output file had no tensor data/was all zero fixed a bug where single hyphen flags were not being correctly parsed implements creation of templated prompts from input (still need to adapt based on model) * fix square_diff matmul index range and CRLF->LF line endings fixed a logic error where square_diff would not multiply all rows fixed a formatting error where the provided completions.txt had CRLF line endings * add command-line args for num threads, num completions file lines, always reload model refactored a few things and did what the commit message says on the tin * code aestheticization * fix compiler warnings * in-series multithreading for prompt embedding? added commented-out code to attempt to start implementing mutlithreading for embedding in main * remove unnecessary multithreading * interim fix memory leak * translated everything but PCA (I think) * tentatively translate the rest * fix ggml errors and make new ones at least it compiles and runs * fix cb_eval * temporary commit while I move dev environments it finally outputs a functioning control vector - "functioning" in the sense that it can be loaded and it clearly has the right idea, but makes the model incoherent * update debug statements * pre-tokenize so we can allocate correct memory to ctx_diffs_wrapped * update comments * (wip) refactor * clean up PCA ggml implementation * fix shape of v_diff_original * add n_batch for pca * working version * remember to copy back the last_eigenvector * fix n_completions * bring back n_completions * default n_pca_batch to 20 * fix macos build * add to makefile all targets * use ggml_format_name * add readme * fix .editorconfig * use ggml_backend_tensor_copy * attemp to fix compile problem on mac * fix compile warn * reuse allocr * move param parser to common * better error handling * clean up a bit * add print_usage * shorten help msg * beautify help msg * escape prompt by default * change compile target to llama-cvector-generator * typo * disable GPU for PCA * code style --------- Co-authored-by: Christian Zhou-Zheng <christianzhouzheng@gmail.com>
* add control-vector-generator * calc diff * add comments * proof-of-concept stdlib implementation Implements PCA and file writing using mostly standard libraries. The output is recognized as a functional control vector, but outputs gibberish. * param parsing, refactor, comments Added basic command-line parameters for outfile and one each positive/negative prompt. Refactored some messy code in PCA computation and GGUF exporting. Left a bunch of comments regarding further work needed. * example template completions Implements an example template set built from the positive/negative prompts like the control vector Python implementation. * add multi prompts, multi-thread for PCA * fix mem error * add debugs * fix matrix transpose multiplication you have got to be kidding me * preliminary template/multiprompt support model is running out of context and that ought to be fixed (segfaulting) but other than that it looks goodish * fix zero output & param parsing, functional templating fixed a bug where the output file had no tensor data/was all zero fixed a bug where single hyphen flags were not being correctly parsed implements creation of templated prompts from input (still need to adapt based on model) * fix square_diff matmul index range and CRLF->LF line endings fixed a logic error where square_diff would not multiply all rows fixed a formatting error where the provided completions.txt had CRLF line endings * add command-line args for num threads, num completions file lines, always reload model refactored a few things and did what the commit message says on the tin * code aestheticization * fix compiler warnings * in-series multithreading for prompt embedding? added commented-out code to attempt to start implementing mutlithreading for embedding in main * remove unnecessary multithreading * interim fix memory leak * translated everything but PCA (I think) * tentatively translate the rest * fix ggml errors and make new ones at least it compiles and runs * fix cb_eval * temporary commit while I move dev environments it finally outputs a functioning control vector - "functioning" in the sense that it can be loaded and it clearly has the right idea, but makes the model incoherent * update debug statements * pre-tokenize so we can allocate correct memory to ctx_diffs_wrapped * update comments * (wip) refactor * clean up PCA ggml implementation * fix shape of v_diff_original * add n_batch for pca * working version * remember to copy back the last_eigenvector * fix n_completions * bring back n_completions * default n_pca_batch to 20 * fix macos build * add to makefile all targets * use ggml_format_name * add readme * fix .editorconfig * use ggml_backend_tensor_copy * attemp to fix compile problem on mac * fix compile warn * reuse allocr * move param parser to common * better error handling * clean up a bit * add print_usage * shorten help msg * beautify help msg * escape prompt by default * change compile target to llama-cvector-generator * typo * disable GPU for PCA * code style --------- Co-authored-by: Christian Zhou-Zheng <christianzhouzheng@gmail.com>

Resolve #6880
Result from last working version: #7514 (comment)
TODO in next PRs:
cvector-generatorexample #7514 (comment))llama_decode)