GitHub - GenLLMGuard/BackdoorDetection: This repository contains the code for the paper titled "GenLLMGuard: Detecting Backdoors in LLMs for Open-Ended Text Generation Through Trigger Inversion".

Follow these steps:

Clone the repository: git clone https://github.com/GenLLMGuard/BackdoorDetection.git
Navigate to the directory: cd BackdoorDetection
Install dependencies (optional): pip install -r requirements.txt
Import the function: from Sentence_invert import run_sentence_invert
Run the function: best_sentence, intermediate_sents, best_sent_norm_attention_wghts = run_sentence_invert(model, tokenizer, user_prompt='User:')

Additional Information

The model and tokenizer need to be loaded first and provided to the function. If the LLM is fine-tuned to have a specific prompting structure (e.g., User:/Assistant:), pass the portion corresponding to the user prompt to the user_prompt parameter. By default, user_prompt is set to 'User:'.

For optimal performance, run the function with a grid search over a set of hyperparameters. Varying alpha_2 and alpha_3 should be sufficient, as this accommodates differences in dictionary size and attention mechanisms between LLMs. The remaining hyperparameters do not require adjustment. Recommended values for the hyperparameters are provided in the paper.

alpha_2: Weight of the diversity loss. Defaults to 0.5.

alpha_3: Weight of the attention loss. Defaults to 0.5.

len_opt: Number of tokens updated in each optimization iteration. Defaults to 50.

num_iterations: Total number of optimization iterations. Defaults to 200.

len_seq: Overall number of tokens in the trainable sentence. Defaults to 200.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
LICENSE		LICENSE
README.md		README.md
Sentence_invert.py		Sentence_invert.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Follow these steps:

Additional Information

The model and tokenizer need to be loaded first and provided to the function. If the LLM is fine-tuned to have a specific prompting structure (e.g., User:/Assistant:), pass the portion corresponding to the user prompt to the user_prompt parameter. By default, user_prompt is set to 'User:'.

alpha_2: Weight of the diversity loss. Defaults to 0.5.

alpha_3: Weight of the attention loss. Defaults to 0.5.

len_opt: Number of tokens updated in each optimization iteration. Defaults to 50.

num_iterations: Total number of optimization iterations. Defaults to 200.

len_seq: Overall number of tokens in the trainable sentence. Defaults to 200.

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Follow these steps:

Additional Information

The model and tokenizer need to be loaded first and provided to the function. If the LLM is fine-tuned to have a specific prompting structure (e.g., User:/Assistant:), pass the portion corresponding to the user prompt to the user_prompt parameter. By default, user_prompt is set to 'User:'.

alpha_2: Weight of the diversity loss. Defaults to 0.5.

alpha_3: Weight of the attention loss. Defaults to 0.5.

len_opt: Number of tokens updated in each optimization iteration. Defaults to 50.

num_iterations: Total number of optimization iterations. Defaults to 200.

len_seq: Overall number of tokens in the trainable sentence. Defaults to 200.

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages