[Proposal] Expand quantization model support

### Why does Transformer Lens only support quantized LLaMA models?

Hi everyone,

I'm trying to use the transformer_lens library to study the activations of a quantized Mistral 7B model (unsloth/mistral-7b-instruct-v0.2-bnb-4bit). However, when I try to load it, I encounter a problem.

This is the code I'm using:

````python
model_merged = model.merge_and_unload()
model_hooked = transformer_lens.HookedTransformer.from_pretrained(
    "unsloth/mistral-7b-instruct-v0.2-bnb-4bit",
    hf_model=model_merged, 
    hf_model_4bit=True, 
    fold_ln=False, 
    fold_value_biases=False, 
    center_writing_weights=False, 
    center_unembed=False, 
    tokenizer=tokenizer
)
````
The problem is that I get an assertion error stating that only LLaMA models can be used in quantized format with this library. This is the error message I receive:
````python
---------------------------------------------------------------------------
AssertionError  Traceback (most recent call last)
AssertionError: Quantization is only supported for Llama models
````
I find it illogical and frustrating that only LLaMA models are compatible with transformer_lens in quantized format. **Can anyone explain why this decision was made? Is there a technical reason behind this or any way to work around this issue so that I can use my Mistral 7B model?**

I appreciate any guidance or solutions you can provide.

Thanks!




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Expand quantization model support #684

Why does Transformer Lens only support quantized LLaMA models?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Proposal] Expand quantization model support #684

Description

Why does Transformer Lens only support quantized LLaMA models?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions