Skip to content

[Proposal] Expand quantization model support #684

@miguel-kjh

Description

@miguel-kjh

Why does Transformer Lens only support quantized LLaMA models?

Hi everyone,

I'm trying to use the transformer_lens library to study the activations of a quantized Mistral 7B model (unsloth/mistral-7b-instruct-v0.2-bnb-4bit). However, when I try to load it, I encounter a problem.

This is the code I'm using:

model_merged = model.merge_and_unload()
model_hooked = transformer_lens.HookedTransformer.from_pretrained(
    "unsloth/mistral-7b-instruct-v0.2-bnb-4bit",
    hf_model=model_merged, 
    hf_model_4bit=True, 
    fold_ln=False, 
    fold_value_biases=False, 
    center_writing_weights=False, 
    center_unembed=False, 
    tokenizer=tokenizer
)

The problem is that I get an assertion error stating that only LLaMA models can be used in quantized format with this library. This is the error message I receive:

---------------------------------------------------------------------------
AssertionError  Traceback (most recent call last)
AssertionError: Quantization is only supported for Llama models

I find it illogical and frustrating that only LLaMA models are compatible with transformer_lens in quantized format. Can anyone explain why this decision was made? Is there a technical reason behind this or any way to work around this issue so that I can use my Mistral 7B model?

I appreciate any guidance or solutions you can provide.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    complexity-highVery complicated changes for people to address who are quite familiar with the code

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions