Skip to content

[Bug Report] Mixtral generates nonsense #570

@joelburget

Description

@joelburget

Describe the bug
Screenshot 2024-05-04 at 4 41 10 AM

I followed the instructions in docs/source/content/special_cases.md as well as I could tell (ran the model in both full precision and with HookedTransformer.from_pretrained_no_processing), yet my model generations were nonsensical.

Code example

from transformer_lens import HookedTransformer
from transformers import AutoModelForCausalLM, AutoTokenizer

model = HookedTransformer.from_pretrained_no_processing(
    "mistralai/Mixtral-8x7B-v0.1",
    n_devices=4,
)

# test model actually works
for i in range(5):
    display(
        model.generate(
            "Once upon a time",
            verbose=False,
            max_new_tokens=50,
        )
    )

System Info
Describe the characteristic of your environment:

  • Describe how transformer_lens was installed: pip install sae-lens
  • What OS are you using? Ubuntu 22.04 (runpod/pytorch:2.0.1-py3.10-cuda11.8.0-devel-ubuntu22.04)
  • Python version: 3.10

Additional context
Running on a 4x A100 SXM system on Runpod.

Checklist

  • I have checked that there is no similar issue in the repo (required)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions