[Bug Report] Mixtral generates nonsense

**Describe the bug**
![Screenshot 2024-05-04 at 4 41 10 AM](https://github.com/neelnanda-io/TransformerLens/assets/310981/69c34618-015f-4cd9-9ed6-4e0b295982e9)

I followed the instructions in `docs/source/content/special_cases.md` as well as I could tell (ran the model in both full precision and with `HookedTransformer.from_pretrained_no_processing`), yet my model generations were nonsensical.

**Code example**
```
from transformer_lens import HookedTransformer
from transformers import AutoModelForCausalLM, AutoTokenizer

model = HookedTransformer.from_pretrained_no_processing(
    "mistralai/Mixtral-8x7B-v0.1",
    n_devices=4,
)

# test model actually works
for i in range(5):
    display(
        model.generate(
            "Once upon a time",
            verbose=False,
            max_new_tokens=50,
        )
    )
```

**System Info**
Describe the characteristic of your environment:
 * Describe how `transformer_lens` was installed: `pip install sae-lens`
 * What OS are you using? Ubuntu 22.04 (`runpod/pytorch:2.0.1-py3.10-cuda11.8.0-devel-ubuntu22.04`)
 * Python version: 3.10

**Additional context**
Running on a 4x A100 SXM system on Runpod.

### Checklist

- [x] I have checked that there is no similar [issue](https://github.com/neelnanda-io/TransformerLens/issues) in the repo (**required**)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Report] Mixtral generates nonsense #570

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug Report] Mixtral generates nonsense #570

Description

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions