Describe the bug

I followed the instructions in docs/source/content/special_cases.md as well as I could tell (ran the model in both full precision and with HookedTransformer.from_pretrained_no_processing), yet my model generations were nonsensical.
Code example
from transformer_lens import HookedTransformer
from transformers import AutoModelForCausalLM, AutoTokenizer
model = HookedTransformer.from_pretrained_no_processing(
"mistralai/Mixtral-8x7B-v0.1",
n_devices=4,
)
# test model actually works
for i in range(5):
display(
model.generate(
"Once upon a time",
verbose=False,
max_new_tokens=50,
)
)
System Info
Describe the characteristic of your environment:
- Describe how
transformer_lens was installed: pip install sae-lens
- What OS are you using? Ubuntu 22.04 (
runpod/pytorch:2.0.1-py3.10-cuda11.8.0-devel-ubuntu22.04)
- Python version: 3.10
Additional context
Running on a 4x A100 SXM system on Runpod.
Checklist
Describe the bug

I followed the instructions in
docs/source/content/special_cases.mdas well as I could tell (ran the model in both full precision and withHookedTransformer.from_pretrained_no_processing), yet my model generations were nonsensical.Code example
System Info
Describe the characteristic of your environment:
transformer_lenswas installed:pip install sae-lensrunpod/pytorch:2.0.1-py3.10-cuda11.8.0-devel-ubuntu22.04)Additional context
Running on a 4x A100 SXM system on Runpod.
Checklist