[Proposal] Update Benchmarks & Verify Models to support Quantized models

### Proposal 

Update our Benchmarking and Verification systems to properly load Quantized models.

### Motivation

At present, we skip all quantized models when doing verification or benchmarking with a note `TransformerLens does not support quantized models at this time`. This is not strictly true. The Benchmark/Verification systems don't support Quantized model loading, but TransformerLens itself can run them as show in the [LLaMA Quantized demo notebook](https://colab.research.google.com/github/neelnanda-io/TransformerLens/blob/main/demos/LLaMA2_GPU_Quantized.ipynb).

### Pitch

We need `main_benchmark` & `verify_models` to properly identify quantized models, make sure their dependencies are loaded, and properly load the models. If the proper dependencies aren't loaded, ideally it exits and informs the user what dependencies need installation before continuation. If the proper dependencies are loaded, it should match the LLaMA Quantized demo method for loading quantized models and run any benchmarks it can for that model.

### Alternatives

As an alternative stop-gap, we should at a minimum update the note for quantized models to say `TransformerLens cannot benchmark quantized models at this time`.

### Checklist

- [x] I have checked that there is no similar [issue](https://github.com/TransformerLensOrg/Transformerlens/issues) in the repo (**required**)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Update Benchmarks & Verify Models to support Quantized models #1275

Proposal

Motivation

Pitch

Alternatives

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Proposal] Update Benchmarks & Verify Models to support Quantized models #1275

Description

Proposal

Motivation

Pitch

Alternatives

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions