Run fast transformer decoders on your Macbooks' GPU! Working towards a fast reimplementation of GPT-2 and Llama-like models in mlx.
The aim is that the only dependencies are:
mlxsentencepiecetqdmnumpy
With an optional dev dependency of:
transformersfor downloading and converting weights
-
makemore llama reimplementation(train your own w/python train.py!) - BERT merged into
mlx-examples - Phi-2 merged into
mlx-examples - AdamW merged into
mlx
This project will be considered complete once these goals are achieved.
- finetune BERT
- GPT-2 reimplementation and loading in MLX
- speculative decoding
- learning rate scheduling
poetry install --no-root
To download and convert the model:
python phi2/convert.pyThat will fill in weights/phi-2.npz.
🚧 (Not yet done) To run the model:
python phi2/generate.pySome great resources: