🚀 Feature request
Add parallelize method to GPT-neo models, so we can finetune them using model parallelism using less expensive GPUs.
Motivation
I want to finetune a GPT-neo model using model parallelism in order to do it using less expensive GPUs. It is not yet implemented, and, as higher-end GPUs are too expensive, it would be better if we distributed the model along several less expensive GPUs rather than using a very expensive one. It would also make it possible for us to iterate using larger batches, what can have big impact on the model fitting.
I would be very glad if you people could do it and I think it would enable the finetuning of specific purpose GPT-neo language models.
🚀 Feature request
Add
parallelizemethod to GPT-neo models, so we can finetune them using model parallelism using less expensive GPUs.Motivation
I want to finetune a GPT-neo model using model parallelism in order to do it using less expensive GPUs. It is not yet implemented, and, as higher-end GPUs are too expensive, it would be better if we distributed the model along several less expensive GPUs rather than using a very expensive one. It would also make it possible for us to iterate using larger batches, what can have big impact on the model fitting.
I would be very glad if you people could do it and I think it would enable the finetuning of specific purpose GPT-neo language models.