The ZeRO 3 example does not run. The main problem appears to be that the InitContext function does not actually exist despite being called by pretrain_gpt2.py. I have tried to introduce some changes to get it to run (incl. changing the batch size, the initialization function, and some of the inputs to the initialization function) but gave up after it threw the error variable beta1 is referenced before assignment. I think that has to do with something wonky in the optimizer?
The ZeRO 3 example does not run. The main problem appears to be that the
InitContextfunction does not actually exist despite being called bypretrain_gpt2.py. I have tried to introduce some changes to get it to run (incl. changing the batch size, the initialization function, and some of the inputs to the initialization function) but gave up after it threw the errorvariable beta1 is referenced before assignment. I think that has to do with something wonky in the optimizer?