I'm pretty sure we need to follow up to the lazy weights init feature #11471
and add under zero3 deepspeed.zero.GatheredParameters here (or inside _init_weights):
https://github.com/huggingface/transformers/pull/11471/files#diff-6b72b98c4c2dcfc6cc606843917733f5d858374fbc22a735ff483bbc0c1e63eaR1275-R1276
plus need a test.
I'm pretty sure we need to follow up to the lazy weights init feature #11471
and add under zero3
deepspeed.zero.GatheredParametershere (or inside_init_weights):https://github.com/huggingface/transformers/pull/11471/files#diff-6b72b98c4c2dcfc6cc606843917733f5d858374fbc22a735ff483bbc0c1e63eaR1275-R1276
plus need a test.