in the main.py file 193 line should be changed to if (i + 1) % args.save_freq == 0 and dist.get_rank() == 0: only master node save checkpoint saving function
in the main.py file 193 line should be changed to
if (i + 1) % args.save_freq == 0 and dist.get_rank() == 0:
only master node save checkpoint saving function