Hello,
I saw each GPU is assigned a solver; however, I 'm not sure how do you average the gradients from multiGPUs. Is it at each iterations, every GPU fetch the images of batch/GPUs and compute the gradient. Then the root solver retrieve the rest of gradients and average together. Then it goes to the next iteration. Is it correct? Thank you