In tensorflow version, you use different network to compute "conditional augmentation", but in this pytorch version, you use the mean value computed from Generator as the "conditional augmentation", and pass it to Discriminator: https://github.com/hanzhanggit/StackGAN-Pytorch/blob/master/code/trainer.py#L189. In Discriminator, you concatenate the 'mu' to encoded images directly without computing another "conditional augmentation".
However, in Generator, you compute the c_code, and concatenate it to noise?
Did you do this on purpose? Does this improve the quality of images generated or something else?