Batchnorm for PatchDiscriminator running in DDP

Thanks for this amazing work, helps a lot in accelerating experiments!

I tried training a AE using `PatchDiscriminator` and ran into this issue, when switching to DistributedDataParallel (DDP)

`RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512]] is at version 4; expected version 3 instead.`

running it with `torch.autograd.set_detect_anomaly(True)` gives:

`UserWarning: Error detected in CudnnBatchNormBackward0`

With some troubleshooting I found that the issue is the [BatchNorm](https://github.com/pytorch/pytorch/issues/66504). So running 
``` python
discriminator = PatchDiscriminator(**kwargs)
torch.nn.SyncBatchNorm.convert_sync_batchnorm(discriminator)
```
solves it. 

Might it be worthwhile to put this into the constructor of the `PatchDiscriminator` to avoid similar issues in the future? e.g.

```python
class PatchDiscriminator(nn.Sequential):
    def __init__(**kwargs) -> None:
        super().__init__()
        [...]
        self.apply(self.initialise_weights)

        torch.nn.SyncBatchNorm.convert_sync_batchnorm(self)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Batchnorm for PatchDiscriminator running in DDP #451

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Batchnorm for PatchDiscriminator running in DDP #451

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions