the code
def _nll_bernoulli(self, theta, x): return - torch.sum(x*torch.log(theta + EPS) + (1-x)*torch.log(1-theta-EPS)) may got nan loss.
i think it should be
def _nll_bernoulli(self, theta, x): return - torch.sum(x*torch.log(theta + EPS) + (1-x)*torch.log(1-theta+EPS))
the code
def _nll_bernoulli(self, theta, x): return - torch.sum(x*torch.log(theta + EPS) + (1-x)*torch.log(1-theta-EPS))may got nan loss.i think it should be
def _nll_bernoulli(self, theta, x): return - torch.sum(x*torch.log(theta + EPS) + (1-x)*torch.log(1-theta+EPS))