-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Description
I was debugging a network with two loss layers and wanted to disable one of them (a SoftmaxWithLossLayer), as such I set the loss_weight to 0. However, this does not do what I expected at all. The clearest way to explain this is probably using an example on how to reproduce it.
To reproduce one can take the examples/mnist/lenet_train_test.prototxt and add a second loss layer, with weight 0:
layer {
name: "bad_loss"
type: "SoftmaxWithLoss"
bottom: "ip2"
bottom: "label"
top: "bad_loss"
loss_weight: 0
}
and then run this python script:
caffe_root = '/roaming/nanne/caffe/' # Update this path to the correct path
import sys
sys.path.insert(0, caffe_root + 'python')
import os
os.chdir(caffe_root)
import caffe
import numpy as np
caffe.set_mode_gpu()
solver = caffe.SGDSolver(caffe_root + 'examples/mnist/lenet_solver.prototxt')
solver.step(1)
print solver.net.blobs['ip2_ip2_0_split_0'].diff.squeeze()[5:7, :]
print solver.net.blobs['ip2_ip2_0_split_1'].diff.squeeze()[5:7, :]
print solver.net.blobs['ip2'].diff.squeeze()[5:7, :]The diff for the split belonging to the SoftmaxWithLoss with loss_weight 0 will contain 64 (batchsize) values equal to the loss (NOT the gradient) for that input, and all the other elements will be 0. The other split will correctly contain all the diff values (64*10) for the loss with weight 1.
However, these two splits still get combined, creating the diff for 'ip2' for which the first 64 values are not comparable to the last 576. Am I wrong in how I tried to use the loss_weight or is this a bug? (It doesn't seem to be specific to SoftmaxWithLoss, though its most clear for this layer).