Skip to content

Incorrect gradient from a SoftmaxWithLossLayer with loss_weight 0 #2895

@Nanne

Description

@Nanne

I was debugging a network with two loss layers and wanted to disable one of them (a SoftmaxWithLossLayer), as such I set the loss_weight to 0. However, this does not do what I expected at all. The clearest way to explain this is probably using an example on how to reproduce it.

To reproduce one can take the examples/mnist/lenet_train_test.prototxt and add a second loss layer, with weight 0:

layer {
 name: "bad_loss"
 type: "SoftmaxWithLoss"
 bottom: "ip2"
 bottom: "label"
 top: "bad_loss"
 loss_weight: 0
}

and then run this python script:

caffe_root = '/roaming/nanne/caffe/' # Update this path to the correct path 
import sys
sys.path.insert(0, caffe_root + 'python')
import os
os.chdir(caffe_root)
import caffe
import numpy as np

caffe.set_mode_gpu()
solver = caffe.SGDSolver(caffe_root + 'examples/mnist/lenet_solver.prototxt')

solver.step(1)

print solver.net.blobs['ip2_ip2_0_split_0'].diff.squeeze()[5:7, :]
print solver.net.blobs['ip2_ip2_0_split_1'].diff.squeeze()[5:7, :]

print solver.net.blobs['ip2'].diff.squeeze()[5:7, :]

The diff for the split belonging to the SoftmaxWithLoss with loss_weight 0 will contain 64 (batchsize) values equal to the loss (NOT the gradient) for that input, and all the other elements will be 0. The other split will correctly contain all the diff values (64*10) for the loss with weight 1.

However, these two splits still get combined, creating the diff for 'ip2' for which the first 64 values are not comparable to the last 576. Am I wrong in how I tried to use the loss_weight or is this a bug? (It doesn't seem to be specific to SoftmaxWithLoss, though its most clear for this layer).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions