Add missing value support to SoftmaxLossLayer#1654
Conversation
b172127 to
51d63a5
Compare
Add missing value support to SoftmaxLossLayer
51d63a5 to
b14afd3
Compare
Add missing value support to SoftmaxLossLayer
|
This is currently passing, but there were some odd nondeterministic failures in the CPU_ONLY/CMake case, see: https://s3.amazonaws.com/archive.travis-ci.org/jobs/45421159/log.txt. No idea what's up with that. |
|
@longjon What kind of failures? Do you mean tests on db? |
b14afd3 to
a9a1cac
Compare
|
@bhack oops, the link isn't to the failed test, but updated to the passing test... I also noticed some test sketchiness while debugging this, will make another PR. |
Add missing value support to SoftmaxLossLayer
Add missing value support to SoftmaxLossLayer
There was a problem hiding this comment.
I don't think this is safe, since the parameter is optional and doesn't have default value
There was a problem hiding this comment.
Yes, it's safe, see https://developers.google.com/protocol-buffers/docs/reference/cpp-generated#fields and https://developers.google.com/protocol-buffers/docs/proto#optional. It might be a good idea to add a comment to make this clear.
There was a problem hiding this comment.
For int the default value is 0
Sergio
2014-12-30 13:56 GMT-08:00 Jon Long notifications@github.com:
In src/caffe/layers/softmax_loss_layer.cpp
#1654 (diff):@@ -17,6 +17,11 @@ void SoftmaxWithLossLayer::LayerSetUp(
softmax_top_vec_.clear();
softmax_top_vec_.push_back(&prob_);
softmax_layer_->SetUp(softmax_bottom_vec_, softmax_top_vec_);
+
- has_missing_values_ =
- this->layer_param_.softmax_loss_param().has_missing_value();
- missing_value_ = this->layer_param_.softmax_loss_param().missing_value();
Yes, it's safe, see
https://developers.google.com/protocol-buffers/docs/reference/cpp-generated#fields
and https://developers.google.com/protocol-buffers/docs/proto#optional.
It might be a good idea to add a comment to make this clear.—
Reply to this email directly or view it on GitHub
https://github.com/BVLC/caffe/pull/1654/files#r22366612.
There was a problem hiding this comment.
Yes, I know. has_missing_values_ gets set to false, and missing_value_ gets set to zero, but that value has no effect. Anyway I think I'll just add the unnecessary check, since it's less of a mental speed bump.
|
@longjon I think it will be great if we could define a typical default |
a9a1cac to
0fdee92
Compare
|
I agree that |
|
What do you think about renaming the new param |
Add missing value support to SoftmaxLossLayer
|
I agree that it makes sense to use a more generic name, but then what about the |
|
That sounds good to me. On Tuesday, December 30, 2014, Jon Long notifications@github.com wrote:
Sergio |
With missing values (and batches of varying spatial dimension), normalizing each batch across instances can inappropriately give different instances different weights, so we give the option of simply normalizing by the batch size instead.
0fdee92 to
c7f63da
Compare
Add missing value support to SoftmaxLossLayer
Add missing value support to SoftmaxLossLayer
Add missing value support to SoftmaxLossLayer
Add missing value support to SoftmaxLossLayer
Add missing value support to SoftmaxLossLayer
Add missing value support to SoftmaxLossLayer
Add missing value support to SoftmaxLossLayer
|
Needs tests (just check loss and gradients for a toy input and truth pair) and docs, then merge. |
Add missing value support to SoftmaxLossLayer
|
I've added a doc comment describing the additional parameters, and simple tests that check the gradient with an ignore label set and with normalize set to false, so this should be ready pending any further comments. |
|
@longjon the docs and gradient tests look good, but whether the ignored labels do not actually contribute the loss or not deserves coverage. Once that test is in, let's merge. |
5bd5684 to
f1eada7
Compare
|
Indeed, the pure gradient checks leave something out. I've added a simple test that checks that labels are being ignored. |
Add missing value support to SoftmaxLossLayer
|
Thanks Jon! |
This PR adds an option,
missing_value, which allows some specified label to indicate that loss should not be computed at the corresponding location.This creates an issue for normalization. To match the existing normalization, we should normalize by the number of labels actually present. However, this means that the weight of each label depends on the number of missing values in each batch, which may be undesirable. For this reason an extra boolean,
normalize, is added; ifnormalizeis false, normalization occurs only over the batch size.Ideally this would be done in a way that can be used uniformly across different loss layers, but this is the form it exists in now.
Not yet well tested.