Skip to content

Add missing value support to SoftmaxLossLayer#1654

Merged
shelhamer merged 5 commits intoBVLC:devfrom
longjon:softmax-missing-values
Jan 30, 2015
Merged

Add missing value support to SoftmaxLossLayer#1654
shelhamer merged 5 commits intoBVLC:devfrom
longjon:softmax-missing-values

Conversation

@longjon
Copy link
Contributor

@longjon longjon commented Dec 30, 2014

This PR adds an option, missing_value, which allows some specified label to indicate that loss should not be computed at the corresponding location.

This creates an issue for normalization. To match the existing normalization, we should normalize by the number of labels actually present. However, this means that the weight of each label depends on the number of missing values in each batch, which may be undesirable. For this reason an extra boolean, normalize, is added; if normalize is false, normalization occurs only over the batch size.

Ideally this would be done in a way that can be used uniformly across different loss layers, but this is the form it exists in now.

Not yet well tested.

@longjon longjon force-pushed the softmax-missing-values branch from b172127 to 51d63a5 Compare December 30, 2014 01:35
longjon added a commit to longjon/caffe that referenced this pull request Dec 30, 2014
Add missing value support to SoftmaxLossLayer
@longjon longjon force-pushed the softmax-missing-values branch from 51d63a5 to b14afd3 Compare December 30, 2014 05:02
longjon added a commit to longjon/caffe that referenced this pull request Dec 30, 2014
Add missing value support to SoftmaxLossLayer
@longjon
Copy link
Contributor Author

longjon commented Dec 30, 2014

This is currently passing, but there were some odd nondeterministic failures in the CPU_ONLY/CMake case, see: https://s3.amazonaws.com/archive.travis-ci.org/jobs/45421159/log.txt. No idea what's up with that.

@bhack
Copy link
Contributor

bhack commented Dec 30, 2014

@longjon What kind of failures? Do you mean tests on db?

@longjon longjon force-pushed the softmax-missing-values branch from b14afd3 to a9a1cac Compare December 30, 2014 20:35
@longjon
Copy link
Contributor Author

longjon commented Dec 30, 2014

@bhack oops, the link isn't to the failed test, but updated to the passing test... NetTest was failing nondeterministically. This was due to a bug that should be fixed now (an uninitialized member variable).

I also noticed some test sketchiness while debugging this, will make another PR.

longjon added a commit to longjon/caffe that referenced this pull request Dec 30, 2014
Add missing value support to SoftmaxLossLayer
longjon added a commit to longjon/caffe that referenced this pull request Dec 30, 2014
Add missing value support to SoftmaxLossLayer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is safe, since the parameter is optional and doesn't have default value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For int the default value is 0

Sergio

2014-12-30 13:56 GMT-08:00 Jon Long notifications@github.com:

In src/caffe/layers/softmax_loss_layer.cpp
#1654 (diff):

@@ -17,6 +17,11 @@ void SoftmaxWithLossLayer::LayerSetUp(
softmax_top_vec_.clear();
softmax_top_vec_.push_back(&prob_);
softmax_layer_->SetUp(softmax_bottom_vec_, softmax_top_vec_);
+

  • has_missing_values_ =
  • this->layer_param_.softmax_loss_param().has_missing_value();
  • missing_value_ = this->layer_param_.softmax_loss_param().missing_value();

Yes, it's safe, see
https://developers.google.com/protocol-buffers/docs/reference/cpp-generated#fields
and https://developers.google.com/protocol-buffers/docs/proto#optional.
It might be a good idea to add a comment to make this clear.


Reply to this email directly or view it on GitHub
https://github.com/BVLC/caffe/pull/1654/files#r22366612.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I know. has_missing_values_ gets set to false, and missing_value_ gets set to zero, but that value has no effect. Anyway I think I'll just add the unnecessary check, since it's less of a mental speed bump.

@sguada
Copy link
Contributor

sguada commented Dec 30, 2014

@longjon I think it will be great if we could define a typical default -1 for missing_value (better called ignore_label), that could be used for other losses including Multi-Label

@longjon longjon force-pushed the softmax-missing-values branch from a9a1cac to 0fdee92 Compare December 30, 2014 22:15
@longjon
Copy link
Contributor Author

longjon commented Dec 30, 2014

I agree that ignore_label is a better, more descriptive name; I've updated accordingly. I don't agree that -1 labels should be ignored by default; I think it's better to ignore nothing by default, and be explicit when ignoring certain labels. Erroneously providing -1 is not terribly hard to imagine; e.g., this is exactly what happened before #1661 which I just posted. Being explicit also means that nets are better self-documented, and avoids "magic numbers".

@sguada
Copy link
Contributor

sguada commented Dec 31, 2014

What do you think about renaming the new param softmax_loss_param to ignore_label_param that way could be used by other the Loss layers? I used a fixed value to ignore labels in #523, but this will be better.

longjon added a commit to longjon/caffe that referenced this pull request Dec 31, 2014
Add missing value support to SoftmaxLossLayer
@longjon
Copy link
Contributor Author

longjon commented Dec 31, 2014

I agree that it makes sense to use a more generic name, but then what about the normalize option? Adding yet another param block for that seems messy. Since that option can also apply to other loss layers, maybe both should be in a generic loss_param?

@sguada
Copy link
Contributor

sguada commented Dec 31, 2014

That sounds good to me.

On Tuesday, December 30, 2014, Jon Long notifications@github.com wrote:

I agree that it makes sense to use a more generic name, but then what
about the normalize option? Adding yet another param block for that seems
messy. Since that option can also apply to other loss layers, maybe both
should be in a generic loss_param?


Reply to this email directly or view it on GitHub
#1654 (comment).

Sergio

With missing values (and batches of varying spatial dimension),
normalizing each batch across instances can inappropriately give
different instances different weights, so we give the option of simply
normalizing by the batch size instead.
@longjon longjon force-pushed the softmax-missing-values branch from 0fdee92 to c7f63da Compare December 31, 2014 06:01
longjon added a commit to longjon/caffe that referenced this pull request Dec 31, 2014
Add missing value support to SoftmaxLossLayer
longjon added a commit to longjon/caffe that referenced this pull request Dec 31, 2014
Add missing value support to SoftmaxLossLayer
longjon added a commit to longjon/caffe that referenced this pull request Jan 1, 2015
Add missing value support to SoftmaxLossLayer
longjon added a commit to longjon/caffe that referenced this pull request Jan 2, 2015
Add missing value support to SoftmaxLossLayer
longjon added a commit to longjon/caffe that referenced this pull request Jan 2, 2015
Add missing value support to SoftmaxLossLayer
longjon added a commit to longjon/caffe that referenced this pull request Jan 3, 2015
Add missing value support to SoftmaxLossLayer
longjon added a commit to longjon/caffe that referenced this pull request Jan 3, 2015
Add missing value support to SoftmaxLossLayer
@shelhamer
Copy link
Member

Needs tests (just check loss and gradients for a toy input and truth pair) and docs, then merge.

philkr added a commit to philkr/caffe that referenced this pull request Jan 25, 2015
Add missing value support to SoftmaxLossLayer
@longjon
Copy link
Contributor Author

longjon commented Jan 27, 2015

I've added a doc comment describing the additional parameters, and simple tests that check the gradient with an ignore label set and with normalize set to false, so this should be ready pending any further comments.

@shelhamer
Copy link
Member

@longjon the docs and gradient tests look good, but whether the ignored labels do not actually contribute the loss or not deserves coverage. Once that test is in, let's merge.

@longjon longjon force-pushed the softmax-missing-values branch from 5bd5684 to f1eada7 Compare January 27, 2015 23:57
@longjon
Copy link
Contributor Author

longjon commented Jan 27, 2015

Indeed, the pure gradient checks leave something out. I've added a simple test that checks that labels are being ignored.

shelhamer added a commit that referenced this pull request Jan 30, 2015
Add missing value support to SoftmaxLossLayer
@shelhamer shelhamer merged commit cff3007 into BVLC:dev Jan 30, 2015
@shelhamer
Copy link
Member

Thanks Jon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments