Skip to content

Comments

Allow EuclideanLossLayer to ignore labels#3677

Closed
xternalz wants to merge 4 commits intoBVLC:masterfrom
xternalz:master
Closed

Allow EuclideanLossLayer to ignore labels#3677
xternalz wants to merge 4 commits intoBVLC:masterfrom
xternalz:master

Conversation

@xternalz
Copy link

Typecast the labels to int type and ignore the labels if they are equal to ignore_label_.

@elezar
Copy link
Contributor

elezar commented Feb 16, 2016

As I am busy looking at ignore labels in #3653 as well, I was wondering if it does not make sense to move the ignore_label property to the LossLayer class. Currently it is implemented in a number of the subclasses (AccuracyLayer, for example, and now EuclidieanLossLayer).

const int label_value = static_cast<int>(label_data[i]);
if (label_value == ignore_label_) {
label_data[i] = ignore_label_;
bottom_data[i] = ignore_label_;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this violate the contract of layers? Layers should not be modifying the bottom data in the forward pass.

@seanbell
Copy link

I don't quite understand the point of this layer. This appears to be a EuclideanLossLayer, but where you imagine that both inputs are actually integer labels, and you are computing a (squared) euclidean distance on integer labels. Can you give an example use case of where you would want this?

@xternalz
Copy link
Author

I have made some changes - it doesn't modify the bottom data anymore. This PR would be useful in situations like this https://groups.google.com/d/msg/caffe-users/90UP97wcCyQ/TBRenlzdCgAJ.

@jeffdonahue
Copy link
Contributor

jeffdonahue commented Feb 18, 2016

Sorry, but I don't think this makes sense. The entire float range is valid for both inputs to EuclideanLossLayer (it doesn't make sense to have an integer point in that range which indicates the sample is ignored), and neither input is explicitly a label or prediction -- the layer is symmetric in the inputs, and this breaks that symmetry.

I'd suggest doing something like this instead, assuming you have a binary indicator vector ignore for each batch item:

def euclidean_loss_with_ignore(ignore, label, pred):
    # ignore is a batch length vector, shape (N) -- 1 for ignore, 0 for keep
    # label and prediction are tops with shape (N, ...)
    not_ignore = L.Power(ignore, scale=-1, shift=1)  # not_ignore = 1 - ignore
    masked_label = L.Scale(label, ignore, axis=0)
    masked_pred = L.Scale(pred, not_ignore, axis=0)
    ignored_pred = L.Eltwise(masked_label, masked_pred)
    return L.EuclideanLoss(ignored_pred, label)

ignored_pred will then simply have the value of the label for every ignored instance, giving 0 loss for those instances.

Edit: come to think of it I guess this is the point of FilterLayer (you'd provide it a list of indices to keep rather than a binary indicator vector) -- using that might be more efficient than my suggestion.

@mikewang1993
Copy link

mikewang1993 commented Jan 3, 2017

I think you also need to ignore the samples with ignore_label in normalization, just as the way of SoftmaxWithLossLayer:

  for (int i = 0; i < outer_num_; ++i) {
    for (int j = 0; j < inner_num_; j++) {
      const int label_value = static_cast<int>(label[i * inner_num_ + j]);
      if (has_ignore_label_ && label_value == ignore_label_) {
        continue;
      }
      DCHECK_GE(label_value, 0);
      DCHECK_LT(label_value, prob_.shape(softmax_axis_));
      loss -= log(std::max(prob_data[i * dim + label_value * inner_num_ + j],
                           Dtype(FLT_MIN)));
      ++count;  // NOTE the way of computing normalization count
    }
  }

@shelhamer
Copy link
Member

Closing as the switch to int does not make sense and a mask approach is more general. However the motivation is right, so thanks for pointing out the need for ignore with regression @xternalz.

I was wondering if it does not make sense to move the ignore_label property to the LossLayer class

Ideally ignoring and normalizing would be handled once and in general in the base class, but right now the subclasses are doing it separately (and in some cases differently).

@shelhamer shelhamer closed this Apr 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants