Conversation
xdshang
commented
Nov 1, 2015
- Support N labels along the softmax axis. The final loss is average loss over all labels.
- By combining the ignore_label, it supports variable number of labels.
…average loss value over all given labels.
|
/cc @mtamburrano |
…abel, since the size of label blob is doubled. When the size of label blob is (10, 1, 2, 3), it also causes checking failure with checking accuracy of 5e-5.
There was a problem hiding this comment.
Are you sure is this right?
It shouldn't be something like
loss -= log(std::max(prob_data[(i * dim) + (dim/label_num_*k) + label_value * inner_num_ + j], Dtype(FLT_MIN))); ?
I'm not sure how you think to feed bottom[0] to match the dimension of the labels, it shouldn't be larger, with a size of previous_size*label_num_?
Let's say we had 3-classes single labels, so an INNER_PRODUCT layer with num_output: 3 was enough. Now if we have for each input 2 labels each one with 3-classes, the INNER_PRODUCT should have num_output: 6 and you should iterate on the prob_ blob with an offset that considers both the number of the classes and the size of the labels.
Is it right or I'm missing something?
There was a problem hiding this comment.
I am not addressing multi-class problem, but multi-label problem where multiple labels are assigned to each instance. In your example, supposing instance i is assigned with 1st and 3rd class, the loss for the instance is simply the average of losses on that two classes, i.e. log(prob_[i * dim + 0 * inner_num_ + j]) and log(prob_[i * dim + 2 * inner_num_ + j]). The INNER_PRODUCT still have num_output: 3 in this case.
Generally, if there is M classes and each instance has K labels, the shape of data blob is still (N, M) and the shape of label blob is (N, K). Originally it only allows (N, 1).
There was a problem hiding this comment.
Ok I get it, I supposed you were addressing multi-class problems.
Thank you
|
Just for reference there was as Multi label Data and MultiLabel Accuracy PR ( #523) previously.
Playing devil's advocate:
|
|
I'm looking for multi label functionality in caffe. How I understand this task. @bhack I've seen your comments on all the PR branches so may be you know more details. |
|
@taras-sereda got frustrated with the multiple PRs so I decided to take the time to lay down a few solutions I know of the other day, the discussion thread lies here: https://groups.google.com/forum/#!topic/caffe-users/RuT1TgwiRCo |
|
@beniz Thanks for sharing. |
|
@BlGene For your first advice, I think the problem is how you get the N * K output. For each instance, supposing there are N labels and the num_output is K, you need to duplicate the K-dimensional output N times. This is much more consumptive than that we simply compute N positions over the K-dimensional output. PR #523 only proposed multilabel accuracy layer, which is for test phase. Furthermore, multilabel loss may often be used in webly-tag classification where the number of tags is huge. In our experiment, we use around 30,000 tags (classes) and there are only 20 positive tags in average for each instance, so it is quite sparse. In this case, hot encoded label ([1, 0, 0, 1, 0, 1]) is not proper. |
|
In terms of classification performance, how does this multi-label version of the Softmax loss compare with the |
|
how to load the multi labels for data, cause i cannot find any examples in your project. It seems that u didn't modify the datalayer. |