Conversation
|
I do not think this is the best way of ignoring samples for EuclideanLoss (and in general for regression loss). You are also throwing away all samples with a target value close to ignore_label. A more principled way is to implement some sort of masking function. For example, allow the third bottom blob to take a (binary) mask. This mask can be used in backward path to reset/block the gradients. |
|
That seems like a more complex solution (both in terms of implementation and use) without much benefit. Realistically, how many spurious data points will fall within such an extremely narrow range? I think probably zero or very close to it. |
|
Closing as a better approach to this has been suggested. |
|
The suggested approach (by jeff) didn't consider the normalization issue. @Noiredd |
|
I see. But wouldn't it be a better way to go if we added a mask input to the EuclideanLossLayer? Loss label in the case of an L2 loss not only feels confusing (we don't even have actual labels here), but also pointless - effectively this means: "never learn values of x", which isn't generally useful. Taking a mask input (i.e. "do not learn from these examples") would be more universal - and it still fulfills @matthill's original reason for implementing this. I'll reopen this for further conversation. |
This PR allows me to use an "ignore_label" value in the EuclideanLoss layer. For example:
The issue I was having was that I am training against multiple loss functions with different data. Part of my dataset is missing. I still want to use the data when I have it, but I do not want to back-propagate when those values are missing. Simply setting the values to zero causes the regression to lean towards the zero values, rather than learning to ignore them.
SoftmaxWithLoss already has this feature, this PR adds parity for the EuclideanLoss layer.
99% of this PR is pulled from this PR: #3677
However, that PR was not ideal, since it simply casts the value to an int and ignores it if it matches. My values are normalized between -1 and 1 so that wouldn't work. Instead, I am simply checking that the float value is within a narrow range around the configured integer. After making this change, my network's accuracy was much higher.