Skip to content

Bugs fixed in the euclidean loss layer.#137

Closed
aravindhm wants to merge 4 commits intoBVLC:devfrom
aravindhm:euclideanlossbugfix
Closed

Bugs fixed in the euclidean loss layer.#137
aravindhm wants to merge 4 commits intoBVLC:devfrom
aravindhm:euclideanlossbugfix

Conversation

@aravindhm
Copy link

I have the following concerns

  1. The backpropogated gradient is sent to only one of the bottom blobs bottom[0]. One of the bottom blobs was not getting updated in my experiments. I think it should go on both blobs.
  2. Some formulations use 1/n sum over i (x_i - y_i)^2 where n is the size of x and y vectors. Can this be accommodated by using the scale parameter already present in the layer (default value 1).
  3. The euclidean loss when used in both net.prototxt and val.prototxt files would not publish any error over validation data because it does not output a measure of accuracy into a layer above it (no output layers in the network). I propose that this layer compute the error and publish it to a blob top[0] if one exists. In other words we need to define forward_cpu and forward_gpu so that the network in test mode outputs some error corresponding to this layer.

Please provide feedback on this pull request. I've not tested it for MKL though I've editted the test file. I think the gradient checker hasn't been used correctly in my version. Please let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use EXPECT_NEAR instead of EXPECT_LE and EXPECT_GE

@jeffdonahue
Copy link
Contributor

Thanks @aravindhm! I added a couple of minor comments above, will merge once these are addressed.

@aravindhm
Copy link
Author

I've fixed the problems and just checked that the code complies and works with the boost-eigen branch (after a few changes specific to that branch).

@jeffdonahue
Copy link
Contributor

Hi @aravindhm, I compiled and ran your tests and I get numerous gradient check failures (which look well beyond a simple precision issue):

The difference between computed_gradient and estimated_gradient is 5.5232507603383505, which exceeds threshold_ * scale, where
computed_gradient evaluates to 0.098407263758736319,
estimated_gradient evaluates to 5.621658024097087, and
threshold_ * scale evaluates to 0.05621658024097087.
debug: (top_id, top_data_id, blob_id, feat_id)=-1,-1,0,497
./src/caffe/test/test_gradient_check_util.hpp:124: Failure
The difference between computed_gradient and estimated_gradient is 9.9778362607820572, which exceeds threshold_ * scale, where
computed_gradient evaluates to -0.17777421436460128,
estimated_gradient evaluates to -10.155610475146659, and
threshold_ * scale evaluates to 0.10155610475146659.
debug: (top_id, top_data_id, blob_id, feat_id)=-1,-1,0,498
./src/caffe/test/test_gradient_check_util.hpp:124: Failure
The difference between computed_gradient and estimated_gradient is 7.921049009144614, which exceeds threshold_ * scale, where
computed_gradient evaluates to -0.14112862024779446,
estimated_gradient evaluates to -8.0621776293924086, and
threshold_ * scale evaluates to 0.080621776293924086.
debug: (top_id, top_data_id, blob_id, feat_id)=-1,-1,0,499
[  FAILED  ] EuclideanLossLayerTest/1.TestGradientCPU, where TypeParam = double (111 ms)
[----------] 4 tests from EuclideanLossLayerTest/1 (114 ms total)

[----------] Global test environment tear-down
[==========] 8 tests from 2 test cases ran. (782 ms total)
[  PASSED  ] 6 tests.
[  FAILED  ] 2 tests, listed below:
[  FAILED  ] EuclideanLossLayerTest/0.TestGradientCPU, where TypeParam = float
[  FAILED  ] EuclideanLossLayerTest/1.TestGradientCPU, where TypeParam = double

If you're able to fix these errors, please also remove the duplicated difference & loss computation in the backward pass, rebase to dev, and fix the style issues that make lint spits out.

@sergeyk
Copy link
Contributor

sergeyk commented Mar 13, 2014

@aravindhm Will you be able to make the suggested changes? We will merge as soon as tests pass.

@sergeyk sergeyk added the bug label Mar 13, 2014
@shelhamer
Copy link
Member

@jeffdonahue could you follow-up on these fixes / fix this yourself since this PR seems to be abandoned?

@aravindhm
Copy link
Author

I can work on this over this weekend.

What are the semantics of the return value for backward_cpu and backward_gpu in the context of a loss layer type? If the loss value is also provided to top[0] then does the meaning of this return value change? I am unable to reverse engineer the meaning of these quantities from the gradient checker code.

Also, if we have multiple data points in a batch, is the gradient chosen to be the average across the batch (1/bottom[0]->num()) or the total?

Also, is it better if I create a new PR based on a checkout of the most recent dev or simply rebase this PR?

@shelhamer
Copy link
Member

Thanks for finishing this up!

Note that loss is now computed in the forward pass since #209 and the backward_* methods are void, so that settles your question about return semantics.

is the gradient chosen to be the average across the batch?

Yeah, both the loss and diff are scaled by dividing through by the number of instances. See loss_layers.cpp for examples.

Re: how to update the PR, please rebase against the latest dev and force push (git push -f) to your branch. It's easier to keep this thread and follow it.

@shelhamer
Copy link
Member

Irrelevant with the new backward protocol of #497.

@shelhamer shelhamer closed this Jun 26, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments