Skip to content

MNIST autoencoder example#330

Merged
shelhamer merged 10 commits intoBVLC:devfrom
jeffdonahue:mnist-autoencoder-example
Apr 16, 2014
Merged

MNIST autoencoder example#330
shelhamer merged 10 commits intoBVLC:devfrom
jeffdonahue:mnist-autoencoder-example

Conversation

@jeffdonahue
Copy link
Contributor

This PR moves examples/lenet/ to examples/mnist/ and adds the necessary files to train an autoencoder on MNIST with the architecture of Hinton & Salakhutdinov [1] using SGD with no pre-training (e.g. via RBM). It uses a sparse Gaussian initialization (added to filler.hpp) as suggested by [2] as a strategy for training autoencoders via SGD* without pretraining. It uses a fixed LR of 0.0001 which could probably be greatly improved upon, but I haven't played with it much.

After 2 million iterations (which took a few hours on the GPU -- I didn't originally intend to train it this long but this is what it was at when I came back to it) the test L2 reconstruction error was around 1.5-1.6. (For an idea of what this means, a reconstruction with 2 out of 784 pixels flipped from perfectly white to perfectly black (or vice versa) would have an L2 reconstruction error of 2.0)

*actually [2] used Nesterov's accelerated gradient, but this is not currently implemented in Caffe and SGD seems to be fairly effective.

[1] http://www.cs.toronto.edu/~hinton/science.pdf

[2] http://jmlr.org/proceedings/papers/v28/sutskever13.pdf

@jeffdonahue
Copy link
Contributor Author

Ready to merge, @shelhamer.

The unit test problem with SigmoidCrossEntropyLossLayer I mentioned I was having occurred when I changed CheckGradientSingle to CheckGradientExhaustive. The reason that the exhaustive version fails is that, like in the current implementation of the Euclidean loss layer, I'm not propagating gradients down to the latter input which we assume is "ground truth".

While I could propagate down to the 2nd input, the semantics of this layer (take sigmoid of first input and compute cross-entropy error on the sigmoidal outputs and the second input assumed to already be in a 0-1 range) seem like they would make it odd to use with weights below the second input, and it's wasteful to propagate down to the second input if there aren't any weights below. I think this should eventually be fixed somehow, e.g. by making the bool propagate_down input to Backward a vector<bool> propagate_down with propagate_down.size() == bottom.size(). Right now I think the propagate_down input is just a hard-coded true everywhere in the code that Backward is called.

@shelhamer
Copy link
Member

Ok, looks good. Thanks Jeff!

by making the bool propagate_down input to Backward a vector propagate_down with propagate_down.size() == bottom.size()

This seems like a good way to fix it to me.

Agreed with not propagating to the 2nd input for now for the reason given. For anyone with a model where both paths have parameters, they could hack in a field in the loss layer like propagate_all or some-such until we fix the problem with an actual vector of propagation flags that Net::Init() can figure out.

shelhamer added a commit that referenced this pull request Apr 16, 2014
@shelhamer shelhamer merged commit 2dad9ca into BVLC:dev Apr 16, 2014
@jeffdonahue jeffdonahue deleted the mnist-autoencoder-example branch April 16, 2014 17:58
@shelhamer shelhamer mentioned this pull request May 20, 2014
mitmul pushed a commit to mitmul/caffe that referenced this pull request Sep 30, 2014
@moi90
Copy link

moi90 commented Jun 8, 2015

I'd be very happy about some comments in mnist_autoencoder.prototxt or a accompanying readme about how exactly the autoencoder works. In particular, the roles of the two distinct loss layers (and why the one operates on flatdata and decode1 and the other on flatdata and decode1neuron) are not clear to me.

@liuruoze
Copy link

I agree with @moi90 , a readme and tutorials would be very nice to one who wants to used them.
If @jeffdonahue could give some instructions, that would be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments