Adding Batch L2 Normalization Layer that L2 normalizes rows of input Tensor by karpathy · Pull Request #260 · torch/nn

karpathy · 2015-05-05T22:07:34Z

This layer L2 normalizes an n x d Tensor. I tested the implementation on both CPU/GPU and gradient checked it with jac and also with my (other) custom code. I also compared to a slower version that uses a loop, as can be seen in this gist. (This batched version is quite a lot faster)

Also, I have only worked with Torch for ~2 weeks so exercise caution :)

…or unit L2 norm

soumith · 2015-05-06T02:35:36Z

will give in-line feedback. this has some common patterns that we dont want in nn.

soumith · 2015-05-06T02:38:50Z

L2Normalize.lua

torch.xxx(...) usually does a malloc for the result tensor.
mallocs during the training loop of the nn are usually quite bad for performance.
Another reason mallocs are bad is that they introduce a synchronize point in CUDA, so you essentially kill any multi-GPU code.

instead, consider having a buffer.

self.buffer = self.buffer or input.new()
self.norms = self.norms or input.new()
self.norms:sum(self.buffer:cmul(input, input), 2):sqrt()

Thanks,

some of this was intentional on my part because I'm applying this in space-limited setting (I'm constantly running out of memory on GPU), so saving all of these cached intermediates (+ at every step of my LSTM) was costly. But if this is considered preferable I'll rewrite.

EDIT: I did not appreciate the point about multi-GPU code, interesting.

soumith · 2015-05-06T02:43:42Z

overall the barrier of contributing is lower to nnx (nn-experimental) which is here: https://github.com/clementfarabet/lua---nnx

I made a bunch of comments, but hope they dont stop you from contributing in the future :)

karpathy · 2015-05-06T03:45:23Z

Thanks for comments! Some of these issues I was aware of and some of them were just silly mistakes (e.g. reshape) - sorry about that. I did a rewrite that should now be much more time-efficient (at cost of space efficiency (due to held caches) and readability), but it works.

…rward pass would incorrectly compute the gradient because of inplace operations

Atcold · 2015-05-06T17:33:22Z

I think it's missing the implementation when input:dim() == 1. If a user sends a non-batched input, the module should output the normalised single vector (see this implementation, which uses an approximate gradient computation but works for single samples as well).

soumith · 2015-05-13T04:51:15Z

squashed the commits and pushed to master via 905ea8c

karpathy · 2015-05-13T05:01:42Z

yay!! By the way, suggested improvements going forward:

Incorporate a smoothing term to prevent division by zero (?) Or perhaps this should this be optional? Not clear.
Allow 1-dimensional inputs as suggested by Atcold.

Atcold · 2015-05-16T06:01:28Z

@karpathy, I would suggest

a smoothing by default of 1e-5 like shown here. If the user wants to set something else, then let her do it by an optional parameter [eps] in the module initialisation;
if you don't want to write new code, 1-dimensional inputs could be viewed as 1 x length. So if dim(input) == 1 then newInput = input:view(1,-1);
to add the corresponding documentation, so we avoid having ghost features.

And, thank you for the time you took to help the Torch's community 😁

nicholas-leonard · 2015-05-18T17:06:01Z

Some documentation would be nice.

soumith · 2015-05-18T17:51:39Z

I totally missed that this PR doesn't have documentation. Indeed some documentation would be nice...

karpathy · 2015-05-18T18:20:56Z

I'll add docs. I have a patch for eps and I tried to patch the support for 1D case but that ended up taking an hour yesterday, it's not as trivial as changing the view with 1 line of code. I'll work on it.

karpathy added 2 commits May 5, 2015 14:55

Adding Batch L2 Normalization Layer that makes all rows of input Tens…

564e0b9

…or unit L2 norm

adding squeeze to force 2d tensor at end

3636987

soumith reviewed May 6, 2015
View reviewed changes

Implemented heavy caching, thank you @soumith for suggestions

5bebc3b

Fixed a problem where calling backward pass multiple times per one fo…

d385c27

…rward pass would incorrectly compute the gradient because of inplace operations

soumith closed this May 13, 2015

fmassa mentioned this pull request Aug 1, 2015

Adds nn.Normalize #341

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding Batch L2 Normalization Layer that L2 normalizes rows of input Tensor#260

Adding Batch L2 Normalization Layer that L2 normalizes rows of input Tensor#260
karpathy wants to merge 4 commits intotorch:masterfrom
karpathy:l2norm

karpathy commented May 5, 2015

Uh oh!

soumith commented May 6, 2015

Uh oh!

soumith May 6, 2015

Uh oh!

karpathy May 6, 2015

Uh oh!

soumith commented May 6, 2015

Uh oh!

karpathy commented May 6, 2015

Uh oh!

Atcold commented May 6, 2015

Uh oh!

soumith commented May 13, 2015

Uh oh!

karpathy commented May 13, 2015

Uh oh!

Atcold commented May 16, 2015

Uh oh!

nicholas-leonard commented May 18, 2015

Uh oh!

soumith commented May 18, 2015

Uh oh!

karpathy commented May 18, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

karpathy commented May 5, 2015

Uh oh!

soumith commented May 6, 2015

Uh oh!

soumith May 6, 2015

Choose a reason for hiding this comment

Uh oh!

karpathy May 6, 2015

Choose a reason for hiding this comment

Uh oh!

soumith commented May 6, 2015

Uh oh!

karpathy commented May 6, 2015

Uh oh!

Atcold commented May 6, 2015

Uh oh!

soumith commented May 13, 2015

Uh oh!

karpathy commented May 13, 2015

Uh oh!

Atcold commented May 16, 2015

Uh oh!

nicholas-leonard commented May 18, 2015

Uh oh!

soumith commented May 18, 2015

Uh oh!

karpathy commented May 18, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants