Adding Batch L2 Normalization Layer that L2 normalizes rows of input Tensor#260
Adding Batch L2 Normalization Layer that L2 normalizes rows of input Tensor#260karpathy wants to merge 4 commits intotorch:masterfrom
Conversation
|
will give in-line feedback. this has some common patterns that we dont want in nn. |
L2Normalize.lua
Outdated
There was a problem hiding this comment.
torch.xxx(...) usually does a malloc for the result tensor.
mallocs during the training loop of the nn are usually quite bad for performance.
Another reason mallocs are bad is that they introduce a synchronize point in CUDA, so you essentially kill any multi-GPU code.
instead, consider having a buffer.
self.buffer = self.buffer or input.new()
self.norms = self.norms or input.new()
self.norms:sum(self.buffer:cmul(input, input), 2):sqrt()
There was a problem hiding this comment.
Thanks,
some of this was intentional on my part because I'm applying this in space-limited setting (I'm constantly running out of memory on GPU), so saving all of these cached intermediates (+ at every step of my LSTM) was costly. But if this is considered preferable I'll rewrite.
EDIT: I did not appreciate the point about multi-GPU code, interesting.
|
overall the barrier of contributing is lower to nnx (nn-experimental) which is here: https://github.com/clementfarabet/lua---nnx I made a bunch of comments, but hope they dont stop you from contributing in the future :) |
|
Thanks for comments! Some of these issues I was aware of and some of them were just silly mistakes (e.g. reshape) - sorry about that. I did a rewrite that should now be much more time-efficient (at cost of space efficiency (due to held caches) and readability), but it works. |
…rward pass would incorrectly compute the gradient because of inplace operations
|
I think it's missing the implementation when |
|
squashed the commits and pushed to master via 905ea8c |
|
yay!! By the way, suggested improvements going forward:
|
|
@karpathy, I would suggest
And, thank you for the time you took to help the Torch's community 😁 |
|
Some documentation would be nice. |
|
I totally missed that this PR doesn't have documentation. Indeed some documentation would be nice... |
|
I'll add docs. I have a patch for |
This layer L2 normalizes an
n x dTensor. I tested the implementation on both CPU/GPU and gradient checked it withjacand also with my (other) custom code. I also compared to a slower version that uses a loop, as can be seen in this gist. (This batched version is quite a lot faster)Also, I have only worked with Torch for ~2 weeks so exercise caution :)