Conversation
d6eb72a to
9271abf
Compare
|
Rebased and ready for review. (Previously depended on gradient accumulation PR #1663.) |
|
This layer works as a lookup table and could be renamed to LookupTable. |
287d2c1 to
69b0e8c
Compare
(double impl from NVIDIA dev docs; float impl included in CUDA as "atomicAdd")
69b0e8c to
ac9e29f
Compare
Embed layer for lookup table of one hot encodings
|
Here is an example of a typical
Understood, of course the padding is to fix the input sequence length. |
(Replaces #1872)
Based on #1977 (parameter gradient accumulation). This adds EmbedLayer (should probably change the name to EmbeddingLayer for consistency with PoolingLayer etc.), which essentially learns a lookup table for integer inputs, useful for language modeling and such. Its computation is equivalent to an InnerProductLayer with "one-hot" vector inputs, but instead of explicitly representing the one-hot vectors (which wastes lots of memory), this assumes the input itself is the indices of the "hot" index of those one-hot vectors (like the label inputs for the categorical losses). This should probably be replaced with SparseInnerProduct (#937) once that's merged, assuming that's faster -- this is a more lightweight change that continues the unfortunate trend of casting floats to ints as labels.