Closed
Conversation
* A sample code was added. * `slice_dim` and `slice_point` attributes were explained.
[docs] brief explanation of SLICE layer's attributes
2c2248a to
8a5e448
Compare
d618f70 to
811d0fa
Compare
811d0fa to
697012c
Compare
Next: release candidater
fix Imagenet example path
Closed
set the right rpath for tools and examples respectively thanks for the report @mees!
[build] fix dynamic linking of tools
… was overwritten with symlink created at build time and installed with install(DIRECTORY ...)
… systems). This commit specifies Python2 with which cpp_lint.py works :-)
[cmake] fix install rpath for pycaffe
num/channnels/height/width indexing is valid.
from saved NetParameter Want to keep the param Blob shape the layer has set, and not necessarily adopt the one from the saved net (e.g. want to keep new 1D bias shape, rather than take the (1 x 1 x 1 x D) shape from a legacy net).
Blobs are N-D arrays (for N not necessarily equals 4)
When setting the mean, assert that it is either one pixel or an array with shape equal to the input data size.
Check shape of input mean
(With layers whose backwards accumlate gradients), this effectively decouples the computational batch from the SGD minibatch. Each iteration accumulates gradients over iter_size batches, then parameters are updated.
(double impl from NVIDIA dev docs; float impl included in CUDA as "atomicAdd")
Merged
|
EmbedLayer has a blob stores the vocabulary_size X embedding_size embeddings, during Forwards/Backward, only invloved words' embeddings are used for computation, but all the embeddings (the whole blob) are updated during solving (Solver::ComputeUpdateValue, Blob::Update). Is my understanding correct? |
Contributor
Author
|
@jzhang533 yes that's correct; it has the same behavior as other parameter layers ( |
|
@jeffdonahue thanks for clarify, am trying to learn embeddings for a large vocabulary, will try to figure out a way to avoid needless computation during solving. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Based on #1486 (N-D blobs) and #1663 (parameter gradient accumulation). This adds
EmbedLayer(should probably change the name toEmbeddingLayerfor consistency withPoolingLayeretc.), which essentially learns a lookup table for integer inputs, useful for language modeling and such. Its computation is equivalent to anInnerProductLayerwhere the inputs are "one-hot" vectors, but instead of explicitly representing the one-hot vectors (which wastes lots of memory), this assumes the input itself is the indices of the "hot" index of those one-hot vectors (like the label inputs for the categorical losses). This should probably be replaced with SparseInnerProduct (#937) once that's merged, assuming that's faster -- this is a more lightweight change (or at least it will be once #1486 is merged) that continues the unfortunate trend of casting floats to ints as labels.