[RFC/WIP] Bilinear initializer by vchuravy · Pull Request #34 · dmlc/MXNet.jl

vchuravy · 2015-11-22T08:19:54Z

This PR adds a Bilinear initializer similar to BVLC/caffe#2213 which is useful for upsampling with deconvolution. Additionally this allows to set different initializers for different layers.

Todo

Documentation for BilinearInitializer
Tests and example
Documentation for initializer per layer
Setting lr-rate per layer (since we preinitialize to a given function we need to turn learning off)
Initializing filter correctly for multiple channels

Setting initializer per layer

using MXNet

data = mx.Variable(:data)
pool1 = mx.Pooling(data = data, kernel = (2,2), pool_type = (:max), stride = (2,2))
deconv = mx.Deconvolution(data = pool1, num_filter = 2, kernel = (2,2), stride = (2,2), no_bias=true)

deconv_w = mx.list_arguments(deconv)[2]


data = zeros(64, 64, 1, 10)
dp = mx.ArrayDataProvider(data, batch_size = 1)

model = mx.FeedForward(deconv)
mx.init_model(model, Dict(:default => mx.UniformInitializer(), deconv_w => mx.BilinearInitializer()), data=size(data))
pred = mx.predict(model, dp)

Proper upsampling

using MXNet

# scaling factor
factor = 2
@show kernel = 2factor - factor % 2
stride = factor
@show pad = ceil(Int64, (factor - 1) / 2)

data = mx.Variable(:data)
deconv = mx.Deconvolution(data = data, num_filter = 1, kernel = (kernel, kernel), stride = (stride, stride), pad = (pad, pad), no_bias=true)

deconv_w = mx.list_arguments(deconv)[2]

data = zeros(3, 3, 1, 1)
for i in 1:3
  for j in 1:3
    data[i, j, 1, 1] = i*j
  end
end

@show data

dp = mx.ArrayDataProvider(data, batch_size = 1)

model = mx.FeedForward(deconv)
mx.init_model(model, Dict(:default => mx.UniformInitializer(), deconv_w => mx.BilinearInitializer()), data=size(data))
pred = mx.predict(model, dp)

@show pred

Ref: #31

codecov-io · 2015-11-22T08:34:00Z

Current coverage is `75.47%`

Merging #34 into master will decrease coverage by -0.92% as of 85b9fd8

@@            master     #34   diff @@
======================================
  Files           20      20       
  Stmts         1449    1468    +19
  Branches         0       0       
  Methods          0       0       
======================================
+ Hit           1107    1108     +1
  Partial          0       0       
- Missed         342     360    +18

Review entire Coverage Diff as of 85b9fd8

Powered by Codecov. Updated on successful CI builds.

vchuravy · 2015-11-22T14:03:58Z

I added a Jupyter notebook as an example, similar to http://nbviewer.ipython.org/gist/tnarihi/54744612d35776f53278

vchuravy · 2015-11-22T14:05:20Z

@pluskid How would I best turn of learning for a layer?

pluskid · 2015-11-23T03:16:47Z

The API looks good to me!

By "turn of" do you mean "turn off"? There is a hacky way of turning off learning by using BlockGrad operator. It blocks gradient back propagation. With several drawbacks:

One need to modify the symbolic structure and insert the BlockGrad symbol.
Layers below it will not get gradients and therefore not get trained.

One nice thing we could have (as in Caffe) is per-layer (per-operator) learning rate. Choices include

Modify the operator definition and add new argument grad_scale (currently Loss layers have this property)
Utilize the newly added attribute [SYMBOL] enable attributes in graph node apache/mxnet#685 interface to attach per operator learning rate
Like what you did here, pass a dictionary to the fit function specifying optionally per-operator learning rate.

I think the 3rd option sounds best as it requires minimum change to the backend codebase and it actually makes more sense as grad_scale is a property for trainer only. The only (slight) inconvenience is that the user has to specify the learning rate separately, which means you will need to use something like what you did in your iJulia notebook

deconv_w =  mx.list_arguments(deconv)[2]

to get the key to be used in the dictionary. Regarding this, the 2nd option might be a good compromise.

vchuravy · 2015-11-23T03:26:35Z

Yeah passing a dictionary in would be the least hacky, but also the most inconvenient. Maybe one could alleviate that by adding mx.weight_name so that the users don't have to worry about getting the correct symbol.

vchuravy · 2015-11-24T04:43:37Z

@pluskid So I was looking into using attributes to set lr per layer, but the calls to the updater function https://github.com/dmlc/MXNet.jl/blob/master/src/optimizer.jl#L179-L183 only receive the NDArrays, is there anyway to get the associated symbol?

pluskid · 2015-11-24T04:57:43Z

Yes, let me think about it. When you construct a symbolic graph, the operators kind of get smashed into a single symbolic node at the end. I'm not even sure if there might be some graph re-writing to optimize runtime efficiency without looking at the libmxnet source code.

@tqchen Is there easy API to inspect the original symbolic hierarchy? (Other than dumping into a JSON)

vchuravy · 2015-11-29T13:14:00Z

superseded by apache/mxnet#746

fixes bilinear initializer following approach in #34

vchuravy force-pushed the vc/bilinear branch from ac43433 to 7f9663a Compare November 22, 2015 14:02

vchuravy mentioned this pull request Nov 23, 2015

[SYMBOL] enable attributes in graph node apache/mxnet#685

Merged

vchuravy added 3 commits November 23, 2015 13:36

first stab at bilinear initialization for deconvolution

3252666

allow to set initializer per layer

d696a3d

add example for bilinear upsampling

bf02531

vchuravy force-pushed the vc/bilinear branch from 7f9663a to bf02531 Compare November 23, 2015 04:36

vchuravy mentioned this pull request Nov 29, 2015

[Operator] add nearest neighboor up sampling apache/mxnet#742

Merged

vchuravy closed this Nov 29, 2015

vchuravy added a commit that referenced this pull request Apr 13, 2017

fixes bilinear initializer following approach in #34

5a96943

vchuravy added a commit that referenced this pull request Apr 13, 2017

fixes bilinear initializer following approach in #34

bfa966a

vchuravy mentioned this pull request Apr 13, 2017

fixes bilinear initializer following approach in #34 #229

Merged

vchuravy added a commit that referenced this pull request Apr 13, 2017

Merge pull request #229 from dmlc/vc/bilinear_initializer

44a6c36

fixes bilinear initializer following approach in #34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC/WIP] Bilinear initializer#34

[RFC/WIP] Bilinear initializer#34
vchuravy wants to merge 3 commits intodmlc:masterfrom
vchuravy:vc/bilinear

vchuravy commented Nov 22, 2015

Uh oh!

codecov-io commented Nov 22, 2015

Uh oh!

vchuravy commented Nov 22, 2015

Uh oh!

vchuravy commented Nov 22, 2015

Uh oh!

pluskid commented Nov 23, 2015

Uh oh!

vchuravy commented Nov 23, 2015

Uh oh!

vchuravy commented Nov 24, 2015

Uh oh!

pluskid commented Nov 24, 2015

Uh oh!

vchuravy commented Nov 29, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

vchuravy commented Nov 22, 2015

Todo

Setting initializer per layer

Proper upsampling

Uh oh!

codecov-io commented Nov 22, 2015

Current coverage is 75.47%

Uh oh!

vchuravy commented Nov 22, 2015

Uh oh!

vchuravy commented Nov 22, 2015

Uh oh!

pluskid commented Nov 23, 2015

Uh oh!

vchuravy commented Nov 23, 2015

Uh oh!

vchuravy commented Nov 24, 2015

Uh oh!

pluskid commented Nov 24, 2015

Uh oh!

vchuravy commented Nov 29, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Current coverage is `75.47%`