make gluon rnn layers hybrid blocks by szha · Pull Request #11482 · apache/mxnet

szha · 2018-06-29T06:33:57Z

Description

make gluon rnn layers hybrid blocks
resolves #10873

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Code is well-documented:
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

make gluon rnn layers hybrid blocks
add reverse infer shape for reshape when output shape is complete and only one input dimension is missing.

piiswrong · 2018-06-29T20:31:35Z

+    def hybrid_forward(self, F, inputs, states=None, **kwargs):
+        if F is ndarray:
+            batch_size = inputs.shape[self._layout.find('N')]
+            if self._input_size == 0:


implement infershape instead?

this is not possible due to the inverse shape inference in concat.

or did you mean overriding block's infer shape? I'm taking a look

piiswrong · 2018-06-29T20:32:04Z

+                states = self.begin_state(batch_size, ctx=inputs.context)
+            else:
+                states = self.begin_state(0, func=symbol.zeros)
+        if isinstance(states, (ndarray.NDArray, symbol.Symbol)):


(ndarray.NDArray, symbol.Symbol) -> tensor_types

piiswrong · 2018-06-29T20:32:47Z

-        return outputs, new_states
-
-    def _forward_kernel(self, inputs, states):
+    def __call__(self, inputs, *states):


implement infer_shape instead?

szha · 2018-06-29T22:00:53Z

@piiswrong turns out I cannot do that only in infer_shape. The reason is that sometimes the block is used as a child block of other blocks, in which case the infer shape is called from parent, thus bypassing the code path in rnn infer shape.

piiswrong · 2018-06-29T23:21:56Z

I see. Then could you do it without overriding __call__?

szha · 2018-06-30T00:04:24Z

I can override forward but it would be pretty much equivalent. The reason I cannot do this in hybrid_forward is that when given partial shape, hybrid_forward would only be invoked with symbols as part of the infer_shape pass.

haojin2 · 2018-07-29T00:13:28Z

-    return (*out_attrs)[0].ndim();
+    if ((*out_attrs)[0].ndim()) {
+      return ReverseReshapeInferShape(&(*in_attrs)[0], oshape);
+    }


Should this be

if ((*out_attrs)[0].ndim()) { return ReverseReshapeInferShape(&(*in_attrs)[0], oshape); } return false;

instead?

anirudh2290 · 2018-07-31T23:41:41Z

+from common import setup_module, with_seed, teardown, assert_raises_cudnn_disabled
+from test_gluon import *
+from test_loss import *
+from test_gluon_rnn import *


test_gluon_rnn has assert_raises_cudnn_disabled for test_rnn. test_rnn doesnt raise an error though. I think it should be removed from test_gluon_rnn.py

Roshrini · 2018-08-02T20:30:34Z

@szha today is the mentioned code freeze date for MXNet 1.3 release. Could you please check the status of this PR? thanks!

haojin2 · 2018-08-02T20:33:18Z

@Roshrini We've found the root cause of the bug, the temporary work-around is the one being built and tested now.

szha · 2018-08-02T22:38:54Z

@Roshrini this PR uncovers a weird bug which should probably be addressed before releasing.

anirudh2290 · 2018-08-03T05:05:03Z

 @assert_raises_cudnn_disabled()
 def test_layer_fill_shape():
    layer = gluon.rnn.LSTM(10)
-    layer.hybridize()


Removing this hybridize call should be enough. is there any reason we are removing all hybridize calls ?

The same error on test_norm still occurs if I only remove this one call, so I was trying out all possible cases

Roshrini · 2018-08-03T14:08:49Z

@szha thanks for the update.

zheng-da · 2018-08-03T20:10:44Z

 .set_attr<FCompute>("FCompute<cpu>", ConcatGradCompute<cpu>);

+
+NNVM_REGISTER_OP(_rnn_param_concat)


Please write comments for this operator. How is it different from the normal concat.

This is a custom concat op with specialized infer_shape, which handles the case where the first one or two inputs may have unknown shape that can be inferred from output shape.

zheng-da · 2018-08-03T20:14:25Z

 from common import assert_raises_cudnn_disabled


-@assert_raises_cudnn_disabled()


why do you remove this?

It’s not calling cudnn kernel

zheng-da · 2018-08-03T20:15:17Z

    check_rnn_layer_forward(layer, mx.nd.ones((3, 2, 7)))
    print(layer)
-    assert layer.i2h_weight[0].shape[1] == 7, layer.i2h_weight[0].shape[1]
+    assert layer.l0_i2h_weight.shape[1] == 7, layer.l0_i2h_weight.shape[1]


is this API change?

This has not been exposed as a documented attribute before and shouldn’t be considered part of API

* make Gluon RNN layer hybrid block * separate gluon gpu tests * remove excess assert_raises_cudnn_disabled usage * add comments and refactor * add bidirectional test * temporarily remove hybridize in test_gluon_rnn.test_layer_fill_shape

[MXNET-750] fix nested call on CachedOp. (apache#11951) * fix nested call on cachedop. * fix. extend reshape op to allow reverse shape inference (apache#11956) Improve sparse embedding index out of bound error message; (apache#11940) [MXNET-770] Remove fixed seed in flaky test (apache#11958) * Remove fixed seed in flaky test * Remove fixed seed in flaky test Update ONNX docs with the latest supported ONNX version (apache#11936) Reduced test to 3 epochs and made gpu only (apache#11863) * Reduced test to 3 epochs and made GPU only * Moved logger variable so that it's accessible Fix flaky tests for test_laop_4 (apache#11972) Updating R client docs (apache#11954) * Updating R client docs * Forcing build Fix install instructions for MXNET-R (apache#11976) * fix install instructions for MXNET-R * fix install instructions for MXNET-R * fix default cuda version for MXNet-R [MXNET-751] fix ce_loss flaky (apache#11971) * add xavier initializer * remove comment line [MXNET-769] set MXNET_HOME as base for downloaded models through base.data_dir() (apache#11636) * set MXNET_DATA_DIR as base for downloaded models through base.data_dir() push joblib to save containers so is not required when running * MXNET_DATA_DIR -> MXNET_HOME [MXNET-748] linker fixed on Scala issues (apache#11989) * put force load back as a temporary solution * use project.basedir as relative path for OSX linker [MXNET-772] Re-enable test_module.py:test_module_set_params (apache#11979) [MXNET-771] Fix Flaky Test test_executor.py:test_dot (apache#11978) * use assert_almost_equal, increase rtol, reduce matrix size * remove seed in test_bind * add seed 0 to test_bind, it is still flaky * add comments for tracking remove mod from arity 2 version of load-checkpoint in clojure-package (apache#11808) * remove mod from arity 2 version of load-checkpoint * load-checkpoint arity 2 test Add unit test stage for mxnet cpu in debug mode (apache#11974) Website broken link fixes (apache#12014) * fix broken link * fix broken link * switch to .md links * fix broken link removed seed from flaky test (apache#11975) Disable ccache log print due to threadunsafety (apache#11997) Added default tolerance levels for regression checks for MBCC (apache#12006) * Added tolerance level for assert_almost_equal for MBCC * Nudge to CI Disable flaky mkldnn test_requantize_int32_to_int8 (apache#11748) [MXNET-769] Usability improvements to windows builds (apache#11947) * Windows scripted build Adjust Jenkins builds to use ci/build_windows.py Issues: apache#8714 apache#11100 apache#10166 apache#10049 * Fix bug * Fix non-portable ut * add xunit Fix import statement (apache#12005) array and multiply are undefined. Importing them from ndarray Disable flaky test test_random.test_gamma_generator (apache#12022) [MXNET-770] Fix flaky test: test_factorization_machine_module (apache#12023) * Remove fixed seed in flaky test * Remove fixed seed in flaky test * Update random seed to reproduce the issue * Fix Flaky unit test and add a training test * Remove fixed seed in flaky test * Update random seed to reproduce the issue * Fix Flaky unit test and add a training test * Increase accuracy check disable opencv threading for forked process (apache#12025) Bug fixes in control flow operators (apache#11942) Fix data narrowing warning on graph_executor.cc (apache#11969) Fix flaky tests for test_squared_hinge_loss (apache#12017) Fix flaky tests for test_hinge_loss (apache#12020) remove fixed seed for test_sparse_ndarray/test_operator_gpu.test_sparse_nd_pickle (apache#12012) Removed fixed seed from , test_loss:test_ctc_loss_train (apache#11985) Removed fixed seed from , test_loss:test_sample_weight_loss (apache#11986) Fix reduce_kernel_M1 (apache#12026) * Fix reduce_kernel_M1 * Improve test_norm Update test_loss.py to remove fixed seed (apache#11995) [MXNET-23] Adding support to profile kvstore server during distributed training (apache#11215) * server profiling merge with master cleanup old code added a check and better info message add functions for C compatibility fix doc lint fixes fix compile issues lint fix build error update function signatures to preserve compatibility fix comments lint * add part1 of test * add integration test Re-enabling test_ndarray/test_cached (apache#11950) Test passes on CPU and GPU (10000 runs) make gluon rnn layers hybrid blocks (apache#11482) * make Gluon RNN layer hybrid block * separate gluon gpu tests * remove excess assert_raises_cudnn_disabled usage * add comments and refactor * add bidirectional test * temporarily remove hybridize in test_gluon_rnn.test_layer_fill_shape [MXNET-751] fix bce_loss flaky (apache#11955) * add fix to bce_loss * add comments * remove unecessary comments Doc fix for a few optimizers (apache#12034) * Update optimizer.py * Update optimizer.py

* make Gluon RNN layer hybrid block * separate gluon gpu tests * remove excess assert_raises_cudnn_disabled usage * add comments and refactor * add bidirectional test * temporarily remove hybridize in test_gluon_rnn.test_layer_fill_shape

szha requested a review from piiswrong June 29, 2018 06:34

piiswrong suggested changes Jun 29, 2018

View reviewed changes

szha force-pushed the gluon_hybrid_rnn branch from 04ff5e1 to 1abaa13 Compare June 29, 2018 21:37

szha force-pushed the gluon_hybrid_rnn branch from ab05819 to fe468c6 Compare June 29, 2018 22:13

szha changed the title ~~make gluon rnn layers hybrid blocks~~ [WIP] make gluon rnn layers hybrid blocks Jul 2, 2018

szha force-pushed the gluon_hybrid_rnn branch from fe468c6 to a4e1f64 Compare July 9, 2018 21:25

szha requested a review from anirudh2290 as a code owner July 9, 2018 21:25

szha force-pushed the gluon_hybrid_rnn branch from a4e1f64 to 3e42f9b Compare July 10, 2018 21:43

szha mentioned this pull request Jul 18, 2018

enable CPU kernel for all RNN layer forward #11807

Merged

4 tasks

szha force-pushed the gluon_hybrid_rnn branch 2 times, most recently from de75d33 to 0970b45 Compare July 19, 2018 17:05

szha force-pushed the gluon_hybrid_rnn branch from 0970b45 to eb1a41b Compare July 26, 2018 18:14

szha changed the title ~~[WIP] make gluon rnn layers hybrid blocks~~ make gluon rnn layers hybrid blocks Jul 26, 2018

piiswrong approved these changes Jul 26, 2018

View reviewed changes

szha force-pushed the gluon_hybrid_rnn branch 4 times, most recently from 03ea58d to a7a3112 Compare July 27, 2018 17:22

haojin2 reviewed Jul 29, 2018

View reviewed changes

szha force-pushed the gluon_hybrid_rnn branch 4 times, most recently from be969f5 to 2320758 Compare July 30, 2018 06:28

szha mentioned this pull request Jul 30, 2018

[Discussion] MXNet 2.0 Roadmap (was: APIs that might be a good idea to break in 2.0) #9686

Closed

szha force-pushed the gluon_hybrid_rnn branch 3 times, most recently from 0f7ad25 to f75fe01 Compare July 31, 2018 01:03

anirudh2290 reviewed Jul 31, 2018

View reviewed changes

szha force-pushed the gluon_hybrid_rnn branch 2 times, most recently from 8a62ec4 to 340dc9d Compare August 1, 2018 02:32

szha force-pushed the gluon_hybrid_rnn branch from 4897ff6 to c9bcc2b Compare August 2, 2018 20:54

szha force-pushed the gluon_hybrid_rnn branch 2 times, most recently from 6a80987 to 9dde5e7 Compare August 3, 2018 00:20

anirudh2290 reviewed Aug 3, 2018

View reviewed changes

szha force-pushed the gluon_hybrid_rnn branch from 9dde5e7 to 74bda2b Compare August 3, 2018 05:59

zheng-da reviewed Aug 3, 2018

View reviewed changes

szha force-pushed the gluon_hybrid_rnn branch from d2bc3bd to 9b52f82 Compare August 4, 2018 05:56

Sheng Zha added 4 commits August 3, 2018 22:56

make Gluon RNN layer hybrid block

62f838e

separate gluon gpu tests

f15bd59

remove excess assert_raises_cudnn_disabled usage

63726f0

add comments and refactor

953e1bb

leezu force-pushed the gluon_hybrid_rnn branch from 9b52f82 to fc6c23b Compare August 4, 2018 14:55

szha force-pushed the gluon_hybrid_rnn branch from 38c133f to e631019 Compare August 4, 2018 15:13

Sheng Zha added 2 commits August 4, 2018 08:13

add bidirectional test

785b7a0

temporarily remove hybridize in test_gluon_rnn.test_layer_fill_shape

e631019

szha merged commit 5474b08 into apache:master Aug 4, 2018

piyushghai mentioned this pull request Aug 6, 2018

Backwards compatibility checker: Parameter 'model.1._unfused.0.l_cell.i2h_bias' is missing in file 'lstm_gluon_save_parameters_api-params' #12046

Closed

szha deleted the gluon_hybrid_rnn branch August 6, 2018 19:01

szha mentioned this pull request Aug 10, 2018

apply_weight_drop required changes to integrate mxnet 1.3 dmlc/gluon-nlp#264

Closed

3 tasks

sandeep-krishnamurthy mentioned this pull request Aug 17, 2018

How to get the symbol of gluon's RNN model by hybridizing #7551

Closed

		.set_attr<FCompute>("FCompute<cpu>", ConcatGradCompute<cpu>);


		NNVM_REGISTER_OP(_rnn_param_concat)

		from common import assert_raises_cudnn_disabled


		@assert_raises_cudnn_disabled()

Conversation

szha commented Jun 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Essentials

Changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szha commented Jun 29, 2018

Uh oh!

piiswrong commented Jun 29, 2018

Uh oh!

szha commented Jun 30, 2018

Uh oh!

haojin2 Jul 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Roshrini commented Aug 2, 2018

Uh oh!

haojin2 commented Aug 2, 2018

Uh oh!

szha commented Aug 2, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Roshrini commented Aug 3, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

szha commented Jun 29, 2018 •

edited

Loading

haojin2 Jul 29, 2018 •

edited

Loading