Refine porting x86 NCHWc conv to AutoTVM #1993

yzhliu · 2018-10-25T17:27:27Z

Add update to dispatchers so that alter_op_layout can provide config for updated operators.
Register NCHWc's compute, schedule and args_to_workload to autotvm

The whole process becomes,

Tuning: tune with origin workload, but users can choose to use NCHWc compute & schedule.
AlterLayout: extract config using origin workload, replace conv op with NCHWc implement, update dispatcher with NCHWc workload and config
Compute & schedule: normal way as in autotvm. NCHWc workloads will exist since AlterLayout updated that.

Reviews please review @kevinthesun @merrymercy

kevinthesun · 2018-10-25T19:40:21Z

python/tvm/autotvm/task/dispatcher.py

-            The specific configuration.
-        """
+        key = (str(target), workload)
+        if self._counter < len(self._records) and (key not in self._global_cfg_dict):


Here we might just want "if self._counter < len(self._records)". Although it's not likely to happen, If somehow key is already in self._global_cfg_dict, we still want to use the cfg from the records and update self._global_cfg_dict.

kevinthesun · 2018-10-25T20:01:11Z

topi/python/topi/x86/conv2d.py

-    new_attrs['out_layout'] = 'NCHW%dc' % oc_bn

-    # Store global schedule dictionary for ApplyGraphBest dispatcher
+    workload = _conv_arg_to_workload(data, kernel, strides, padding, layout, out_dtype)


This change means that we also need to update existing pre-searched x86 conv2d schedules.

Got it, I update it in ApplyGraphBest query instead.

merrymercy · 2018-10-26T17:39:41Z

I like this design.
Autotvm uses arguments as workload directly, this can automate a lot of things.

Previously for altered operator, we will convert the altered workload to the original workload and query the log. This results in strange logic in _conv_arg_to_workload and contradicts the goal of automating config dispatching.
After this PR, we can make sure all records in ApplyHistoryBest context exactly match the arguments in function calls, so we can delete all things like _conv_arg_to_workload, _conv2d_NCHWc_arg_to_workload
Then we can use autotvm.register_topi_compute, autotvm.register_topi_schedule. These two decorators will help you to do all things related to workload/config dispatching. I have used these two decorators for depthwise_conv2d since depthwise_conv2d doesn' need alter_op_layout.

Using the update mechanism in this PR, we can make the code simpler. I can follow with updates for other backends (cuda, arm cpu)

merrymercy · 2018-10-26T16:54:51Z

topi/python/topi/x86/conv2d.py

    # get config here
    cfg = get_config()
-    _create_schedule_template(cfg, data, kernel, strides, padding, layout)
+    _create_schedule_template(cfg, data, kernel, strides, padding, origin_layout)


rename to _create_tuning_space or _create_template_space ?
To clarify some terminologies.

template = the code in topi schedule = template + a specific config

so template is a function(config) -> schedule

merrymercy · 2018-10-26T17:15:03Z

topi/python/topi/x86/conv2d.py



-def conv_arg_to_workload(data, kernel, strides, padding, layout, out_dtype):
+def _conv_arg_to_workload(data, kernel, strides, padding, layout, out_dtype):


After this change, we don't need this function anymore

merrymercy · 2018-10-26T17:47:04Z

topi/python/topi/x86/conv2d.py

+    return _conv_arg_to_workload(data, kernel, strides, padding, layout, out_dtype)


 @conv2d_x86.register(["direct"])


We can use @autotvm.topi_register_compute(conv2d, 'cpu', 'direct') and delete the function conv2d_x86

merrymercy · 2018-10-26T17:48:17Z

topi/python/topi/x86/conv2d.py

-                                    conv_arg_to_workload(data, kernel, strides,
-                                                         padding, layout,
-                                                         out_dtype)})
+                                    _conv_arg_to_workload(data, kernel, strides,


If we use @autotvm.topi_register_compute, we don't need this. That decorator will help us to attach the workload

merrymercy · 2018-10-26T17:51:10Z

topi/python/topi/x86/conv2d.py

+                                       padding, layout, out_layout, out_dtype)


+@conv2d_NCHWc_cpu.register(['direct'])


use autotvm.register_topi_compute

merrymercy · 2018-10-26T17:52:14Z

topi/python/topi/x86/conv2d.py

-    workload = conv_NCHWc_arg_to_workload(data, kernel, kernel_size,
-                                          strides, padding, layout,
-                                          out_layout, out_dtype),
+    workload = _conv_NCHWc_arg_to_workload(data, kernel,


after using autotvm.register_topi_compute, we don't need to attach workload manually

yzhliu · 2018-10-27T06:01:56Z

@merrymercy Thanks, please review again.
Note that I add function as an argument to args_to_workload so that alter_layout_op and topi_integration can invoke one function and have unified behavior - let me know if there's a better way: 45559ce#diff-73c074e7bdae52b76a83efdc54a2fdb6R185

merrymercy · 2018-10-27T08:26:34Z

tutorials/autotvm/tune_nnvm_x86.py

+        args_conv_NCHWc = [data_plc, kernel_plc, num_filter,
+                           kernel_size, strides, padding, layout, dtype]
+        args_conv_NCHWc = autotvm.task.nnvm_integration.serialize_args(args_conv_NCHWc)
+        task = autotvm.task.create("topi_x86_conv2d_NCHWc", args=args_conv_NCHWc, target=target)


Add argument template='direct' to autotvm.task.create

merrymercy · 2018-10-27T08:38:47Z

topi/python/topi/x86/conv2d_avx_common.py

-            reg_n = n
-            break
-
+    sch = _get_default_schedule(wkl, simd_width)


delete function _get_default_schedule and AVXConvCommonFwd

merrymercy · 2018-10-27T08:39:41Z

topi/python/topi/x86/conv2d_avx_1x1.py

-                    return ConfigEntity.from_json_dict(cfg_dict)
-
-    raise ValueError("cannot decide default schedule for workload: {}".format(wkl))
+    sch = _get_default_schedule(wkl, simd_width)


delete function _get_default_schedule and AVXConv1x1Fwd

merrymercy · 2018-10-27T08:40:55Z

topi/python/topi/x86/conv2d.py

    # get config here
    cfg = get_config()
-    _create_schedule_template(cfg, data, kernel, strides, padding, layout)
+    cfg.template_key = "direct"


delete this. Users will specify a tempalte key in tune_nnvm_x86.py

merrymercy · 2018-10-27T08:52:53Z

topi/python/topi/x86/conv2d.py

-                    current_cfg = _query_dispatcher(workload)
+                assert current_cfg is not None
+                if current_cfg.is_fallback:
+                    wkl = Workload(data.dtype, conv_out.dtype, h, w, ic, num_filter,


This case should never happend. We can handle fallback only in topi_compute. (see explanation below.)

merrymercy · 2018-10-27T09:05:19Z

topi/python/topi/x86/conv2d.py

+    if cfg.is_fallback:
+        wkl = Workload(data.dtype, out_dtype, in_height, in_width, in_channel, num_filter,
+                       kernel_height, kernel_width, HPAD, WPAD, HSTR, WSTR)
+        cfg = _get_default_config(wkl)


instead of returning a new config, we can directly modify the values in fallback schedule. e.g. cfg['tile_ic'] = 12.
The same modified config will also be passed to topi_schedule, so we can to handle fallback only in topi_compute.

I add __setitem__ to FallbackConfigEntity, if it looks good.

merrymercy

Nice, we did a good clean up for x86 backend. One final comment.

merrymercy · 2018-10-28T08:47:40Z

topi/python/topi/x86/conv2d.py

-            kh, kw = kernel_size if isinstance(kernel_size, (tuple, list)) else \
-                (kernel_size, kernel_size)
+            kh, kw = kernel_size if isinstance(kernel_size, (tuple, list)) \
+                else (kernel_size, kernel_size)


We don't need kernel_size in the argument list. We can get it from kernel in L407

tqchen · 2018-10-28T16:48:41Z

python/tvm/autotvm/task/dispatcher.py


+    def update(self, target, workload, cfg):
+        """
+        Update context with a specific config.


This is a quite critical design here, and I think it is best if we could provide some motivation(with an example) on why do we need to expose this API)

We can add a Note block to this function

…e; port x86 conv2d_int8 to autotvm

unify int8 conv compute (1x1 and common); remove kernel_size in conv schedule remove kernel_size & num_filter args in conv NCHWc compute fix lint

yzhliu · 2018-10-29T01:42:06Z

add Note for DispatcherContext::update
unify int8 conv compute (1x1 and common)
remove kernel_size in conv schedule
remove kernel_size & num_filter args in conv NCHWc compute

@merrymercy @tqchen Please review again.

It also largely improves int8 1x1 conv by removing 6ac03c8#diff-10fe98ea61ea47beb21a5bf01336db60L195 @anijain2305

merrymercy · 2018-10-29T02:45:07Z

It seems we don't have test cases for NCHWc fp32

yzhliu · 2018-10-29T18:49:00Z

@merrymercy Please check the test cases.

merrymercy · 2018-10-29T21:33:48Z

LGTM. We can merge this @tqchen

tqchen · 2018-10-29T21:37:08Z

Thanks, @merrymercy @yzhliu @kevinthesun ! this is now merged

yzhliu requested review from kevinthesun and merrymercy October 25, 2018 17:36

yzhliu added the status: need review label Oct 25, 2018

kevinthesun reviewed Oct 25, 2018

View reviewed changes

merrymercy requested changes Oct 26, 2018

View reviewed changes

yzhliu added status: need update need update based on feedbacks status: review in progress and removed status: need review labels Oct 26, 2018

merrymercy reviewed Oct 27, 2018

View reviewed changes

merrymercy mentioned this pull request Oct 27, 2018

[TOPI] Add dilation argument to conv2d and depthwise_conv2d #1970

Merged

yzhliu force-pushed the apply_graph_best branch from 0f0422f to dc43eda Compare October 28, 2018 07:02

merrymercy approved these changes Oct 28, 2018

View reviewed changes

tqchen requested changes Oct 28, 2018

View reviewed changes

yzhliu added 6 commits October 28, 2018 18:30

add update interface for DispatchContext

f4676c0

port conv2d_NCHWc x86 compute to autotvm

6a6c27e

port schedule_conv2d_NCHWc to autotvm

e6416ac

change graph_best query & update

c96df6d

remove conv_to_workload

3f91397

find default schedule only in compute; remove non-autotvm get_schedul…

c5021ff

…e; port x86 conv2d_int8 to autotvm

yzhliu force-pushed the apply_graph_best branch from dc43eda to 717b331 Compare October 29, 2018 01:31

add Note for DispatcherContext::update

6ac03c8

unify int8 conv compute (1x1 and common); remove kernel_size in conv schedule remove kernel_size & num_filter args in conv NCHWc compute fix lint

yzhliu force-pushed the apply_graph_best branch from 717b331 to 6ac03c8 Compare October 29, 2018 01:39

tqchen approved these changes Oct 29, 2018

View reviewed changes

add test case for conv2d_NCHWc

6367d0c

tqchen merged commit b9e8826 into apache:master Oct 29, 2018

tqchen added status: accepted and removed status: need update need update based on feedbacks status: review in progress labels Oct 29, 2018

eqy pushed a commit to eqy/tvm that referenced this pull request Oct 29, 2018

Refine porting x86 NCHWc conv to AutoTVM (apache#1993)

d1dbb26

This was referenced Oct 30, 2018

[ARM][Performance]Improve ARM CPU depthwise convolution performance #2028

Closed

[TOPI][AUTOTVM] Improve style #2034

Merged

merrymercy mentioned this pull request Nov 27, 2018

[AutoTVM] x86 fp32 and int8 support #1690

Closed

2 tasks

FrozenGene pushed a commit to FrozenGene/tvm that referenced this pull request Dec 27, 2018

Refine porting x86 NCHWc conv to AutoTVM (apache#1993)

5984fab

wweic pushed a commit to neo-ai/tvm that referenced this pull request Feb 20, 2019

Refine porting x86 NCHWc conv to AutoTVM (apache#1993)

74165ce

wweic pushed a commit to neo-ai/tvm that referenced this pull request Feb 20, 2019

Refine porting x86 NCHWc conv to AutoTVM (apache#1993)

c4ae618



		def conv_arg_to_workload(data, kernel, strides, padding, layout, out_dtype):
		def _conv_arg_to_workload(data, kernel, strides, padding, layout, out_dtype):

		return _conv_arg_to_workload(data, kernel, strides, padding, layout, out_dtype)


		@conv2d_x86.register(["direct"])

		padding, layout, out_layout, out_dtype)


		@conv2d_NCHWc_cpu.register(['direct'])

Refine porting x86 NCHWc conv to AutoTVM #1993

Refine porting x86 NCHWc conv to AutoTVM #1993

Uh oh!

Conversation

yzhliu commented Oct 25, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

merrymercy commented Oct 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

merrymercy Oct 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

merrymercy Oct 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yzhliu commented Oct 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

merrymercy Oct 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

merrymercy Oct 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

merrymercy Oct 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

merrymercy Oct 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

merrymercy left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yzhliu commented Oct 29, 2018

Uh oh!

merrymercy commented Oct 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yzhliu commented Oct 29, 2018

Uh oh!

merrymercy commented Oct 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tqchen commented Oct 29, 2018

Uh oh!

Reviewers

Assignees

merrymercy commented Oct 26, 2018 •

edited

Loading

merrymercy Oct 26, 2018 •

edited

Loading

merrymercy Oct 26, 2018 •

edited

Loading

yzhliu commented Oct 27, 2018 •

edited

Loading

merrymercy Oct 27, 2018 •

edited

Loading

merrymercy Oct 27, 2018 •

edited

Loading

merrymercy Oct 27, 2018 •

edited

Loading

merrymercy Oct 27, 2018 •

edited

Loading

merrymercy left a comment •

edited

Loading

merrymercy commented Oct 29, 2018 •

edited

Loading

merrymercy commented Oct 29, 2018 •

edited

Loading