Skip to content

Conversation

@yzhliu
Copy link
Member

@yzhliu yzhliu commented Oct 25, 2018

  • Add update to dispatchers so that alter_op_layout can provide config for updated operators.
  • Register NCHWc's compute, schedule and args_to_workload to autotvm

The whole process becomes,

  • Tuning: tune with origin workload, but users can choose to use NCHWc compute & schedule.
  • AlterLayout: extract config using origin workload, replace conv op with NCHWc implement, update dispatcher with NCHWc workload and config
  • Compute & schedule: normal way as in autotvm. NCHWc workloads will exist since AlterLayout updated that.

Reviews please review @kevinthesun @merrymercy

The specific configuration.
"""
key = (str(target), workload)
if self._counter < len(self._records) and (key not in self._global_cfg_dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we might just want "if self._counter < len(self._records)". Although it's not likely to happen, If somehow key is already in self._global_cfg_dict, we still want to use the cfg from the records and update self._global_cfg_dict.

new_attrs['out_layout'] = 'NCHW%dc' % oc_bn

# Store global schedule dictionary for ApplyGraphBest dispatcher
workload = _conv_arg_to_workload(data, kernel, strides, padding, layout, out_dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change means that we also need to update existing pre-searched x86 conv2d schedules.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I update it in ApplyGraphBest query instead.

@merrymercy
Copy link
Member

merrymercy commented Oct 26, 2018

I like this design.
Autotvm uses arguments as workload directly, this can automate a lot of things.

Previously for altered operator, we will convert the altered workload to the original workload and query the log. This results in strange logic in _conv_arg_to_workload and contradicts the goal of automating config dispatching.
After this PR, we can make sure all records in ApplyHistoryBest context exactly match the arguments in function calls, so we can delete all things like _conv_arg_to_workload, _conv2d_NCHWc_arg_to_workload
Then we can use autotvm.register_topi_compute, autotvm.register_topi_schedule. These two decorators will help you to do all things related to workload/config dispatching. I have used these two decorators for depthwise_conv2d since depthwise_conv2d doesn' need alter_op_layout.

Using the update mechanism in this PR, we can make the code simpler. I can follow with updates for other backends (cuda, arm cpu)

# get config here
cfg = get_config()
_create_schedule_template(cfg, data, kernel, strides, padding, layout)
_create_schedule_template(cfg, data, kernel, strides, padding, origin_layout)
Copy link
Member

@merrymercy merrymercy Oct 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to _create_tuning_space or _create_template_space ?
To clarify some terminologies.

template = the code in topi
schedule = template + a specific config

so template is a function(config) -> schedule



def conv_arg_to_workload(data, kernel, strides, padding, layout, out_dtype):
def _conv_arg_to_workload(data, kernel, strides, padding, layout, out_dtype):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this change, we don't need this function anymore

return _conv_arg_to_workload(data, kernel, strides, padding, layout, out_dtype)


@conv2d_x86.register(["direct"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use @autotvm.topi_register_compute(conv2d, 'cpu', 'direct') and delete the function conv2d_x86

conv_arg_to_workload(data, kernel, strides,
padding, layout,
out_dtype)})
_conv_arg_to_workload(data, kernel, strides,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use @autotvm.topi_register_compute, we don't need this. That decorator will help us to attach the workload

padding, layout, out_layout, out_dtype)


@conv2d_NCHWc_cpu.register(['direct'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use autotvm.register_topi_compute

workload = conv_NCHWc_arg_to_workload(data, kernel, kernel_size,
strides, padding, layout,
out_layout, out_dtype),
workload = _conv_NCHWc_arg_to_workload(data, kernel,
Copy link
Member

@merrymercy merrymercy Oct 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after using autotvm.register_topi_compute, we don't need to attach workload manually

@yzhliu
Copy link
Member Author

yzhliu commented Oct 27, 2018

@merrymercy Thanks, please review again.
Note that I add function as an argument to args_to_workload so that alter_layout_op and topi_integration can invoke one function and have unified behavior - let me know if there's a better way: 45559ce#diff-73c074e7bdae52b76a83efdc54a2fdb6R185

args_conv_NCHWc = [data_plc, kernel_plc, num_filter,
kernel_size, strides, padding, layout, dtype]
args_conv_NCHWc = autotvm.task.nnvm_integration.serialize_args(args_conv_NCHWc)
task = autotvm.task.create("topi_x86_conv2d_NCHWc", args=args_conv_NCHWc, target=target)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add argument template='direct' to autotvm.task.create

reg_n = n
break

sch = _get_default_schedule(wkl, simd_width)
Copy link
Member

@merrymercy merrymercy Oct 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete function _get_default_schedule and AVXConvCommonFwd

return ConfigEntity.from_json_dict(cfg_dict)

raise ValueError("cannot decide default schedule for workload: {}".format(wkl))
sch = _get_default_schedule(wkl, simd_width)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete function _get_default_schedule and AVXConv1x1Fwd

# get config here
cfg = get_config()
_create_schedule_template(cfg, data, kernel, strides, padding, layout)
cfg.template_key = "direct"
Copy link
Member

@merrymercy merrymercy Oct 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete this. Users will specify a tempalte key in tune_nnvm_x86.py

current_cfg = _query_dispatcher(workload)
assert current_cfg is not None
if current_cfg.is_fallback:
wkl = Workload(data.dtype, conv_out.dtype, h, w, ic, num_filter,
Copy link
Member

@merrymercy merrymercy Oct 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case should never happend. We can handle fallback only in topi_compute. (see explanation below.)

if cfg.is_fallback:
wkl = Workload(data.dtype, out_dtype, in_height, in_width, in_channel, num_filter,
kernel_height, kernel_width, HPAD, WPAD, HSTR, WSTR)
cfg = _get_default_config(wkl)
Copy link
Member

@merrymercy merrymercy Oct 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of returning a new config, we can directly modify the values in fallback schedule. e.g. cfg['tile_ic'] = 12.
The same modified config will also be passed to topi_schedule, so we can to handle fallback only in topi_compute.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add __setitem__ to FallbackConfigEntity, if it looks good.

Copy link
Member

@merrymercy merrymercy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, we did a good clean up for x86 backend. One final comment.

kh, kw = kernel_size if isinstance(kernel_size, (tuple, list)) else \
(kernel_size, kernel_size)
kh, kw = kernel_size if isinstance(kernel_size, (tuple, list)) \
else (kernel_size, kernel_size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need kernel_size in the argument list. We can get it from kernel in L407


def update(self, target, workload, cfg):
"""
Update context with a specific config.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a quite critical design here, and I think it is best if we could provide some motivation(with an example) on why do we need to expose this API)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a Note block to this function

unify int8 conv compute (1x1 and common); remove kernel_size in conv schedule

remove kernel_size & num_filter args in conv NCHWc compute

fix lint
@yzhliu
Copy link
Member Author

yzhliu commented Oct 29, 2018

  • add Note for DispatcherContext::update
  • unify int8 conv compute (1x1 and common)
  • remove kernel_size in conv schedule
  • remove kernel_size & num_filter args in conv NCHWc compute

@merrymercy @tqchen Please review again.

It also largely improves int8 1x1 conv by removing 6ac03c8#diff-10fe98ea61ea47beb21a5bf01336db60L195 @anijain2305

@merrymercy
Copy link
Member

merrymercy commented Oct 29, 2018

It seems we don't have test cases for NCHWc fp32

@yzhliu
Copy link
Member Author

yzhliu commented Oct 29, 2018

@merrymercy Please check the test cases.

@merrymercy
Copy link
Member

merrymercy commented Oct 29, 2018

LGTM. We can merge this @tqchen

@tqchen tqchen merged commit b9e8826 into apache:master Oct 29, 2018
@tqchen
Copy link
Member

tqchen commented Oct 29, 2018

Thanks, @merrymercy @yzhliu @kevinthesun ! this is now merged

eqy pushed a commit to eqy/tvm that referenced this pull request Oct 29, 2018
FrozenGene pushed a commit to FrozenGene/tvm that referenced this pull request Dec 27, 2018
wweic pushed a commit to neo-ai/tvm that referenced this pull request Feb 20, 2019
wweic pushed a commit to neo-ai/tvm that referenced this pull request Feb 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants