-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Refine porting x86 NCHWc conv to AutoTVM #1993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| The specific configuration. | ||
| """ | ||
| key = (str(target), workload) | ||
| if self._counter < len(self._records) and (key not in self._global_cfg_dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we might just want "if self._counter < len(self._records)". Although it's not likely to happen, If somehow key is already in self._global_cfg_dict, we still want to use the cfg from the records and update self._global_cfg_dict.
topi/python/topi/x86/conv2d.py
Outdated
| new_attrs['out_layout'] = 'NCHW%dc' % oc_bn | ||
|
|
||
| # Store global schedule dictionary for ApplyGraphBest dispatcher | ||
| workload = _conv_arg_to_workload(data, kernel, strides, padding, layout, out_dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change means that we also need to update existing pre-searched x86 conv2d schedules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I update it in ApplyGraphBest query instead.
|
I like this design. Previously for altered operator, we will convert the altered workload to the original workload and query the log. This results in strange logic in Using the update mechanism in this PR, we can make the code simpler. I can follow with updates for other backends (cuda, arm cpu) |
topi/python/topi/x86/conv2d.py
Outdated
| # get config here | ||
| cfg = get_config() | ||
| _create_schedule_template(cfg, data, kernel, strides, padding, layout) | ||
| _create_schedule_template(cfg, data, kernel, strides, padding, origin_layout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to _create_tuning_space or _create_template_space ?
To clarify some terminologies.
template = the code in topi
schedule = template + a specific config
so template is a function(config) -> schedule
topi/python/topi/x86/conv2d.py
Outdated
|
|
||
|
|
||
| def conv_arg_to_workload(data, kernel, strides, padding, layout, out_dtype): | ||
| def _conv_arg_to_workload(data, kernel, strides, padding, layout, out_dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this change, we don't need this function anymore
topi/python/topi/x86/conv2d.py
Outdated
| return _conv_arg_to_workload(data, kernel, strides, padding, layout, out_dtype) | ||
|
|
||
|
|
||
| @conv2d_x86.register(["direct"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use @autotvm.topi_register_compute(conv2d, 'cpu', 'direct') and delete the function conv2d_x86
topi/python/topi/x86/conv2d.py
Outdated
| conv_arg_to_workload(data, kernel, strides, | ||
| padding, layout, | ||
| out_dtype)}) | ||
| _conv_arg_to_workload(data, kernel, strides, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use @autotvm.topi_register_compute, we don't need this. That decorator will help us to attach the workload
topi/python/topi/x86/conv2d.py
Outdated
| padding, layout, out_layout, out_dtype) | ||
|
|
||
|
|
||
| @conv2d_NCHWc_cpu.register(['direct']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use autotvm.register_topi_compute
topi/python/topi/x86/conv2d.py
Outdated
| workload = conv_NCHWc_arg_to_workload(data, kernel, kernel_size, | ||
| strides, padding, layout, | ||
| out_layout, out_dtype), | ||
| workload = _conv_NCHWc_arg_to_workload(data, kernel, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after using autotvm.register_topi_compute, we don't need to attach workload manually
|
@merrymercy Thanks, please review again. |
tutorials/autotvm/tune_nnvm_x86.py
Outdated
| args_conv_NCHWc = [data_plc, kernel_plc, num_filter, | ||
| kernel_size, strides, padding, layout, dtype] | ||
| args_conv_NCHWc = autotvm.task.nnvm_integration.serialize_args(args_conv_NCHWc) | ||
| task = autotvm.task.create("topi_x86_conv2d_NCHWc", args=args_conv_NCHWc, target=target) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add argument template='direct' to autotvm.task.create
| reg_n = n | ||
| break | ||
|
|
||
| sch = _get_default_schedule(wkl, simd_width) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete function _get_default_schedule and AVXConvCommonFwd
| return ConfigEntity.from_json_dict(cfg_dict) | ||
|
|
||
| raise ValueError("cannot decide default schedule for workload: {}".format(wkl)) | ||
| sch = _get_default_schedule(wkl, simd_width) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete function _get_default_schedule and AVXConv1x1Fwd
topi/python/topi/x86/conv2d.py
Outdated
| # get config here | ||
| cfg = get_config() | ||
| _create_schedule_template(cfg, data, kernel, strides, padding, layout) | ||
| cfg.template_key = "direct" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete this. Users will specify a tempalte key in tune_nnvm_x86.py
topi/python/topi/x86/conv2d.py
Outdated
| current_cfg = _query_dispatcher(workload) | ||
| assert current_cfg is not None | ||
| if current_cfg.is_fallback: | ||
| wkl = Workload(data.dtype, conv_out.dtype, h, w, ic, num_filter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case should never happend. We can handle fallback only in topi_compute. (see explanation below.)
topi/python/topi/x86/conv2d.py
Outdated
| if cfg.is_fallback: | ||
| wkl = Workload(data.dtype, out_dtype, in_height, in_width, in_channel, num_filter, | ||
| kernel_height, kernel_width, HPAD, WPAD, HSTR, WSTR) | ||
| cfg = _get_default_config(wkl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of returning a new config, we can directly modify the values in fallback schedule. e.g. cfg['tile_ic'] = 12.
The same modified config will also be passed to topi_schedule, so we can to handle fallback only in topi_compute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add __setitem__ to FallbackConfigEntity, if it looks good.
0f0422f to
dc43eda
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, we did a good clean up for x86 backend. One final comment.
topi/python/topi/x86/conv2d.py
Outdated
| kh, kw = kernel_size if isinstance(kernel_size, (tuple, list)) else \ | ||
| (kernel_size, kernel_size) | ||
| kh, kw = kernel_size if isinstance(kernel_size, (tuple, list)) \ | ||
| else (kernel_size, kernel_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need kernel_size in the argument list. We can get it from kernel in L407
|
|
||
| def update(self, target, workload, cfg): | ||
| """ | ||
| Update context with a specific config. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a quite critical design here, and I think it is best if we could provide some motivation(with an example) on why do we need to expose this API)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add a Note block to this function
…e; port x86 conv2d_int8 to autotvm
dc43eda to
717b331
Compare
unify int8 conv compute (1x1 and common); remove kernel_size in conv schedule remove kernel_size & num_filter args in conv NCHWc compute fix lint
717b331 to
6ac03c8
Compare
@merrymercy @tqchen Please review again. It also largely improves int8 1x1 conv by removing 6ac03c8#diff-10fe98ea61ea47beb21a5bf01336db60L195 @anijain2305 |
|
It seems we don't have test cases for NCHWc fp32 |
|
@merrymercy Please check the test cases. |
|
LGTM. We can merge this @tqchen |
|
Thanks, @merrymercy @yzhliu @kevinthesun ! this is now merged |
updateto dispatchers so thatalter_op_layoutcan provide config for updated operators.compute,scheduleandargs_to_workloadto autotvmThe whole process becomes,
AlterLayoutupdated that.Reviews please review @kevinthesun @merrymercy