Implementation of TVM codegen library by ke1337 · Pull Request #888 · microsoft/onnxruntime

ke1337 · 2019-04-23T21:40:49Z

This is a subset of changes in #881. Splitting it to make review easier per reviewers' request.

weixingzhang · 2019-05-29T21:25:27Z

onnxruntime/core/codegen/target/generic/scheduler/schedule_utils.cc

+    if (it->second == ScheduleType::ScheduleClosure)
+      return false;
+    else if (it->second == ScheduleType::ScheduleRoot) {
+      return false;


Can if and else if clause be merged? #Resolved

Will do

In reply to: 288775459 [](ancestors = 288775459)

Usually, if you have

if(...){ ... return xxxx; } else { ... return yyyy; }

The 'else' keyword can be removed.

see: https://clang.llvm.org/extra/clang-tidy/checks/readability-else-after-return.html #Resolved

souptc · 2019-06-17T17:43:47Z

include/onnxruntime/core/framework/op_kernel_info.h


  bool TryGetConstantInput(int input_index, const Tensor** constant_input_value) const;

+  bool TryGetConstantInput(const std::string& name, const Tensor** constant_input_value) const;


const std::string& name, [](start = 27, length = 24)

why you want to compare with name? we should already have the name to idx mapping in sesseion_state, all the apis in OpKernel is idx based, as compare string will hurt perf. #ByDesign

This is only called in kernel constructor, so perf impact is not much. The reason to use name is that there are initializers in subgraph that are not listed as the subgraphs' inputs, and using name will make sure they could be found.

In reply to: 294412651 [](ancestors = 294412651)

souptc · 2019-06-17T17:48:12Z

onnxruntime/core/codegen/common/registry.h

+// including 1) IR builder passes
+//           2) Weight layout transformer passes
+//           3) Scheduler passses, etc.
+


@KeZhang, many components under this "codegen/common" folder, like registry/profile/dispatcher, seems should be the task-transparent component that could shared with runtime. I feel it is a little weird if we have a "registry" type in runtime codebase but not used in other places. #WontFix

We had some discussion about this before. For the sake of minimize change to framework, these utils are kept in codegen/common.

In reply to: 294414368 [](ancestors = 294414368)

souptc · 2019-06-17T18:06:45Z

onnxruntime/core/codegen/common/registry.h

+ public:
+  RegistryBase() = default;
+
+  bool Contains(const std::string& name) const {


const std::string& name [](start = 16, length = 23)

if we hardcode the key type to string, how could you handle the other cases, like op's domain/version? I understand that for initial implementation, we don't need to consider version/domain, but after you go to production, you will hit this issue soon or later. #ByDesign

Good suggestion. For this generic class, string could be anything. We can expand the key to include all necessary info.

In reply to: 294431551 [](ancestors = 294431551)

Currently opset and domain is handled inside the op, see codegen/common/target/op_ir_creator/slice.cc as an example. If not critical let's revisit the design in later PRs.

In reply to: 294436742 [](ancestors = 294436742,294431551)

souptc · 2019-06-17T18:11:53Z

onnxruntime/core/codegen/common/utils.h

+int64_t TotalSize(const std::vector<int64_t>& shape);
+
+template <typename T1, typename T2>
+void DumpArrayRecursive(const T1* data, int64_t& data_offset, const std::vector<T2>& shape, int idx) {


DumpArrayRecursive [](start = 5, length = 18)

it looks like more suitable to put in debug folder. #Resolved

putting it in dump_array.h

In reply to: 294436241 [](ancestors = 294436241)

souptc · 2019-06-17T18:13:50Z

onnxruntime/core/codegen/mti/debug/tvm_print.cc

+      if (tvm::runtime::TypeMatch(dtype, kDLFloat, 32)) {
+        float* data = reinterpret_cast<float*>(static_cast<char*>(X->data) + X->byte_offset);
+        DumpArray("float tensor:", data, shape);
+      } else if (tvm::runtime::TypeMatch(dtype, kDLInt, 8)) {


this print function only works on CPU right? I would prefer to put i under nuphar, to avoid in case people thought it is target in-dependent and abuse it. #ByDesign

It is for stdio printf support only. FPGA probably won't need it. CUDA, ARM CPU might still have. #Resolved

This is for any CPU devices, like ARM/X64/PPC. It is target independent, but not for all targets though

In reply to: 294438000 [](ancestors = 294438000)

please document that this debug tool is for cpu only.

In reply to: 296499584 [](ancestors = 296499584,294438000)

souptc · 2019-06-17T18:23:01Z

onnxruntime/core/codegen/mti/nn/conv_ops.cc

+  // Gemm Convolution
+  const int64_t* batch_size = tvm::as_const_int(input->shape[0]);
+  if (batch_size != nullptr && *batch_size == 1)
+    return Conv2D_gemm(input, filter, output_shape, stride, padding);


I don't think we should put the conv2d_gemm here. It is efficient for some platform, but for others, like braisnlice, it can be scheduled in this way. You can keep this customized lowering in nuphar and register it during nuphar's graph iteration, but don't put it as generic conversion. #Resolved

I think there's value to share GEMM based conv2d implementation for ARM/X86?

In reply to: 294443494 [](ancestors = 294443494)

but if i want to use this mti lib to convert Con2D to tvm, for any non-cpu target, i don't think i expect to get this Gemm based conv expression.
yes it could be shared between several cpu architecture, but I feel we need some clarify on this 'target-independent' mti lib, does this 'target-independent' means could works with different devices, or just different cpus?

In reply to: 297448544 [](ancestors = 297448544,294443494)

Fixed by removing Conv2D_gemm from Conv2D

In reply to: 298400188 [](ancestors = 298400188,297448544,294443494)

souptc · 2019-06-17T18:29:20Z

onnxruntime/core/codegen/target/tvm_schedule_builder.h

+  Status Evaluate(
+      const tvm::Tensor& tensor,
+      const Node* node,
+      CodeGenContext& ctx,


CodeGenContext& ctx [](start = 6, length = 19)

as we talked before, today if we use the generic IR building in mti, this CodeGenContext in this signature is useless, I have write another iteration to build the context. I think it will be better if we take a customized BuildContextFunc here. #ByDesign

What hold BuildContextFunc?
Builder or OpIRCreator?
We tried to make builder and OpIRCreator stateless.
CodeGenContext is passed as a state.

#Resolved

It is useful to handle domain/opset. See codegen/target/generic/op_ir_creator/tensor/slice.cc.

In reply to: 294445998 [](ancestors = 294445998)

yes, CodeGenContext is the state, but based on current implementation, different user has different state, CodeGenContext is just a handle. But if i use the generic ir builder in mti, it doesn't give me any chance to deal with the handle.

In reply to: 294552479 [](ancestors = 294552479)

souptc · 2019-06-17T18:31:48Z

onnxruntime/core/codegen/target/ort_tvm_utils.cc

@@ -0,0 +1,186 @@
+// Copyright (c) Microsoft Corporation. All rights reserved.
+// Licensed under the MIT License.


What's the meaning of "target" in this folder structure? it seems contain many tvm ir building stuff, i didn't see the relationship with "target" #Closed

target means target hardware ISAs, like ARM/x86/CUDA/etc. It is in line with the concept of target in TVM.

In reply to: 294446965 [](ancestors = 294446965)

right, but i doesn't see any subfolder here for x86/arm/cuda/etc, and the contents in this folder looks like some ir builder / schedule for particualr target, it is very confusing.

In reply to: 297449201 [](ancestors = 297449201,294446965)

It is just the first PR. :) Later, we will increase them. #Closed

souptc · 2019-06-17T18:35:38Z

onnxruntime/core/codegen/target/generic/op_ir_creator/tensor/cast.cc

+  }
+
+  outputs.push_back(Y);
+  return Status::OK();


so those are the impl to iterate onnx graph and convert to tvm ir, right? do we plan let this one to be the standard interface for onnx graph --> tvm? or it is just nuphar's current implementation? if this is only for nuphar, it is better move to nuphar folder. #ByDesign

Didn't get the question. Is this related to cast? or int8 to bool thing? #ByDesign

tvm bool and ONNX bool are different types (1-bit vs. 8-bit), so this is general for ONNX to tvm

In reply to: 294448930 [](ancestors = 294448930)

Hmm... it looks codeflow has bug, it put my question to the wrong places..... :(
I just want to have a standard way to iterate the onnx graph with a visitor pattern. anyway, we could discuss it later.

In reply to: 296499535 [](ancestors = 296499535,294448930)

forget about it, codeflow bug, i guess

In reply to: 294552124 [](ancestors = 294552124)

We has a way to iterate onnx graph, but we just don't know whether it is representative enough to be a "standard". That is why we move it to nuphar. If it turns out to be common enough, we can move it back.

souptc · 2019-06-17T18:40:03Z

onnxruntime/core/codegen/target/tvm_scheduler.h

+
+  virtual Scheduler* Find(const tvm::Tensor&,
+                          const Node*,
+                          tvm_codegen::CodeGenContext&) = 0;


so the key to find a schedule pass is ""Tensor + Node + CodeGenContext"? it looks like only suitable for per-op schedule.

It is not really per-op schedule. Context can hold a graph and corresponding analysis.
Tensor can be used as Key to get previous analyzed result and applied to schedule. #Resolved

As Li-wen mentioned, context can have more complicated schedule if needed. Please check Nuphar implementations as example.

In reply to: 294452737 [](ancestors = 294452737)

like mentioned in another comments, if that's the case, why you list tvm::Tensor here, context could cover everything. It makes the interface more confusing.

In reply to: 297449610 [](ancestors = 297449610,294452737)

snnn · 2019-06-17T18:40:23Z

Could you please also remove this line:
https://github.com/microsoft/onnxruntime/blob/master/cmake/CMakeLists.txt#L426

#Resolved

souptc · 2019-06-17T18:42:07Z

onnxruntime/core/codegen/target/tvm_schedule_builder.h

+// If a tvm::Tensor satisfies more than one TVM scheduler passes,
+// the first dispatched pass will be applied.
+
+class TVMScheduleBuilder {


class TVMScheduleBuilder [](start = 0, length = 25)

So what's the usage of this ScheduleBuilder? iterate each op in the tvm graph and apply a schedule pass? i feel in many case, the schedule pass we write is not this kind of per-op schedule. #Pending

It is not really per-op schedule. Context can hold a graph and corresponding analysis.
Tensor can be used as Key to get previous analyzed result and applied to schedule

I guess your implementation won't use the default scheduler at all. This is more for sharing the logic between ARM/X86/etc

In reply to: 294454687 [](ancestors = 294454687)

then should we move the IRBuilder and schedule that only works for ARM/x86/etc to a seperate package?

In reply to: 297448953 [](ancestors = 297448953,294454687)

if that's the case, why you put tvm::Tensor as the key in the signature? I feel the way you mentioned is more like a hacky workaround to use this interface for cross-op schedule, but it will make the interface more confusing.

In reply to: 294551886 [](ancestors = 294551886)

The design separates analysis from transformation. You apply analysis, record where and what you want to transform, and then apply the transformation. tvm::Tensor is the finest unit in low IR, and Ort::Node is the finest unit in high IR. That is why we use these two as default key. You can use any other key in customized CodeGenContext. After tvm::IR is built, it is just a graph of tvm::Tensor. Op boundary is not that important.

then should we move the IRBuilder and schedule that only works for ARM/x86/etc to a seperate package?

I had same concern and discussed with Ke a few weeks ago.

souptc · 2019-06-17T21:26:11Z

onnxruntime/core/codegen/mti/math/halide_ops.h

+
+tvm::Expr halideir_log(const tvm::Expr& x_full);
+
+tvm::Expr fast_log(const tvm::Expr& x);


I beleive those are also for CPU only, right? #Resolved

For non-hw accelerated only. It is a polynomial special function calculation.
If there are hw-accelerated instruction, like FPGA usually building its own lookup table for SF.
Then, it is not applicable. Movable
#Resolved

Moved out

In reply to: 294524957 [](ancestors = 294524957)

ke1337 · 2019-06-26T01:03:03Z

Done.

In reply to: 502802718 [](ancestors = 502802718)

ke1337 · 2019-06-26T02:44:13Z

Please bear with the force push after rebase, to solve confliction of this branch vs. master. #Resolved

souptc

souptc · 2019-07-03T07:08:13Z

Thanks Ke, as we synced offline, for other issues we could improve it later.

ke1337 · 2019-07-03T17:17:12Z

Thanks Ke, as we synced offline, for other issues we could improve it later.

Thanks for the review!

ke1337 requested a review from a team as a code owner April 23, 2019 21:40

ke1337 mentioned this pull request Apr 24, 2019

Implementation of Nuphar execution provider #881

Merged

ke1337 force-pushed the gt/Nuphar_non_EP branch from 451355c to 306d137 Compare June 13, 2019 19:29

weixingzhang reviewed Jun 17, 2019

View reviewed changes

souptc reviewed Jun 17, 2019

View reviewed changes

ke1337 force-pushed the gt/Nuphar_non_EP branch from 306d137 to c50ab1f Compare June 26, 2019 02:37

Implementation of TVM codegen library

026ac75

ke1337 force-pushed the gt/Nuphar_non_EP branch from c50ab1f to 026ac75 Compare June 26, 2019 03:58

Address CR

68d2443

souptc approved these changes Jul 3, 2019

View reviewed changes

ke1337 merged commit 0d204f3 into master Jul 3, 2019


		bool TryGetConstantInput(int input_index, const Tensor** constant_input_value) const;

		bool TryGetConstantInput(const std::string& name, const Tensor** constant_input_value) const;

		@@ -0,0 +1,186 @@
		// Copyright (c) Microsoft Corporation. All rights reserved.
		// Licensed under the MIT License.


		tvm::Expr halideir_log(const tvm::Expr& x_full);

		tvm::Expr fast_log(const tvm::Expr& x);

Conversation

ke1337 commented Apr 23, 2019

Uh oh!

weixingzhang May 29, 2019 • edited by ke1337 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

snnn Jun 17, 2019 • edited by ke1337 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

souptc Jun 17, 2019 • edited by ke1337 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ke1337 Jun 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

souptc Jun 17, 2019 • edited by ke1337 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

souptc Jun 17, 2019 • edited by ke1337 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

souptc Jun 17, 2019 • edited by ke1337 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ke1337 Jun 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

souptc Jun 17, 2019 • edited by ke1337 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liwchang Jun 17, 2019 • edited by ke1337 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

souptc Jun 17, 2019 • edited by ke1337 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

souptc Jun 17, 2019 • edited by ke1337 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liwchang Jun 17, 2019 • edited by ke1337 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

souptc Jun 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

weixingzhang May 29, 2019 •

edited by ke1337

Loading

snnn Jun 17, 2019 •

edited by ke1337

Loading

souptc Jun 17, 2019 •

edited by ke1337

Loading

ke1337 Jun 17, 2019 •

edited

Loading

souptc Jun 17, 2019 •

edited by ke1337

Loading

souptc Jun 17, 2019 •

edited by ke1337

Loading

souptc Jun 17, 2019 •

edited by ke1337

Loading

ke1337 Jun 26, 2019 •

edited

Loading

souptc Jun 17, 2019 •

edited by ke1337

Loading

liwchang Jun 17, 2019 •

edited by ke1337

Loading

souptc Jun 17, 2019 •

edited by ke1337

Loading

souptc Jun 17, 2019 •

edited by ke1337

Loading

liwchang Jun 17, 2019 •

edited by ke1337

Loading

souptc Jun 17, 2019 •

edited

Loading

liwchang Jun 27, 2019 •

edited by souptc

Loading

souptc Jun 17, 2019 •

edited by ke1337

Loading

liwchang Jun 17, 2019 •

edited by ke1337

Loading

souptc Jun 17, 2019 •

edited

Loading

liwchang Jun 17, 2019 •

edited by ke1337

Loading

snnn commented Jun 17, 2019 •

edited by ke1337

Loading

souptc Jun 17, 2019 •

edited by ke1337

Loading