Conversation
| if (it->second == ScheduleType::ScheduleClosure) | ||
| return false; | ||
| else if (it->second == ScheduleType::ScheduleRoot) { | ||
| return false; |
There was a problem hiding this comment.
Can if and else if clause be merged? #Resolved
There was a problem hiding this comment.
There was a problem hiding this comment.
Usually, if you have
if(...){
...
return xxxx;
} else {
...
return yyyy;
}The 'else' keyword can be removed.
see: https://clang.llvm.org/extra/clang-tidy/checks/readability-else-after-return.html #Resolved
|
|
||
| bool TryGetConstantInput(int input_index, const Tensor** constant_input_value) const; | ||
|
|
||
| bool TryGetConstantInput(const std::string& name, const Tensor** constant_input_value) const; |
There was a problem hiding this comment.
const std::string& name, [](start = 27, length = 24)
why you want to compare with name? we should already have the name to idx mapping in sesseion_state, all the apis in OpKernel is idx based, as compare string will hurt perf. #ByDesign
There was a problem hiding this comment.
This is only called in kernel constructor, so perf impact is not much. The reason to use name is that there are initializers in subgraph that are not listed as the subgraphs' inputs, and using name will make sure they could be found.
In reply to: 294412651 [](ancestors = 294412651)
| // including 1) IR builder passes | ||
| // 2) Weight layout transformer passes | ||
| // 3) Scheduler passses, etc. | ||
|
|
There was a problem hiding this comment.
@KeZhang, many components under this "codegen/common" folder, like registry/profile/dispatcher, seems should be the task-transparent component that could shared with runtime. I feel it is a little weird if we have a "registry" type in runtime codebase but not used in other places. #WontFix
There was a problem hiding this comment.
We had some discussion about this before. For the sake of minimize change to framework, these utils are kept in codegen/common.
In reply to: 294414368 [](ancestors = 294414368)
| public: | ||
| RegistryBase() = default; | ||
|
|
||
| bool Contains(const std::string& name) const { |
There was a problem hiding this comment.
const std::string& name [](start = 16, length = 23)
if we hardcode the key type to string, how could you handle the other cases, like op's domain/version? I understand that for initial implementation, we don't need to consider version/domain, but after you go to production, you will hit this issue soon or later. #ByDesign
There was a problem hiding this comment.
Good suggestion. For this generic class, string could be anything. We can expand the key to include all necessary info.
In reply to: 294431551 [](ancestors = 294431551)
There was a problem hiding this comment.
Currently opset and domain is handled inside the op, see codegen/common/target/op_ir_creator/slice.cc as an example. If not critical let's revisit the design in later PRs.
In reply to: 294436742 [](ancestors = 294436742,294431551)
| int64_t TotalSize(const std::vector<int64_t>& shape); | ||
|
|
||
| template <typename T1, typename T2> | ||
| void DumpArrayRecursive(const T1* data, int64_t& data_offset, const std::vector<T2>& shape, int idx) { |
There was a problem hiding this comment.
DumpArrayRecursive [](start = 5, length = 18)
it looks like more suitable to put in debug folder. #Resolved
There was a problem hiding this comment.
| if (tvm::runtime::TypeMatch(dtype, kDLFloat, 32)) { | ||
| float* data = reinterpret_cast<float*>(static_cast<char*>(X->data) + X->byte_offset); | ||
| DumpArray("float tensor:", data, shape); | ||
| } else if (tvm::runtime::TypeMatch(dtype, kDLInt, 8)) { |
There was a problem hiding this comment.
this print function only works on CPU right? I would prefer to put i under nuphar, to avoid in case people thought it is target in-dependent and abuse it. #ByDesign
There was a problem hiding this comment.
It is for stdio printf support only. FPGA probably won't need it. CUDA, ARM CPU might still have. #Resolved
There was a problem hiding this comment.
This is for any CPU devices, like ARM/X64/PPC. It is target independent, but not for all targets though
In reply to: 294438000 [](ancestors = 294438000)
There was a problem hiding this comment.
please document that this debug tool is for cpu only.
In reply to: 296499584 [](ancestors = 296499584,294438000)
| // Gemm Convolution | ||
| const int64_t* batch_size = tvm::as_const_int(input->shape[0]); | ||
| if (batch_size != nullptr && *batch_size == 1) | ||
| return Conv2D_gemm(input, filter, output_shape, stride, padding); |
There was a problem hiding this comment.
I don't think we should put the conv2d_gemm here. It is efficient for some platform, but for others, like braisnlice, it can be scheduled in this way. You can keep this customized lowering in nuphar and register it during nuphar's graph iteration, but don't put it as generic conversion. #Resolved
There was a problem hiding this comment.
I think there's value to share GEMM based conv2d implementation for ARM/X86?
In reply to: 294443494 [](ancestors = 294443494)
There was a problem hiding this comment.
but if i want to use this mti lib to convert Con2D to tvm, for any non-cpu target, i don't think i expect to get this Gemm based conv expression.
yes it could be shared between several cpu architecture, but I feel we need some clarify on this 'target-independent' mti lib, does this 'target-independent' means could works with different devices, or just different cpus?
In reply to: 297448544 [](ancestors = 297448544,294443494)
There was a problem hiding this comment.
Fixed by removing Conv2D_gemm from Conv2D
In reply to: 298400188 [](ancestors = 298400188,297448544,294443494)
| Status Evaluate( | ||
| const tvm::Tensor& tensor, | ||
| const Node* node, | ||
| CodeGenContext& ctx, |
There was a problem hiding this comment.
CodeGenContext& ctx [](start = 6, length = 19)
as we talked before, today if we use the generic IR building in mti, this CodeGenContext in this signature is useless, I have write another iteration to build the context. I think it will be better if we take a customized BuildContextFunc here. #ByDesign
There was a problem hiding this comment.
What hold BuildContextFunc?
Builder or OpIRCreator?
We tried to make builder and OpIRCreator stateless.
CodeGenContext is passed as a state.
#Resolved
There was a problem hiding this comment.
It is useful to handle domain/opset. See codegen/target/generic/op_ir_creator/tensor/slice.cc.
In reply to: 294445998 [](ancestors = 294445998)
There was a problem hiding this comment.
yes, CodeGenContext is the state, but based on current implementation, different user has different state, CodeGenContext is just a handle. But if i use the generic ir builder in mti, it doesn't give me any chance to deal with the handle.
In reply to: 294552479 [](ancestors = 294552479)
| @@ -0,0 +1,186 @@ | |||
| // Copyright (c) Microsoft Corporation. All rights reserved. | |||
| // Licensed under the MIT License. | |||
There was a problem hiding this comment.
What's the meaning of "target" in this folder structure? it seems contain many tvm ir building stuff, i didn't see the relationship with "target" #Closed
There was a problem hiding this comment.
target means target hardware ISAs, like ARM/x86/CUDA/etc. It is in line with the concept of target in TVM.
In reply to: 294446965 [](ancestors = 294446965)
There was a problem hiding this comment.
right, but i doesn't see any subfolder here for x86/arm/cuda/etc, and the contents in this folder looks like some ir builder / schedule for particualr target, it is very confusing.
In reply to: 297449201 [](ancestors = 297449201,294446965)
There was a problem hiding this comment.
It is just the first PR. :) Later, we will increase them. #Closed
| } | ||
|
|
||
| outputs.push_back(Y); | ||
| return Status::OK(); |
There was a problem hiding this comment.
so those are the impl to iterate onnx graph and convert to tvm ir, right? do we plan let this one to be the standard interface for onnx graph --> tvm? or it is just nuphar's current implementation? if this is only for nuphar, it is better move to nuphar folder. #ByDesign
There was a problem hiding this comment.
Didn't get the question. Is this related to cast? or int8 to bool thing? #ByDesign
There was a problem hiding this comment.
tvm bool and ONNX bool are different types (1-bit vs. 8-bit), so this is general for ONNX to tvm
In reply to: 294448930 [](ancestors = 294448930)
There was a problem hiding this comment.
Hmm... it looks codeflow has bug, it put my question to the wrong places..... :(
I just want to have a standard way to iterate the onnx graph with a visitor pattern. anyway, we could discuss it later.
In reply to: 296499535 [](ancestors = 296499535,294448930)
There was a problem hiding this comment.
There was a problem hiding this comment.
We has a way to iterate onnx graph, but we just don't know whether it is representative enough to be a "standard". That is why we move it to nuphar. If it turns out to be common enough, we can move it back.
|
|
||
| virtual Scheduler* Find(const tvm::Tensor&, | ||
| const Node*, | ||
| tvm_codegen::CodeGenContext&) = 0; |
There was a problem hiding this comment.
so the key to find a schedule pass is ""Tensor + Node + CodeGenContext"? it looks like only suitable for per-op schedule.
There was a problem hiding this comment.
It is not really per-op schedule. Context can hold a graph and corresponding analysis.
Tensor can be used as Key to get previous analyzed result and applied to schedule. #Resolved
There was a problem hiding this comment.
As Li-wen mentioned, context can have more complicated schedule if needed. Please check Nuphar implementations as example.
In reply to: 294452737 [](ancestors = 294452737)
There was a problem hiding this comment.
like mentioned in another comments, if that's the case, why you list tvm::Tensor here, context could cover everything. It makes the interface more confusing.
In reply to: 297449610 [](ancestors = 297449610,294452737)
|
Could you please also remove this line: #Resolved |
| // If a tvm::Tensor satisfies more than one TVM scheduler passes, | ||
| // the first dispatched pass will be applied. | ||
|
|
||
| class TVMScheduleBuilder { |
There was a problem hiding this comment.
class TVMScheduleBuilder [](start = 0, length = 25)
So what's the usage of this ScheduleBuilder? iterate each op in the tvm graph and apply a schedule pass? i feel in many case, the schedule pass we write is not this kind of per-op schedule. #Pending
There was a problem hiding this comment.
It is not really per-op schedule. Context can hold a graph and corresponding analysis.
Tensor can be used as Key to get previous analyzed result and applied to schedule
There was a problem hiding this comment.
I guess your implementation won't use the default scheduler at all. This is more for sharing the logic between ARM/X86/etc
In reply to: 294454687 [](ancestors = 294454687)
There was a problem hiding this comment.
then should we move the IRBuilder and schedule that only works for ARM/x86/etc to a seperate package?
In reply to: 297448953 [](ancestors = 297448953,294454687)
There was a problem hiding this comment.
if that's the case, why you put tvm::Tensor as the key in the signature? I feel the way you mentioned is more like a hacky workaround to use this interface for cross-op schedule, but it will make the interface more confusing.
In reply to: 294551886 [](ancestors = 294551886)
There was a problem hiding this comment.
The design separates analysis from transformation. You apply analysis, record where and what you want to transform, and then apply the transformation. tvm::Tensor is the finest unit in low IR, and Ort::Node is the finest unit in high IR. That is why we use these two as default key. You can use any other key in customized CodeGenContext. After tvm::IR is built, it is just a graph of tvm::Tensor. Op boundary is not that important.
There was a problem hiding this comment.
then should we move the IRBuilder and schedule that only works for ARM/x86/etc to a seperate package?
I had same concern and discussed with Ke a few weeks ago.
|
|
||
| tvm::Expr halideir_log(const tvm::Expr& x_full); | ||
|
|
||
| tvm::Expr fast_log(const tvm::Expr& x); |
There was a problem hiding this comment.
I beleive those are also for CPU only, right? #Resolved
There was a problem hiding this comment.
For non-hw accelerated only. It is a polynomial special function calculation.
If there are hw-accelerated instruction, like FPGA usually building its own lookup table for SF.
Then, it is not applicable. Movable
#Resolved
There was a problem hiding this comment.
|
Done. In reply to: 502802718 [](ancestors = 502802718) |
|
Please bear with the force push after rebase, to solve confliction of this branch vs. master. #Resolved |
|
Thanks Ke, as we synced offline, for other issues we could improve it later. |
Thanks for the review! |
This is a subset of changes in #881. Splitting it to make review easier per reviewers' request.