Skip to content

Conversation

@denis0x0D
Copy link
Contributor

The patch is related to issue:
https://discuss.tvm.ai/t/idea-trace-expression/945

This patch allows to trace Tensor data at the runtime.

Use case example:
1. Write model.
...
tvm::Arraytvm::Expr shape = {n, n, n};
tvm::Tensor C = tvm::compute (shape, [&] (tvm::Var i, tvm::Var j, tvm::Var k) {
return tvm::trace ("A", {i,j, k}, A[i][j][k]) + B[i][j][k];
}
...

2. Run model with specif data.
output is
...
A [1][4][5] = $actual_tensor_value
...

How it works:
tvm::trace adds ir::Call expr with type External and then lowers by llvm codegen part
to external call, so, the runtime support is added. The runtime just processes added function expanding arguments by "var_arg" and writes to file specific data, by default it is a stdout, but log file could be specified by env variable TVM_TRACE, also tracing process could be disable/enable.
Example:
export TVM_TRACE=trace_log_file=path/to/file:process_tracing=1

@denis0x0D denis0x0D force-pushed the sandbox/trace branch 3 times, most recently from eef6670 to 5bedd3e Compare October 24, 2018 13:09
@tqchen tqchen self-assigned this Oct 24, 2018
@tqchen
Copy link
Member

tqchen commented Oct 24, 2018

Thanks for the contribution. We can likely further simplify this by using tvm's runtime system for callback. TVM support generated code to callback any PackedFunc (including ones from python).

Here is a rough example, it is only used to demonstrate how the lowered callback works. Some future wrapping around this mechanism might make the trace API easier to use

import tvm
import numpy as np


@tvm.register_func
def my_debug(x, i):
    print("array=", x.asnumpy())
    return 0


x = tvm.placeholder((4,), name="x", dtype="int32")
xbuffer = tvm.decl_buffer(x.shape, dtype=x.dtype)

y = tvm.compute(x.shape, lambda i: tvm.call_packed("my_debug", xbuffer))
s = tvm.create_schedule(y.op)

print(tvm.lower(s, [x, y], binds={x:xbuffer}, simple_mode=True))

f = tvm.build(s, [xbuffer, y], binds={x:xbuffer})
xnd = tvm.nd.array(np.ones((4,), dtype=x.dtype))
ynd = tvm.nd.array(np.zeros((4,), dtype=y.dtype))

f(xnd, ynd)

In the above case, the call_packed calls my_debug in every iteration passing the array handle(xbuffer). Some further wrapping could make it looks nicer. The good part is, we can use any piece of python code to monitor the content of x
See more at https://docs.tvm.ai/dev/runtime.html

@denis0x0D
Copy link
Contributor Author

Hi @tqchen thanks for the answer.
I've noticed some difference between python and c++ api parts, for example:
I didn't find any option to dump IR after each pass, same as gcc does with "-fdump-tree-all" for GIMPLE IR, or so on, with c++ API, as far as I understood, this feature only available for python API, could you please share your thoughts about c++ api, would it be good to add the same functionality as python has or may be just keep all in shared library and just call needed function from every api for different languages to keep it consistently.
Thanks.

@tqchen
Copy link
Member

tqchen commented Oct 25, 2018

This is a good question :)

The current principle is: if some API is stabilized and needed across languages, we bring it to C++. If something is in flux and not yet critical to the developments, we will keep it in the language bindings(e.g. python), let it plays out and wait until it stablizes.

Admittedly there are tradeoffs, having everything in c++ makes the API available to all languages, but it is also not necessarily flexible. For example, in the case of monitoring, printing things out to a log may not be as flexible as defining a customized callback and let the user do what they want to do.

Having some of the things in python(for example, most of our test cases are) gives some speed up to the development process and allows us to write more test cases easily.

In both cases, we should document it clearly, so it would be really nice to have a tutorial to teach users on all the possible ways to do tracing

@tqchen
Copy link
Member

tqchen commented Oct 25, 2018

As a side note, in this particular case, the my_debug function can be registered in c++ as well, because the runtime system allows interop between languages

@tqchen tqchen added the status: need update need update based on feedbacks label Oct 26, 2018
@denis0x0D
Copy link
Contributor Author

@tqchen thanks for the answer,
but anyway the tracing is require to write own runtime support, what do you think about generic solution written as shared library, with some options which could be applied by environment library. Looks like it would be an easy way to use it for the users.

For example "sanitizers" from llvm/compiler-rt have two parts:

  1. Instrumentation pass.
  2. Runtime support, which could disable/enable some features by environment variable.

The library can be build as unique dso and would require to manually link model with it.

@tqchen
Copy link
Member

tqchen commented Oct 29, 2018

The current features could likely be implemented using packed function calls, to define a set of tracing functions like you do but declare them as packed function. Then optionally we could register these runtime function as a packed function, or registering different variant of it depending on whether tracing is on/off

So to summarize, I agree that tracing would need runtime support, but these runtime feature could likely be built around tvm packed function system

@denis0x0D
Copy link
Contributor Author

@tqchen got it now, thanks,
I'll update the patch.

@xqdan
Copy link
Contributor

xqdan commented Oct 30, 2018

@denis0x0D I'm considering move the dump ir to C++ side.

@denis0x0D
Copy link
Contributor Author

@tqchen the strange thing I've faced with packed function - when I increase the shape of the buffer or so on.
For example:

import tvm
import numpy as np

@tvm.register_func
def my_debug(x):
    return 0

x = tvm.placeholder((1024, 1024), name="x", dtype="int32")
xbuffer = tvm.decl_buffer(x.shape, dtype=x.dtype)
y = tvm.compute(x.shape, lambda i, j: tvm.call_packed("my_debug", xbuffer))
s = tvm.create_schedule(y.op)

f = tvm.build(s, [xbuffer, y], binds={x:xbuffer})
xnd = tvm.nd.array(np.ones((1024, 1024), dtype=x.dtype))
ynd = tvm.nd.array(np.zeros((1024, 1024), dtype=y.dtype))
f(xnd, ynd)

I got Segmentation fault on 3 different platform and spent some time to debug, the segfault happens in hash table which registry the packed function. May be I miss something or may it is a bug, it would be good to get confirmation from someone else. Also I have a backtrace of error and tests case on c++.

@denis0x0D
Copy link
Contributor Author

@tqchen patch is updated, tests added, python api is added alongside with c++ api.

@tqchen
Copy link
Member

tqchen commented Nov 11, 2018

Thanks for the updates, the new set of changes makes use the packed_func runtime and reduces the runtime greatly. I am thinking how to even simplify this further, to consider most user's use case. Most of my reasoning comes from the fact that this is still a new feature and we may not settle down on what API user might prefer, so maybe it is a good idea to start quick mocks in python side, before we commit to heavy engineering which could become a technical debt.

Here are some specific comments:

  • I do feel the current flag parsing and the pre-defined prints are a bit redundant
    • If in most cases users could use the python runtime to debug, would it sufficient to provide a simple python callback function
    • Alternatively, a C++ PackedFunc version of log_to_file callback would be cleaner than the flag parsing + global print logic.
  • For example, would it be sufficient to allow a user to register tvm_trace_callback into the PackedFunc, and only trigger the callback when necessary?

Here is a mock up version of the code.

from tvm.contrib import trace

tvm.register_func("tvm.trace_callback", trace.log_to_file("mytrace.log")

# run the trace code

@denis0x0D
Copy link
Contributor Author

@tqchen thanks for review, I'll update it.

@denis0x0D denis0x0D force-pushed the sandbox/trace branch 3 times, most recently from a96246b to f23b0e3 Compare December 1, 2018 23:36
@denis0x0D
Copy link
Contributor Author

@tqchen updated regarding to latest comments.
Now tvm.trace is a simple wrapper under call_packed function, and has runtime part which calls user callback when it is defined.


   @tvm.register_func("tvm.trace_callback")
   def trace_buffer(x):
   # trace code



   n = 4
   x = tvm.placeholder((n,), name="X", dtype="float32")
   xbuffer = tvm.decl_buffer(x.shape, dtype=x.dtype)
   y = tvm.compute(x.shape, lambda i: tvm.trace("tvm.trace_callback", xbuffer))
   s = tvm.create_schedule(y.op)
   f = tvm.build(s, [xbuffer, y], binds={x:xbuffer})
   xnd = tvm.nd.array(np.ones((n,), dtype=x.dtype))
   ynd = tvm.nd.array(np.zeros((n,), dtype=y.dtype))
   f(xnd, ynd)

Thanks.

@merrymercy merrymercy added status: need review and removed status: need update need update based on feedbacks labels Dec 11, 2018
@tqchen
Copy link
Member

tqchen commented Dec 13, 2018

sorry for the delayed review, I am just recovering from NeurIPS and TVM conference.

  • Is this additional level of redirection necessary in the tvm_trace_runtime? Currently, the feature being implemented seems to be fine to just call call_packed.
  • One of the properties of trace is that we might want to return the same specific value x[i] when we trace the content, which might need some additional work in the compiler when seeing trace, this property seems not being reflected here.

@grwlf please also comment on this.

@tqchen tqchen added the status: need update need update based on feedbacks label Dec 13, 2018
@tqchen
Copy link
Member

tqchen commented Dec 21, 2018

Let us double check the implementation of call_packed_lowered to make sure that the return value of the trace function is lowered correctly

@denis0x0D
Copy link
Contributor Author

@tqchen should I add more tests ?

@tqchen
Copy link
Member

tqchen commented Dec 21, 2018

@denis0x0D never mind, I checked the current code and it should support return value correctly.

@tqchen
Copy link
Member

tqchen commented Dec 21, 2018

Thanks @denis0x0D for the comment, the curent logic appears to be correct. However, it is never-the-less not fast, because everytime we have to look up the callback function by name. Here is a possibly better strategy, which is not that different from current one:

  • Generate a intrinsic function tvm_trace(*args) in the trace frontend
  • In the src/codegen/llvm/codegen_cpu.cc translate tvm_trace intrinsic into logic of call_packed(by using CallPacked, without any return value) and return the corresponding argument correctly.

The advantage of this way is that we reuse the packed function caching mechanism, so there is not repeatitive lookup of packed functions, it does add a special logic to the code generator block to support tracing (mainly for returning the correct value)

Copy link
Member

@tqchen tqchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my latest comment

// Check the global registry for the user defined callback.
auto pf = runtime::Registry::Get(str);
if (pf) {
if (size == 2) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no switch is necessary, you can use pf->CallPacked

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment may no longer apply given my latest comment

@denis0x0D
Copy link
Contributor Author

@tqchen thanks for review, I agree, your strategy is faster, without additional check in global registry.
I'll update the patch.

@denis0x0D denis0x0D force-pushed the sandbox/trace branch 2 times, most recently from 00800b2 to a3d6a34 Compare December 24, 2018 14:19
@denis0x0D
Copy link
Contributor Author

@tqchen updated regarding the latest comment, the tracing value explicitly sets into return value in codegen stage for trace intrinsic.
cc @yzhliu @grwlf

@tqchen
Copy link
Member

tqchen commented Dec 24, 2018

@yzhliu @grwlf Please take another round of look

@sergei-mironov
Copy link
Contributor

Looks good to me. My use-cases all work.

@tqchen
Copy link
Member

tqchen commented Dec 26, 2018

@denis0x0D thanks for the change. The logic can be improved further, here are some parts that can be improved

  • The condition handling is done inside the CallPacked and is a bit obscured because the behavior depends on op's name.
  • The returning of the traced value is done via load from stored argument buffer, while we could have directly done so in intrinsic handling.

Here is one possible way to improve the current logic.

  • When lowering trace, keep the traced value as the last argument.
  • Pass an optional return value as an argument of CallPacked, if that is not null, make sure CallPacked return the corresponding value

@denis0x0D
Copy link
Contributor Author

@tqchen thanks for review, sorry for the late response, I was on a holidays.
I've updated the patch regarding to your comments:

  1. Separated the traced machinery from tvm_packed.
  2. The traced value passes as the last argument when lowering.
  3. Added additional conditional branch for return value in codegen. In case the TVMRetValue is zero we return traced value, otherwise we return the value specified by the user. In this case it's easy for user to redefine the traced value from python code:
@tvm.register_func("call_back")
def call_back(x):
   print (x)
   return user_defined_value

Also tests added for that functional too.
I will be appreciate for any feedback.
Thanks.
cc @grwlf @yzhliu

Copy link
Member

@tqchen tqchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good improvements, some final comments

llvm::Value *CodeGenCPU::CreateCallTracePacked(const Call *op) {
using llvm::BasicBlock;
CHECK_EQ(op->args.size(), 6U);
std::string func_name = op->args[0].as<StringImm>()->value;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please to merge the part of the CallPacked logic with CallPacked

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, we can likely do things like MakeCallPacked(args, tcode, begin, end)

Copy link
Contributor Author

@denis0x0D denis0x0D Jan 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done. Also added some additional arguments as rvalue, r_type and returns basic block, so it's easy to create a phi node in this case.

llvm::Value *cmp;
llvm::Type *rvalue_type = rvalue->getType();
// The cmp instruction depends on processing type.
if (rvalue_type->isFloatTy() || rvalue_type->isDoubleTy() ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not check on the return value, instead check if return tcode == kNull(pyhon code not returning anything)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for mention tcode, I was spending couple of hours to understand how to check return value, and didn't find a better solution to return 0 from call back when we want to process origin value, otherwise return user-defined value, but the true way to do it is by checking tcode.

Add trace call expr to allow trace Tensor data
at the runtime. By default the default handler
is enabled which prints a tracing data to stdout,
otherwise user should specify a call_back as
global_function (aka @tvm.register_func).

The issue is related to:
https://discuss.tvm.ai/t/idea-trace-expression/945
@denis0x0D
Copy link
Contributor Author

@tqchen thanks for review, I've updated the patch regarding to latest comments.

@tqchen tqchen merged commit a12c556 into apache:master Jan 8, 2019
@tqchen tqchen added status: accepted and removed status: need update need update based on feedbacks labels Jan 8, 2019
@tqchen
Copy link
Member

tqchen commented Jan 8, 2019

Thanks @denis0x0D @grwlf @yzhliu !This is now merged !

FrozenGene pushed a commit to FrozenGene/tvm that referenced this pull request Jan 10, 2019
Add trace call expr to allow trace Tensor data
at the runtime. By default the default handler
is enabled which prints a tracing data to stdout,
otherwise user should specify a call_back as
global_function (aka @tvm.register_func).

The issue is related to:
https://discuss.tvm.ai/t/idea-trace-expression/945
wweic pushed a commit to neo-ai/tvm that referenced this pull request Feb 20, 2019
Add trace call expr to allow trace Tensor data
at the runtime. By default the default handler
is enabled which prints a tracing data to stdout,
otherwise user should specify a call_back as
global_function (aka @tvm.register_func).

The issue is related to:
https://discuss.tvm.ai/t/idea-trace-expression/945
wweic pushed a commit to neo-ai/tvm that referenced this pull request Feb 20, 2019
Add trace call expr to allow trace Tensor data
at the runtime. By default the default handler
is enabled which prints a tracing data to stdout,
otherwise user should specify a call_back as
global_function (aka @tvm.register_func).

The issue is related to:
https://discuss.tvm.ai/t/idea-trace-expression/945
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants