[RUNTIME][TRACE] Support trace primitive. #1973

denis0x0D · 2018-10-24T11:31:41Z

The patch is related to issue:
https://discuss.tvm.ai/t/idea-trace-expression/945

This patch allows to trace Tensor data at the runtime.

Use case example:
1. Write model.
...
tvm::Arraytvm::Expr shape = {n, n, n};
tvm::Tensor C = tvm::compute (shape, [&] (tvm::Var i, tvm::Var j, tvm::Var k) {
return tvm::trace ("A", {i,j, k}, A[i][j][k]) + B[i][j][k];
}
...

2. Run model with specif data.
output is
...
A [1][4][5] = $actual_tensor_value
...

How it works:
tvm::trace adds ir::Call expr with type External and then lowers by llvm codegen part
to external call, so, the runtime support is added. The runtime just processes added function expanding arguments by "var_arg" and writes to file specific data, by default it is a stdout, but log file could be specified by env variable TVM_TRACE, also tracing process could be disable/enable.
Example:
export TVM_TRACE=trace_log_file=path/to/file:process_tracing=1

tqchen · 2018-10-24T16:29:16Z

Thanks for the contribution. We can likely further simplify this by using tvm's runtime system for callback. TVM support generated code to callback any PackedFunc (including ones from python).

Here is a rough example, it is only used to demonstrate how the lowered callback works. Some future wrapping around this mechanism might make the trace API easier to use

import tvm
import numpy as np


@tvm.register_func
def my_debug(x, i):
    print("array=", x.asnumpy())
    return 0


x = tvm.placeholder((4,), name="x", dtype="int32")
xbuffer = tvm.decl_buffer(x.shape, dtype=x.dtype)

y = tvm.compute(x.shape, lambda i: tvm.call_packed("my_debug", xbuffer))
s = tvm.create_schedule(y.op)

print(tvm.lower(s, [x, y], binds={x:xbuffer}, simple_mode=True))

f = tvm.build(s, [xbuffer, y], binds={x:xbuffer})
xnd = tvm.nd.array(np.ones((4,), dtype=x.dtype))
ynd = tvm.nd.array(np.zeros((4,), dtype=y.dtype))

f(xnd, ynd)

In the above case, the call_packed calls my_debug in every iteration passing the array handle(xbuffer). Some further wrapping could make it looks nicer. The good part is, we can use any piece of python code to monitor the content of x
See more at https://docs.tvm.ai/dev/runtime.html

denis0x0D · 2018-10-25T12:25:39Z

Hi @tqchen thanks for the answer.
I've noticed some difference between python and c++ api parts, for example:
I didn't find any option to dump IR after each pass, same as gcc does with "-fdump-tree-all" for GIMPLE IR, or so on, with c++ API, as far as I understood, this feature only available for python API, could you please share your thoughts about c++ api, would it be good to add the same functionality as python has or may be just keep all in shared library and just call needed function from every api for different languages to keep it consistently.
Thanks.

tqchen · 2018-10-25T16:37:17Z

This is a good question :)

The current principle is: if some API is stabilized and needed across languages, we bring it to C++. If something is in flux and not yet critical to the developments, we will keep it in the language bindings(e.g. python), let it plays out and wait until it stablizes.

Admittedly there are tradeoffs, having everything in c++ makes the API available to all languages, but it is also not necessarily flexible. For example, in the case of monitoring, printing things out to a log may not be as flexible as defining a customized callback and let the user do what they want to do.

Having some of the things in python(for example, most of our test cases are) gives some speed up to the development process and allows us to write more test cases easily.

In both cases, we should document it clearly, so it would be really nice to have a tutorial to teach users on all the possible ways to do tracing

tqchen · 2018-10-25T16:37:48Z

As a side note, in this particular case, the my_debug function can be registered in c++ as well, because the runtime system allows interop between languages

denis0x0D · 2018-10-29T19:07:22Z

@tqchen thanks for the answer,
but anyway the tracing is require to write own runtime support, what do you think about generic solution written as shared library, with some options which could be applied by environment library. Looks like it would be an easy way to use it for the users.

For example "sanitizers" from llvm/compiler-rt have two parts:

Instrumentation pass.
Runtime support, which could disable/enable some features by environment variable.

The library can be build as unique dso and would require to manually link model with it.

tqchen · 2018-10-29T19:16:27Z

The current features could likely be implemented using packed function calls, to define a set of tracing functions like you do but declare them as packed function. Then optionally we could register these runtime function as a packed function, or registering different variant of it depending on whether tracing is on/off

So to summarize, I agree that tracing would need runtime support, but these runtime feature could likely be built around tvm packed function system

denis0x0D · 2018-10-29T19:31:08Z

@tqchen got it now, thanks,
I'll update the patch.

xqdan · 2018-10-30T06:42:35Z

@denis0x0D I'm considering move the dump ir to C++ side.

denis0x0D · 2018-11-02T10:45:38Z

@tqchen the strange thing I've faced with packed function - when I increase the shape of the buffer or so on.
For example:

import tvm
import numpy as np

@tvm.register_func
def my_debug(x):
    return 0

x = tvm.placeholder((1024, 1024), name="x", dtype="int32")
xbuffer = tvm.decl_buffer(x.shape, dtype=x.dtype)
y = tvm.compute(x.shape, lambda i, j: tvm.call_packed("my_debug", xbuffer))
s = tvm.create_schedule(y.op)

f = tvm.build(s, [xbuffer, y], binds={x:xbuffer})
xnd = tvm.nd.array(np.ones((1024, 1024), dtype=x.dtype))
ynd = tvm.nd.array(np.zeros((1024, 1024), dtype=y.dtype))
f(xnd, ynd)

I got Segmentation fault on 3 different platform and spent some time to debug, the segfault happens in hash table which registry the packed function. May be I miss something or may it is a bug, it would be good to get confirmation from someone else. Also I have a backtrace of error and tests case on c++.

denis0x0D · 2018-11-08T12:14:17Z

@tqchen patch is updated, tests added, python api is added alongside with c++ api.

tqchen · 2018-11-11T23:50:13Z

Thanks for the updates, the new set of changes makes use the packed_func runtime and reduces the runtime greatly. I am thinking how to even simplify this further, to consider most user's use case. Most of my reasoning comes from the fact that this is still a new feature and we may not settle down on what API user might prefer, so maybe it is a good idea to start quick mocks in python side, before we commit to heavy engineering which could become a technical debt.

Here are some specific comments:

I do feel the current flag parsing and the pre-defined prints are a bit redundant
- If in most cases users could use the python runtime to debug, would it sufficient to provide a simple python callback function
- Alternatively, a C++ PackedFunc version of log_to_file callback would be cleaner than the flag parsing + global print logic.
For example, would it be sufficient to allow a user to register tvm_trace_callback into the PackedFunc, and only trigger the callback when necessary?

Here is a mock up version of the code.

from tvm.contrib import trace

tvm.register_func("tvm.trace_callback", trace.log_to_file("mytrace.log")

# run the trace code

denis0x0D · 2018-11-12T15:22:49Z

@tqchen thanks for review, I'll update it.

denis0x0D · 2018-12-02T00:20:56Z

@tqchen updated regarding to latest comments.
Now tvm.trace is a simple wrapper under call_packed function, and has runtime part which calls user callback when it is defined.


   @tvm.register_func("tvm.trace_callback")
   def trace_buffer(x):
   # trace code



   n = 4
   x = tvm.placeholder((n,), name="X", dtype="float32")
   xbuffer = tvm.decl_buffer(x.shape, dtype=x.dtype)
   y = tvm.compute(x.shape, lambda i: tvm.trace("tvm.trace_callback", xbuffer))
   s = tvm.create_schedule(y.op)
   f = tvm.build(s, [xbuffer, y], binds={x:xbuffer})
   xnd = tvm.nd.array(np.ones((n,), dtype=x.dtype))
   ynd = tvm.nd.array(np.zeros((n,), dtype=y.dtype))
   f(xnd, ynd)

Thanks.

tqchen · 2018-12-13T18:28:24Z

sorry for the delayed review, I am just recovering from NeurIPS and TVM conference.

Is this additional level of redirection necessary in the tvm_trace_runtime? Currently, the feature being implemented seems to be fine to just call call_packed.
One of the properties of trace is that we might want to return the same specific value x[i] when we trace the content, which might need some additional work in the compiler when seeing trace, this property seems not being reflected here.

@grwlf please also comment on this.

tqchen · 2018-12-21T19:45:55Z

Let us double check the implementation of call_packed_lowered to make sure that the return value of the trace function is lowered correctly

denis0x0D · 2018-12-21T21:04:45Z

@tqchen should I add more tests ?

tqchen · 2018-12-21T22:17:01Z

@denis0x0D never mind, I checked the current code and it should support return value correctly.

tqchen · 2018-12-21T22:26:27Z

Thanks @denis0x0D for the comment, the curent logic appears to be correct. However, it is never-the-less not fast, because everytime we have to look up the callback function by name. Here is a possibly better strategy, which is not that different from current one:

Generate a intrinsic function tvm_trace(*args) in the trace frontend
In the src/codegen/llvm/codegen_cpu.cc translate tvm_trace intrinsic into logic of call_packed(by using CallPacked, without any return value) and return the corresponding argument correctly.

The advantage of this way is that we reuse the packed function caching mechanism, so there is not repeatitive lookup of packed functions, it does add a special logic to the code generator block to support tracing (mainly for returning the correct value)

tqchen

see my latest comment

tqchen · 2018-12-21T22:17:51Z

src/runtime/trace.cc

+      // Check the global registry for the user defined callback.
+      auto pf = runtime::Registry::Get(str);
+      if (pf) {
+        if (size == 2) {


no switch is necessary, you can use pf->CallPacked

this comment may no longer apply given my latest comment

denis0x0D · 2018-12-22T21:14:37Z

@tqchen thanks for review, I agree, your strategy is faster, without additional check in global registry.
I'll update the patch.

denis0x0D · 2018-12-24T15:46:24Z

@tqchen updated regarding the latest comment, the tracing value explicitly sets into return value in codegen stage for trace intrinsic.
cc @yzhliu @grwlf

tqchen · 2018-12-24T19:30:35Z

@yzhliu @grwlf Please take another round of look

sergei-mironov · 2018-12-25T11:30:34Z

Looks good to me. My use-cases all work.

tqchen · 2018-12-26T19:05:32Z

@denis0x0D thanks for the change. The logic can be improved further, here are some parts that can be improved

The condition handling is done inside the CallPacked and is a bit obscured because the behavior depends on op's name.
The returning of the traced value is done via load from stored argument buffer, while we could have directly done so in intrinsic handling.

Here is one possible way to improve the current logic.

When lowering trace, keep the traced value as the last argument.
Pass an optional return value as an argument of CallPacked, if that is not null, make sure CallPacked return the corresponding value

denis0x0D · 2019-01-05T15:07:23Z

@tqchen thanks for review, sorry for the late response, I was on a holidays.
I've updated the patch regarding to your comments:

Separated the traced machinery from tvm_packed.
The traced value passes as the last argument when lowering.
Added additional conditional branch for return value in codegen. In case the TVMRetValue is zero we return traced value, otherwise we return the value specified by the user. In this case it's easy for user to redefine the traced value from python code:

@tvm.register_func("call_back")
def call_back(x):
   print (x)
   return user_defined_value

Also tests added for that functional too.
I will be appreciate for any feedback.
Thanks.
cc @grwlf @yzhliu

tqchen

Good improvements, some final comments

tqchen · 2019-01-05T18:32:25Z

src/codegen/llvm/codegen_cpu.cc

+llvm::Value *CodeGenCPU::CreateCallTracePacked(const Call *op) {
+  using llvm::BasicBlock;
+  CHECK_EQ(op->args.size(), 6U);
+  std::string func_name = op->args[0].as<StringImm>()->value;


Please to merge the part of the CallPacked logic with CallPacked

In particular, we can likely do things like MakeCallPacked(args, tcode, begin, end)

Thanks, done. Also added some additional arguments as rvalue, r_type and returns basic block, so it's easy to create a phi node in this case.

tqchen · 2019-01-07T04:22:32Z

src/codegen/llvm/codegen_cpu.cc

+  llvm::Value *cmp;
+  llvm::Type *rvalue_type = rvalue->getType();
+  // The cmp instruction depends on processing type.
+  if (rvalue_type->isFloatTy() || rvalue_type->isDoubleTy() ||


Do not check on the return value, instead check if return tcode == kNull(pyhon code not returning anything)

thanks for mention tcode, I was spending couple of hours to understand how to check return value, and didn't find a better solution to return 0 from call back when we want to process origin value, otherwise return user-defined value, but the true way to do it is by checking tcode.

Add trace call expr to allow trace Tensor data at the runtime. By default the default handler is enabled which prints a tracing data to stdout, otherwise user should specify a call_back as global_function (aka @tvm.register_func). The issue is related to: https://discuss.tvm.ai/t/idea-trace-expression/945

denis0x0D · 2019-01-07T14:36:52Z

@tqchen thanks for review, I've updated the patch regarding to latest comments.

tqchen · 2019-01-08T06:46:34Z

Thanks @denis0x0D @grwlf @yzhliu !This is now merged !

Add trace call expr to allow trace Tensor data at the runtime. By default the default handler is enabled which prints a tracing data to stdout, otherwise user should specify a call_back as global_function (aka @tvm.register_func). The issue is related to: https://discuss.tvm.ai/t/idea-trace-expression/945

denis0x0D force-pushed the sandbox/trace branch 3 times, most recently from eef6670 to 5bedd3e Compare October 24, 2018 13:09

tqchen self-assigned this Oct 24, 2018

tqchen added the status: need update need update based on feedbacks label Oct 26, 2018

denis0x0D force-pushed the sandbox/trace branch from 5bedd3e to 6abdbcb Compare November 2, 2018 09:33

denis0x0D mentioned this pull request Nov 5, 2018

[RUNTIME][PackedFunc][Registry::Manager] Segfault with Registry::Manager and PackedFunc on x86_64 GNU/Linux. #2067

Closed

denis0x0D force-pushed the sandbox/trace branch 2 times, most recently from 804592d to 89e6a89 Compare November 8, 2018 11:22

denis0x0D force-pushed the sandbox/trace branch from 89e6a89 to f965a1d Compare November 9, 2018 15:05

denis0x0D force-pushed the sandbox/trace branch 3 times, most recently from a96246b to f23b0e3 Compare December 1, 2018 23:36

merrymercy added status: need review and removed status: need update need update based on feedbacks labels Dec 11, 2018

tqchen added the status: need update need update based on feedbacks label Dec 13, 2018

sergei-mironov approved these changes Dec 21, 2018

View reviewed changes

tqchen reviewed Dec 21, 2018

View reviewed changes

denis0x0D force-pushed the sandbox/trace branch 2 times, most recently from 00800b2 to a3d6a34 Compare December 24, 2018 14:19

tqchen added the status: need review label Dec 24, 2018

tqchen removed the status: need review label Dec 31, 2018

denis0x0D force-pushed the sandbox/trace branch from a3d6a34 to f1d69fc Compare January 5, 2019 14:07

tqchen requested changes Jan 7, 2019

View reviewed changes

denis0x0D force-pushed the sandbox/trace branch from f1d69fc to 01ac54e Compare January 7, 2019 13:12

denis0x0D force-pushed the sandbox/trace branch from 01ac54e to f813889 Compare January 7, 2019 13:25

tqchen approved these changes Jan 8, 2019

View reviewed changes

tqchen merged commit a12c556 into apache:master Jan 8, 2019

tqchen added status: accepted and removed status: need update need update based on feedbacks labels Jan 8, 2019

[RUNTIME][TRACE] Support trace primitive. #1973

[RUNTIME][TRACE] Support trace primitive. #1973

Uh oh!

Conversation

denis0x0D commented Oct 24, 2018

Uh oh!

tqchen commented Oct 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

denis0x0D commented Oct 25, 2018

Uh oh!

tqchen commented Oct 25, 2018

Uh oh!

tqchen commented Oct 25, 2018

Uh oh!

denis0x0D commented Oct 29, 2018

Uh oh!

tqchen commented Oct 29, 2018

Uh oh!

denis0x0D commented Oct 29, 2018

Uh oh!

xqdan commented Oct 30, 2018

Uh oh!

denis0x0D commented Nov 2, 2018

Uh oh!

denis0x0D commented Nov 8, 2018

Uh oh!

tqchen commented Nov 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

denis0x0D commented Nov 12, 2018

Uh oh!

denis0x0D commented Dec 2, 2018

Uh oh!

tqchen commented Dec 13, 2018

Uh oh!

tqchen commented Dec 21, 2018

Uh oh!

denis0x0D commented Dec 21, 2018

Uh oh!

tqchen commented Dec 21, 2018

Uh oh!

tqchen commented Dec 21, 2018

Uh oh!

tqchen left a comment

Choose a reason for hiding this comment

Uh oh!

tqchen Dec 21, 2018

Choose a reason for hiding this comment

Uh oh!

tqchen Dec 21, 2018

Choose a reason for hiding this comment

Uh oh!

denis0x0D commented Dec 22, 2018

Uh oh!

denis0x0D commented Dec 24, 2018

Uh oh!

tqchen commented Dec 24, 2018

Uh oh!

sergei-mironov commented Dec 25, 2018

Uh oh!

tqchen commented Dec 26, 2018

Uh oh!

denis0x0D commented Jan 5, 2019

Uh oh!

tqchen left a comment

Choose a reason for hiding this comment

Uh oh!

tqchen Jan 5, 2019

Choose a reason for hiding this comment

Uh oh!

tqchen Jan 7, 2019

Choose a reason for hiding this comment

Uh oh!

denis0x0D Jan 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tqchen Jan 7, 2019

Choose a reason for hiding this comment

tqchen commented Oct 24, 2018 •

edited

Loading

tqchen commented Nov 11, 2018 •

edited

Loading

denis0x0D Jan 7, 2019 •

edited

Loading