Skip to content

Conversation

@argrento
Copy link
Contributor

@argrento argrento commented Apr 29, 2022

Profiling in TVM is enabled or disable in compile time by the USE_PROFILER switch. It means that if we enable profile in config.cmake, but do not use any profiling features in the app, OpenCL is forced to collect cl_events objects.

Build TVM with set(USE_PROFILER ON).
Consider simple app, where we create module from the .so file:

tvm::runtime::Module mod_factory = tvm::runtime::Module::LoadFromFile("model.so");
tvm::runtime::Module gmod = mod_factory.GetFunction("default")(ctx);
tvm::runtime::PackedFunc set_input = gmod.GetFunction("set_input");
tvm::runtime::PackedFunc get_input = gmod.GetFunction("get_input");
tvm::runtime::PackedFunc get_output = gmod.GetFunction("get_output");
tvm::runtime::PackedFunc run = gmod.GetFunction("run");

// set inputs and outputs

size_t niterations = 5000;
for (size_t i = 0; i < niterations; i++) {
  run();
}

Then we collect memory usage info with Valgrind.

    MB
818.5^                                                                       #
     |                                                                    @@@#
     |                                                                @@@@@@@#
     |                                                            @@@@@@@@@@@#
     |                                                         @@@@@@@@@@@@@@#
     |                                                     ::::@@@@@@@@@@@@@@#
     |                                                  @:::: :@@@@@@@@@@@@@@#
     |                                             @@@:@@:::: :@@@@@@@@@@@@@@#
     |                                           @@@ @:@@:::: :@@@@@@@@@@@@@@#
     |                                      :@@@@@@@ @:@@:::: :@@@@@@@@@@@@@@#
     |                                  :@@::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
     |                               ::::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
     |                           :@@@: ::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
     |                       :::@:@ @: ::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
     |                   ::@@: :@:@ @: ::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
     |                @@@: @@: :@:@ @: ::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
     |           :::::@@ : @@: :@:@ @: ::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
     | @@:::::::@:: ::@@ : @@: :@:@ @: ::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
     | @ :::::::@:: ::@@ : @@: :@:@ @: ::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
     | @ :::::::@:: ::@@ : @@: :@:@ @: ::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
   0 +----------------------------------------------------------------------->Gi
     0                                                                   75.95

We do not use any profiling info, but it is collected implicitly because of the compile-time switches:

  1. clCreateCommandQueue(this->context, did, CL_QUEUE_PROFILING_ENABLE, &err_code));
  2. OPENCL_CALL(clEnqueueNDRangeKernel(queue, kernel, work_dim, nullptr, wl.work_size,

With the proposed modifications this behavior is changed: clCommandQueue by default is created in the normal mode and is recreated with profiling capabilities when user calls profiler explicitly. When a profiling session is finished, the queue is recreated again in normal mode, which allows to mix profile() calls and run() calls.

With the proposed changes valgrind shows no abnormal memory usage for the example above.

    MB
148.9^#                                                                       
     |#::::::::::::::::::::::::::@:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
   0 +----------------------------------------------------------------------->Gi
     0                                                                   83.07

@argrento argrento changed the title Change of OpenCL profiling logic [OpenCL] Change of OpenCL profiling logic Apr 29, 2022
@argrento argrento changed the title [OpenCL] Change of OpenCL profiling logic [WIP] [OpenCL] Change of OpenCL profiling logic Apr 30, 2022
Copy link
Contributor

@echuraev echuraev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general LGTM. Thanks

@argrento argrento changed the title [WIP] [OpenCL] Change of OpenCL profiling logic [OpenCL] Change of OpenCL profiling logic May 9, 2022
@argrento argrento force-pushed the opencl_profiling branch from b3dd41b to e3bd930 Compare May 9, 2022 11:19
@echuraev
Copy link
Contributor

@masahi could you please take a look at this PR?

@masahi masahi merged commit 0f6abea into apache:main May 10, 2022
@argrento argrento deleted the opencl_profiling branch May 12, 2022 10:08
@argrento argrento restored the opencl_profiling branch May 12, 2022 10:08
mehrdadh pushed a commit to mehrdadh/tvm that referenced this pull request May 16, 2022
* Enable profiling only when it is used explicitly

* Change logic of clCommandQueue create/destroy

* Update comments

* Linter fix

* Refactor queue create

* Move queue recreation logic to function

* Replace profiling flag by the queue info request

* Enhance readability

* Fix linter errors
shtinsa pushed a commit to Deelvin/tvm that referenced this pull request May 17, 2022
* Enable profiling only when it is used explicitly

* Change logic of clCommandQueue create/destroy

* Update comments

* Linter fix

* Refactor queue create

* Move queue recreation logic to function

* Replace profiling flag by the queue info request

* Enhance readability

* Fix linter errors
shingjan pushed a commit to shingjan/tvm that referenced this pull request May 17, 2022
* Enable profiling only when it is used explicitly

* Change logic of clCommandQueue create/destroy

* Update comments

* Linter fix

* Refactor queue create

* Move queue recreation logic to function

* Replace profiling flag by the queue info request

* Enhance readability

* Fix linter errors
SebastianBoblest pushed a commit to SebastianBoblest/tvm that referenced this pull request May 27, 2022
* Enable profiling only when it is used explicitly

* Change logic of clCommandQueue create/destroy

* Update comments

* Linter fix

* Refactor queue create

* Move queue recreation logic to function

* Replace profiling flag by the queue info request

* Enhance readability

* Fix linter errors
@srkreddy1238
Copy link
Contributor

srkreddy1238 commented Sep 1, 2022

@argrento

This PR causes CLML profiling failure. Reason explained below

In general the workspaces can be accessed and shared via “device_api.opencl”. CLML integration shares the workspace created by default OpenCL and it has a reference to the command queue. Changing the command queue in between makes them invalid.

Why do we enable profiler in compilation when we don't want to profile any thing ?
Or
Is there any case where we enable profiling some thing else but not OpenCL / dynamically enable/disable profiling? In such cases we could think of having different compilation flag for OpenCL or environment varible to control at runtime launch.

At any case dynamically recreating the queue will cause issues for other components.

@masahi & @valmat07 comment pls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants