-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[Hexagon] Add support for instrumentation based profiling for Hexagon #12971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Hexagon] Add support for instrumentation based profiling for Hexagon #12971
Conversation
This's done by instrumenting the code with profiling builtin calls using a TIR pass.
During codegen, these builtin calls are replaced with the calls to a hexagon specific
handler which records the runtime information into a buffer. This buffer is written
into a JSON file ('lwp.json') which is processed to construct function and loop-level
profiling information as a csv file.
At a high-level, this PR makes the following changes:
1) Add a TIR pass (src/tir/transforms/profile_instrumentation.cc) to instrument the
functions and loops with profilging builtins.
2) Hexagon codegen changes to replace profilng builtin calls with the call to Hexagon
specific handler. This handler record the runtime data into a buffer. For all other
targets, these builtin calls are ignored.
3) Add API to RPC Launcher to get the profiling data as a JSON file
4) A python script to process the profiling data and construct a CSV file
5) Add TVM script based unit tests to test and demonstrate various profiling config
flags: tests/python/unittest/test_tir_transform_profiling_instr.py
6) Adds two tests in tests/python/contrib/test_hexagon/test_launcher.py to demonstrate
necessary changes to enable profiling and to collect and process runtime data.
For additional details, please refer to src/runtime/hexagon/profiler/README.md
|
CC @tkonolige |
apps/hexagon_launcher/README.md
Outdated
| Here, `instrument_lwp` is used to enable the tir pass which instruments the code with the builtin calls. | ||
|
|
||
| During codegen, profiling builtin calls can be replaced with a target specific handler to record runtime | ||
| information into a buffer. This buffer is written into a JSON file which is proccessed to construct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
proccessed -> processed
| # Hexagon lightweight instrumentation based profiling (LWP) | ||
|
|
||
| For Hexagon, LWP can be used to get function and loop level processor cycle count. | ||
| This's done by instrumenting the code with profiling builtin calls using a TIR pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This's -> This is
|
@tkonolige - do you have some cycles to take a look at |
|
@jverma-quic are you able to resolve the merge conflicts in |
tkonolige
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jverma-quic this is pretty awesome! I've wanted something like this for a while.
Thinking longer-term, I'd like to see this supported for more targets than hexagon. Maybe we could discuss (not in this PR) how we could integrate this work into the existing profiling tools. I'd especially love to see support for different runtime metrics (hardware counters, etc). Do you think it would be possible to integrate the (MetricCollector)[https://github.com/apache/tvm/blob/main/include/tvm/runtime/profiling.h#L278] interface into this? (Once again, I'm just thinking about the future. Its not necessary for this PR)
| unsigned int lwp_counter[LWP_COUNTER_SIZE] = {0}; | ||
| unsigned int lwp_buffer[LWP_BUFFER_SIZE]; | ||
| unsigned int* __lwp_counter = lwp_counter; | ||
| unsigned int* __lwp_buffer_ptr = lwp_buffer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use a fixed size type (uint32_t) for this?
| This is done by instrumenting the code with profiling builtin calls using a TIR pass. | ||
| During codegen, these builtin calls are replaced with the calls to a hexagon specific | ||
| handler which records the runtime information into a buffer. | ||
| This buffer is written into a JSON file ('lwp.json') which is processed to construct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason to save the data to disk vs just sending it directly over the network? Seems like it would simplify the interface and the code to not have to write out the data and then re-parse it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no easy way of getting the data from Hexagon. It needs to be saved into a file. Since this file contains the processor cycles for each handler invocation, it needs to be processed for ease of understanding.
| profiler = HexagonProfiler() | ||
| ``` | ||
|
|
||
| 4) Run the model and get profile data (`lwp.json`) from the device (or the simulator): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you include a fully runnable example here? I think it would be helpful for people trying to use the profiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there're two examples, (test_lwp and test_lwp_multiple_conv2d) in tests/python/contrib/test_hexagon/test_launcher.py. You should be able to run them on the hexagon simulator using pytest.
| @@ -0,0 +1,152 @@ | |||
| <!--- Licensed to the Apache Software Foundation (ASF) under one --> | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this file belongs in the docs directory.
@tkonolige - Thanks for the review comments and for suggestions for the future enhancements! Let me look into MetricCollector to see if it can be integrated with what I have here. The only part that makes it Hexagon specific is the LLVM intrinsic (which doesn't have to be hexagon specific) and its lowering in LLVM backend to a hexagon specific handler function that collects runtime data. |
Add type hint Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com>
Simplify the interface to the lightweight profiling.
|
Hi @tkonolige, Sorry it took me a while to address all your review comments. I think I have been able to resolve most of them. Could you please take a look? |
tkonolige
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the changes. Looking at apps/hexagon_launcher/README.md, it seems to a be a big file. I think leaving it where it is for this PR is fine. It would be good to move it to docs/how_to/deploy in a separate PR.
| "${LAUNCHER_SRC}/launcher_core.cc" | ||
| "${LAUNCHER_SRC}/launcher_hexagon.cc" | ||
| ) | ||
| set(PROFILER_DIR "${TVM_SOURCE_DIR}/src/runtime/hexagon/profiler") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| set(PROFILER_DIR "${TVM_SOURCE_DIR}/src/runtime/hexagon/profiler") | |
| set(HEXAGON_PROFILER_DIR "${TVM_SOURCE_DIR}/src/runtime/hexagon/profiler") |
|
@tkonolige - I have made a change in codegen_hexagon.cc to ignore profile builtins for llvm versions < 15.0. This builtin is lowered into a Hexagon specific llvm intrinsic, which was added prior to LLVM 15.0 release, causing upstream CI builds to fail. |
|
@tmoreau89: My PR contains a .S file which is the hand written assembly code for the profiling handler for hexagon. The CI is failing as it doesn't expect .S file to be checked in. Can I get an exception for this? |
joshherr-quic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one comment regarding the name used for the binary. It is hardcoded in the python profiler class.
| def __init__(self, module: ExecutorFactoryModule, hexagon_server_process, enable_debug): | ||
| """Configure HexagonProfiler""" | ||
| # Save test .so to process profiling data | ||
| dso_binary = "test_binary.so" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be abstracted to allow for other binary names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good suggestion, @joshherr-quic. Thanks!
|
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment. Generated by tvm-bot |
…apache#12971) * [Hexagon] Add support for instrumentation based profiling for Hexagon This's done by instrumenting the code with profiling builtin calls using a TIR pass. During codegen, these builtin calls are replaced with the calls to a hexagon specific handler which records the runtime information into a buffer. This buffer is written into a JSON file ('lwp.json') which is processed to construct function and loop-level profiling information as a csv file. At a high-level, this PR makes the following changes: 1) Add a TIR pass (src/tir/transforms/profile_instrumentation.cc) to instrument the functions and loops with profilging builtins. 2) Hexagon codegen changes to replace profilng builtin calls with the call to Hexagon specific handler. This handler record the runtime data into a buffer. For all other targets, these builtin calls are ignored. 3) Add API to RPC Launcher to get the profiling data as a JSON file 4) A python script to process the profiling data and construct a CSV file 5) Add TVM script based unit tests to test and demonstrate various profiling config flags: tests/python/unittest/test_tir_transform_profiling_instr.py 6) Adds two tests in tests/python/contrib/test_hexagon/test_launcher.py to demonstrate necessary changes to enable profiling and to collect and process runtime data. For additional details, please refer to src/runtime/hexagon/profiler/README.md * Fix typos * Update python/tvm/contrib/hexagon/build.py Add type hint Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com> * Address review comments Simplify the interface to the lightweight profiling. * Ignore profile builtins if llvm version < 15.0 * Add src/runtime/hexagon/profiler/lwp_handler.S to allowed list * Address reformatting issues * Fix pylint errors * Address remaining linter failures * clang-format issue * Fix builtin names * Resolve test failure for the simulator run * Allow for the tests to provide .so name Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com>
…apache#12971) * [Hexagon] Add support for instrumentation based profiling for Hexagon This's done by instrumenting the code with profiling builtin calls using a TIR pass. During codegen, these builtin calls are replaced with the calls to a hexagon specific handler which records the runtime information into a buffer. This buffer is written into a JSON file ('lwp.json') which is processed to construct function and loop-level profiling information as a csv file. At a high-level, this PR makes the following changes: 1) Add a TIR pass (src/tir/transforms/profile_instrumentation.cc) to instrument the functions and loops with profilging builtins. 2) Hexagon codegen changes to replace profilng builtin calls with the call to Hexagon specific handler. This handler record the runtime data into a buffer. For all other targets, these builtin calls are ignored. 3) Add API to RPC Launcher to get the profiling data as a JSON file 4) A python script to process the profiling data and construct a CSV file 5) Add TVM script based unit tests to test and demonstrate various profiling config flags: tests/python/unittest/test_tir_transform_profiling_instr.py 6) Adds two tests in tests/python/contrib/test_hexagon/test_launcher.py to demonstrate necessary changes to enable profiling and to collect and process runtime data. For additional details, please refer to src/runtime/hexagon/profiler/README.md * Fix typos * Update python/tvm/contrib/hexagon/build.py Add type hint Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com> * Address review comments Simplify the interface to the lightweight profiling. * Ignore profile builtins if llvm version < 15.0 * Add src/runtime/hexagon/profiler/lwp_handler.S to allowed list * Address reformatting issues * Fix pylint errors * Address remaining linter failures * clang-format issue * Fix builtin names * Resolve test failure for the simulator run * Allow for the tests to provide .so name Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com>
This is done by instrumenting the code with profiling builtin calls using a TIR pass. During codegen, these builtin calls are replaced with the calls to a hexagon specific handler which records the runtime information into a buffer. This buffer is written into a JSON file ('lwp.json') which is processed to construct function and loop-level profiling information as a csv file.
At a high-level, this PR makes the following changes:
For additional details, please refer to src/runtime/hexagon/profiler/README.md and apps/hexagon_launcher/README.md.