Skip to content

Conversation

@jverma-quic
Copy link
Contributor

@jverma-quic jverma-quic commented Oct 3, 2022

This is done by instrumenting the code with profiling builtin calls using a TIR pass. During codegen, these builtin calls are replaced with the calls to a hexagon specific handler which records the runtime information into a buffer. This buffer is written into a JSON file ('lwp.json') which is processed to construct function and loop-level profiling information as a csv file.

At a high-level, this PR makes the following changes:

  1. Add a TIR pass (src/tir/transforms/profile_instrumentation.cc) to instrument the functions and loops with profilging builtins.
  2. Hexagon codegen changes to replace profilng builtin calls with the call to Hexagon specific handler. This handler records the runtime data into a buffer. For all other targets, these builtin calls are ignored.
  3. Add API to RPC Launcher to get the profiling data as a JSON file
  4. Add a python script (python/tvm/contrib/hexagon/profiling/process_lwp_data.py) to process the profiling data and construct a CSV file
  5. Add TVM script based unit tests to test and demonstrate various profiling config flags: tests/python/unittest/test_tir_transform_profiling_instr.py
  6. Adds two tests in tests/python/contrib/test_hexagon/test_launcher.py to demonstrate necessary changes to enable profiling and to collect and process runtime data.

For additional details, please refer to src/runtime/hexagon/profiler/README.md and apps/hexagon_launcher/README.md.

This's done by instrumenting the code with profiling builtin calls using a TIR pass.
During codegen, these builtin calls are replaced with the calls to a hexagon specific
handler which records the runtime information into a buffer. This buffer is written
into a JSON file ('lwp.json') which is processed to construct function and loop-level
profiling information as a csv file.

At a high-level, this PR makes the following changes:

1) Add a TIR pass (src/tir/transforms/profile_instrumentation.cc) to instrument the
functions and loops with profilging builtins.
2) Hexagon codegen changes to replace profilng builtin calls with the call to Hexagon
specific handler. This handler record the runtime data into a buffer. For all other
targets, these builtin calls are ignored.
3) Add API to RPC Launcher to get the profiling data as a JSON file
4) A python script to process the profiling data and construct a CSV file
5) Add TVM script based unit tests to test and demonstrate various profiling config
flags: tests/python/unittest/test_tir_transform_profiling_instr.py
6) Adds two tests in tests/python/contrib/test_hexagon/test_launcher.py to demonstrate
necessary changes to enable profiling and to collect and process runtime data.

For additional details, please refer to src/runtime/hexagon/profiler/README.md
@tmoreau89
Copy link
Contributor

CC @tkonolige

Here, `instrument_lwp` is used to enable the tir pass which instruments the code with the builtin calls.

During codegen, profiling builtin calls can be replaced with a target specific handler to record runtime
information into a buffer. This buffer is written into a JSON file which is proccessed to construct
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proccessed -> processed

# Hexagon lightweight instrumentation based profiling (LWP)

For Hexagon, LWP can be used to get function and loop level processor cycle count.
This's done by instrumenting the code with profiling builtin calls using a TIR pass.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This's -> This is

@tmoreau89
Copy link
Contributor

@tkonolige - do you have some cycles to take a look at profile_instrumentation.cc specifically? thanks

@tmoreau89
Copy link
Contributor

@jverma-quic are you able to resolve the merge conflicts in tests/python/contrib/test_hexagon/test_launcher.py

Copy link
Contributor

@tkonolige tkonolige left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jverma-quic this is pretty awesome! I've wanted something like this for a while.

Thinking longer-term, I'd like to see this supported for more targets than hexagon. Maybe we could discuss (not in this PR) how we could integrate this work into the existing profiling tools. I'd especially love to see support for different runtime metrics (hardware counters, etc). Do you think it would be possible to integrate the (MetricCollector)[https://github.com/apache/tvm/blob/main/include/tvm/runtime/profiling.h#L278] interface into this? (Once again, I'm just thinking about the future. Its not necessary for this PR)

Comment on lines 33 to 36
unsigned int lwp_counter[LWP_COUNTER_SIZE] = {0};
unsigned int lwp_buffer[LWP_BUFFER_SIZE];
unsigned int* __lwp_counter = lwp_counter;
unsigned int* __lwp_buffer_ptr = lwp_buffer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use a fixed size type (uint32_t) for this?

This is done by instrumenting the code with profiling builtin calls using a TIR pass.
During codegen, these builtin calls are replaced with the calls to a hexagon specific
handler which records the runtime information into a buffer.
This buffer is written into a JSON file ('lwp.json') which is processed to construct
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason to save the data to disk vs just sending it directly over the network? Seems like it would simplify the interface and the code to not have to write out the data and then re-parse it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no easy way of getting the data from Hexagon. It needs to be saved into a file. Since this file contains the processor cycles for each handler invocation, it needs to be processed for ease of understanding.

profiler = HexagonProfiler()
```

4) Run the model and get profile data (`lwp.json`) from the device (or the simulator):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you include a fully runnable example here? I think it would be helpful for people trying to use the profiler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there're two examples, (test_lwp and test_lwp_multiple_conv2d) in tests/python/contrib/test_hexagon/test_launcher.py. You should be able to run them on the hexagon simulator using pytest.

@@ -0,0 +1,152 @@
<!--- Licensed to the Apache Software Foundation (ASF) under one -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file belongs in the docs directory.

@jverma-quic
Copy link
Contributor Author

@jverma-quic this is pretty awesome! I've wanted something like this for a while.

Thinking longer-term, I'd like to see this supported for more targets than hexagon. Maybe we could discuss (not in this PR) how we could integrate this work into the existing profiling tools. I'd especially love to see support for different runtime metrics (hardware counters, etc). Do you think it would be possible to integrate the (MetricCollector)[https://github.com/apache/tvm/blob/main/include/tvm/runtime/profiling.h#L278] interface into this? (Once again, I'm just thinking about the future. Its not necessary for this PR)

@tkonolige - Thanks for the review comments and for suggestions for the future enhancements! Let me look into MetricCollector to see if it can be integrated with what I have here. The only part that makes it Hexagon specific is the LLVM intrinsic (which doesn't have to be hexagon specific) and its lowering in LLVM backend to a hexagon specific handler function that collects runtime data.

jverma-quic and others added 3 commits October 5, 2022 15:59
Add type hint

Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com>
Simplify the interface to the lightweight profiling.
@jverma-quic jverma-quic requested review from tkonolige and tmoreau89 and removed request for tkonolige and tmoreau89 October 13, 2022 21:15
@jverma-quic
Copy link
Contributor Author

Hi @tkonolige, Sorry it took me a while to address all your review comments. I think I have been able to resolve most of them. Could you please take a look?
One thing that remains unresolved is your comment regarding README.md belonging to the docs directory. Do I need to move it under ./docs?

Copy link
Contributor

@tkonolige tkonolige left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the changes. Looking at apps/hexagon_launcher/README.md, it seems to a be a big file. I think leaving it where it is for this PR is fine. It would be good to move it to docs/how_to/deploy in a separate PR.

"${LAUNCHER_SRC}/launcher_core.cc"
"${LAUNCHER_SRC}/launcher_hexagon.cc"
)
set(PROFILER_DIR "${TVM_SOURCE_DIR}/src/runtime/hexagon/profiler")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
set(PROFILER_DIR "${TVM_SOURCE_DIR}/src/runtime/hexagon/profiler")
set(HEXAGON_PROFILER_DIR "${TVM_SOURCE_DIR}/src/runtime/hexagon/profiler")

@jverma-quic
Copy link
Contributor Author

@tkonolige - I have made a change in codegen_hexagon.cc to ignore profile builtins for llvm versions < 15.0. This builtin is lowered into a Hexagon specific llvm intrinsic, which was added prior to LLVM 15.0 release, causing upstream CI builds to fail.

@jverma-quic
Copy link
Contributor Author

@tmoreau89: My PR contains a .S file which is the hand written assembly code for the profiling handler for hexagon. The CI is failing as it doesn't expect .S file to be checked in. Can I get an exception for this?

@areusch areusch added needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it and removed needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it labels Oct 19, 2022
Copy link
Contributor

@joshherr-quic joshherr-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment regarding the name used for the binary. It is hardcoded in the python profiler class.

def __init__(self, module: ExecutorFactoryModule, hexagon_server_process, enable_debug):
"""Configure HexagonProfiler"""
# Save test .so to process profiling data
dso_binary = "test_binary.so"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be abstracted to allow for other binary names

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good suggestion, @joshherr-quic. Thanks!

@tvm-bot
Copy link
Collaborator

tvm-bot commented Oct 25, 2022

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

@kparzysz-quic kparzysz-quic merged commit 23c2909 into apache:main Oct 25, 2022
xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 10, 2022
…apache#12971)

* [Hexagon] Add support for instrumentation based profiling for Hexagon

This's done by instrumenting the code with profiling builtin calls using a TIR pass.
During codegen, these builtin calls are replaced with the calls to a hexagon specific
handler which records the runtime information into a buffer. This buffer is written
into a JSON file ('lwp.json') which is processed to construct function and loop-level
profiling information as a csv file.

At a high-level, this PR makes the following changes:

1) Add a TIR pass (src/tir/transforms/profile_instrumentation.cc) to instrument the
functions and loops with profilging builtins.
2) Hexagon codegen changes to replace profilng builtin calls with the call to Hexagon
specific handler. This handler record the runtime data into a buffer. For all other
targets, these builtin calls are ignored.
3) Add API to RPC Launcher to get the profiling data as a JSON file
4) A python script to process the profiling data and construct a CSV file
5) Add TVM script based unit tests to test and demonstrate various profiling config
flags: tests/python/unittest/test_tir_transform_profiling_instr.py
6) Adds two tests in tests/python/contrib/test_hexagon/test_launcher.py to demonstrate
necessary changes to enable profiling and to collect and process runtime data.

For additional details, please refer to src/runtime/hexagon/profiler/README.md

* Fix typos

* Update python/tvm/contrib/hexagon/build.py

Add type hint

Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com>

* Address review comments

Simplify the interface to the lightweight profiling.

* Ignore profile builtins if llvm version < 15.0

* Add src/runtime/hexagon/profiler/lwp_handler.S to allowed list

* Address reformatting issues

* Fix pylint errors

* Address remaining linter failures

* clang-format issue

* Fix builtin names

* Resolve test failure for the simulator run

* Allow for the tests to provide .so name

Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com>
xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
…apache#12971)

* [Hexagon] Add support for instrumentation based profiling for Hexagon

This's done by instrumenting the code with profiling builtin calls using a TIR pass.
During codegen, these builtin calls are replaced with the calls to a hexagon specific
handler which records the runtime information into a buffer. This buffer is written
into a JSON file ('lwp.json') which is processed to construct function and loop-level
profiling information as a csv file.

At a high-level, this PR makes the following changes:

1) Add a TIR pass (src/tir/transforms/profile_instrumentation.cc) to instrument the
functions and loops with profilging builtins.
2) Hexagon codegen changes to replace profilng builtin calls with the call to Hexagon
specific handler. This handler record the runtime data into a buffer. For all other
targets, these builtin calls are ignored.
3) Add API to RPC Launcher to get the profiling data as a JSON file
4) A python script to process the profiling data and construct a CSV file
5) Add TVM script based unit tests to test and demonstrate various profiling config
flags: tests/python/unittest/test_tir_transform_profiling_instr.py
6) Adds two tests in tests/python/contrib/test_hexagon/test_launcher.py to demonstrate
necessary changes to enable profiling and to collect and process runtime data.

For additional details, please refer to src/runtime/hexagon/profiler/README.md

* Fix typos

* Update python/tvm/contrib/hexagon/build.py

Add type hint

Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com>

* Address review comments

Simplify the interface to the lightweight profiling.

* Ignore profile builtins if llvm version < 15.0

* Add src/runtime/hexagon/profiler/lwp_handler.S to allowed list

* Address reformatting issues

* Fix pylint errors

* Address remaining linter failures

* clang-format issue

* Fix builtin names

* Resolve test failure for the simulator run

* Allow for the tests to provide .so name

Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants