-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[PASS] add a pass for the specific hardware accelarator when it is not binded #1999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for contribution. Could you add unittest for this? |
| bounds = tvm.schedule.InferBound(s) | ||
| stmt = tvm.schedule.ScheduleOps(s, bounds) | ||
| stmt = tvm.ir_pass.DeviceMark(stmt) | ||
| print stmt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try to use assert instead of print, you can find some examples in other tests
|
add the unit case @yzhliu |
include/tvm/ir_pass.h
Outdated
| * \param stmt The stmt to be trasnformed | ||
| * \return Transformed stmt. | ||
| */ | ||
| Stmt DeviceMark(Stmt stmt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as per our current convention, this should be Mark Device
src/pass/split_host_device.cc
Outdated
| op->attr_key == attr::pipeline_exec_scope) { | ||
| op->attr_key == attr::pipeline_exec_scope || | ||
| (op->attr_key == attr::pragma_scope_prefix && | ||
| op->value.as<StringImm>()->value == "device")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in our current convention, we use attr=pragma_xyz to indicate a special pragma, so if we want to do this, we should use pragma_device in the key and ignore the value(can be just constant 0)
|
Actually, it is a feature for some hardware accelerator which can't be binded. The pass is not essential for GPU or CPU. Using a pass will be flexible, user can add it in their own custom pass list, just like coproc_sync. And there will be no side-effect for current solution. @tqchen |
src/pass/detect_device.cc
Outdated
| namespace ir { | ||
|
|
||
| class DetectDevice : public IRMutator { | ||
| public: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IRMutator is not necessary, if we want to support this one, we can just inline the logic to MarkDevice
src/pass/detect_device.cc
Outdated
| } | ||
| }; | ||
|
|
||
| Stmt MarkDevice(Stmt stmt) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MarkDevice->DecorateDeviceScope, which reflects what we really want
include/tvm/ir_pass.h
Outdated
| Stmt LowerStorageAccessInfo(Stmt stmt); | ||
|
|
||
| /*! | ||
| * \brief insert the mark of device for the hardware accelarator when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decorate the stmt with a device scope, this is helpful for hardware accelerator without thread blocks
|
@tqchen Changed per the request |
|
Thanks @kun-zh @ZihengJiang , this is merged |
Thanks for contributing to TVM! Please refer to guideline https://docs.tvm.ai/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers.
For accelerators, host device splitting is different from gpu, as discussed here,
https://discuss.tvm.ai/t/split-we-want-to-split-the-host-and-device-code-flexibly/932
this is a solution we are using for DaVinci core, and can use for other accelerators.
@tqchen @yzhliu @tmoreau89 @ZihengJiang please review, thanks!