Skip to content

Conversation

@kun-zh
Copy link
Contributor

@kun-zh kun-zh commented Oct 26, 2018

Thanks for contributing to TVM! Please refer to guideline https://docs.tvm.ai/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers.

For accelerators, host device splitting is different from gpu, as discussed here,
https://discuss.tvm.ai/t/split-we-want-to-split-the-host-and-device-code-flexibly/932

this is a solution we are using for DaVinci core, and can use for other accelerators.

@tqchen @yzhliu @tmoreau89 @ZihengJiang please review, thanks!

@yzhliu
Copy link
Member

yzhliu commented Oct 26, 2018

Thanks for contribution. Could you add unittest for this?

bounds = tvm.schedule.InferBound(s)
stmt = tvm.schedule.ScheduleOps(s, bounds)
stmt = tvm.ir_pass.DeviceMark(stmt)
print stmt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try to use assert instead of print, you can find some examples in other tests

@kun-zh
Copy link
Contributor Author

kun-zh commented Oct 28, 2018

add the unit case @yzhliu

* \param stmt The stmt to be trasnformed
* \return Transformed stmt.
*/
Stmt DeviceMark(Stmt stmt);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as per our current convention, this should be Mark Device

op->attr_key == attr::pipeline_exec_scope) {
op->attr_key == attr::pipeline_exec_scope ||
(op->attr_key == attr::pragma_scope_prefix &&
op->value.as<StringImm>()->value == "device")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in our current convention, we use attr=pragma_xyz to indicate a special pragma, so if we want to do this, we should use pragma_device in the key and ignore the value(can be just constant 0)

@kun-zh
Copy link
Contributor Author

kun-zh commented Oct 28, 2018

Actually, it is a feature for some hardware accelerator which can't be binded. The pass is not essential for GPU or CPU. Using a pass will be flexible, user can add it in their own custom pass list, just like coproc_sync. And there will be no side-effect for current solution. @tqchen

namespace ir {

class DetectDevice : public IRMutator {
public:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IRMutator is not necessary, if we want to support this one, we can just inline the logic to MarkDevice

}
};

Stmt MarkDevice(Stmt stmt) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MarkDevice->DecorateDeviceScope, which reflects what we really want

Stmt LowerStorageAccessInfo(Stmt stmt);

/*!
* \brief insert the mark of device for the hardware accelarator when
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decorate the stmt with a device scope, this is helpful for hardware accelerator without thread blocks

@kun-zh
Copy link
Contributor Author

kun-zh commented Oct 29, 2018

@tqchen Changed per the request

@tqchen tqchen merged commit 92f82c8 into apache:master Oct 29, 2018
@tqchen
Copy link
Member

tqchen commented Oct 29, 2018

Thanks @kun-zh @ZihengJiang , this is merged

eqy pushed a commit to eqy/tvm that referenced this pull request Oct 29, 2018
FrozenGene pushed a commit to FrozenGene/tvm that referenced this pull request Dec 27, 2018
wweic pushed a commit to neo-ai/tvm that referenced this pull request Feb 20, 2019
wweic pushed a commit to neo-ai/tvm that referenced this pull request Feb 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants