Skip to content

Conversation

@masahi
Copy link
Member

@masahi masahi commented Jun 9, 2022

This pass looks for global to shared memory copy enclosed in the new tir::attr::async_scope scope, and replace that with PTX cp.async intrinsics added in #11368.

This pass is disabled by default, since cp.async is only supported by NV gpus with sm >= 80. For now, the attr tir::attr::async_scope and the proper synchronization need to be inserted manually in the input TIR. But I have a working branch https://github.com/apache/tvm/compare/main...masahi:inject-async-copy?expand=1 that automatically inserts such async regions and synchronizations as part of the software pipeline transform.

@vinx13 @junrushao1994 @tqchen @csullivan

@junrushao
Copy link
Member

Amazing work!!


Stmt VisitStmt_(const BufferStoreNode* store) {
if (in_async && (store->buffer.scope() == "shared" || store->buffer.scope() == "shared.dyn")) {
if (auto* load = store->value.as<BufferLoadNode>()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering how we should handle the case that the value is not BufferLoad? For padding case maybe this can rely on the intrin provide predicated read, not sure about more complicated case. But this PR is good, no action needed for now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Indeed, currently it only supports the fixed pattern shared[...] = global[...]. But I think we can add more patterns as they come up, as long as we can extract the src pointer and the offset.

@vinx13 vinx13 merged commit 53d163c into apache:main Jun 10, 2022
Kathryn-cat pushed a commit to Kathryn-cat/tvm that referenced this pull request Jun 10, 2022
…async (apache#11658)

* [TIR, CUDA] Add pass to replace global to shared memory copy with cp.async

* add missing doc

* black

* missing src

* clang format

* clang format

* check against nested async scope
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants