Skip to content

Add cuda::ptx::cp_reduce_async_bulk#1445

Merged
miscco merged 3 commits intoNVIDIA:mainfrom
ahendriksen:add-ptx-cp-reduce-async-bulk
Mar 1, 2024
Merged

Add cuda::ptx::cp_reduce_async_bulk#1445
miscco merged 3 commits intoNVIDIA:mainfrom
ahendriksen:add-ptx-cp-reduce-async-bulk

Conversation

@ahendriksen
Copy link
Copy Markdown
Contributor

Description

closes #1444

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@ahendriksen ahendriksen requested review from a team as code owners February 27, 2024 17:14
@ahendriksen ahendriksen requested review from griwes and miscco February 27, 2024 17:14
Comment thread libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/ptx.h
1. Add the ifdef
2. Add min, max support for f16 and bf16 (I overlooked this initially)
@ahendriksen ahendriksen force-pushed the add-ptx-cp-reduce-async-bulk branch from 44aa1af to 312c5b6 Compare February 28, 2024 16:05
}
#endif // __cccl_ptx_isa >= 800

#ifdef _LIBCUDACXX_HAS_NVF16
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, the PR that brings this in has not been merged, so that will currently always be off until we merge #1140

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.. This issue probably caused the tests to fail, so I have guarded the tests on this macro as well.

I am okay with the __half and bfloat16 variants not being available immediately. I have tested the generated PTX offline, so I know it works.

@miscco miscco enabled auto-merge (squash) February 29, 2024 07:15
auto-merge was automatically disabled February 29, 2024 07:25

Pull Request is not mergeable

@miscco miscco merged commit 4495154 into NVIDIA:main Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

[FEA]: Add cuda::ptx::cp_reduce_async_bulk

2 participants