-
Notifications
You must be signed in to change notification settings - Fork 261
[CK_BUILDER] Convolution forward transfer concepts. #3535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces robust compile-time validation for convolution forward transfer operations by replacing individual static assertions with composite concepts that validate block transfers comprehensively. The changes improve code maintainability through consistent naming conventions and add helper utilities for compile-time checks.
Key changes:
- Introduced composite validation concepts (
ValidABlockTransfer,ValidBBlockTransfer,ValidCBlockTransfer) that replace multiple individualstatic_assertstatements with comprehensive checks for vector size, access order, cluster size, and tile coverage - Renamed
block_transfer_access_ordertothread_cluster_arrange_orderthroughout the codebase for improved clarity - Added stricter layout validation checks using the
IsValidLayoutconcept in factory classes
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
conv_algorithm_limits.hpp |
Added helper utilities in detail namespace for computing cluster sizes, coverage, and vector size constraints; introduced composite validation concepts |
conv_fwd_xdl_factory.hpp |
Replaced individual assertions with composite concepts; added comprehensive layout validation |
conv_fwd_wmma_factory.hpp |
Replaced individual assertions with composite concepts; added layout validation |
conv_fwd_v3_factory.hpp |
Replaced individual assertions with composite concepts; added layout validation |
conv_fwd_large_tensor_factory.hpp |
Replaced individual assertions with composite concepts; added layout validation |
conv_algorithm_concepts.hpp |
Updated concept definitions to use new thread_cluster_arrange_order naming |
conv_block_transfer.hpp |
Updated helper function to use new thread_cluster_arrange_order naming |
conv_algorithm_types.hpp |
Renamed struct member from block_transfer_access_order to thread_cluster_arrange_order |
test_conv_description.cpp |
Updated test configurations to use new naming convention |
ckb_conv_test_configs.hpp |
Updated test configurations to use new naming convention; adjusted vector size values |
conv_algorithm_type_utils.hpp |
Updated string conversion to use new naming convention |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| { | ||
| return std::tuple_size_v<Range>; | ||
| } | ||
| } |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing return statement in get_range_size() function. When neither HasStaticSize nor HasTupleSize conditions are met, the function has no return value, leading to undefined behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add an else branch with static_assert to produce a clear compile-time error?
| .block_transfer = {.k0 = 4, .m_n = 64, .k1 = 1}, | ||
| .lds_transfer = {.src_vector_dim = 2, | ||
| .src_scalar_per_vector = 2, | ||
| .lds_dst_scalar_per_vector = 4, |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change from lds_dst_scalar_per_vector = 8 to 4 alters transfer behavior but lacks corresponding test coverage to verify this configuration works correctly with the new validation concepts.
| .src_scalar_per_vector = 4, | ||
| .lds_dst_scalar_per_vector = 4, |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to vector size values (from 8 to 4) for B transfer configuration lack test coverage to ensure these values satisfy the new composite validation concepts.
vpietila-amd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, looks good to me. Only some minor comment for possible improvements.
| concept SpecifiesThreadClusterAccessOrder = requires(T t) { | ||
| { T::transfer.a.block_transfer_access_order } -> AccessOrderDescriptor; | ||
| { T::transfer.b.block_transfer_access_order } -> AccessOrderDescriptor; | ||
| { T::transfer.a.thread_cluster_arrange_order } -> AccessOrderDescriptor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it "access order" or "arrange order"? The concept is referring to "access order" but the expected member is "arrange order".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The template parameters refer to this sequence as "arrange order". Should we also change the concept name to SpecifiesThreadClusterAccessOrder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed this name cause it acutally refers to the arrange order which is different (semantically) from access order. However the concept actually is working for both ;) I didn't wanted to introduce exactly same (logically) concept with just different name.... But yeah I'd improve it maybe by renaming the concept.
| { | ||
| return std::tuple_size_v<Range>; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add an else branch with static_assert to produce a clear compile-time error?
| template <size_t DataTypeSize> | ||
| constexpr auto get_data_max_vec_size() | ||
| { | ||
| // this is arch specific - but all current gfx9 has same value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a pending PR #3509 that will add the WMMA factories for gfx11 and gfx12 architectures. Can we take also these architectures into account here?
|
|
||
| template <size_t ScalarPerVec, size_t DataTypeSize> | ||
| concept IsVectorSizeValid = requires { | ||
| requires((ScalarPerVec & (ScalarPerVec - 1)) == 0) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would improve readability if the power of two check was a separate concept with descriptive name., or just some short comment that ScalarPerVec must be power of two.
The main changes include the introduction of composite concepts for validating block transfers, the replacement of legacy naming for access order, and the addition of helper utilities for compile-time checks.
In this PR transfer concepts are validated just in xdl, v3, wmma and large tensor factories. More would be added in following PRs.
Key changes:
Validation and Concepts
Introduced composite concepts (
ValidABlockTransfer,ValidBBlockTransfer,ValidCBlockTransfer) that encapsulate multiple checks for block transfer validity, including vector size, access order, cluster size, and tile coverage. These replace several individualstatic_assertsin factory classes, enabling more robust compile-time validation. [1] [2] [3] [4] [5]Added helper utilities in the
detailnamespace to compute cluster sizes, coverage, and vector size constraints at compile time, supporting the new validation concepts.Access Order Naming Consistency
block_transfer_access_ordertothread_cluster_arrange_orderthroughout the codebase for clarity and consistency, including in concepts, struct members, and helper functions. [1] [2] [3] [4] [5]Layout Validation
IsValidLayoutconcept, ensuring only supported tensor layouts are accepted and that vectorization occurs on the correct dimension. [1] [2] [3] [4]