Skip to content

Conversation

@aosewski
Copy link
Collaborator

@aosewski aosewski commented Jan 8, 2026

The main changes include the introduction of composite concepts for validating block transfers, the replacement of legacy naming for access order, and the addition of helper utilities for compile-time checks.

In this PR transfer concepts are validated just in xdl, v3, wmma and large tensor factories. More would be added in following PRs.

Key changes:

Validation and Concepts

  • Introduced composite concepts (ValidABlockTransfer, ValidBBlockTransfer, ValidCBlockTransfer) that encapsulate multiple checks for block transfer validity, including vector size, access order, cluster size, and tile coverage. These replace several individual static_asserts in factory classes, enabling more robust compile-time validation. [1] [2] [3] [4] [5]

  • Added helper utilities in the detail namespace to compute cluster sizes, coverage, and vector size constraints at compile time, supporting the new validation concepts.

Access Order Naming Consistency

  • Renamed block_transfer_access_order to thread_cluster_arrange_order throughout the codebase for clarity and consistency, including in concepts, struct members, and helper functions. [1] [2] [3] [4] [5]

Layout Validation

  • Added stricter layout validation checks in all convolution factory classes using the new IsValidLayout concept, ensuring only supported tensor layouts are accepted and that vectorization occurs on the correct dimension. [1] [2] [3] [4]

@aosewski aosewski self-assigned this Jan 8, 2026
@aosewski aosewski changed the title Aosewski/trasfer concept [BUILDER] Convolution forward transfer concepts. Jan 8, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces robust compile-time validation for convolution forward transfer operations by replacing individual static assertions with composite concepts that validate block transfers comprehensively. The changes improve code maintainability through consistent naming conventions and add helper utilities for compile-time checks.

Key changes:

  • Introduced composite validation concepts (ValidABlockTransfer, ValidBBlockTransfer, ValidCBlockTransfer) that replace multiple individual static_assert statements with comprehensive checks for vector size, access order, cluster size, and tile coverage
  • Renamed block_transfer_access_order to thread_cluster_arrange_order throughout the codebase for improved clarity
  • Added stricter layout validation checks using the IsValidLayout concept in factory classes

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
conv_algorithm_limits.hpp Added helper utilities in detail namespace for computing cluster sizes, coverage, and vector size constraints; introduced composite validation concepts
conv_fwd_xdl_factory.hpp Replaced individual assertions with composite concepts; added comprehensive layout validation
conv_fwd_wmma_factory.hpp Replaced individual assertions with composite concepts; added layout validation
conv_fwd_v3_factory.hpp Replaced individual assertions with composite concepts; added layout validation
conv_fwd_large_tensor_factory.hpp Replaced individual assertions with composite concepts; added layout validation
conv_algorithm_concepts.hpp Updated concept definitions to use new thread_cluster_arrange_order naming
conv_block_transfer.hpp Updated helper function to use new thread_cluster_arrange_order naming
conv_algorithm_types.hpp Renamed struct member from block_transfer_access_order to thread_cluster_arrange_order
test_conv_description.cpp Updated test configurations to use new naming convention
ckb_conv_test_configs.hpp Updated test configurations to use new naming convention; adjusted vector size values
conv_algorithm_type_utils.hpp Updated string conversion to use new naming convention

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

{
return std::tuple_size_v<Range>;
}
}
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing return statement in get_range_size() function. When neither HasStaticSize nor HasTupleSize conditions are met, the function has no return value, leading to undefined behavior.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add an else branch with static_assert to produce a clear compile-time error?

.block_transfer = {.k0 = 4, .m_n = 64, .k1 = 1},
.lds_transfer = {.src_vector_dim = 2,
.src_scalar_per_vector = 2,
.lds_dst_scalar_per_vector = 4,
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change from lds_dst_scalar_per_vector = 8 to 4 alters transfer behavior but lacks corresponding test coverage to verify this configuration works correctly with the new validation concepts.

Copilot uses AI. Check for mistakes.
Comment on lines +58 to +59
.src_scalar_per_vector = 4,
.lds_dst_scalar_per_vector = 4,
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to vector size values (from 8 to 4) for B transfer configuration lack test coverage to ensure these values satisfy the new composite validation concepts.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@vpietila-amd vpietila-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, looks good to me. Only some minor comment for possible improvements.

concept SpecifiesThreadClusterAccessOrder = requires(T t) {
{ T::transfer.a.block_transfer_access_order } -> AccessOrderDescriptor;
{ T::transfer.b.block_transfer_access_order } -> AccessOrderDescriptor;
{ T::transfer.a.thread_cluster_arrange_order } -> AccessOrderDescriptor;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it "access order" or "arrange order"? The concept is referring to "access order" but the expected member is "arrange order".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The template parameters refer to this sequence as "arrange order". Should we also change the concept name to SpecifiesThreadClusterAccessOrder?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed this name cause it acutally refers to the arrange order which is different (semantically) from access order. However the concept actually is working for both ;) I didn't wanted to introduce exactly same (logically) concept with just different name.... But yeah I'd improve it maybe by renaming the concept.

{
return std::tuple_size_v<Range>;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add an else branch with static_assert to produce a clear compile-time error?

template <size_t DataTypeSize>
constexpr auto get_data_max_vec_size()
{
// this is arch specific - but all current gfx9 has same value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a pending PR #3509 that will add the WMMA factories for gfx11 and gfx12 architectures. Can we take also these architectures into account here?


template <size_t ScalarPerVec, size_t DataTypeSize>
concept IsVectorSizeValid = requires {
requires((ScalarPerVec & (ScalarPerVec - 1)) == 0) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would improve readability if the power of two check was a separate concept with descriptive name., or just some short comment that ScalarPerVec must be power of two.

@aosewski aosewski changed the title [BUILDER] Convolution forward transfer concepts. [CK_BUILDER] Convolution forward transfer concepts. Jan 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants