Add support for padded input/output buffers in transpose and halo communication routines #60

romerojosh · 2025-03-11T22:44:47Z

Continuing on with changes to cuDecomp to increase flexibility, this PR introduces the ability to pass padded input/output buffers to the cudecompTranspose* and cudecompUpdateHalo* functions. This can come up naturally in use cases involving real-to-complex/complex-to-real FFTs, where users may store real space data in buffers with a padded dimension (e.g. using a buffer of dimension [2*(N/2 + 1), N, N] real values) to facilitate in-place FFT operations. Previously, there was no way for cuDecomp to transpose only the relevant [N, N, N] real values of this array directly without an intermediate copy to an unpadded buffer, or perform halo updates without padded elements. With this new feature, users can specify the padded elements (per axis) for cuDecomp to ignore when performing communication. Outside of this specific example, handling padded buffers generally broadens the applicability of cuDecomp to more user scenarios.

To enable this, new arguments have been added to several APIs:

input_padding/output_padding in the cudecompTranspose* routines
padding in the cudecompUpdateHalo* routines
padding in the cudecompGetPencilInfo routine

In other words, this is a breaking API change.

In all cases, the padding arguments are vectors of 3 integers, specifying the number of padded elements per axis (in global order). For example, a padding argument of [1, 0, 2] specifies that there is a padding of 1 element in the X-direction, no padding in the Y-dimension and a padding of 2 elements in the Z-direction for the buffer.

A summary of changes required in existing code after this PR lands:

C/C++:

cudecompTranspose* routines require additional nullptr entries for the new input_padding/output_padding arguments.

// Old:
cudecompTransposeXToY(handle, grid_desc, input, output, work, dtype, input_halo_extents, output_halo_extents, stream);
// New:
cudecompTransposeXToY(handle, grid_desc, input, output, work, dtype, input_halo_extents, output_halo_extents, nullptr, nullptr, stream);

cudecompUpdateHalo* routines require additional nullptr entries for the new padding argument

// Old:
cudecompUpdateHalosX(handle, grid_desc, input, work, dtype, halo_extents, halo_periods, dim, stream);
// New:
cudecompUpdateHalosX(handle, grid_desc, input, work, dtype, halo_extents, halo_periods, dim, nullptr, stream);

cudecompGetPencilInfo routine requires an additional nullptr entry for the new padding argument

// Old:
cudecompGetPencilInfo(handle, grid_desc, pencil_info, axis, halo_extents);
// New:
cudecompGetPencilInfo(handle, grid_desc, pencil_info, axis, halo_extents, nullptr);

Fortran:

cudecompTranspose* calls need to be modified to handle new optional input_padding/output_padding arguments if stream argument is specified as an unnamed argument:

// Old:
call cudecompTransposeXToY(handle, grid_desc, input, output, work, dtype, input_halo_extents, output_halo_extents, stream)
// New:
call cudecompTransposeXToY(handle, grid_desc, input, output, work, dtype, input_halo_extents, output_halo_extents, [0, 0, 0], [0, 0, 0], stream)
// OR
call cudecompTransposeXToY(handle, grid_desc, input, output, work, dtype, input_halo_extents, output_halo_extents, stream=stream)

cudecompUpdateHalo* calls potentially need to be modified to handle new padding argument if stream arguemnt is specified as an unnamed argument:

// Old:
call cudecompUpdateHalosX(handle, grid_desc, input, work, dtype, halo_extents, halo_periods, dim, stream)
// New:
cudecompUpdateHalosX(handle, grid_desc, input, work, dtype, halo_extents, halo_periods, dim, [0, 0, 0], stream)
// OR
cudecompUpdateHalosX(handle, grid_desc, input, work, dtype, halo_extents, halo_periods, dim, stream=stream)

This PR also includes some test reorganization to handle the new padded argument cases and some additional optimizations to the test programs to improve throughput.

romerojosh · 2025-03-11T22:50:37Z

Since this PR breaks the existing cuDecomp API, pinging a few folks directly for awareness/comment:
@ASKabalan / @EiffL for JaxDecomp
@p-costa for CaNS

…timization.

romerojosh added 8 commits March 18, 2025 09:57

Begin work on adding padding argument support. Transposes updated.

29903e6

Add padding support to halo implementation and API.

ab1add2

Fixes after rebase.

1d3b655

Adding padding test cases with some additional test reorganization/op…

0261a29

…timization.

Fix in test_runner.py

534f2cf

Use workspace caching in test implementation to improve efficiency.

9ad04d0

Update documentation.

3a7ad3c

Remove unused CHECK_CUDA macros from Fortran tests.

a3805be

romerojosh force-pushed the padding_support branch from b664f12 to a3805be Compare March 18, 2025 16:58

romerojosh merged commit eedbcad into main Mar 18, 2025

romerojosh deleted the padding_support branch March 18, 2025 22:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for padded input/output buffers in transpose and halo communication routines #60

Add support for padded input/output buffers in transpose and halo communication routines #60

Uh oh!

romerojosh commented Mar 11, 2025

Uh oh!

romerojosh commented Mar 11, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add support for padded input/output buffers in transpose and halo communication routines #60

Add support for padded input/output buffers in transpose and halo communication routines #60

Uh oh!

Conversation

romerojosh commented Mar 11, 2025

Uh oh!

romerojosh commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

romerojosh commented Mar 11, 2025 •

edited

Loading