Skip to content

Conversation

@romerojosh
Copy link
Collaborator

Continuing on with changes to cuDecomp to increase flexibility, this PR introduces the ability to pass padded input/output buffers to the cudecompTranspose* and cudecompUpdateHalo* functions. This can come up naturally in use cases involving real-to-complex/complex-to-real FFTs, where users may store real space data in buffers with a padded dimension (e.g. using a buffer of dimension [2*(N/2 + 1), N, N] real values) to facilitate in-place FFT operations. Previously, there was no way for cuDecomp to transpose only the relevant [N, N, N] real values of this array directly without an intermediate copy to an unpadded buffer, or perform halo updates without padded elements. With this new feature, users can specify the padded elements (per axis) for cuDecomp to ignore when performing communication. Outside of this specific example, handling padded buffers generally broadens the applicability of cuDecomp to more user scenarios.

To enable this, new arguments have been added to several APIs:

  1. input_padding/output_padding in the cudecompTranspose* routines
  2. padding in the cudecompUpdateHalo* routines
  3. padding in the cudecompGetPencilInfo routine

In other words, this is a breaking API change.

In all cases, the padding arguments are vectors of 3 integers, specifying the number of padded elements per axis (in global order). For example, a padding argument of [1, 0, 2] specifies that there is a padding of 1 element in the X-direction, no padding in the Y-dimension and a padding of 2 elements in the Z-direction for the buffer.

A summary of changes required in existing code after this PR lands:

C/C++:

  1. cudecompTranspose* routines require additional nullptr entries for the new input_padding/output_padding arguments.
// Old:
cudecompTransposeXToY(handle, grid_desc, input, output, work, dtype, input_halo_extents, output_halo_extents, stream);
// New:
cudecompTransposeXToY(handle, grid_desc, input, output, work, dtype, input_halo_extents, output_halo_extents, nullptr, nullptr, stream);
  1. cudecompUpdateHalo* routines require additional nullptr entries for the new padding argument
// Old:
cudecompUpdateHalosX(handle, grid_desc, input, work, dtype, halo_extents, halo_periods, dim, stream);
// New:
cudecompUpdateHalosX(handle, grid_desc, input, work, dtype, halo_extents, halo_periods, dim, nullptr, stream);
  1. cudecompGetPencilInfo routine requires an additional nullptr entry for the new padding argument
// Old:
cudecompGetPencilInfo(handle, grid_desc, pencil_info, axis, halo_extents);
// New:
cudecompGetPencilInfo(handle, grid_desc, pencil_info, axis, halo_extents, nullptr);

Fortran:

  1. cudecompTranspose* calls need to be modified to handle new optional input_padding/output_padding arguments if stream argument is specified as an unnamed argument:
// Old:
call cudecompTransposeXToY(handle, grid_desc, input, output, work, dtype, input_halo_extents, output_halo_extents, stream)
// New:
call cudecompTransposeXToY(handle, grid_desc, input, output, work, dtype, input_halo_extents, output_halo_extents, [0, 0, 0], [0, 0, 0], stream)
// OR
call cudecompTransposeXToY(handle, grid_desc, input, output, work, dtype, input_halo_extents, output_halo_extents, stream=stream)
  1. cudecompUpdateHalo* calls potentially need to be modified to handle new padding argument if stream arguemnt is specified as an unnamed argument:
// Old:
call cudecompUpdateHalosX(handle, grid_desc, input, work, dtype, halo_extents, halo_periods, dim, stream)
// New:
cudecompUpdateHalosX(handle, grid_desc, input, work, dtype, halo_extents, halo_periods, dim, [0, 0, 0], stream)
// OR
cudecompUpdateHalosX(handle, grid_desc, input, work, dtype, halo_extents, halo_periods, dim, stream=stream)

This PR also includes some test reorganization to handle the new padded argument cases and some additional optimizations to the test programs to improve throughput.

@romerojosh
Copy link
Collaborator Author

romerojosh commented Mar 11, 2025

Since this PR breaks the existing cuDecomp API, pinging a few folks directly for awareness/comment:
@ASKabalan / @EiffL for JaxDecomp
@p-costa for CaNS

@romerojosh romerojosh merged commit eedbcad into main Mar 18, 2025
@romerojosh romerojosh deleted the padding_support branch March 18, 2025 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants