Add support for padded input/output buffers in transpose and halo communication routines #60
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Continuing on with changes to cuDecomp to increase flexibility, this PR introduces the ability to pass padded input/output buffers to the
cudecompTranspose*andcudecompUpdateHalo*functions. This can come up naturally in use cases involving real-to-complex/complex-to-real FFTs, where users may store real space data in buffers with a padded dimension (e.g. using a buffer of dimension[2*(N/2 + 1), N, N]real values) to facilitate in-place FFT operations. Previously, there was no way for cuDecomp to transpose only the relevant[N, N, N]real values of this array directly without an intermediate copy to an unpadded buffer, or perform halo updates without padded elements. With this new feature, users can specify the padded elements (per axis) for cuDecomp to ignore when performing communication. Outside of this specific example, handling padded buffers generally broadens the applicability of cuDecomp to more user scenarios.To enable this, new arguments have been added to several APIs:
input_padding/output_paddingin thecudecompTranspose*routinespaddingin thecudecompUpdateHalo*routinespaddingin thecudecompGetPencilInforoutineIn other words, this is a breaking API change.
In all cases, the padding arguments are vectors of 3 integers, specifying the number of padded elements per axis (in global order). For example, a padding argument of
[1, 0, 2]specifies that there is a padding of 1 element in theX-direction, no padding in theY-dimension and a padding of 2 elements in theZ-direction for the buffer.A summary of changes required in existing code after this PR lands:
C/C++:
cudecompTranspose*routines require additionalnullptrentries for the newinput_padding/output_paddingarguments.cudecompUpdateHalo*routines require additionalnullptrentries for the newpaddingargumentcudecompGetPencilInforoutine requires an additionalnullptrentry for the newpaddingargumentFortran:
cudecompTranspose*calls need to be modified to handle new optionalinput_padding/output_paddingarguments ifstreamargument is specified as an unnamed argument:cudecompUpdateHalo*calls potentially need to be modified to handle newpaddingargument ifstreamarguemnt is specified as an unnamed argument:This PR also includes some test reorganization to handle the new padded argument cases and some additional optimizations to the test programs to improve throughput.