Add new transpose_mem_order configuration argument to enable more flexible pencil memory layouts. #49
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In cuDecomp, users currently have only two options for memory layout orders for their pencil data buffers:
[X, Y, Z]corresponding totranspose_axis_contiguous[axis] = falsetranspose_axis_contiguous[axis] = true(e.g.,[Y, Z, X]for the Y-axis pencil).This can limit the ease of application of cuDecomp to codes that have established their own memory orderings that do not match what is currently available.
This PR enables users to more flexibly set their own desired pencil buffer memory layouts by pencil axis via a new
transpose_mem_orderentry in thecudecompGridDescConfigstructure. This new entry overrides the setting fromtranspose_axis_contiguous.From the updated documentation:
Advanced users who require more flexibility in the memory layout of the pencil buffers can override the layouts available via
transpose_axis_contiguousby setting thetranspose_mem_orderarray in the configuration structure. This array enablesusers to set arbitrary memory layout orders for the pencil buffers by axis. For example, a user can set this structure as follows to have pencil memory in
[X, Y, Z]order for the X-pencil and[Z, Y, X]order for the Y- and Z-pencils:In C++:
In Fortran: