Branch/Tag/Commit
main
Docker Image Version
N/A
GPU name
A100
CUDA Driver
N/A
Reproduced Steps
This line is not safe because it is writing to a Tensor structs's data field which is a const void* (modifying constant value is undefined behavior). When I run a standalone script to test the custom all reduce, I can print the tensor's data attribute before and after the call to swapInternalBuffer and see that no change is made. I include my script below:
repro_issue671_fastertransformer.zip
Instructions:
- python3 make_npy_tensors.py
- Run main.cu
- python3 validate_npy_tensors.py
Output from my machine for main.cu:
ar_out_buffer.data (before): 0x7f65d5002400.
ar_out_buffer.data (after): 0x7f65d5002400.
DONE
We can see the data pointer of the Tensor is not changed. This prevents us from being able to use custom all reduce as there is no way to write to the all reduce input buffers.