// nWarpPerWork = nWarps/nWorks
int nWarpPerWork = __popc(__ballot_sync(~0u, nWorks*(lane+1) <= nWarps));
int nRecvWarpPerWork = nWarpPerWork<=4 ? nWarpPerWork/2 : (nWarpPerWork-1)/2;
int nSendWarpPerWork = nWarpPerWork<=4 ? nRecvWarpPerWork : nRecvWarpPerWork+1;
i found these codes in nccl/src/device/sendrecv.h, which means recv alway use half warps , maybe nccl-tests -R param can not use in ALL2ALL?