Clarify build strategy for heterogeneous applications (and clean all build options)

I am opening an issue that is a bit of catch-all container in the area of build options, of c++ vs cuda, of host vs device.

This started off from the work I want to do to integrate the bridge, with a simple test emulating fortran random/sampling to connect to cuda ME.

Take a component like rambo for instance. This can do the work on the host or on the device, even if in both cases the ME is computed on the device. The point is that I need a build of the c++/host version of rambo to link to the cuda/device version of the GPU. So far, fo rthings like rambo we only had EITHER a gcc build of the c++/host version, OR a nvcc build of the cuda/device version. Now I'd like to add also a c++/host version that I can use with the ME cuda. (All these issues will become the norm for truly heterogeneous workloads as in #85).

The easiest would be essentially to build rambo c++/host with nvcc. After all, nvcc is a c++ compiler too. In a way, it would be nice for instance to test SIMD C++ vectorization in an nvcc build. The problem is that setting simply CXX=nvcc runs into various other issues. Some may be fixed using the options to forward unknow to compiler/linker, but not all issues. There are also some -ccbin and -Xcompiler to clean up. Also, is CXXFLAGS really needed on all link instructions in the Makefile? There is quite some cleanup to do.

On the code side, there are (my fault) many different namespaces for cuda and c++. I am converging on the idea of having just two, say mg5amcOnCpu and mg5amcOnGpu: the latter is for a __CUDACC__ (ie nvcc) build, the former for a gcc/clang build.

Also on the code side, setting things like rambo as both __device__ and __host__ should ensure that a single nvcc build makes it usable both on the CPU and on the GPU.

So, in principle, one could aim for
- CPU-only application: mg5amcOnCPU namespace, build all using your favorite gcc/clang/ipcx compiler
- CPU+GPU application (which for instance requires cudaMallocHost instead of malloc on the host): mg5amcOnGpu namespace, build all using nvcc, making sure that it delegates the c++ stuff correctly to your favorite glcc/clang/icpx compiler (so in principle you should get the same performance, even from the ME vectorization in c++)

This is not urgent, but about some of the issues it's better to think earlier rather than later..  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify build strategy for heterogeneous applications (and clean all build options) #318

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clarify build strategy for heterogeneous applications (and clean all build options) #318

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions