Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 5 additions & 16 deletions docs/docs/GPU-Support-Guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,23 +20,12 @@ NVIDIA virtual machine and instruction set architecture that is generated in the
You can learn more about PTX [here](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html). A fatbin may
have one or the other type of code, or both, for one or a set of different architectures.

By default, the OpenMPF components are built for maximum portability across NVIDIA GPU architectures. The nvcc flags
OpenMPF components should be built for maximum portability across NVIDIA GPU architectures. The nvcc flags
to accomplish this are described in this
[table](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#options-for-steering-gpu-code-generation).
OpenMPF uses the `-gencode` flag, with the `-arch=compute_30` and `-code=compute_30` flags. This generates PTX code
for the minimum compute capability; at runtime, the NVIDIA driver will just-in-time compile the PTX code for the
architecture the code is running on.

## Customizing the GPU Compile Flags

OpenMPF has several GPU components. Initially, we tested a GPU component on a variety of NVIDIA GPU architectures and
found an insignificant difference in the run time for different architectures using this approach, and so we have opted
to provide maximum runtime portability. For any new components that may be developed, this may not be the case, and
similar testing should be undertaken to determine the correct set of flags for that component. The nvcc compiler flags
are configured by setting the `CUDA_NVCC_FLAGS` CMake variable in the individual component's CMakeLists.txt file, e.g.:
```
set(CUDA_NVCC_FLAGS --compiler-options -fPIC -gencode arch=compute_30,code=compute_30)
```
[table](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#options-for-steering-gpu-code-generation).
If you are using CMake to build the component, the compute capabilities can be specified in a couple of different ways,
depending on the version of CMake that is being used. See for example [CMAKE_CUDA_FLAGS](https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_FLAGS.html),
or [CMAKE_CUDA_ARCHITECTURES](https://cmake.org/cmake/help/latest/variable/CMAKE_CUDA_ARCHITECTURES.html).

# OpenCV GPU Support

Expand Down
24 changes: 5 additions & 19 deletions docs/site/GPU-Support-Guide/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -172,12 +172,6 @@

<li class="toctree-l3"><a href="#building-a-component">Building a Component</a></li>

<ul>

<li><a class="toctree-l4" href="#customizing-the-gpu-compile-flags">Customizing the GPU Compile Flags</a></li>

</ul>


<li class="toctree-l3"><a href="#opencv-gpu-support">OpenCV GPU Support</a></li>

Expand Down Expand Up @@ -266,20 +260,12 @@ <h1 id="building-a-component">Building a Component</h1>
NVIDIA virtual machine and instruction set architecture that is generated in the first phase of nvcc compilation.
You can learn more about PTX <a href="https://docs.nvidia.com/cuda/parallel-thread-execution/index.html">here</a>. A fatbin may
have one or the other type of code, or both, for one or a set of different architectures. </p>
<p>By default, the OpenMPF components are built for maximum portability across NVIDIA GPU architectures. The nvcc flags
<p>OpenMPF components should be built for maximum portability across NVIDIA GPU architectures. The nvcc flags
to accomplish this are described in this
<a href="https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#options-for-steering-gpu-code-generation">table</a>.
OpenMPF uses the <code>-gencode</code> flag, with the <code>-arch=compute_30</code> and <code>-code=compute_30</code> flags. This generates PTX code
for the minimum compute capability; at runtime, the NVIDIA driver will just-in-time compile the PTX code for the
architecture the code is running on.</p>
<h2 id="customizing-the-gpu-compile-flags">Customizing the GPU Compile Flags</h2>
<p>OpenMPF has several GPU components. Initially, we tested a GPU component on a variety of NVIDIA GPU architectures and
found an insignificant difference in the run time for different architectures using this approach, and so we have opted
to provide maximum runtime portability. For any new components that may be developed, this may not be the case, and
similar testing should be undertaken to determine the correct set of flags for that component. The nvcc compiler flags
are configured by setting the <code>CUDA_NVCC_FLAGS</code> CMake variable in the individual component's CMakeLists.txt file, e.g.:</p>
<pre><code>set(CUDA_NVCC_FLAGS --compiler-options -fPIC -gencode arch=compute_30,code=compute_30)
</code></pre>
<a href="https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#options-for-steering-gpu-code-generation">table</a>.
If you are using CMake to build the component, the compute capabilities can be specified in a couple of different ways,
depending on the version of CMake that is being used. See for example <a href="https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_FLAGS.html">CMAKE_CUDA_FLAGS</a>,
or <a href="https://cmake.org/cmake/help/latest/variable/CMAKE_CUDA_ARCHITECTURES.html">CMAKE_CUDA_ARCHITECTURES</a>.</p>
<h1 id="opencv-gpu-support">OpenCV GPU Support</h1>
<p>In OpenMPF, OpenCV is built with CUDA support, including the CUDA Deep Neural Network library, cuDNN. C++ components
that use OpenCV CUDA support will have built-in access to it through the base C++ builder and executor Docker images, and
Expand Down
2 changes: 1 addition & 1 deletion docs/site/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -400,5 +400,5 @@ <h1 id="overview">Overview</h1>

<!--
MkDocs version : 0.17.5
Build Date UTC : 2024-09-04 20:38:56
Build Date UTC : 2024-12-06 18:49:13
-->
9 changes: 2 additions & 7 deletions docs/site/search/search_index.json
Original file line number Diff line number Diff line change
Expand Up @@ -1277,7 +1277,7 @@
},
{
"location": "/GPU-Support-Guide/index.html",
"text": "NOTICE:\n This software (or technical data) was produced for the U.S. Government under contract, and is subject to the\nRights in Data-General Clause 52.227-14, Alt. IV (DEC 2007). Copyright 2024 The MITRE Corporation. All Rights Reserved.\n\n\nIntroduction\n\n\nA subset of OpenMPF components are capable of running on NVIDIA GPUs. GPU support is through the NVIDIA CUDA libraries \nand runtime. This guide provides information needed for new component developers that would like to use NVIDIA GPUs \nto accelerate their component processing, and for users of the existing components that provide GPU support.\n\n\nBuilding a Component\n\n\nOpenMPF components that use GPUs are built with the NVIDIA nvcc compiler. Information about the nvcc compiler can be \nfound \nhere\n. The compiler accepts a number of \nflags to optimize the code generated, and the output of the compiler is called a \"fatbin\", since it may contain \nversions of the CUDA code compiled for multiple GPU architectures. This section discusses the nvcc compiler flags that \nare used within OpenMPF to tell the nvcc compiler what to include in the compiled output.\n\n\nThe nvcc compiler can generate two types of code: ELF code for a specific GPU architecture, and PTX code, which is the \nNVIDIA virtual machine and instruction set architecture that is generated in the first phase of nvcc compilation. \nYou can learn more about PTX \nhere\n. A fatbin may \nhave one or the other type of code, or both, for one or a set of different architectures. \n\n\nBy default, the OpenMPF components are built for maximum portability across NVIDIA GPU architectures. The nvcc flags \nto accomplish this are described in this \n\ntable\n. \nOpenMPF uses the \n-gencode\n flag, with the \n-arch=compute_30\n and \n-code=compute_30\n flags. This generates PTX code \nfor the minimum compute capability; at runtime, the NVIDIA driver will just-in-time compile the PTX code for the \narchitecture the code is running on.\n\n\nCustomizing the GPU Compile Flags\n\n\nOpenMPF has several GPU components. Initially, we tested a GPU component on a variety of NVIDIA GPU architectures and\nfound an insignificant difference in the run time for different architectures using this approach, and so we have opted\nto provide maximum runtime portability. For any new components that may be developed, this may not be the case, and\nsimilar testing should be undertaken to determine the correct set of flags for that component. The nvcc compiler flags\nare configured by setting the \nCUDA_NVCC_FLAGS\n CMake variable in the individual component's CMakeLists.txt file, e.g.:\n\n\nset(CUDA_NVCC_FLAGS --compiler-options -fPIC -gencode arch=compute_30,code=compute_30)\n\n\n\nOpenCV GPU Support\n\n\nIn OpenMPF, OpenCV is built with CUDA support, including the CUDA Deep Neural Network library, cuDNN. C++ components\nthat use OpenCV CUDA support will have built-in access to it through the base C++ builder and executor Docker images, and\nthe above-mentioned GPU compile flags will have already been set when OpenCV was built.\n\n\n\n\nNOTE:\n Most OpenMPF GPU components are written so that they can run on the CPU only, as well as using GPU hardware. \nIf the component is built on a system that does not have the NVIDIA CUDA Toolkit installed, then the build will \ndefault to compiling for the CPU. It is recommended that developers of new GPU components make every attempt to \nfollow this model, so that other users are not burdened with installing the NVIDIA CUDA Toolkit when they have no \nplans to run on GPU hardware.",
"text": "NOTICE:\n This software (or technical data) was produced for the U.S. Government under contract, and is subject to the\nRights in Data-General Clause 52.227-14, Alt. IV (DEC 2007). Copyright 2024 The MITRE Corporation. All Rights Reserved.\n\n\nIntroduction\n\n\nA subset of OpenMPF components are capable of running on NVIDIA GPUs. GPU support is through the NVIDIA CUDA libraries \nand runtime. This guide provides information needed for new component developers that would like to use NVIDIA GPUs \nto accelerate their component processing, and for users of the existing components that provide GPU support.\n\n\nBuilding a Component\n\n\nOpenMPF components that use GPUs are built with the NVIDIA nvcc compiler. Information about the nvcc compiler can be \nfound \nhere\n. The compiler accepts a number of \nflags to optimize the code generated, and the output of the compiler is called a \"fatbin\", since it may contain \nversions of the CUDA code compiled for multiple GPU architectures. This section discusses the nvcc compiler flags that \nare used within OpenMPF to tell the nvcc compiler what to include in the compiled output.\n\n\nThe nvcc compiler can generate two types of code: ELF code for a specific GPU architecture, and PTX code, which is the \nNVIDIA virtual machine and instruction set architecture that is generated in the first phase of nvcc compilation. \nYou can learn more about PTX \nhere\n. A fatbin may \nhave one or the other type of code, or both, for one or a set of different architectures. \n\n\nOpenMPF components should be built for maximum portability across NVIDIA GPU architectures. The nvcc flags \nto accomplish this are described in this \n\ntable\n.\nIf you are using CMake to build the component, the compute capabilities can be specified in a couple of different ways,\ndepending on the version of CMake that is being used. See for example \nCMAKE_CUDA_FLAGS\n,\nor \nCMAKE_CUDA_ARCHITECTURES\n.\n\n\nOpenCV GPU Support\n\n\nIn OpenMPF, OpenCV is built with CUDA support, including the CUDA Deep Neural Network library, cuDNN. C++ components\nthat use OpenCV CUDA support will have built-in access to it through the base C++ builder and executor Docker images, and\nthe above-mentioned GPU compile flags will have already been set when OpenCV was built.\n\n\n\n\nNOTE:\n Most OpenMPF GPU components are written so that they can run on the CPU only, as well as using GPU hardware. \nIf the component is built on a system that does not have the NVIDIA CUDA Toolkit installed, then the build will \ndefault to compiling for the CPU. It is recommended that developers of new GPU components make every attempt to \nfollow this model, so that other users are not burdened with installing the NVIDIA CUDA Toolkit when they have no \nplans to run on GPU hardware.",
"title": "GPU Support Guide"
},
{
Expand All @@ -1287,14 +1287,9 @@
},
{
"location": "/GPU-Support-Guide/index.html#building-a-component",
"text": "OpenMPF components that use GPUs are built with the NVIDIA nvcc compiler. Information about the nvcc compiler can be \nfound here . The compiler accepts a number of \nflags to optimize the code generated, and the output of the compiler is called a \"fatbin\", since it may contain \nversions of the CUDA code compiled for multiple GPU architectures. This section discusses the nvcc compiler flags that \nare used within OpenMPF to tell the nvcc compiler what to include in the compiled output. The nvcc compiler can generate two types of code: ELF code for a specific GPU architecture, and PTX code, which is the \nNVIDIA virtual machine and instruction set architecture that is generated in the first phase of nvcc compilation. \nYou can learn more about PTX here . A fatbin may \nhave one or the other type of code, or both, for one or a set of different architectures. By default, the OpenMPF components are built for maximum portability across NVIDIA GPU architectures. The nvcc flags \nto accomplish this are described in this table . \nOpenMPF uses the -gencode flag, with the -arch=compute_30 and -code=compute_30 flags. This generates PTX code \nfor the minimum compute capability; at runtime, the NVIDIA driver will just-in-time compile the PTX code for the \narchitecture the code is running on.",
"text": "OpenMPF components that use GPUs are built with the NVIDIA nvcc compiler. Information about the nvcc compiler can be \nfound here . The compiler accepts a number of \nflags to optimize the code generated, and the output of the compiler is called a \"fatbin\", since it may contain \nversions of the CUDA code compiled for multiple GPU architectures. This section discusses the nvcc compiler flags that \nare used within OpenMPF to tell the nvcc compiler what to include in the compiled output. The nvcc compiler can generate two types of code: ELF code for a specific GPU architecture, and PTX code, which is the \nNVIDIA virtual machine and instruction set architecture that is generated in the first phase of nvcc compilation. \nYou can learn more about PTX here . A fatbin may \nhave one or the other type of code, or both, for one or a set of different architectures. OpenMPF components should be built for maximum portability across NVIDIA GPU architectures. The nvcc flags \nto accomplish this are described in this table .\nIf you are using CMake to build the component, the compute capabilities can be specified in a couple of different ways,\ndepending on the version of CMake that is being used. See for example CMAKE_CUDA_FLAGS ,\nor CMAKE_CUDA_ARCHITECTURES .",
"title": "Building a Component"
},
{
"location": "/GPU-Support-Guide/index.html#customizing-the-gpu-compile-flags",
"text": "OpenMPF has several GPU components. Initially, we tested a GPU component on a variety of NVIDIA GPU architectures and\nfound an insignificant difference in the run time for different architectures using this approach, and so we have opted\nto provide maximum runtime portability. For any new components that may be developed, this may not be the case, and\nsimilar testing should be undertaken to determine the correct set of flags for that component. The nvcc compiler flags\nare configured by setting the CUDA_NVCC_FLAGS CMake variable in the individual component's CMakeLists.txt file, e.g.: set(CUDA_NVCC_FLAGS --compiler-options -fPIC -gencode arch=compute_30,code=compute_30)",
"title": "Customizing the GPU Compile Flags"
},
{
"location": "/GPU-Support-Guide/index.html#opencv-gpu-support",
"text": "In OpenMPF, OpenCV is built with CUDA support, including the CUDA Deep Neural Network library, cuDNN. C++ components\nthat use OpenCV CUDA support will have built-in access to it through the base C++ builder and executor Docker images, and\nthe above-mentioned GPU compile flags will have already been set when OpenCV was built. NOTE: Most OpenMPF GPU components are written so that they can run on the CPU only, as well as using GPU hardware. \nIf the component is built on a system that does not have the NVIDIA CUDA Toolkit installed, then the build will \ndefault to compiling for the CPU. It is recommended that developers of new GPU components make every attempt to \nfollow this model, so that other users are not burdened with installing the NVIDIA CUDA Toolkit when they have no \nplans to run on GPU hardware.",
Expand Down
Loading