openmpf · clnowacki · Dec 9, 2024 · Dec 3, 2024 · Dec 3, 2024 · Dec 3, 2024
diff --git a/docs/docs/GPU-Support-Guide.md b/docs/docs/GPU-Support-Guide.md
@@ -20,23 +20,12 @@ NVIDIA virtual machine and instruction set architecture that is generated in the
 You can learn more about PTX [here](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html). A fatbin may 
 have one or the other type of code, or both, for one or a set of different architectures. 
 
-By default, the OpenMPF components are built for maximum portability across NVIDIA GPU architectures. The nvcc flags 
+OpenMPF components should be built for maximum portability across NVIDIA GPU architectures. The nvcc flags 
 to accomplish this are described in this 
-[table](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#options-for-steering-gpu-code-generation). 
-OpenMPF uses the `-gencode` flag, with the `-arch=compute_30` and `-code=compute_30` flags. This generates PTX code 
-for the minimum compute capability; at runtime, the NVIDIA driver will just-in-time compile the PTX code for the 
-architecture the code is running on.
-
-## Customizing the GPU Compile Flags
-
-OpenMPF has several GPU components. Initially, we tested a GPU component on a variety of NVIDIA GPU architectures and
-found an insignificant difference in the run time for different architectures using this approach, and so we have opted
-to provide maximum runtime portability. For any new components that may be developed, this may not be the case, and
-similar testing should be undertaken to determine the correct set of flags for that component. The nvcc compiler flags
-are configured by setting the `CUDA_NVCC_FLAGS` CMake variable in the individual component's CMakeLists.txt file, e.g.:
-```
-set(CUDA_NVCC_FLAGS --compiler-options -fPIC -gencode arch=compute_30,code=compute_30)
-```
+[table](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#options-for-steering-gpu-code-generation).
+If you are using CMake to build the component, the compute capabilities can be specified in a couple of different ways,
+depending on the version of CMake that is being used. See for example [CMAKE_CUDA_FLAGS](https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_FLAGS.html),
+or [CMAKE_CUDA_ARCHITECTURES](https://cmake.org/cmake/help/latest/variable/CMAKE_CUDA_ARCHITECTURES.html).
 
 # OpenCV GPU Support
 

diff --git a/docs/site/GPU-Support-Guide/index.html b/docs/site/GPU-Support-Guide/index.html
@@ -172,12 +172,6 @@
 
     <li class="toctree-l3"><a href="#building-a-component">Building a Component</a></li>
 
-        <ul>
-
-            <li><a class="toctree-l4" href="#customizing-the-gpu-compile-flags">Customizing the GPU Compile Flags</a></li>
-
-        </ul>
-
 
     <li class="toctree-l3"><a href="#opencv-gpu-support">OpenCV GPU Support</a></li>
 
@@ -266,20 +260,12 @@ <h1 id="building-a-component">Building a Component</h1>
 NVIDIA virtual machine and instruction set architecture that is generated in the first phase of nvcc compilation. 
 You can learn more about PTX <a href="https://docs.nvidia.com/cuda/parallel-thread-execution/index.html">here</a>. A fatbin may 
 have one or the other type of code, or both, for one or a set of different architectures. </p>
-<p>By default, the OpenMPF components are built for maximum portability across NVIDIA GPU architectures. The nvcc flags 
+<p>OpenMPF components should be built for maximum portability across NVIDIA GPU architectures. The nvcc flags 
 to accomplish this are described in this 
-<a href="https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#options-for-steering-gpu-code-generation">table</a>. 
-OpenMPF uses the <code>-gencode</code> flag, with the <code>-arch=compute_30</code> and <code>-code=compute_30</code> flags. This generates PTX code 
-for the minimum compute capability; at runtime, the NVIDIA driver will just-in-time compile the PTX code for the 
-architecture the code is running on.</p>
-<h2 id="customizing-the-gpu-compile-flags">Customizing the GPU Compile Flags</h2>
-<p>OpenMPF has several GPU components. Initially, we tested a GPU component on a variety of NVIDIA GPU architectures and
-found an insignificant difference in the run time for different architectures using this approach, and so we have opted
-to provide maximum runtime portability. For any new components that may be developed, this may not be the case, and
-similar testing should be undertaken to determine the correct set of flags for that component. The nvcc compiler flags
-are configured by setting the <code>CUDA_NVCC_FLAGS</code> CMake variable in the individual component's CMakeLists.txt file, e.g.:</p>
-<pre><code>set(CUDA_NVCC_FLAGS --compiler-options -fPIC -gencode arch=compute_30,code=compute_30)
-</code></pre>
+<a href="https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#options-for-steering-gpu-code-generation">table</a>.
+If you are using CMake to build the component, the compute capabilities can be specified in a couple of different ways,
+depending on the version of CMake that is being used. See for example <a href="https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_FLAGS.html">CMAKE_CUDA_FLAGS</a>,
+or <a href="https://cmake.org/cmake/help/latest/variable/CMAKE_CUDA_ARCHITECTURES.html">CMAKE_CUDA_ARCHITECTURES</a>.</p>
 <h1 id="opencv-gpu-support">OpenCV GPU Support</h1>
 <p>In OpenMPF, OpenCV is built with CUDA support, including the CUDA Deep Neural Network library, cuDNN. C++ components
 that use OpenCV CUDA support will have built-in access to it through the base C++ builder and executor Docker images, and

diff --git a/docs/site/index.html b/docs/site/index.html
@@ -400,5 +400,5 @@ <h1 id="overview">Overview</h1>
 
 <!--
 MkDocs version : 0.17.5
-Build Date UTC : 2024-09-04 20:38:56
+Build Date UTC : 2024-12-06 18:49:13
 -->
diff --git a/docs/site/search/search_index.json b/docs/site/search/search_index.json
@@ -1277,7 +1277,7 @@
         },
         {
             "location": "/GPU-Support-Guide/index.html",
-            "text": "NOTICE:\n This software (or technical data) was produced for the U.S. Government under contract, and is subject to the\nRights in Data-General Clause 52.227-14, Alt. IV (DEC 2007). Copyright 2024 The MITRE Corporation. All Rights Reserved.\n\n\nIntroduction\n\n\nA subset of OpenMPF components are capable of running on NVIDIA GPUs. GPU support is through the NVIDIA CUDA libraries \nand runtime. This guide provides information needed for new component developers that would like to use NVIDIA GPUs \nto accelerate their component processing, and for users of the existing components that provide GPU support.\n\n\nBuilding a Component\n\n\nOpenMPF components that use GPUs are built with the NVIDIA nvcc compiler. Information about the nvcc compiler can be \nfound \nhere\n. The compiler accepts a number of \nflags to optimize the code generated, and the output of the compiler is called a \"fatbin\", since it may contain \nversions of the CUDA code compiled for multiple GPU architectures. This section discusses the nvcc compiler flags that \nare used within OpenMPF to tell the nvcc compiler what to include in the compiled output.\n\n\nThe nvcc compiler can generate two types of code: ELF code for a specific GPU architecture, and PTX code, which is the \nNVIDIA virtual machine and instruction set architecture that is generated in the first phase of nvcc compilation. \nYou can learn more about PTX \nhere\n. A fatbin may \nhave one or the other type of code, or both, for one or a set of different architectures. \n\n\nBy default, the OpenMPF components are built for maximum portability across NVIDIA GPU architectures. The nvcc flags \nto accomplish this are described in this \n\ntable\n. \nOpenMPF uses the \n-gencode\n flag, with the \n-arch=compute_30\n and \n-code=compute_30\n flags. This generates PTX code \nfor the minimum compute capability; at runtime, the NVIDIA driver will just-in-time compile the PTX code for the \narchitecture the code is running on.\n\n\nCustomizing the GPU Compile Flags\n\n\nOpenMPF has several GPU components. Initially, we tested a GPU component on a variety of NVIDIA GPU architectures and\nfound an insignificant difference in the run time for different architectures using this approach, and so we have opted\nto provide maximum runtime portability. For any new components that may be developed, this may not be the case, and\nsimilar testing should be undertaken to determine the correct set of flags for that component. The nvcc compiler flags\nare configured by setting the \nCUDA_NVCC_FLAGS\n CMake variable in the individual component's CMakeLists.txt file, e.g.:\n\n\nset(CUDA_NVCC_FLAGS --compiler-options -fPIC -gencode arch=compute_30,code=compute_30)\n\n\n\nOpenCV GPU Support\n\n\nIn OpenMPF, OpenCV is built with CUDA support, including the CUDA Deep Neural Network library, cuDNN. C++ components\nthat use OpenCV CUDA support will have built-in access to it through the base C++ builder and executor Docker images, and\nthe above-mentioned GPU compile flags will have already been set when OpenCV was built.\n\n\n\n\nNOTE:\n Most OpenMPF GPU components are written so that they can run on the CPU only, as well as using GPU hardware. \nIf the component is built on a system that does not have the NVIDIA CUDA Toolkit installed, then the build will \ndefault to compiling for the CPU. It is recommended that developers of new GPU components make every attempt to \nfollow this model, so that other users are not burdened with installing the NVIDIA CUDA Toolkit when they have no \nplans to run on GPU hardware.",
+            "text": "NOTICE:\n This software (or technical data) was produced for the U.S. Government under contract, and is subject to the\nRights in Data-General Clause 52.227-14, Alt. IV (DEC 2007). Copyright 2024 The MITRE Corporation. All Rights Reserved.\n\n\nIntroduction\n\n\nA subset of OpenMPF components are capable of running on NVIDIA GPUs. GPU support is through the NVIDIA CUDA libraries \nand runtime. This guide provides information needed for new component developers that would like to use NVIDIA GPUs \nto accelerate their component processing, and for users of the existing components that provide GPU support.\n\n\nBuilding a Component\n\n\nOpenMPF components that use GPUs are built with the NVIDIA nvcc compiler. Information about the nvcc compiler can be \nfound \nhere\n. The compiler accepts a number of \nflags to optimize the code generated, and the output of the compiler is called a \"fatbin\", since it may contain \nversions of the CUDA code compiled for multiple GPU architectures. This section discusses the nvcc compiler flags that \nare used within OpenMPF to tell the nvcc compiler what to include in the compiled output.\n\n\nThe nvcc compiler can generate two types of code: ELF code for a specific GPU architecture, and PTX code, which is the \nNVIDIA virtual machine and instruction set architecture that is generated in the first phase of nvcc compilation. \nYou can learn more about PTX \nhere\n. A fatbin may \nhave one or the other type of code, or both, for one or a set of different architectures. \n\n\nOpenMPF components should be built for maximum portability across NVIDIA GPU architectures. The nvcc flags \nto accomplish this are described in this \n\ntable\n.\nIf you are using CMake to build the component, the compute capabilities can be specified in a couple of different ways,\ndepending on the version of CMake that is being used. See for example \nCMAKE_CUDA_FLAGS\n,\nor \nCMAKE_CUDA_ARCHITECTURES\n.\n\n\nOpenCV GPU Support\n\n\nIn OpenMPF, OpenCV is built with CUDA support, including the CUDA Deep Neural Network library, cuDNN. C++ components\nthat use OpenCV CUDA support will have built-in access to it through the base C++ builder and executor Docker images, and\nthe above-mentioned GPU compile flags will have already been set when OpenCV was built.\n\n\n\n\nNOTE:\n Most OpenMPF GPU components are written so that they can run on the CPU only, as well as using GPU hardware. \nIf the component is built on a system that does not have the NVIDIA CUDA Toolkit installed, then the build will \ndefault to compiling for the CPU. It is recommended that developers of new GPU components make every attempt to \nfollow this model, so that other users are not burdened with installing the NVIDIA CUDA Toolkit when they have no \nplans to run on GPU hardware.",
             "title": "GPU Support Guide"
         },
         {
@@ -1287,14 +1287,9 @@
         },
         {
             "location": "/GPU-Support-Guide/index.html#building-a-component",
-            "text": "OpenMPF components that use GPUs are built with the NVIDIA nvcc compiler. Information about the nvcc compiler can be \nfound  here . The compiler accepts a number of \nflags to optimize the code generated, and the output of the compiler is called a \"fatbin\", since it may contain \nversions of the CUDA code compiled for multiple GPU architectures. This section discusses the nvcc compiler flags that \nare used within OpenMPF to tell the nvcc compiler what to include in the compiled output.  The nvcc compiler can generate two types of code: ELF code for a specific GPU architecture, and PTX code, which is the \nNVIDIA virtual machine and instruction set architecture that is generated in the first phase of nvcc compilation. \nYou can learn more about PTX  here . A fatbin may \nhave one or the other type of code, or both, for one or a set of different architectures.   By default, the OpenMPF components are built for maximum portability across NVIDIA GPU architectures. The nvcc flags \nto accomplish this are described in this  table . \nOpenMPF uses the  -gencode  flag, with the  -arch=compute_30  and  -code=compute_30  flags. This generates PTX code \nfor the minimum compute capability; at runtime, the NVIDIA driver will just-in-time compile the PTX code for the \narchitecture the code is running on.",
+            "text": "OpenMPF components that use GPUs are built with the NVIDIA nvcc compiler. Information about the nvcc compiler can be \nfound  here . The compiler accepts a number of \nflags to optimize the code generated, and the output of the compiler is called a \"fatbin\", since it may contain \nversions of the CUDA code compiled for multiple GPU architectures. This section discusses the nvcc compiler flags that \nare used within OpenMPF to tell the nvcc compiler what to include in the compiled output.  The nvcc compiler can generate two types of code: ELF code for a specific GPU architecture, and PTX code, which is the \nNVIDIA virtual machine and instruction set architecture that is generated in the first phase of nvcc compilation. \nYou can learn more about PTX  here . A fatbin may \nhave one or the other type of code, or both, for one or a set of different architectures.   OpenMPF components should be built for maximum portability across NVIDIA GPU architectures. The nvcc flags \nto accomplish this are described in this  table .\nIf you are using CMake to build the component, the compute capabilities can be specified in a couple of different ways,\ndepending on the version of CMake that is being used. See for example  CMAKE_CUDA_FLAGS ,\nor  CMAKE_CUDA_ARCHITECTURES .",
             "title": "Building a Component"
         },
-        {
-            "location": "/GPU-Support-Guide/index.html#customizing-the-gpu-compile-flags",
-            "text": "OpenMPF has several GPU components. Initially, we tested a GPU component on a variety of NVIDIA GPU architectures and\nfound an insignificant difference in the run time for different architectures using this approach, and so we have opted\nto provide maximum runtime portability. For any new components that may be developed, this may not be the case, and\nsimilar testing should be undertaken to determine the correct set of flags for that component. The nvcc compiler flags\nare configured by setting the  CUDA_NVCC_FLAGS  CMake variable in the individual component's CMakeLists.txt file, e.g.:  set(CUDA_NVCC_FLAGS --compiler-options -fPIC -gencode arch=compute_30,code=compute_30)",
-            "title": "Customizing the GPU Compile Flags"
-        },
         {
             "location": "/GPU-Support-Guide/index.html#opencv-gpu-support",
             "text": "In OpenMPF, OpenCV is built with CUDA support, including the CUDA Deep Neural Network library, cuDNN. C++ components\nthat use OpenCV CUDA support will have built-in access to it through the base C++ builder and executor Docker images, and\nthe above-mentioned GPU compile flags will have already been set when OpenCV was built.   NOTE:  Most OpenMPF GPU components are written so that they can run on the CPU only, as well as using GPU hardware. \nIf the component is built on a system that does not have the NVIDIA CUDA Toolkit installed, then the build will \ndefault to compiling for the CPU. It is recommended that developers of new GPU components make every attempt to \nfollow this model, so that other users are not burdened with installing the NVIDIA CUDA Toolkit when they have no \nplans to run on GPU hardware.",