[GLUTEN-9243][VL] Fix cuda docker image by zhouyuan · Pull Request #9333 · apache/gluten

zhouyuan · 2025-04-15T10:34:11Z

What changes were proposed in this pull request?

The image ghcr.io/facebookincubator/velox-dev:adapters is not avaiable on ARM
followup on #9229

Signed-off-by: Yuan Zhou yuan.zhou@ibm.com

How was this patch tested?

pass GHA

The image `ghcr.io/facebookincubator/velox-dev:adapters` is not avaiable on ARM Signed-off-by: Yuan Zhou <yuan.zhou@ibm.com>

Signed-off-by: Yuan Zhou <yuan.zhou@ibm.com>

github-actions · 2025-04-15T10:34:30Z

#9243

This reverts commit 3966966.

Signed-off-by: Yuan Zhou <yuan.zhou@ibm.com>

philo-he

Looks good!

Signed-off-by: Yuan Zhou <yuan.zhou@ibm.com>

jinchengchenghh · 2025-04-17T09:08:31Z

  if [ $ENABLE_GPU == "ON" ]; then
-    COMPILE_OPTION="$COMPILE_OPTION -DVELOX_ENABLE_GPU=ON -DVELOX_ENABLE_CUDF=ON"
+    # the cuda default options are for Centos9 image from Meta
+    COMPILE_OPTION="$COMPILE_OPTION -DVELOX_ENABLE_GPU=ON -DVELOX_ENABLE_CUDF=ON -DCMAKE_CUDA_ARCHITECTURES=70 -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.8/bin/nvcc"


I use the ENV here but without CUDA_COMPILER https://github.com/apache/incubator-gluten/pull/9333/files#diff-395a263b6f69bd7ef4991face3666e100acdd60e47f4b28e0737adbbf5fb945aR12,is all the environment CUDA COMPILER this path? Do we need to make it an environment variable in docker image?

In my tests it will report cuda identifier is undefined in device code if not setting these two variables. Note the GHA runner does not have NV GPU installed.

CUDA_ARCHITECTURESis able to read from CMAKE_CUDA_ARCHITECTURES
https://cmake.org/cmake/help/v3.28/prop_tgt/CUDA_ARCHITECTURES.html
however it seems there's no similar variable like CUDA_COMPILERS

jinchengchenghh · 2025-04-17T09:10:55Z

        with:
          images: ${{ env.DOCKERHUB_REPO }}
-          tags: centos-8-jdk8
+          tags: vcpkg-centos-8


Looks like the tag is changed, some docker images in docker hub maybe duplicated. I don't suggest to change the tag, the user may face compatible issue.

it's not changed actually - it's diff issue from github. here's the full file:
https://github.com/apache/incubator-gluten/blob/245ea1e22cf81e158fa101aa6b6d970ac17381f9/.github/workflows/docker_image.yml#L63

jinchengchenghh · 2025-04-17T09:11:18Z

Thanks for your fix.

zhouyuan · 2025-04-20T10:54:51Z

The fix does not work on GHA due to:

meta/velox missed one commit to allow skip building tests on GPU enabled
GHA runner disk space is not enough on building with meta centos9 adaptors
Will try to make a fix again.

zhouyuan added 3 commits April 15, 2025 11:18

[VL] fix cuda docker image

90f259e

The image `ghcr.io/facebookincubator/velox-dev:adapters` is not avaiable on ARM Signed-off-by: Yuan Zhou <yuan.zhou@ibm.com>

move to seperate job

56776d6

Signed-off-by: Yuan Zhou <yuan.zhou@ibm.com>

test build

3966966

Signed-off-by: Yuan Zhou <yuan.zhou@ibm.com>

github-actions bot added the INFRA label Apr 15, 2025

Revert "test build"

e009b5b

This reverts commit 3966966.

zhouyuan marked this pull request as ready for review April 15, 2025 14:29

fix cmake link

fd50f94

Signed-off-by: Yuan Zhou <yuan.zhou@ibm.com>

github-actions bot added the BUILD label Apr 15, 2025

philo-he approved these changes Apr 15, 2025

View reviewed changes

philo-he changed the title ~~[GLUTEN-9243][VL] fix cuda image~~ [GLUTEN-9243][VL] Fix cuda docker image Apr 15, 2025

fix default cuda options

245ea1e

Signed-off-by: Yuan Zhou <yuan.zhou@ibm.com>

github-actions bot added the VELOX label Apr 16, 2025

jinchengchenghh reviewed Apr 17, 2025

View reviewed changes

jinchengchenghh approved these changes Apr 18, 2025

View reviewed changes

philo-he merged commit 6dd3899 into apache:main Apr 18, 2025
45 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GLUTEN-9243][VL] Fix cuda docker image#9333

[GLUTEN-9243][VL] Fix cuda docker image#9333
philo-he merged 6 commits intoapache:mainfrom
zhouyuan:wip_fix_cuda_image

zhouyuan commented Apr 15, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Apr 15, 2025

Uh oh!

philo-he left a comment

Uh oh!

jinchengchenghh Apr 17, 2025

Uh oh!

zhouyuan Apr 18, 2025

Uh oh!

jinchengchenghh Apr 17, 2025

Uh oh!

zhouyuan Apr 17, 2025

Uh oh!

jinchengchenghh commented Apr 17, 2025

Uh oh!

Uh oh!

zhouyuan commented Apr 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zhouyuan commented Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

github-actions bot commented Apr 15, 2025

Uh oh!

philo-he left a comment

Choose a reason for hiding this comment

Uh oh!

jinchengchenghh Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

zhouyuan Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

jinchengchenghh Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

zhouyuan Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

jinchengchenghh commented Apr 17, 2025

Uh oh!

Uh oh!

zhouyuan commented Apr 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zhouyuan commented Apr 15, 2025 •

edited

Loading