Skip to content

Update ROCm docker container to 7.2#19418

Merged
CISC merged 1 commit intoggml-org:masterfrom
superm1:docker-rocm
Feb 21, 2026
Merged

Update ROCm docker container to 7.2#19418
CISC merged 1 commit intoggml-org:masterfrom
superm1:docker-rocm

Conversation

@superm1
Copy link
Copy Markdown
Contributor

@superm1 superm1 commented Feb 7, 2026

Update all the CI artifacts and jobs to use ROCm 7.2.

When testing offline, I found a problem with rocWMMA on the docker container with GFX908, so it's disabled for that.

@superm1 superm1 requested a review from ngxson as a code owner February 7, 2026 14:35
@github-actions github-actions Bot added the devops improvements to build systems and github actions label Feb 7, 2026
@superm1 superm1 requested a review from CISC as a code owner February 7, 2026 15:17
@superm1 superm1 force-pushed the docker-rocm branch 6 times, most recently from 21b0a0f to 373a1c3 Compare February 7, 2026 17:51
@CISC
Copy link
Copy Markdown
Member

CISC commented Feb 7, 2026

There have been several issues submitted about newer versions of ROCm, including rocWMMA, not sure we are ready to there. @IMbackK ?

@superm1
Copy link
Copy Markdown
Contributor Author

superm1 commented Feb 7, 2026

How important is gfx908 to CI and to release artifacts?

I believe there is something wrong with rocWMMA specifically with gfx908 and some intrinsic types. Right now what my PR does is disables rocWMMA entirely in that container while building. Another option is to disable gfx908.

@IMbackK
Copy link
Copy Markdown
Collaborator

IMbackK commented Feb 7, 2026

What problem specifically? There is no problem on my side with gfx908 on rocwmma 2.2.0 (which ships with 7.2).
rocwmma 2.0.0 (rocm 7.0 - 7.1) for fp16 output dataypes was broken at compile time on all gfx9 devices rocwmma supports, ie gfx908-gfx942, not gfx908 specifically.

@superm1
Copy link
Copy Markdown
Contributor Author

superm1 commented Feb 7, 2026

Here's a snippet of what I was seeing that lead to what I put in this PR. I guess it's some issues with template instantiation.

[ 56%] Building HIP object ggml/src/ggml-hip/CMakeFiles/ggml-hip.dir/__/ggml-cuda/template-instances/mmq-instance-q8_0.cu.o
In file included from /app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:18:
In file included from /opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/rocwmma.hpp:29:
In file included from /opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/accessors.hpp:29:
In file included from /opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/accessors_impl.hpp:35:
In file included from /opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_config.hpp:32:
In file included from /opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma.hpp:32:
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2549:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 16>'
 2549 |             using ARegsT = typename Impl::ARegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2572:73: note: in instantiation of template class 'rocwmma::MmaTraits_impl::mfma_traits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 16>>' requested here
 2572 |         struct MmaTraits<MmaOp, enable_if_t<is_mfma_v<MmaOp>>> : public mfma_traits<MmaOp>
      |                                                                         ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_traits.hpp:34:31: note: in instantiation of template class 'rocwmma::MmaTraits_impl::MmaTraits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 16>>' requested here
   34 |     struct MmaTraits : public MmaTraits_impl::MmaTraits<MmaOp>
      |                               ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_selector.hpp:56:42: note: in instantiation of template class 'rocwmma::MmaTraits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 16>>' requested here
   56 |         using SelectedOp = conditional_t<CandidateTraits::is_supported,
      |                                          ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma.hpp:53:34: note: in instantiation of template class 'rocwmma::MmaSelector<rocwmma::Mfma_impl, __half, __half, __half, 16, 16, 16>' requested here
   53 |     struct MfmaSelector : public MmaSelector<Mfma_impl, InputTA, InputTB, ComputeT, BlockM, BlockN, BlockKTest>{};
      |                                  ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma.hpp:71:27: note: in instantiation of template class 'rocwmma::MfmaSelector<__half, __half, __half, 16, 16, 16>' requested here
   71 |                  typename MfmaSelector<InputTA, InputTB, ComputeT, BlockM, BlockN, BlockK>::SelectedOp,
      |                           ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/rocwmma_impl.hpp:277:44: note: in instantiation of template class 'rocwmma::Mfma<16, 16, 16, __half, __half, __half, 16, 16>' requested here
  277 |         d.mAccess = XD::exec(PackD::unpack(Mma::exec(PackA::pack(XA::exec(a.mAccess)),
      |                                            ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:334:27: note: in instantiation of function template specialization 'rocwmma::mma_sync<rocwmma::fragment<rocwmma::matrix_a, 16, 16, 16, __half, rocwmma::col_major>, rocwmma::fragment<rocwmma::matrix_b, 16, 16, 16, __half, rocwmma::col_major>, rocwmma::fragment<rocwmma::accumulator, 16, 16, 16, __half>, rocwmma::fragment<rocwmma::accumulator, 16, 16, 16, __half>>' requested here
  334 |                     wmma::mma_sync(VKQ_c[i_VKQ_0/VKQ_stride][j], v_a, KQ_b[k0/(VKQ_ratio*16)][j], VKQ_c[i_VKQ_0/VKQ_stride][j]);
      |                           ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:527:24: note: in instantiation of function template specialization 'flash_attn_ext_f16<64, 16, 4, 64, float, false>' requested here
  527 |         fattn_kernel = flash_attn_ext_f16<
      |                        ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:549:21: note: in instantiation of function template specialization 'ggml_cuda_flash_attn_ext_wmma_f16_case<64, 16, float>' requested here
  549 |                     ggml_cuda_flash_attn_ext_wmma_f16_case< 64, cols_per_block, float>(ctx, dst);
      |                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 16, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 16, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2550:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 16>'
 2550 |             using BRegsT = typename Impl::BRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 16, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 16, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2551:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 16>'
 2551 |             using CRegsT = typename Impl::CRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 16, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 16, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2552:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 16>'
 2552 |             using DRegsT = typename Impl::DRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 16, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 16, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2549:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 8>'
 2549 |             using ARegsT = typename Impl::ARegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2572:73: note: in instantiation of template class 'rocwmma::MmaTraits_impl::mfma_traits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 8>>' requested here
 2572 |         struct MmaTraits<MmaOp, enable_if_t<is_mfma_v<MmaOp>>> : public mfma_traits<MmaOp>
      |                                                                         ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_traits.hpp:34:31: note: in instantiation of template class 'rocwmma::MmaTraits_impl::MmaTraits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 8>>' requested here
   34 |     struct MmaTraits : public MmaTraits_impl::MmaTraits<MmaOp>
      |                               ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_selector.hpp:56:42: note: in instantiation of template class 'rocwmma::MmaTraits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 8>>' requested here
   56 |         using SelectedOp = conditional_t<CandidateTraits::is_supported,
      |                                          ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_selector.hpp:58:50: note: in instantiation of template class 'rocwmma::MmaSelector<rocwmma::Mfma_impl, __half, __half, __half, 16, 16, 8>' requested here
   58 |                                         typename MmaSelector<Mma_impl, InputTA, InputTB, ComputeT, BlockM, BlockN, BlockKTest / 2u>::SelectedOp>;
      |                                                  ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma.hpp:53:34: note: in instantiation of template class 'rocwmma::MmaSelector<rocwmma::Mfma_impl, __half, __half, __half, 16, 16, 16>' requested here
   53 |     struct MfmaSelector : public MmaSelector<Mfma_impl, InputTA, InputTB, ComputeT, BlockM, BlockN, BlockKTest>{};
      |                                  ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma.hpp:71:27: note: in instantiation of template class 'rocwmma::MfmaSelector<__half, __half, __half, 16, 16, 16>' requested here
   71 |                  typename MfmaSelector<InputTA, InputTB, ComputeT, BlockM, BlockN, BlockK>::SelectedOp,
      |                           ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/rocwmma_impl.hpp:277:44: note: in instantiation of template class 'rocwmma::Mfma<16, 16, 16, __half, __half, __half, 16, 16>' requested here
  277 |         d.mAccess = XD::exec(PackD::unpack(Mma::exec(PackA::pack(XA::exec(a.mAccess)),
      |                                            ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:334:27: note: in instantiation of function template specialization 'rocwmma::mma_sync<rocwmma::fragment<rocwmma::matrix_a, 16, 16, 16, __half, rocwmma::col_major>, rocwmma::fragment<rocwmma::matrix_b, 16, 16, 16, __half, rocwmma::col_major>, rocwmma::fragment<rocwmma::accumulator, 16, 16, 16, __half>, rocwmma::fragment<rocwmma::accumulator, 16, 16, 16, __half>>' requested here
  334 |                     wmma::mma_sync(VKQ_c[i_VKQ_0/VKQ_stride][j], v_a, KQ_b[k0/(VKQ_ratio*16)][j], VKQ_c[i_VKQ_0/VKQ_stride][j]);
      |                           ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:527:24: note: in instantiation of function template specialization 'flash_attn_ext_f16<64, 16, 4, 64, float, false>' requested here
  527 |         fattn_kernel = flash_attn_ext_f16<
      |                        ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:549:21: note: in instantiation of function template specialization 'ggml_cuda_flash_attn_ext_wmma_f16_case<64, 16, float>' requested here
  549 |                     ggml_cuda_flash_attn_ext_wmma_f16_case< 64, cols_per_block, float>(ctx, dst);
      |                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 8, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 8, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2550:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 8>'
 2550 |             using BRegsT = typename Impl::BRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 8, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 8, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2551:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 8>'
 2551 |             using CRegsT = typename Impl::CRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 8, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 8, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2552:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 8>'
 2552 |             using DRegsT = typename Impl::DRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 8, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 8, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2549:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 4>'
 2549 |             using ARegsT = typename Impl::ARegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2572:73: note: in instantiation of template class 'rocwmma::MmaTraits_impl::mfma_traits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 4>>' requested here
 2572 |         struct MmaTraits<MmaOp, enable_if_t<is_mfma_v<MmaOp>>> : public mfma_traits<MmaOp>
      |                                                                         ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_traits.hpp:34:31: note: in instantiation of template class 'rocwmma::MmaTraits_impl::MmaTraits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 4>>' requested here
   34 |     struct MmaTraits : public MmaTraits_impl::MmaTraits<MmaOp>
      |                               ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_selector.hpp:56:42: note: in instantiation of template class 'rocwmma::MmaTraits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 4>>' requested here
   56 |         using SelectedOp = conditional_t<CandidateTraits::is_supported,
      |                                          ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_selector.hpp:58:50: note: in instantiation of template class 'rocwmma::MmaSelector<rocwmma::Mfma_impl, __half, __half, __half, 16, 16, 4>' requested here
   58 |                                         typename MmaSelector<Mma_impl, InputTA, InputTB, ComputeT, BlockM, BlockN, BlockKTest / 2u>::SelectedOp>;
      |                                                  ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_selector.hpp:58:50: note: in instantiation of template class 'rocwmma::MmaSelector<rocwmma::Mfma_impl, __half, __half, __half, 16, 16, 8>' requested here
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma.hpp:53:34: note: (skipping 1 context in backtrace; use -ftemplate-backtrace-limit=0 to see all)
   53 |     struct MfmaSelector : public MmaSelector<Mfma_impl, InputTA, InputTB, ComputeT, BlockM, BlockN, BlockKTest>{};
      |                                  ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma.hpp:71:27: note: in instantiation of template class 'rocwmma::MfmaSelector<__half, __half, __half, 16, 16, 16>' requested here
   71 |                  typename MfmaSelector<InputTA, InputTB, ComputeT, BlockM, BlockN, BlockK>::SelectedOp,
      |                           ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/rocwmma_impl.hpp:277:44: note: in instantiation of template class 'rocwmma::Mfma<16, 16, 16, __half, __half, __half, 16, 16>' requested here
  277 |         d.mAccess = XD::exec(PackD::unpack(Mma::exec(PackA::pack(XA::exec(a.mAccess)),
      |                                            ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:334:27: note: in instantiation of function template specialization 'rocwmma::mma_sync<rocwmma::fragment<rocwmma::matrix_a, 16, 16, 16, __half, rocwmma::col_major>, rocwmma::fragment<rocwmma::matrix_b, 16, 16, 16, __half, rocwmma::col_major>, rocwmma::fragment<rocwmma::accumulator, 16, 16, 16, __half>, rocwmma::fragment<rocwmma::accumulator, 16, 16, 16, __half>>' requested here
  334 |                     wmma::mma_sync(VKQ_c[i_VKQ_0/VKQ_stride][j], v_a, KQ_b[k0/(VKQ_ratio*16)][j], VKQ_c[i_VKQ_0/VKQ_stride][j]);
      |                           ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:527:24: note: in instantiation of function template specialization 'flash_attn_ext_f16<64, 16, 4, 64, float, false>' requested here
  527 |         fattn_kernel = flash_attn_ext_f16<
      |                        ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:549:21: note: in instantiation of function template specialization 'ggml_cuda_flash_attn_ext_wmma_f16_case<64, 16, float>' requested here
  549 |                     ggml_cuda_flash_attn_ext_wmma_f16_case< 64, cols_per_block, float>(ctx, dst);
      |                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 4, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 4, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2550:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 4>'
 2550 |             using BRegsT = typename Impl::BRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 4, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 4, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2551:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 4>'
 2551 |             using CRegsT = typename Impl::CRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 4, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 4, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2552:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 4>'
 2552 |             using DRegsT = typename Impl::DRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 4, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 4, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2549:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 2>'
 2549 |             using ARegsT = typename Impl::ARegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2572:73: note: in instantiation of template class 'rocwmma::MmaTraits_impl::mfma_traits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 2>>' requested here
 2572 |         struct MmaTraits<MmaOp, enable_if_t<is_mfma_v<MmaOp>>> : public mfma_traits<MmaOp>
      |                                                                         ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_traits.hpp:34:31: note: in instantiation of template class 'rocwmma::MmaTraits_impl::MmaTraits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 2>>' requested here
   34 |     struct MmaTraits : public MmaTraits_impl::MmaTraits<MmaOp>
      |                               ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_selector.hpp:56:42: note: in instantiation of template class 'rocwmma::MmaTraits<rocwmma::detail::amdgcn_mfma<__half, __half, __half, 16, 16, 2>>' requested here
   56 |         using SelectedOp = conditional_t<CandidateTraits::is_supported,
      |                                          ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_selector.hpp:58:50: note: in instantiation of template class 'rocwmma::MmaSelector<rocwmma::Mfma_impl, __half, __half, __half, 16, 16, 2>' requested here
   58 |                                         typename MmaSelector<Mma_impl, InputTA, InputTB, ComputeT, BlockM, BlockN, BlockKTest / 2u>::SelectedOp>;
      |                                                  ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_selector.hpp:58:50: note: in instantiation of template class 'rocwmma::MmaSelector<rocwmma::Mfma_impl, __half, __half, __half, 16, 16, 4>' requested here
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mma_selector.hpp:58:50: note: (skipping 2 contexts in backtrace; use -ftemplate-backtrace-limit=0 to see all)
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma.hpp:71:27: note: in instantiation of template class 'rocwmma::MfmaSelector<__half, __half, __half, 16, 16, 16>' requested here
   71 |                  typename MfmaSelector<InputTA, InputTB, ComputeT, BlockM, BlockN, BlockK>::SelectedOp,
      |                           ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/rocwmma_impl.hpp:277:44: note: in instantiation of template class 'rocwmma::Mfma<16, 16, 16, __half, __half, __half, 16, 16>' requested here
  277 |         d.mAccess = XD::exec(PackD::unpack(Mma::exec(PackA::pack(XA::exec(a.mAccess)),
      |                                            ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:334:27: note: in instantiation of function template specialization 'rocwmma::mma_sync<rocwmma::fragment<rocwmma::matrix_a, 16, 16, 16, __half, rocwmma::col_major>, rocwmma::fragment<rocwmma::matrix_b, 16, 16, 16, __half, rocwmma::col_major>, rocwmma::fragment<rocwmma::accumulator, 16, 16, 16, __half>, rocwmma::fragment<rocwmma::accumulator, 16, 16, 16, __half>>' requested here
  334 |                     wmma::mma_sync(VKQ_c[i_VKQ_0/VKQ_stride][j], v_a, KQ_b[k0/(VKQ_ratio*16)][j], VKQ_c[i_VKQ_0/VKQ_stride][j]);
      |                           ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:527:24: note: in instantiation of function template specialization 'flash_attn_ext_f16<64, 16, 4, 64, float, false>' requested here
  527 |         fattn_kernel = flash_attn_ext_f16<
      |                        ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:549:21: note: in instantiation of function template specialization 'ggml_cuda_flash_attn_ext_wmma_f16_case<64, 16, float>' requested here
  549 |                     ggml_cuda_flash_attn_ext_wmma_f16_case< 64, cols_per_block, float>(ctx, dst);
      |                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 2, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 2, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2550:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 2>'
 2550 |             using BRegsT = typename Impl::BRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 2, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 2, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2551:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 2>'
 2551 |             using CRegsT = typename Impl::CRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 2, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 2, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:2552:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 2>'
 2552 |             using DRegsT = typename Impl::DRegsT;
      |                                     ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:131:16: note: partial specialization matches [with InputTA = __half, InputTB = __half, ComputeT = __half, BlockM = 16, BlockN = 16, BlockK = 2, GfxTargetId = 2312]
  131 |         struct amdgcn_mfma<InputTA,
      |                ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/internal/mfma_impl.hpp:421:16: note: partial specialization matches [with BlockM = 16, BlockN = 16, BlockK = 2, GfxTargetId = 2312]
  421 |         struct amdgcn_mfma<hfloat16_t,
      |                ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:186:21: error: no matching function for call to 'mma_sync'
  186 |                     wmma::mma_sync(KQ_c[j], K_a, Q_b[k_KQ_0/16][j], KQ_c[j]);
      |                     ^~~~~~~~~~~~~~
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:531:24: note: in instantiation of function template specialization 'flash_attn_ext_f16<64, 16, 4, 64, __half, true>' requested here
  531 |         fattn_kernel = flash_attn_ext_f16<
      |                        ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:627:17: note: in instantiation of function template specialization 'ggml_cuda_flash_attn_ext_wmma_f16_case<64, 16, __half>' requested here
  627 |                 ggml_cuda_flash_attn_ext_wmma_f16_case< 64, cols_per_block, half>(ctx, dst);
      |                 ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/rocwmma_impl.hpp:251:25: note: candidate template ignored: substitution failure [with FragA = frag_a_K, FragB = frag_b, FragAccumIn = frag_c_KQ, FragAccumOut = frag_c_KQ]
  251 |     ROCWMMA_DEVICE void mma_sync(FragAccumOut& d, FragA const& a, FragB const& b, FragAccumIn& c)
      |                         ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:186:21: error: no matching function for call to 'mma_sync'
  186 |                     wmma::mma_sync(KQ_c[j], K_a, Q_b[k_KQ_0/16][j], KQ_c[j]);
      |                     ^~~~~~~~~~~~~~
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:527:24: note: in instantiation of function template specialization 'flash_attn_ext_f16<80, 16, 4, 16, __half, false>' requested here
  527 |         fattn_kernel = flash_attn_ext_f16<
      |                        ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:630:17: note: in instantiation of function template specialization 'ggml_cuda_flash_attn_ext_wmma_f16_case<80, 16, __half>' requested here
  630 |                 ggml_cuda_flash_attn_ext_wmma_f16_case< 80, cols_per_block, half>(ctx, dst);
      |                 ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/rocwmma_impl.hpp:251:25: note: candidate template ignored: substitution failure [with FragA = frag_a_K, FragB = frag_b, FragAccumIn = frag_c_KQ, FragAccumOut = frag_c_KQ]
  251 |     ROCWMMA_DEVICE void mma_sync(FragAccumOut& d, FragA const& a, FragB const& b, FragAccumIn& c)
      |                         ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:186:21: error: no matching function for call to 'mma_sync'
  186 |                     wmma::mma_sync(KQ_c[j], K_a, Q_b[k_KQ_0/16][j], KQ_c[j]);
      |                     ^~~~~~~~~~~~~~
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:531:24: note: in instantiation of function template specialization 'flash_attn_ext_f16<80, 16, 4, 16, __half, true>' requested here
  531 |         fattn_kernel = flash_attn_ext_f16<
      |                        ^
/app/ggml/src/ggml-cuda/fattn-wmma-f16.cu:630:17: note: in instantiation of function template specialization 'ggml_cuda_flash_attn_ext_wmma_f16_case<80, 16, __half>' requested here
  630 |                 ggml_cuda_flash_attn_ext_wmma_f16_case< 80, cols_per_block, half>(ctx, dst);
      |                 ^
/opt/rocm-7.2.0/lib/llvm/bin/../../../include/rocwmma/rocwmma_impl.hpp:251:25: note: candidate template ignored: substitution failure [with FragA = frag_a_K, FragB = frag_b, FragAccumIn = frag_c_KQ, FragAccumOut = frag_c_KQ]
  251 |     ROCWMMA_DEVICE void mma_sync(FragAccumOut& d, FragA const& a, FragB const& b, FragAccumIn& c)
      |                         ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated when compiling for gfx908.
gmake[2]: *** [ggml/src/ggml-hip/CMakeFiles/ggml-hip.dir/build.make:335: ggml/src/ggml-hip/CMakeFiles/ggml-hip.dir/__/ggml-cuda/fattn-wmma-f16.cu.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....

@IMbackK
Copy link
Copy Markdown
Collaborator

IMbackK commented Feb 7, 2026

Right this is yet another bug in the fp16 downcast code in rocwmma.
It affects gfx908-gfx942 and is an upstream issue

@IMbackK
Copy link
Copy Markdown
Collaborator

IMbackK commented Feb 7, 2026

we cant disable this for all of cdna so this pr is non-viable until its fixed upstream or we find a workaround.

@superm1
Copy link
Copy Markdown
Contributor Author

superm1 commented Feb 7, 2026

Would you be open to some macros to force the data types?

@IMbackK
Copy link
Copy Markdown
Collaborator

IMbackK commented Feb 7, 2026

sure ofc

@IMbackK
Copy link
Copy Markdown
Collaborator

IMbackK commented Feb 8, 2026

this pr is blocked by #19269 (comment)

@superm1
Copy link
Copy Markdown
Contributor Author

superm1 commented Feb 8, 2026

Thanks for that and pointing out the upstream bug. I think that we can split this into two parts if you agree.

  1. Updating the docker container to 7.2.
  2. Adding release artifacts based upon 7.2 for unaffected architectures

I split off the second part to #19433.

@superm1
Copy link
Copy Markdown
Contributor Author

superm1 commented Feb 9, 2026

Would you be open to some macros to force the data types?

sure ofc

I came up with a workaround that seems to work for me with 7.2 to explicitly declare the types. I split it off to #19461.

@IMbackK
Copy link
Copy Markdown
Collaborator

IMbackK commented Feb 9, 2026

Please remove cc5a595, otherwise this is fine to merge after #19461 lands.

@superm1
Copy link
Copy Markdown
Contributor Author

superm1 commented Feb 9, 2026

Thanks; dropped that commit.

Copy link
Copy Markdown
Collaborator

@IMbackK IMbackK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise lgtm and should be ready now.

Comment thread .github/workflows/build.yml Outdated
Comment thread .devops/rocm.Dockerfile Outdated
@CISC
Copy link
Copy Markdown
Member

CISC commented Feb 13, 2026

@superm1 Please provide a successful build on your own fork to verify.

@superm1
Copy link
Copy Markdown
Contributor Author

superm1 commented Feb 13, 2026

Hi,

I did a successful build of this locally.

@superm1
Copy link
Copy Markdown
Contributor Author

superm1 commented Feb 13, 2026

I had to hack up the build targets to get it to run (I don't have a workflow dispatch or cron target in a fork), but here is a successful run on my fork.

https://github.com/superm1/llama.cpp/actions/runs/21989365518/job/63531927483

@superm1
Copy link
Copy Markdown
Contributor Author

superm1 commented Feb 19, 2026

Can this merge now?

@CISC
Copy link
Copy Markdown
Member

CISC commented Feb 19, 2026

Can this merge now?

I'd like a final approval by @IMbackK first.

@CISC
Copy link
Copy Markdown
Member

CISC commented Feb 19, 2026

@superm1 Also, why are you building more arches here than in the release?

@superm1
Copy link
Copy Markdown
Contributor Author

superm1 commented Feb 19, 2026

@superm1 Also, why are you building more arches here than in the release?

That's a good point. Let me pare it down.

@IMbackK
Copy link
Copy Markdown
Collaborator

IMbackK commented Feb 19, 2026

you removed <gfx908 as i mentioned in #19418 (comment) that makes sense as rocm in these docker images is not built for those. But now you also removed gfx1030 for which the docker image is certainly built and gfx1010 and gfx1032 which i need to check (pulling the image right now)

@IMbackK
Copy link
Copy Markdown
Collaborator

IMbackK commented Feb 19, 2026

Looks like they dont build for gfx1010 and 32 anymore so just gfx1030 is missing here and in the other pr.

@CISC
Copy link
Copy Markdown
Member

CISC commented Feb 20, 2026

@IMbackK gentle ping, waiting for your approval. :)

Copy link
Copy Markdown
Collaborator

@IMbackK IMbackK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, nit can be addressed or not - its not terribly important.

Comment thread .devops/rocm.Dockerfile Outdated
@CISC CISC merged commit 3571565 into ggml-org:master Feb 21, 2026
2 checks passed
visorcraft added a commit to visorcraft/llama.cpp that referenced this pull request Feb 21, 2026
Update ROCm docker container to 7.2 release (ggml-org#19418)
liparetejas pushed a commit to liparetejas/llama.cpp that referenced this pull request Feb 23, 2026
bartowski1182 pushed a commit to bartowski1182/llama.cpp that referenced this pull request Mar 2, 2026
ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request Mar 3, 2026
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devops improvements to build systems and github actions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants