Update ROCm docker container to 7.2#19418
Conversation
21b0a0f to
373a1c3
Compare
|
There have been several issues submitted about newer versions of ROCm, including rocWMMA, not sure we are ready to there. @IMbackK ? |
|
How important is gfx908 to CI and to release artifacts? I believe there is something wrong with rocWMMA specifically with gfx908 and some intrinsic types. Right now what my PR does is disables rocWMMA entirely in that container while building. Another option is to disable gfx908. |
|
What problem specifically? There is no problem on my side with gfx908 on rocwmma 2.2.0 (which ships with 7.2). |
|
Here's a snippet of what I was seeing that lead to what I put in this PR. I guess it's some issues with template instantiation. |
|
Right this is yet another bug in the fp16 downcast code in rocwmma. |
|
we cant disable this for all of cdna so this pr is non-viable until its fixed upstream or we find a workaround. |
|
Would you be open to some macros to force the data types? |
|
sure ofc |
|
this pr is blocked by #19269 (comment) |
|
Thanks for that and pointing out the upstream bug. I think that we can split this into two parts if you agree.
I split off the second part to #19433. |
I came up with a workaround that seems to work for me with 7.2 to explicitly declare the types. I split it off to #19461. |
|
Thanks; dropped that commit. |
IMbackK
left a comment
There was a problem hiding this comment.
Otherwise lgtm and should be ready now.
|
@superm1 Please provide a successful build on your own fork to verify. |
|
Hi, I did a successful build of this locally. |
|
I had to hack up the build targets to get it to run (I don't have a workflow dispatch or cron target in a fork), but here is a successful run on my fork. https://github.com/superm1/llama.cpp/actions/runs/21989365518/job/63531927483 |
|
Can this merge now? |
I'd like a final approval by @IMbackK first. |
|
@superm1 Also, why are you building more arches here than in the release? |
That's a good point. Let me pare it down. |
|
you removed <gfx908 as i mentioned in #19418 (comment) that makes sense as rocm in these docker images is not built for those. But now you also removed gfx1030 for which the docker image is certainly built and gfx1010 and gfx1032 which i need to check (pulling the image right now) |
|
Looks like they dont build for gfx1010 and 32 anymore so just gfx1030 is missing here and in the other pr. |
|
@IMbackK gentle ping, waiting for your approval. :) |
IMbackK
left a comment
There was a problem hiding this comment.
Looks good, nit can be addressed or not - its not terribly important.
Also update architectures
Update ROCm docker container to 7.2 release (ggml-org#19418)
Also update architectures
Also update architectures
Also update architectures
Also update architectures
Update all the CI artifacts and jobs to use ROCm 7.2.
When testing offline, I found a problem with rocWMMA on the docker container with GFX908, so it's disabled for that.