Skip to content

RVV1.0 Supported tests for RISC-V#16682

Merged
CISC merged 19 commits intoggml-org:masterfrom
alitariq4589:rvv1.0ci
Dec 2, 2025
Merged

RVV1.0 Supported tests for RISC-V#16682
CISC merged 19 commits intoggml-org:masterfrom
alitariq4589:rvv1.0ci

Conversation

@alitariq4589
Copy link
Copy Markdown
Contributor

This PR adds tests supported for RISC-V

Tests which are added for execution with RISC-V

debian-cpu-cmake-rv64-native
debian-trixie-cmake-sanitizer-riscv64-native * 3
debian-trixie-llguidance-riscv64-native
debian-trixie-cmake-rpc-riscv64-native
ggml-ci-riscv64-native-cpu-low-perf

Dependencies which are not yet supported for RISC-V:

rocblas-dev
hipblas-dev
vulkan-sdk
mthreads/musa:rc4.3.0-devel-ubuntu22.04-amd64
intel-oneapi-compiler-dpcpp-cpp
intel-oneapi-mkl-devel
cuda
macOS*
windows*
torch

Tests which are not added for RISC-V due to above unmet dependencies:

macOS-latest-cmake-arm64
macOS-latest-cmake-x64
macOS-latest-cmake-arm64-webgpu
ubuntu-24-cmake-vulkan
ubuntu-22-cmake-webgpu
ubuntu-22-cmake-hip
ubuntu-22-cmake-musa
ubuntu-22-cmake-sycl
ubuntu-22-cmake-sycl-fp16
macOS-latest-cmake-ios
macOS-latest-cmake-tvos
macOS-latest-cmake-visionos
macOS-latest-swift
windows-msys2
windows-latest-cmake
ubuntu-latest-cmake-cuda
windows-2022-cmake-cuda
windows-latest-cmake-sycl
windows-latest-cmake-hip
ios-xcode-build
android-build
openEuler-latest-cmake-cann
ggml-ci-x64-nvidia-cuda
ggml-ci-x64-nvidia-vulkan-cm
ggml-ci-x64-nvidia-vulkan-cm2
ggml-ci-x64-cpu-amx
ggml-ci-mac-metal
ggml-ci-mac-vulkan
ggml-ci-arm64-cpu-high-perf-sve
ggml-ci-riscv64-native-cpu-high-perf

Additional Notes

Note 1

Due to a warning (treated as error) related to RISC-V simd mappings

In file included from ../../../ggml/src/ggml-cpu/arch/riscv/quants.c:6:
../../../ggml/src/ggml-cpu/simd-mappings.h: In function 'riscv_compute_fp16_to_fp32':
../../../ggml/src/ggml-cpu/simd-mappings.h:101:9: error: ISO C does not support the '_Float16' type before C23 [-Werror=pedantic]
  101 |         _Float16 hf;
      |         ^~~~~~~~
../../../ggml/src/ggml-cpu/simd-mappings.h: In function 'riscv_compute_fp32_to_fp16':
../../../ggml/src/ggml-cpu/simd-mappings.h:108:9: error: ISO C does not support the '_Float16' type before C23 [-Werror=pedantic]
  108 |         _Float16 hf = (_Float16)f;
      |         ^~~~~~~~
../../../ggml/src/ggml-cpu/simd-mappings.h:108:24: error: ISO C does not support the '_Float16' type before C23 [-Werror=pedantic]
  108 |         _Float16 hf = (_Float16)f;
      |                        ^~~~~~~~
cc1: all warnings being treated as errors
make[2]: *** [ggml/src/CMakeFiles/ggml-cpu.dir/build.make:261: ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/arch/riscv/quants.c.o] Error 1
make[2]: *** Waiting for unfinished jobs....
In file included from ../../../ggml/src/ggml-cpu/quants.c:5:
../../../ggml/src/ggml-cpu/simd-mappings.h: In function 'riscv_compute_fp16_to_fp32':
../../../ggml/src/ggml-cpu/simd-mappings.h:101:9: error: ISO C does not support the '_Float16' type before C23 [-Werror=pedantic]
  101 |         _Float16 hf;
      |         ^~~~~~~~
../../../ggml/src/ggml-cpu/simd-mappings.h: In function 'riscv_compute_fp32_to_fp16':
../../../ggml/src/ggml-cpu/simd-mappings.h:108:9: error: ISO C does not support the '_Float16' type before C23 [-Werror=pedantic]
  108 |         _Float16 hf = (_Float16)f;
      |         ^~~~~~~~
../../../ggml/src/ggml-cpu/simd-mappings.h:108:24: error: ISO C does not support the '_Float16' type before C23 [-Werror=pedantic]
  108 |         _Float16 hf = (_Float16)f;
      |                        ^~~~~~~~
cc1: all warnings being treated as errors
make[2]: *** [ggml/src/CMakeFiles/ggml-cpu.dir/build.make:135: ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/quants.c.o] Error 1
In file included from ../../../ggml/src/ggml-cpu/common.h:7,
                 from ../../../ggml/src/ggml-cpu/unary-ops.h:3,
                 from ../../../ggml/src/ggml-cpu/ggml-cpu.c:12:
../../../ggml/src/ggml-cpu/simd-mappings.h: In function 'riscv_compute_fp16_to_fp32':
../../../ggml/src/ggml-cpu/simd-mappings.h:101:9: error: ISO C does not support the '_Float16' type before C23 [-Werror=pedantic]
  101 |         _Float16 hf;
      |         ^~~~~~~~
../../../ggml/src/ggml-cpu/simd-mappings.h: In function 'riscv_compute_fp32_to_fp16':
../../../ggml/src/ggml-cpu/simd-mappings.h:108:9: error: ISO C does not support the '_Float16' type before C23 [-Werror=pedantic]
  108 |         _Float16 hf = (_Float16)f;
      |         ^~~~~~~~
../../../ggml/src/ggml-cpu/simd-mappings.h:108:24: error: ISO C does not support the '_Float16' type before C23 [-Werror=pedantic]
  108 |         _Float16 hf = (_Float16)f;
      |                        ^~~~~~~~
../../../ggml/src/ggml-cpu/ggml-cpu.c: In function 'ggml_cpu_fp32_to_fp16':
../../../ggml/src/ggml-cpu/ggml-cpu.c:3239:32: error: ISO C does not support the '_Float16' type before C23 [-Werror=pedantic]
 3239 |         __riscv_vse16_v_f16m1((_Float16 *)&y[i], vy, vl);
      |                                ^~~~~~~~
cc1: all warnings being treated as errors
make[2]: *** [ggml/src/CMakeFiles/ggml-cpu.dir/build.make:79: ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.c.o] Error 1
../../../ggml/src/ggml-cpu/vec.cpp: In function 'void ggml_vec_dot_f32(int, float*, size_t, const float*, size_t, const float*, size_t, int)':
../../../ggml/src/ggml-cpu/vec.cpp:93:41: error: 'vsum' may be used uninitialized [-Werror=maybe-uninitialized]
   93 |         vsum = __riscv_vfmv_v_f_f32m8_tu(vsum, 0.0f, vl);
      |                ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~
../../../ggml/src/ggml-cpu/vec.cpp:90:22: note: 'vsum' was declared here
   90 |         vfloat32m8_t vsum;
      |                      ^~~~
cc1plus: all warnings being treated as errors
make[2]: *** [ggml/src/CMakeFiles/ggml-cpu.dir/build.make:219: ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/vec.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:2199: ggml/src/CMakeFiles/ggml-cpu.dir/all] Error 2

-DLLAMA_FATAL_WARNINGS=ON has to be turned off for all the tests otherwise CI fails

(This can be created a separated issue. I can track the contributor and ping him if you want)

Note 2

One RISC-V board may not be optimal for running all these tests so by the end of RISC-V summit North America, we are expecting more boards to arrive (around mid of November). So this PR can be treated as draft for reviews till then.

@github-actions github-actions Bot added the devops improvements to build systems and github actions label Oct 20, 2025
@CISC
Copy link
Copy Markdown
Member

CISC commented Nov 27, 2025

@alitariq4589 Any progress on this?

Also, it looks like the runner has gone offline...

@alitariq4589
Copy link
Copy Markdown
Contributor Author

@CISC There was a minor internet outage. It is fixed now. Can you check and let me know if the board still shows offline? If that is the case, I will restart the container.

Any progress on this?

I had ordered and integrated 9 RISC-V boards with RVV1.0 so all tests can run smoothly, I am currently testing the ccache by running multiple tests if it works. Here is the build. As soon as I am done testing, I will open the PR for review. Additionally I will also need a single runner token for adding all those boards in llama.cpp repository once I open the PR for review.

It would be great if we have some faster means of communication other than issues and emails. Do you use some other messaging platform (discord, mastodon etc.?). If it is okay, you can also join this discord server.

@CISC
Copy link
Copy Markdown
Member

CISC commented Nov 27, 2025

@CISC There was a minor internet outage. It is fixed now. Can you check and let me know if the board still shows offline? If that is the case, I will restart the container.

I cancelled all the old jobs, but there are currently 2 new ones queued and not picked up yet.

Any progress on this?

I had ordered and integrated 9 RISC-V boards with RVV1.0 so all tests can run smoothly, I am currently testing the ccache by running multiple tests if it works. Here is the build. As soon as I am done testing, I will open the PR for review. Additionally I will also need a single runner token for adding all those boards in llama.cpp repository once I open the PR for review.

Great, ping Georgi when you do.

It would be great if we have some faster means of communication other than issues and emails. Do you use some other messaging platform (discord, mastodon etc.?). If it is okay, you can also join this discord server.

Sorry, email only.

@alitariq4589
Copy link
Copy Markdown
Contributor Author

@CISC I have restarted the runner and it is picking up the jobs again
image

Corrections included:
1. Changed the test names from debian to ubuntu as it is more stable than Debian Trixie
2. Added explicit compiler in cmake command as GCC compiler below version 14 have been recorded
to throw errors with rvv1.0 and some other extensions
3. Added dependencies which are not installed by default in the RISC-V Ubuntu 24.04
4. Separate ccache directory for all jobs as all the ccache results are not the same and may cause ccache to not work
@alitariq4589
Copy link
Copy Markdown
Contributor Author

@CISC This PR is ready for review. I have excluded ggml-ci-riscv64-native-cpu-low-perf as it was causing boards to crash due to high resource utilization. There are 10x Banana Pi F3 boards arriving. When they do, I will use them for testing.

The result of the added builds can be seen here in my fork. There are multiple attempts, so you can check each of them. As the number of boards integrated is greater than the number of tests ported for RISC-V (because there may be some builds in the queue to, these boards will offload some time), the ccache effect is not immediated visible in 4 attempts, but I have tested individually and so far, according to the stats, the ccache seems to be working (check this and this job builds too for ccache results in which I executed each job multiple times for checking ccache results).

Following RISC-V boards will be integrated once I get the token.

  • 8x Milk-V Jupiter boards
  • 1x Banana Pi F3 (BPI F3) - This is already integrated with runner name 87a313462bbe, and I will not add it again for duplication

For checking the resource utilization, I have set up grafana to track usage. This tracking site tracks the usage of the host machine and not the containers in which the builds will be running. Use the following links to view resource usage.

NOTE: Since our network engineer is out of the office for the next couple of days, jupiter-16G-2, jupiter-16G-4, jupiter-16G-5 are not added in usage tracking, so you can ignore their shown data in the tracking dashboards (those are placeholders right now). I will add this within the next two weeks and will comment here.

@ggerganov, please share a github runner token when you can, and I will use that to register all these boards for builds. The token will be valid for one hour after generation, so let me know as soon as you generate it. You can send it to me at my email.

Let me know if anyone has any questions 🙂

@alitariq4589 alitariq4589 marked this pull request as ready for review December 2, 2025 07:07
Copy link
Copy Markdown
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alitariq4589 Sending you the token in a min

Comment thread ci/run.sh Outdated
Comment on lines +48 to +52
<<<<<<< HEAD
CMAKE_EXTRA="-DLLAMA_FATAL_WARNINGS=${LLAMA_FATAL_WARNINGS:-ON} -DLLAMA_CURL=ON"
=======
CMAKE_EXTRA="-DLLAMA_FATAL_WARNINGS=ON -DLLAMA_CURL=ON -DGGML_SCHED_NO_REALLOC=ON"
>>>>>>> master
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conflict

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check.

BTW, the background of this change is that, since there is a warning (in the very first comment), tests were failing so I had to turn the "warnings as errors" off for CI to pass. I think we also need to create an issue and ping the contributor for this change. Maybe the RVV1.0 intrinsics change somewhere which is causing this error.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xctan Could you take a look at this compile warning?

In file included from ../../../ggml/src/ggml-cpu/arch/riscv/quants.c:6:
../../../ggml/src/ggml-cpu/simd-mappings.h: In function 'riscv_compute_fp16_to_fp32':
../../../ggml/src/ggml-cpu/simd-mappings.h:101:9: error: ISO C does not support the '_Float16' type before C23 [-Werror=pedantic]
  101 |         _Float16 hf;
      |         ^~~~~~~~
../../../ggml/src/ggml-cpu/simd-mappings.h: In function 'riscv_compute_fp32_to_fp16':
../../../ggml/src/ggml-cpu/simd-mappings.h:108:9: error: ISO C does not support the '_Float16' type before C23 [-Werror=pedantic]
  108 |         _Float16 hf = (_Float16)f;
      |         ^~~~~~~~
../../../ggml/src/ggml-cpu/simd-mappings.h:108:24: error: ISO C does not support the '_Float16' type before C23 [-Werror=pedantic]
  108 |         _Float16 hf = (_Float16)f;
      |                        ^~~~~~~~
cc1: all warnings being treated as errors

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggerganov strangely I dont see conflicts after merging. I dont see conflicts here on the github ui too. Are you testing this on older commit? I just merged master branch to my branch and it seems okay.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have committed the merge conflict text, it just needs cleaning up.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _Float16 type seems to be the only way to get the compiler's built-in code generation to work for Zfh and Zvfh extensions. The catch is, this type is only available starting with the ISO C C23 standard. Otherwise, we'd have to resort to inline assembly, which isn't ideal for register usage. Plus, vector intrinsic functions also need this type, and float16_t is exclusively defined in C++ headers (since C++23). So, given all that, maybe we can just disable this diagnostic when _Float16 types are being used?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CISC My bad. I have cleaned it up now.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, given all that, maybe we can just disable this diagnostic when _Float16 types are being used?

Sounds good

Comment thread ci/run.sh Outdated
@alitariq4589
Copy link
Copy Markdown
Contributor Author

@CISC the .github/workflows/build-riscv-native.yml CI is failing. But should I remove this since most of the things are already covered in the added tests, and it seems redundant?

@ggerganov Thanks for sharing the token. Can you please confirm the following added runners in the repository settings as online?

jupiter-16G-1
jupiter-16G-2
jupiter-16G-4
jupiter-16G-5
jupiter-16G-6
jupiter-16G-7
jupiter-16G-8
jupiter-16G-9
(and one previously added runner with name 87a313462bbe)

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
@CISC
Copy link
Copy Markdown
Member

CISC commented Dec 2, 2025

@CISC the .github/workflows/build-riscv-native.yml CI is failing. But should I remove this since most of the things are already covered in the added tests, and it seems redundant?

You tell me. :) If it is of no further value to you it can be removed.

@alitariq4589
Copy link
Copy Markdown
Contributor Author

@CISC I am sorry for the inconvenience. I will now upgrade the runner to the latest version, and (hopefully) all these issues will be resolved.

According to what I assessed in the past couple of days, these are happening because of the following discrepancies.

  1. The Ubuntu image used in the dockerfile which is running as the runner is riscv64/ubuntu, and that is not the official image. There IS one official image from Ubuntu now. But updating the image will cause a little downtime
  2. The Github runner version that is running is now older than the upstream version, so it tries to install the latest version of github runner. But, since that is not available forthe RISC-V architecture, it interrupts the running job, cancelling it in the process. That is why the job cancellations are now more frequent. I have not yet implemented a patch for this. The best way is to keep the github runner up to date.
  3. The older versions of .NET silently crash internally with segfault errors because of bugs. The newer version (10.0.102) will hopefully work fine.

I will start the upgrade process. Hopefully, there won't be any interruptions as I will upgrade the runners one by one. I should be done in 5-6 hours if no problem is faced.

I will add a comment once I am done with this.

@alitariq4589
Copy link
Copy Markdown
Contributor Author

I would also like to mention that pytorch is now released for riscv. I have set up a workflow which fetches the source code from official upstream, builds and releases it without adding any kind of change. That means the RISC-V workflows which require pytorch for RISC-V can now be added.

Pytorch release link: https://github.com/alitariq4589/pytorch-riscv/releases

@CISC
Copy link
Copy Markdown
Member

CISC commented Feb 13, 2026

I would also like to mention that pytorch is now released for riscv. I have set up a workflow which fetches the source code from official upstream, builds and releases it without adding any kind of change. That means the RISC-V workflows which require pytorch for RISC-V can now be added.

Nice, thanks for following up. :)

@alitariq4589
Copy link
Copy Markdown
Contributor Author

@CISC, I noticed that there are several jobs in the queue. I was waiting for the runner to complete all the jobs so I can patch it, but looking at the logs, i dont think that is going to happen 😅 . So if I try to patch the runner in this state, the running job will be cancelled.

Is it okay if I cancel the job running on the boards to patch them? You can also tell me a specific time to upgrade the packages when you think it will not affect the CI and PRs considerably.

@CISC
Copy link
Copy Markdown
Member

CISC commented Feb 13, 2026

@CISC, I noticed that there are several jobs in the queue. I was waiting for the runner to complete all the jobs so I can patch it, but looking at the logs, i dont think that is going to happen 😅 . So if I try to patch the runner in this state, the running job will be cancelled.

Yeah, quite unlikely at this point. :D

Is it okay if I cancel the job running on the boards to patch them? You can also tell me a specific time to upgrade the packages when you think it will not affect the CI and PRs considerably.

Just go ahead, if it's crucial for a PR we can rerun them.

@alitariq4589
Copy link
Copy Markdown
Contributor Author

@CISC All the runners are now upgraded to version 2.331.0.

Hopefully, this will resolve all the disconnection/cancellation issues. The only concern now remains of the job cancellation when a new version of GitHub Runner launches. I will keep a watch, but let me know if you observe any cancellations.

Additionally, I will be taking a backup of the images tomorrow, so in case of corruption, it may be possible to restore the image. Since backing up with ccache consumes a lot of space, I will clear ccache and take a backup of the images. So the runners will take some CI builds to fill the ccache again.

Let me know if you see any anomalies.

@CISC
Copy link
Copy Markdown
Member

CISC commented Feb 16, 2026

Hopefully, this will resolve all the disconnection/cancellation issues. The only concern now remains of the job cancellation when a new version of GitHub Runner launches. I will keep a watch, but let me know if you observe any cancellations.

@alitariq4589 Got a few in a row now:
https://github.com/ggml-org/llama.cpp/actions/runs/22069063323/job/63768515140
https://github.com/ggml-org/llama.cpp/actions/runs/22069348103/job/63769503838

@alitariq4589
Copy link
Copy Markdown
Contributor Author

alitariq4589 commented Feb 16, 2026

@CISC Thanks for informing me and keeping a check on these errors. I am sorry again for the inconvinience.

I have found the cause of the issue. The runner is checking for newer version of github actions and cannot find one because of unknown (RISC-V) ISA.

image

I will add a patch for this and will let you know once this issue is solved.

Since, even after adding the latest version, it still tries to fetch/check a new version, I will disable this check of checking for updates entirely within the github actions source code.

@alitariq4589
Copy link
Copy Markdown
Contributor Author

@CISC Thanks a lot for informing me about cancellations. I have just updated all the runners with a new patch of github runner source code, which disables the auto updater (here is the workflow if you would like to have a look yourself).

I also added pending restart functionality for the runner containers to restart only when no jobs are running, but due to GitHub Actions picking up jobs quickly from queue, you may have seen some cancellations.

I hope this will solve all the cancellation issues. But if they do appear, please let me know.

@CISC
Copy link
Copy Markdown
Member

CISC commented Feb 18, 2026

I hope this will solve all the cancellation issues. But if they do appear, please let me know.

@alitariq4589 https://github.com/ggml-org/llama.cpp/actions/runs/22139370809/job/63999404175?pr=19660

@alitariq4589
Copy link
Copy Markdown
Contributor Author

I hope this will solve all the cancellation issues. But if they do appear, please let me know.

@alitariq4589 https://github.com/ggml-org/llama.cpp/actions/runs/22139370809/job/63999404175?pr=19660

This is a strange kind of error that I didn't encounter before. I think this is something more related to github server side than the runner side.

image

@alitariq4589
Copy link
Copy Markdown
Contributor Author

I also noticed that two runners were not properly upgraded. I have updated their file,s and they will automatically restart once no jobs are running.

@CISC
Copy link
Copy Markdown
Member

CISC commented Feb 18, 2026

This is a strange kind of error that I didn't encounter before. I think this is something more related to github server side than the runner side.

Yep, probably, we had some other strange failures as well, think GitHub had a little hiccup.

@alitariq4589
Copy link
Copy Markdown
Contributor Author

@CISC Did you notice any kind of issues after that? (disconnections, failures, cancellations, etc. due to GitHub Runner package)

@CISC
Copy link
Copy Markdown
Member

CISC commented Feb 24, 2026

@CISC Did you notice any kind of issues after that? (disconnections, failures, cancellations, etc. due to GitHub Runner package)

Just a few infrequent weird failures, hard to tell why, otherwise all good:
https://github.com/ggml-org/llama.cpp/actions/runs/22261765665/job/64401095792
https://github.com/ggml-org/llama.cpp/actions/runs/22339773029/job/64640297902

@CISC
Copy link
Copy Markdown
Member

CISC commented Feb 26, 2026

@CISC Did you notice any kind of issues after that? (disconnections, failures, cancellations, etc. due to GitHub Runner package)

Just a few infrequent weird failures, hard to tell why, otherwise all good: https://github.com/ggml-org/llama.cpp/actions/runs/22261765665/job/64401095792 https://github.com/ggml-org/llama.cpp/actions/runs/22339773029/job/64640297902

@alitariq4589 Ok, it's been happening a lot today:
https://github.com/ggml-org/llama.cpp/actions/runs/22460277028/job/65052394470

@alitariq4589
Copy link
Copy Markdown
Contributor Author

@CISC Did you notice any kind of issues after that? (disconnections, failures, cancellations, etc. due to GitHub Runner package)

Just a few infrequent weird failures, hard to tell why, otherwise all good: https://github.com/ggml-org/llama.cpp/actions/runs/22261765665/job/64401095792 https://github.com/ggml-org/llama.cpp/actions/runs/22339773029/job/64640297902

@alitariq4589 Ok, it's been happening a lot today: https://github.com/ggml-org/llama.cpp/actions/runs/22460277028/job/65052394470

Thank you for pointing that out. I am checking the logs. Can you please also provide me with other jobs that had this kind of failure? I am trying to see the similarities in these failures in the debug logs.

@CISC
Copy link
Copy Markdown
Member

CISC commented Feb 27, 2026

@alitariq4589 Ok, it's been happening a lot today: https://github.com/ggml-org/llama.cpp/actions/runs/22460277028/job/65052394470

Thank you for pointing that out. I am checking the logs. Can you please also provide me with other jobs that had this kind of failure? I am trying to see the similarities in these failures in the debug logs.

Sure, here's another:
https://github.com/ggml-org/llama.cpp/actions/runs/22463173413/job/65062512840?pr=19796

Also this one at Post Clone:
https://github.com/ggml-org/llama.cpp/actions/runs/22474963466/job/65099868008

@alitariq4589
Copy link
Copy Markdown
Contributor Author

@alitariq4589 Ok, it's been happening a lot today: https://github.com/ggml-org/llama.cpp/actions/runs/22460277028/job/65052394470

Thank you for pointing that out. I am checking the logs. Can you please also provide me with other jobs that had this kind of failure? I am trying to see the similarities in these failures in the debug logs.

Sure, here's another: https://github.com/ggml-org/llama.cpp/actions/runs/22463173413/job/65062512840?pr=19796

Also this one at Post Clone: https://github.com/ggml-org/llama.cpp/actions/runs/22474963466/job/65099868008

This is a segmentation fault (error code 139), as I have seen in the diag logs. According to my understanding of this behavior, this is again coming from the .NET. I have added an issue in the .NET release of RISC-V. Let's see what they say about this.

One other thing is that the Ubuntu image which I am using for this is not official from Canonical (because when I created the image for github runner, Canonical did not release any image at that time for riscv). I will change the image inside every container to the official Ubuntu LTS image, but it is going to take some time (around a week) because it is a manual effort for every container. I will get back to you as soon as I finish it.

@CISC
Copy link
Copy Markdown
Member

CISC commented Mar 26, 2026

@alitariq4589
Copy link
Copy Markdown
Contributor Author

@CISC I have created some space on the device.

Also, @luhenry and I have set up RISC-V CI infrastructure in a github app with ephemeral runners under RISE CI enablement project. So instead of the manually added runners, you will be able to install the apps, and that will allocate a runner. For the resolution of all the issues, these added runners in Llama.cpp will soon be moved to the RISE pool of GitHub runners.

I will be able to inform you about the exact downtime once we have figured out the best approach to migrate the CI runners to the RISE github app.

@luhenry
Copy link
Copy Markdown
Contributor

luhenry commented Mar 26, 2026

Also, @luhenry and I have set up RISC-V CI infrastructure in a github app with ephemeral runners under RISE CI enablement project. So instead of the manually added runners, you will be able to install the apps, and that will allocate a runner. For the resolution of all the issues, these added runners in Llama.cpp will soon be moved to the RISE pool of GitHub runners.

@CISC You can find all the information about these RISE RISC-V Runners at https://riseproject-dev.github.io/riscv-runner/. The announcement here

@alitariq4589
Copy link
Copy Markdown
Contributor Author

@CISC We will start the migration process to move the RISC-V machines to RISE runners as discussed above. This will be a rolling migration (meaning boards will be taken down and added to the github app one after the other). This will not cause considerable downtime, but the running jobs on the board under migration will be cancelled.

Before starting the process, can you please install the RISE RISC-V Runners app so that the added boards automatically pick up new jobs and prevent downtime?

Also, this app is configured to be installed at the organization level, not at the individual repository level.

Let me know once you have installed the github app.

@CISC
Copy link
Copy Markdown
Member

CISC commented Mar 30, 2026

Before starting the process, can you please install the RISE RISC-V Runners app so that the added boards automatically pick up new jobs and prevent downtime?

Also, this app is configured to be installed at the organization level, not at the individual repository level.

Let me know once you have installed the github app.

cc/ @ggerganov

@ggerganov
Copy link
Copy Markdown
Member

The app has been added to the ggml-org with access to the llama.cpp, ggml and whisper.cpp repos.

@alitariq4589
Copy link
Copy Markdown
Contributor Author

@CISC We (I and @luhenry ) are migrating the boards from conventional runners to GitHub App, which is installed. You may see some cancellations because waiting for them to be free causes us to wait for a long time. You can then rerun the jobs, and they will automatically be scheduled on the newly added runners.

I will ping here once the process is complelete.

@luhenry
Copy link
Copy Markdown
Contributor

luhenry commented Apr 5, 2026

@CISC the migration is mostly complete. All the Jupiter boards have been migrated. We only have a Banana-Pi left but it doesn’t seem critical.

There is still the ccache issue, I hope to have progress done this week.

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
* Added RISC-V supported tests

* Added default value for LLAMA_FATAL_WARNINGS and option to specify by user

* Added RISC-V supported tests

* Added default value for LLAMA_FATAL_WARNINGS and option to specify by user

* Removed apt prompt

* Added RISC-V specific tests with corrections

Corrections included:
1. Changed the test names from debian to ubuntu as it is more stable than Debian Trixie
2. Added explicit compiler in cmake command as GCC compiler below version 14 have been recorded
to throw errors with rvv1.0 and some other extensions
3. Added dependencies which are not installed by default in the RISC-V Ubuntu 24.04
4. Separate ccache directory for all jobs as all the ccache results are not the same and may cause ccache to not work

* Resolved the merge conflict and cleaned up run.sh

* Update ci/run.sh

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Removed previously added build ci for RISC-V

* Removed trailing whitespaces

* corrected build name

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* cleanup

* Enabled build tests (1)

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Enabled build tests (2)

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* enable openssl

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devops improvements to build systems and github actions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants