llama.android : Rewrite Android binding by hanyin-arm · Pull Request #17152 · ggml-org/llama.cpp

hanyin-arm · 2025-11-10T19:05:55Z

Setup dynamic native library loading in the :lib native submodule to:
a. enable Aarch64 acceleration up to SME2
b. enable x86_64 acceleratiion up to AMX
c. support both Android and ChromeOS
Rewrite the C++ layer with:
a. automatic message role formatting
b. system prompt injection
c. context overflow handling
d. batch decoding
Rewrite the JNI bridge to expose:
a. engine states
b. new API (such as system prompt)
Add utilities and helpers to:
a. parse GGUF metadata
b. detect Arm CPU features
Miscellaneous performance optimizations
Rewrite the basic sample app

Performance comparison
Against Google AI Edge Gallery on the same Pixel 9:

AI.Playground.vs.Google.AI.Edge.Gallery.on.Gemma3N.mp4

Architectural diagram

Production app
The full-featured production app "Arm AI Chat" is published on Google Play, it's built with the same binding and works on both Android and ChromeOS:

…k navigation inside conversation and benchmark

… enable alert dialog on back navigation inside conversation and benchmark

…s screens

…ens in generation

…the basic sample app from https://github.com/hanyin-arm/Arm-AI-Chat-Sample Note: the full Google Play version of AI Chat app will be open will be open sourced in another repo soon, therefore didn't go through the trouble of pruning the history using `git filter-repo` here.

…ifferent build types

…shots; add my GitHub ID to maintainer list

hanyin-arm · 2025-11-10T19:06:37Z

 /examples/gen-docs/                     @ggerganov
 /examples/gguf/                         @ggerganov
-/examples/llama.android/                @ggerganov
+/examples/llama.android/                @ggerganov @hanyin-arm @naco-siren


adding my personal GitHub account @naco-siren given what's coming to my team at Arm's SJ office ;)

my LinkedIn btw in case anyone wants to connect https://www.linkedin.com/in/nacosiren/

hanyin-arm · 2025-11-11T07:22:45Z

reopening it due to accidentally deleted my remote branch...

BmanClark · 2025-11-19T16:46:34Z

+        }
+    }
+
+    repositories {


This repositories section just needs removing

BmanClark

Have commented on small section that wants removing

ggerganov

Thanks for the interesting contribution. I don't have hardware currently to test it, but the demonstration and description looks good.

I can't provide a meaningful review of the code as I am not really experienced with Android projects. Still, if you are interested in helping with maintenance in the future I think it's OK to merge it. Would need to resolve the CI failures.

ggerganov · 2025-11-20T11:19:38Z

+[submodule "include/cpu_features"]
+	path = include/cpu_features
+	url = https://github.com/google/cpu_features


Can we avoid adding this submodule?

I am trying to understand how this library is used. As far as I can tell, it is only used to print a "CPU tier" string in the UI. This doesn't make much sense to me - was this written before changing the backend loading logic to GGML_CPU_ALL_VARIANTS?

As it is, this library does not seem necessary.

This library was serving two purposes before migrating onto GGML_CPU_ALL_VARIANTS:

detect the CPU tier and save it into DataStore

pass this value to inference submodule for loading the corresponding .so (you should be able to find some traces in my previous commits)

Though the second purpose is no longer valid, this cpu feature indicator is very well received by our leadership (up to SVPs), and they believe it's a reasonable add-on to be PR back since all of the Android devices now are Aarch64, it makes sense to showcase "why this phone runs LLM slow / so fast".

However, if this add-on feature is really unacceptable to be merged into upstream, I can quickly pull up a new PR without it, but it will become incompatible with the Arm fork https://github.com/arm/ai-chat, which is definitely keeping the Arm Feature Indicator, thus whatever to be added there will have a hard time PR back to this upstream repo due to different facade APIs.

So it's up to your choice:

Keep it Remove it

Arm leadership Very happy Pissed

Code Compatible Diverged

Future collaboration Easy Hard

Generally, we try to avoid adding unnecessary dependencies. If you want to print a list of features enabled in the CPU backend, you can do so by using llama_print_system_info. If the way this function formats the output is not desirable, you can use the ggml_backend_get_features function directly and format it in any way you want.

@slaren thanks for your timely response. As promised, I put up another PR without cpu_features: #17413

I also pulled your latest master branch and resolved merge conflicts, and removed Maven's publish plugin and publishing blocks.

let me know if there's anything else needed. thank you very much 🙏

@ggerganov hope that alternative PR addresses all your concerns, thanks again🙏

naco-siren · 2025-11-20T17:29:46Z

Thanks for the interesting contribution. I don't have hardware currently to test it, but the demonstration and description looks good.

I can't provide a meaningful review of the code as I am not really experienced with Android projects. Still, if you are interested in helping with maintenance in the future I think it's OK to merge it. Would need to resolve the CI failures.

Thanks for the response! You actually don't need a physical Android device to test this hardware acceleration, because as long as you have an Apple Silicon Mac (even just M1 will do), Google's official emulator already supports up to dotprod, which can be easily verified by yourself because this basic sample app has already using that TierDetection to show the device CPU tier on the top TextView.

btw I myself didn't buy new Android tablets to capture tablet versions screenshots for publishing either, plz note the dotprod in the 7" & 10" tablet (emulator) screenshots:

btw this arm feature indicator 👆 comes handy, doesn't it? 😉

however I'd still highly recommend reimbursing a latest Android flagship device (with SME/2) from either Arm or at least Qualcomm, so that you will see this project performs way way way better than Google / Msft / Meta.
(@BmanClark plz check with Rob on this 🙏)

BmanClark · 2025-12-17T17:36:50Z

Can close this as updated version was merged

hanyin-arm added 30 commits October 28, 2025 11:39

UI: introduce new dependencies, update versions & references

cbe7133

UI: define theme, color palette, typography and shape

697d778

data: define data models for LLM and system prompts

3787fbd

LLM: stub a local inference engine for faster iteration

3f913ce

UI: app navigation

32608fb

UI: implement basic UI components

4dd755e

util: implement performance monitor; wrap it with a viewmodel

46bd638

util: implement user preferences utility

5ad6591

UI: implement core flow's screens

7e5c80c

UI: add a new MainActivity; update manifest

ca2b777

[WIP] DI: implement simple local vm factory provider

a7ae8b7

UI: disable triggering drawer via gesture; enable alert dialog on bac…

648b978

…k navigation inside conversation and benchmark

UI: allow drawer's gesture control only on Home and Settings screens;…

65c09b2

… enable alert dialog on back navigation inside conversation and benchmark

UI: split a nested parent settings screen into separate child setting…

a7ee3d3

…s screens

UI: polish system prompt setup UI

5868eaa

Deps: bump Kotlin plugin; introduce KSP; apply in :app subproject

4046cd1

DB: setup Room database

5596d52

data: introduce repo for System Prompt; flow data from Room to VM

4848bf9

bugfix: properly handle user's quitting conversation screen while tok…

75c986a

…ens in generation

UI: rename ModeSelection to ModelLoading for better clarity

5568184

UI: update app name to be more Arm

64ebdc6

UI: polish conversation screen

3b499ac

data: code polish

fddf060

UI: code polish

e8b84c6

bugfix: handle user quitting on model loading

6b341b0

UI: locks user in alert dialog when model is unloading

e47e3b7

vm: replace token metrics stubs with actual implementation

2a41c0e

UI: refactor top app bars

5e4972e

nit: combine temperatureMetrics and useFahrenheit

af0d68d

DI: introduce Hilt plugin + processor + lib dependencies

65741a7

hanyin-arm added 10 commits October 28, 2025 11:39

lib: add File version for GGUF Magic number verification

f10d1ab

lib: perform engine state check inclusively instead of exclusively

3644082

lib: change LlamaTier to ArmCpuTier

266fc31

lib: remove kleidi-llama related namings

cadaf80

[WIP] doc: update main and Android README docs; add self to code owners

f10a45f

lib: revert System.load back to System.loadLibrary

3fa3c15

jni: introduce a logging util to filter different logging levels on d…

33987b5

…ifferent build types

lib: enable app optimization

e765543

doc: replace stub Google Play app URL with the actual link add screen…

0bbe3ba

…shots; add my GitHub ID to maintainer list

hanyin-arm requested review from ggerganov and slaren as code owners November 10, 2025 19:05

hanyin-arm commented Nov 10, 2025

View reviewed changes

DajanaV mentioned this pull request Nov 10, 2025

UPSTREAM PR #17152: llama.android : Rewrite Android binding auroralabs-loci/llama.cpp#162

Open

github-actions Bot added documentation Improvements or additions to documentation android Issues specific to Android examples ggml changes relating to the ggml tensor library for machine learning labels Nov 10, 2025

hanyin-arm closed this Nov 11, 2025

hanyin-arm deleted the ai-chat-binding branch November 11, 2025 07:21

hanyin-arm restored the ai-chat-binding branch November 11, 2025 07:21

hanyin-arm reopened this Nov 11, 2025

BmanClark reviewed Nov 19, 2025

View reviewed changes

ggerganov reviewed Nov 20, 2025

View reviewed changes

naco-siren mentioned this pull request Nov 24, 2025

llama.android : Rewrite Android binding (w/o cpu_features dep) #17413

Merged

ggerganov closed this Dec 17, 2025

	Keep it	Remove it
Arm leadership	Very happy	Pissed
Code	Compatible	Diverged
Future collaboration	Easy	Hard

Conversation

hanyin-arm commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hanyin-arm Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hanyin-arm commented Nov 11, 2025

Uh oh!

BmanClark Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

BmanClark left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

ggerganov Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

slaren Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

naco-siren Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

slaren Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

naco-siren Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

naco-siren Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

naco-siren commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BmanClark commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

hanyin-arm commented Nov 10, 2025 •

edited

Loading

hanyin-arm Nov 10, 2025 •

edited

Loading

BmanClark left a comment •

edited

Loading

naco-siren Nov 20, 2025 •

edited

Loading

naco-siren Nov 20, 2025 •

edited

Loading

naco-siren commented Nov 20, 2025 •

edited

Loading