llama.android : Rewrite Android binding#17152
llama.android : Rewrite Android binding#17152hanyin-arm wants to merge 270 commits intoggml-org:masterfrom
Conversation
…k navigation inside conversation and benchmark
… enable alert dialog on back navigation inside conversation and benchmark
…ens in generation
…the basic sample app from https://github.com/hanyin-arm/Arm-AI-Chat-Sample Note: the full Google Play version of AI Chat app will be open will be open sourced in another repo soon, therefore didn't go through the trouble of pruning the history using `git filter-repo` here.
…ifferent build types
…shots; add my GitHub ID to maintainer list
| /examples/gen-docs/ @ggerganov | ||
| /examples/gguf/ @ggerganov | ||
| /examples/llama.android/ @ggerganov | ||
| /examples/llama.android/ @ggerganov @hanyin-arm @naco-siren |
There was a problem hiding this comment.
adding my personal GitHub account @naco-siren given what's coming to my team at Arm's SJ office ;)
my LinkedIn btw in case anyone wants to connect https://www.linkedin.com/in/nacosiren/
|
reopening it due to accidentally deleted my remote branch... |
| } | ||
| } | ||
|
|
||
| repositories { |
There was a problem hiding this comment.
This repositories section just needs removing
ggerganov
left a comment
There was a problem hiding this comment.
Thanks for the interesting contribution. I don't have hardware currently to test it, but the demonstration and description looks good.
I can't provide a meaningful review of the code as I am not really experienced with Android projects. Still, if you are interested in helping with maintenance in the future I think it's OK to merge it. Would need to resolve the CI failures.
| [submodule "include/cpu_features"] | ||
| path = include/cpu_features | ||
| url = https://github.com/google/cpu_features |
There was a problem hiding this comment.
Can we avoid adding this submodule?
There was a problem hiding this comment.
I am trying to understand how this library is used. As far as I can tell, it is only used to print a "CPU tier" string in the UI. This doesn't make much sense to me - was this written before changing the backend loading logic to GGML_CPU_ALL_VARIANTS?
As it is, this library does not seem necessary.
There was a problem hiding this comment.
This library was serving two purposes before migrating onto GGML_CPU_ALL_VARIANTS:
- detect the CPU tier and save it into DataStore
- pass this value to inference submodule for loading the corresponding
.so(you should be able to find some traces in my previous commits)
Though the second purpose is no longer valid, this cpu feature indicator is very well received by our leadership (up to SVPs), and they believe it's a reasonable add-on to be PR back since all of the Android devices now are Aarch64, it makes sense to showcase "why this phone runs LLM slow / so fast".
However, if this add-on feature is really unacceptable to be merged into upstream, I can quickly pull up a new PR without it, but it will become incompatible with the Arm fork https://github.com/arm/ai-chat, which is definitely keeping the Arm Feature Indicator, thus whatever to be added there will have a hard time PR back to this upstream repo due to different facade APIs.
So it's up to your choice:
| Keep it | Remove it | |
|---|---|---|
| Arm leadership | Very happy | Pissed |
| Code | Compatible | Diverged |
| Future collaboration | Easy | Hard |
There was a problem hiding this comment.
Generally, we try to avoid adding unnecessary dependencies. If you want to print a list of features enabled in the CPU backend, you can do so by using llama_print_system_info. If the way this function formats the output is not desirable, you can use the ggml_backend_get_features function directly and format it in any way you want.
There was a problem hiding this comment.
There was a problem hiding this comment.
@ggerganov hope that alternative PR addresses all your concerns, thanks again🙏
Thanks for the response! You actually don't need a physical Android device to test this hardware acceleration, because as long as you have an Apple Silicon Mac (even just M1 will do), Google's official emulator already supports up to btw I myself didn't buy new Android tablets to capture tablet versions screenshots for publishing either, plz note the however I'd still highly recommend reimbursing a latest Android flagship device (with |
|
Can close this as updated version was merged |

:libnative submodule to:a. enable
Aarch64acceleration up toSME2b. enable
x86_64acceleratiion up toAMXc. support both Android and ChromeOS
a. automatic message role formatting
b. system prompt injection
c. context overflow handling
d. batch decoding
a. engine states
b. new API (such as system prompt)
a. parse GGUF metadata
b. detect Arm CPU features
Performance comparison
Against Google AI Edge Gallery on the same Pixel 9:
AI.Playground.vs.Google.AI.Edge.Gallery.on.Gemma3N.mp4
Architectural diagram

Production app

The full-featured production app "Arm AI Chat" is published on Google Play, it's built with the same binding and works on both Android and ChromeOS: