Skip to content

llama.android : Rewrite Android binding#17152

Closed
hanyin-arm wants to merge 270 commits intoggml-org:masterfrom
arm:ai-chat-binding
Closed

llama.android : Rewrite Android binding#17152
hanyin-arm wants to merge 270 commits intoggml-org:masterfrom
arm:ai-chat-binding

Conversation

@hanyin-arm
Copy link
Copy Markdown
Contributor

@hanyin-arm hanyin-arm commented Nov 10, 2025

  1. Setup dynamic native library loading in the :lib native submodule to:
    a. enable Aarch64 acceleration up to SME2
    b. enable x86_64 acceleratiion up to AMX
    c. support both Android and ChromeOS
  2. Rewrite the C++ layer with:
    a. automatic message role formatting
    b. system prompt injection
    c. context overflow handling
    d. batch decoding
  3. Rewrite the JNI bridge to expose:
    a. engine states
    b. new API (such as system prompt)
  4. Add utilities and helpers to:
    a. parse GGUF metadata
    b. detect Arm CPU features
  5. Miscellaneous performance optimizations
  6. Rewrite the basic sample app

Performance comparison
Against Google AI Edge Gallery on the same Pixel 9:

AI.Playground.vs.Google.AI.Edge.Gallery.on.Gemma3N.mp4

Architectural diagram
image

Production app
The full-featured production app "Arm AI Chat" is published on Google Play, it's built with the same binding and works on both Android and ChromeOS:
Acer Chromebook with i3-1315U (2023) up to AVX_VNNI

…k navigation inside conversation and benchmark
… enable alert dialog on back navigation inside conversation and benchmark
Comment thread CODEOWNERS
/examples/gen-docs/ @ggerganov
/examples/gguf/ @ggerganov
/examples/llama.android/ @ggerganov
/examples/llama.android/ @ggerganov @hanyin-arm @naco-siren
Copy link
Copy Markdown
Contributor Author

@hanyin-arm hanyin-arm Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding my personal GitHub account @naco-siren given what's coming to my team at Arm's SJ office ;)

my LinkedIn btw in case anyone wants to connect https://www.linkedin.com/in/nacosiren/

@github-actions github-actions Bot added documentation Improvements or additions to documentation android Issues specific to Android examples ggml changes relating to the ggml tensor library for machine learning labels Nov 10, 2025
@hanyin-arm hanyin-arm closed this Nov 11, 2025
@hanyin-arm hanyin-arm deleted the ai-chat-binding branch November 11, 2025 07:21
@hanyin-arm hanyin-arm restored the ai-chat-binding branch November 11, 2025 07:21
@hanyin-arm hanyin-arm reopened this Nov 11, 2025
@hanyin-arm
Copy link
Copy Markdown
Contributor Author

reopening it due to accidentally deleted my remote branch...

}
}

repositories {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This repositories section just needs removing

Copy link
Copy Markdown

@BmanClark BmanClark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have commented on small section that wants removing

Copy link
Copy Markdown
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the interesting contribution. I don't have hardware currently to test it, but the demonstration and description looks good.

I can't provide a meaningful review of the code as I am not really experienced with Android projects. Still, if you are interested in helping with maintenance in the future I think it's OK to merge it. Would need to resolve the CI failures.

Comment thread .gitmodules
Comment on lines +1 to +3
[submodule "include/cpu_features"]
path = include/cpu_features
url = https://github.com/google/cpu_features
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid adding this submodule?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to understand how this library is used. As far as I can tell, it is only used to print a "CPU tier" string in the UI. This doesn't make much sense to me - was this written before changing the backend loading logic to GGML_CPU_ALL_VARIANTS?

As it is, this library does not seem necessary.

Copy link
Copy Markdown
Contributor

@naco-siren naco-siren Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This library was serving two purposes before migrating onto GGML_CPU_ALL_VARIANTS:

  1. detect the CPU tier and save it into DataStore
  2. pass this value to inference submodule for loading the corresponding .so (you should be able to find some traces in my previous commits)

Though the second purpose is no longer valid, this cpu feature indicator is very well received by our leadership (up to SVPs), and they believe it's a reasonable add-on to be PR back since all of the Android devices now are Aarch64, it makes sense to showcase "why this phone runs LLM slow / so fast".

However, if this add-on feature is really unacceptable to be merged into upstream, I can quickly pull up a new PR without it, but it will become incompatible with the Arm fork https://github.com/arm/ai-chat, which is definitely keeping the Arm Feature Indicator, thus whatever to be added there will have a hard time PR back to this upstream repo due to different facade APIs.

So it's up to your choice:

Keep it Remove it
Arm leadership Very happy Pissed
Code Compatible Diverged
Future collaboration Easy Hard

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, we try to avoid adding unnecessary dependencies. If you want to print a list of features enabled in the CPU backend, you can do so by using llama_print_system_info. If the way this function formats the output is not desirable, you can use the ggml_backend_get_features function directly and format it in any way you want.

Copy link
Copy Markdown
Contributor

@naco-siren naco-siren Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slaren thanks for your timely response. As promised, I put up another PR without cpu_features: #17413

I also pulled your latest master branch and resolved merge conflicts, and removed Maven's publish plugin and publishing blocks.

let me know if there's anything else needed. thank you very much 🙏

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggerganov hope that alternative PR addresses all your concerns, thanks again🙏

@naco-siren
Copy link
Copy Markdown
Contributor

naco-siren commented Nov 20, 2025

Thanks for the interesting contribution. I don't have hardware currently to test it, but the demonstration and description looks good.

I can't provide a meaningful review of the code as I am not really experienced with Android projects. Still, if you are interested in helping with maintenance in the future I think it's OK to merge it. Would need to resolve the CI failures.

Thanks for the response! You actually don't need a physical Android device to test this hardware acceleration, because as long as you have an Apple Silicon Mac (even just M1 will do), Google's official emulator already supports up to dotprod, which can be easily verified by yourself because this basic sample app has already using that TierDetection to show the device CPU tier on the top TextView.

btw I myself didn't buy new Android tablets to capture tablet versions screenshots for publishing either, plz note the dotprod in the 7" & 10" tablet (emulator) screenshots:
image
btw this arm feature indicator 👆 comes handy, doesn't it? 😉

however I'd still highly recommend reimbursing a latest Android flagship device (with SME/2) from either Arm or at least Qualcomm, so that you will see this project performs way way way better than Google / Msft / Meta.
(@BmanClark plz check with Rob on this 🙏)

@BmanClark
Copy link
Copy Markdown

Can close this as updated version was merged

@ggerganov ggerganov closed this Dec 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

android Issues specific to Android documentation Improvements or additions to documentation examples ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants