Draft
Conversation
- Replace llama.cpp implementation with SmolVLM2-500M-Video-Instruct via ONNX Runtime 2.0.0-rc.10 - Implement pure Rust dylib approach with JNI bindings for Android integration - Add SmolVLMAndroid.kt for native interface management - Configure dynamic linking strategy to resolve static linking hang issues - Update MainActivity.kt to use consistent library naming (smolvlm_snap) - Add environment setup scripts for both static and dynamic linking configurations - Remove CMake configuration in favor of cargo-ndk build system - Successfully reduce library size from 888MB+ to 40MB total with shared libraries 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Fixed the generation loop to properly handle image embeddings by: - Expanding prompt with <image> tokens matching SmolVLM2 structure (4x4 grid, 64 tokens per patch) - Tokenizing the expanded prompt with proper image token placeholders - Replacing image token embeddings with actual vision features from encoder - Using correct attention mask that matches the full sequence length - Following the working reference implementation pattern This resolves the repetitive output issue and allows the model to generate proper responses. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Changes: - Updated ModelManager to download uint8 quantized models instead of q4 for better XNNPack compatibility - Fixed SplashActivity download logic to only load models after ALL files complete downloading - Fixed RGB channel order conversion from Android ARGB_8888 bitmap format (RGBA in memory) - Added XNNPack availability checking and configuration with 4 threads - Rebuilt ONNX Runtime with XNNPack support enabled The VLM now generates correct output with proper colors. Performance is still limited by CPU/XNNPack on quantized models, but the architecture is now ready for future migration to Executorch for better on-device acceleration. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This branch exists to show that you can run a VLM using ONNX Runtime on Android using Rust. I leveraged pyke-ort for my ONNX Runtime bindings and heavily relied on Claude Code for the generation loop. The primary bottleneck for this project is by far the image encoder. It takes up to a minute to encode the image into tokens, and this needs to be optimized further for practical use. This is also a bottleneck when using llama.cpp, which is why I leveraged Vulkan on the Pixel 9.
The Pixel 9 is a previous generation phone, but it is in no means a "low quality device", and if it can't run well on the Pixel 9, it probably can't run well on Android.