Context
NNAPI hardware acceleration doesn't work reliably on Google Tensor chips (Pixel 8, 9, 10). The SDK currently falls back to CPU on these devices, which works but is slower than hardware-accelerated inference on Qualcomm/Samsung.
Several Pixel users reported crashes in v0.0.3 (fixed in v0.0.5 with CPU fallback), but performance would benefit from proper hardware acceleration.
Proposal
Add LiteRT (formerly TensorFlow Lite) as an inference backend for Pixel devices, using the GPU delegate for Tensor chip acceleration.
Scope
- Convert ONNX models to TFLite format (Parakeet TDT, Kokoro, Silero VAD, DeepFilterNet3)
- Add LiteRT GPU delegate as execution provider alongside ONNX Runtime
- Auto-detect Tensor SoC and select the right backend
- Benchmark against CPU-only and NNAPI on Pixel 8/9/10
Affected devices
- Pixel 8 / 8 Pro (Tensor G3)
- Pixel 9 / 9 Pro (Tensor G4)
- Pixel 10 / 10 Pro (Tensor G5)
- Future Pixel devices
References
Context
NNAPI hardware acceleration doesn't work reliably on Google Tensor chips (Pixel 8, 9, 10). The SDK currently falls back to CPU on these devices, which works but is slower than hardware-accelerated inference on Qualcomm/Samsung.
Several Pixel users reported crashes in v0.0.3 (fixed in v0.0.5 with CPU fallback), but performance would benefit from proper hardware acceleration.
Proposal
Add LiteRT (formerly TensorFlow Lite) as an inference backend for Pixel devices, using the GPU delegate for Tensor chip acceleration.
Scope
Affected devices
References