OLLOD
On-device local LLM runner for Android, enabling users to run open-source large language models completely offline with private chat interactions.
The Challenge
Running Large Language Models (LLMs) on mobile devices traditionally requires high-bandwidth cloud APIs. This compromises user data privacy, generates high ongoing token-hosting fees, and prevents usage in remote or air-gapped settings.
The Solution
We compiled a high-performance C++ inference engine using llama.cpp and linked it to Kotlin via JNI. By optimizing 4-bit quantized open-source models (like Llama 3 and Gemma) and leveraging Android's Neural Networks API (NNAPI) for hardware acceleration, we enabled local offline execution with a minimal memory footprint.
The Results & Impact
OLLOD achieves local inference speeds of over 12 tokens per second on modern Android devices with zero cloud API dependency, ensuring 100% user data privacy and zero ongoing backend hosting costs.
Have a similar project in mind?
Let's talk about how we can build custom, high-performance software for your business.
Get in touch