Android · Local AI

OLLOD

On-device local LLM runner for Android, enabling users to run open-source large language models completely offline with private chat interactions.

ClientVX9Studio (Self-Developed)

RoleLead Mobile AI Engineer

The Challenge

Running Large Language Models (LLMs) on mobile devices traditionally requires high-bandwidth cloud APIs. This compromises user data privacy, generates high ongoing token-hosting fees, and prevents usage in remote or air-gapped settings.

The Solution

We compiled a high-performance C++ inference engine using llama.cpp and linked it to Kotlin via JNI. By optimizing 4-bit quantized open-source models (like Llama 3 and Gemma) and leveraging Android's Neural Networks API (NNAPI) for hardware acceleration, we enabled local offline execution with a minimal memory footprint.

Technologies Leveraged

KotlinAndroid SDKllama.cppGemma / Llama 3NPU AccelerationC++ / JNI

The Results & Impact

OLLOD achieves local inference speeds of over 12 tokens per second on modern Android devices with zero cloud API dependency, ensuring 100% user data privacy and zero ongoing backend hosting costs.

Have a similar project in mind?

Let's talk about how we can build custom, high-performance software for your business.

Get in touch