DataSapien Lab Report: Running Google Gemma 3n on iOS

by

|

Jun 11, 2025

|

By Arda Dogantemur

What if Google’s newest open foundation model could run natively on your iPhone, with no Python, no cloud, just edge intelligence? That question became an experiment, and now it’s a working proof of concept. In this Lab Report, I’m sharing how I managed to get Gemma 3n, Google’s lightweight language model, running on iOS, and what it might mean for the future of private AI on mobile.

Over the past week, we’ve been experimenting with Google’s Gemma 3n model one of the latest lightweight language models optimised for on-device performance. Our goal: run it natively on iOS, without cloud support, using a real-world SDK integration path. We got surprisingly far.

Here’s what we achieved, what blocked us, and why this work matters for the future of private, orchestrated, edge-native AI.

What We Did

We used Google’s Edge AI MediaPipe framework to compile and execute Gemma 3n directly on an iPhone. No server, no shared data, no delay.

Gemma 3n handled text prompts smoothly, and the on-device performance was stable enough for practical experimentation. This alone opens up huge potential for personal agents, chatbots, and privacy-first mobile assistants.

On our tests with iPhone 16 Pro Max, the Gemma 3n 1B model ran using just ~1.1 GB of RAM – comfortably within modern device limits and without impacting other system performance. (topK: 64 – topP: 0.95 – temperature: 1.0)

Where Things Got Tricky

Gemma 3n is also capable of image-to-text tasks – think captioning, OCR, or multimodal understanding. However, this mode expects images in an encoded text format. And here’s the challenge: There’s no public guidance yet on how to prepare or encode images for Gemma 3n on iOS.

We tried multiple strategies: base64, tokenized binary wrappers, and structure-preserving transforms. None worked reliably with the current MediaPipe stack on iOS. Until Google publishes clear specs or compatible tooling, image inputs remain an open puzzle.

Lessons from Other Runtimes

We also explored Apple’s MLX a blazing-fast, Swift-native machine learning framework. While performance was great, maintainability was not:

Requires forking and manually building 10+ repositories

Apple-specific and hard to integrate in cross-platform SDKs

Packaging for distribution is a pain

That’s a deal-breaker for our use case.

For us, cross-platform reliability > marginal performance gains.

Our SDKs must support rapid rollout, frequent updates, and maintainable codebases across both iOS and Android. Our clients depend on consistent behavior, minimal integration friction, and quick turnarounds for new features.

Why This Matters

This isn’t just a tech demo. It’s a signal. As Personal AI becomes more intimate – living on your phone – understanding your data, and helping orchestrate your day, cloud-based inference becomes a privacy liability.

We’re building toward a future where zero data leaves the device, inference is native, and intelligence runs in sync with user intent. That demands:

Cross-platform, lightweight inference runtimes

Local orchestration tools (not just models)

Efficient packaging and maintainable SDKs

Gemma 3n shows that we’re getting close. Even without full support, we’ve proven that state-of-the-art language models can run natively on iOS today.

What’s Next

We’re continuing this exploration with the following priorities:

Investigate alternative image encoding formats

Optimize context window sizes for memory efficiency

Build script-based model orchestration for complex workflows

Benchmark LiteRT vs MLX across a range of iOS and Android devices

With cross-platform AI that will soon be multi-modal, every update, fix or feature release needs to work seamlessly across both iOS and Android (and other edges), without introducing friction or delay. This will become critical with personal agent AI workflows launched natively from the edge (the individuals own smartphone).

Final Thoughts

This is not yet a public SDK. It’s a working lab report. But the implications are clear: You don’t need cloud GPUs to run next-gen AI. You just need focus, a good runtime, and a willingness to test the edges.

We’ll keep pushing on those edges, and share what we learn along the way. The future of personal AI is on-device, private, and closer than you think.

Stay tuned.

Gemma 3n on iOS ~ Credit Alice.A

EdgeAI PersonalAI

Comments

2 responses to “DataSapien Lab Report: Running Google Gemma 3n on iOS ”

SDKs: A Tale of Two Futures – From Data Extraction to Data Empowerment – DataSapien

14th July 2025

[…] Data – personal data that never leaves the user’s phone – to create a multi-tiered personal intelligence. This intelligence reveals Private Personal Insights that help users make smarter decisions and […]
DataSapien Lab Report: What's the Best Local LLM? – DataSapien

30th January 2026

[…] model delivers strong instruction following and thoughtful outputs. Likewise, with Multimodal models that we have […]