On-Device ML: Tradeoffs and Decisions
Privacy-first AI isn't just a feature. It's an engineering philosophy.
Why On-Device?
When we decided to build Nix, we had a fundamental choice: cloud-based inference or on-device inference. Cloud would be easier. More powerful models. Simpler deployment. But we chose on-device, and here's why.
Privacy as Architecture
Privacy policies can change. Companies can be acquired. Data can be breached. The only truly private data is data that doesn't leave your device.
For a notification manager, this is critical. Your notifications contain some of the most sensitive information about your life: who you talk to, what you buy, where you go, what you're worried about.
We didn't want that data. We designed Nix so we couldn't have it even if we wanted to.
Latency Matters
Notification prioritization needs to happen in real-time. When a notification arrives, users expect it immediately (if important) or not at all (if not). Round-trip latency to a cloud server would introduce unacceptable delays.
On-device inference gives us sub-50ms latency. The user never waits.
Offline Capability
Phones go offline. Planes, subways, rural areas. A notification manager that only works with connectivity isn't a notification manager—it's a notification suggester.
The Tradeoffs
On-device ML isn't free. Here's what we gave up:
Model Size
Cloud models can be arbitrarily large. On-device models must fit in memory alongside everything else. Our model is 12MB—larger would impact app size and performance.
This means we can't use the largest, most capable models. We compensate with:
Compute Budget
Mobile devices have limited compute, especially when you need to preserve battery. We budget 50ms and 2% battery impact per day for Nix's ML inference.
This constraint forced us to be creative:
Model Updates
Cloud models can be updated instantly. On-device models require app updates. This means:
We mitigate this with:
What We Learned
After shipping Nix's on-device ML, a few lessons stand out:
Constraints breed creativity
The limitations of on-device ML forced us to think harder about what features actually matter. We couldn't throw compute at the problem—we had to understand it deeply.
Privacy is a feature
Users notice when you don't ask for their data. "All processing happens on your device" isn't just a technical detail—it's a trust signal.
Performance is table stakes
If on-device inference was slow, users would blame the feature, not the architecture. We spent as much time on optimization as on modeling.
Conclusion
On-device ML isn't right for every problem. But for privacy-sensitive applications with real-time requirements, it's not just viable—it's superior.
The key is treating constraints as design requirements, not obstacles. When you can't have more compute, you need better algorithms. When you can't have more data, you need better features. When you can't update instantly, you need better testing.
These constraints made Nix better. They can make your products better too.