On-Device ML: Tradeoffs and Decisions

Why On-Device?

When we decided to build Nix, we had a fundamental choice: cloud-based inference or on-device inference. Cloud would be easier. More powerful models. Simpler deployment. But we chose on-device, and here's why.

Privacy as Architecture

Privacy policies can change. Companies can be acquired. Data can be breached. The only truly private data is data that doesn't leave your device.

For a notification manager, this is critical. Your notifications contain some of the most sensitive information about your life: who you talk to, what you buy, where you go, what you're worried about.

We didn't want that data. We designed Nix so we couldn't have it even if we wanted to.

Latency Matters

Notification prioritization needs to happen in real-time. When a notification arrives, users expect it immediately (if important) or not at all (if not). Round-trip latency to a cloud server would introduce unacceptable delays.

On-device inference gives us sub-50ms latency. The user never waits.

Offline Capability

Phones go offline. Planes, subways, rural areas. A notification manager that only works with connectivity isn't a notification manager—it's a notification suggester.

The Tradeoffs

On-device ML isn't free. Here's what we gave up:

Model Size

Cloud models can be arbitrarily large. On-device models must fit in memory alongside everything else. Our model is 12MB—larger would impact app size and performance.

This means we can't use the largest, most capable models. We compensate with:

Careful feature engineering

Model distillation from larger teachers

Task-specific architecture optimization

Compute Budget

Mobile devices have limited compute, especially when you need to preserve battery. We budget 50ms and 2% battery impact per day for Nix's ML inference.

This constraint forced us to be creative:

Quantized INT8 inference

Pruned attention mechanisms

Cached intermediate representations

Batched inference where possible

Model Updates

Cloud models can be updated instantly. On-device models require app updates. This means:

More careful testing before deployment

Longer iteration cycles

Version management complexity

We mitigate this with:

Modular model components that can be updated independently

A/B testing infrastructure for gradual rollouts

Careful monitoring of model performance in production

What We Learned

After shipping Nix's on-device ML, a few lessons stand out:

Constraints breed creativity

The limitations of on-device ML forced us to think harder about what features actually matter. We couldn't throw compute at the problem—we had to understand it deeply.

Privacy is a feature

Users notice when you don't ask for their data. "All processing happens on your device" isn't just a technical detail—it's a trust signal.

Performance is table stakes

If on-device inference was slow, users would blame the feature, not the architecture. We spent as much time on optimization as on modeling.

Conclusion

On-device ML isn't right for every problem. But for privacy-sensitive applications with real-time requirements, it's not just viable—it's superior.

The key is treating constraints as design requirements, not obstacles. When you can't have more compute, you need better algorithms. When you can't have more data, you need better features. When you can't update instantly, you need better testing.

These constraints made Nix better. They can make your products better too.