Back to Research
Research·12 min read

On-Device ML: Tradeoffs and Decisions

Privacy-first AI isn't just a feature. It's an engineering philosophy.

Why On-Device?

When we decided to build Nix, we had a fundamental choice: cloud-based inference or on-device inference. Cloud would be easier. More powerful models. Simpler deployment. But we chose on-device, and here's why.

Privacy as Architecture

Privacy policies can change. Companies can be acquired. Data can be breached. The only truly private data is data that doesn't leave your device.

For a notification manager, this is critical. Your notifications contain some of the most sensitive information about your life: who you talk to, what you buy, where you go, what you're worried about.

We didn't want that data. We designed Nix so we couldn't have it even if we wanted to.

Latency Matters

Notification prioritization needs to happen in real-time. When a notification arrives, users expect it immediately (if important) or not at all (if not). Round-trip latency to a cloud server would introduce unacceptable delays.

On-device inference gives us sub-50ms latency. The user never waits.

Offline Capability

Phones go offline. Planes, subways, rural areas. A notification manager that only works with connectivity isn't a notification manager—it's a notification suggester.

The Tradeoffs

On-device ML isn't free. Here's what we gave up:

Model Size

Cloud models can be arbitrarily large. On-device models must fit in memory alongside everything else. Our model is 12MB—larger would impact app size and performance.

This means we can't use the largest, most capable models. We compensate with:

  • Careful feature engineering
  • Model distillation from larger teachers
  • Task-specific architecture optimization
  • Compute Budget

    Mobile devices have limited compute, especially when you need to preserve battery. We budget 50ms and 2% battery impact per day for Nix's ML inference.

    This constraint forced us to be creative:

  • Quantized INT8 inference
  • Pruned attention mechanisms
  • Cached intermediate representations
  • Batched inference where possible
  • Model Updates

    Cloud models can be updated instantly. On-device models require app updates. This means:

  • More careful testing before deployment
  • Longer iteration cycles
  • Version management complexity
  • We mitigate this with:

  • Modular model components that can be updated independently
  • A/B testing infrastructure for gradual rollouts
  • Careful monitoring of model performance in production
  • What We Learned

    After shipping Nix's on-device ML, a few lessons stand out:

    Constraints breed creativity

    The limitations of on-device ML forced us to think harder about what features actually matter. We couldn't throw compute at the problem—we had to understand it deeply.

    Privacy is a feature

    Users notice when you don't ask for their data. "All processing happens on your device" isn't just a technical detail—it's a trust signal.

    Performance is table stakes

    If on-device inference was slow, users would blame the feature, not the architecture. We spent as much time on optimization as on modeling.

    Conclusion

    On-device ML isn't right for every problem. But for privacy-sensitive applications with real-time requirements, it's not just viable—it's superior.

    The key is treating constraints as design requirements, not obstacles. When you can't have more compute, you need better algorithms. When you can't have more data, you need better features. When you can't update instantly, you need better testing.

    These constraints made Nix better. They can make your products better too.

    Share this article

    Read more articles