On-device ML in Flutter, with zero native setup

On-device inference used to mean wrestling with TensorFlow Lite, platform channels, and a different integration story on every OS. I got tired of it, so I built inference, a Flutter package that runs models through Rust engines (Candle, Linfa) with no native setup on your side.

Why keep the model on-device

Privacy, the data never leaves the phone.
Latency, no round trip, predictions in milliseconds.
Offline, it works on a train, on a plane, in a tunnel.

Install it

$ flutter pub add inference

That's the whole setup. The Rust engine ships precompiled inside the package, there's no Cargo step, no Xcode fiddling.

Load a model and predict

lib/classify.dart

final model = await Inference.load('assets/mnist.onnx');

final output = await model.run(pixels);
final digit = output.argMax();

note Inference runs on a background isolate, so a heavy model won't drop frames. Show a small spinner for the first load while weights are read into memory.

Where it shines, and where it doesn't

This is great for classifiers, embeddings, small vision and audio models, the things that fit comfortably on a phone. It is not the move for a 7B-parameter LLM; that still wants a server. Pick on-device when the model is small and the data is sensitive.

[ screenshot: a Flutter demo classifying a hand-drawn digit ]
replace with a recording or screenshot of your app

A tiny MNIST classifier running entirely on the device, no network.

The best ML feature is the one your users never notice is ML, instant, private, and just there.

Wrapping up

If you've been putting off on-device ML because the setup looked grim, that's exactly the friction inference exists to remove. Try it on a small model and see how far it gets you.

Why keep the model on-device

Install it

Load a model and predict

Where it shines, and where it doesn't

Wrapping up

References