On-device ML in Flutter, with zero native setup
Most "ML in Flutter" guides end at a cloud API call. This one keeps the model on the device, private, offline, and fast, using Rust inference engines under a plain Dart API.
On-device inference used to mean wrestling with TensorFlow Lite, platform channels, and a different integration story on every OS. I got tired of it, so I built inference, a Flutter package that runs models through Rust engines (Candle, Linfa) with no native setup on your side.
Why keep the model on-device
- Privacy, the data never leaves the phone.
- Latency, no round trip, predictions in milliseconds.
- Offline, it works on a train, on a plane, in a tunnel.
Install it
$ flutter pub add inference
That's the whole setup. The Rust engine ships precompiled inside the package, there's no Cargo step, no Xcode fiddling.
Load a model and predict
lib/classify.dart
final model = await Inference.load('assets/mnist.onnx');
final output = await model.run(pixels);
final digit = output.argMax();
Where it shines, and where it doesn't
This is great for classifiers, embeddings, small vision and audio models, the things that fit comfortably on a phone. It is not the move for a 7B-parameter LLM; that still wants a server. Pick on-device when the model is small and the data is sensitive.
replace with a recording or screenshot of your app
The best ML feature is the one your users never notice is ML, instant, private, and just there.
Wrapping up
If you've been putting off on-device ML because the setup looked grim, that's exactly the friction inference exists to remove. Try it on a small model and see how far it gets you.
References
- inference — the package on pub.dev.
- Candle — the Rust ML framework powering it.
- Source on GitHub.