How to Use an ONNX Model in React Native (and Other Mobile App Frameworks)
ONNX (Open Neural Network Exchange) is one of the most practical formats for deploying machine learning models on mobile devices. It allows you to train models once (PyTorch, TensorFlow, etc.) and run them efficiently across Android, iOS, and cross-platform frameworks like React Native and Flutter.
This article explains how ONNX inference works on mobile, with a hands-on focus on React Native, followed by patterns for other mobile frameworks.
Why ONNX Is a Good Fit for Mobile Apps
ONNX is widely adopted for mobile deployment because:
- It is framework-independent (train anywhere, deploy anywhere)
- It supports CPU and mobile accelerators (NNAPI, CoreML)
- It avoids heavy runtime dependencies like Python
- It works well with offline / on-device inference
For mobile apps that need privacy, low latency, or offline AI, ONNX is often the best choice.
Mobile ONNX Runtime Options (Overview)
| Platform | Recommended Runtime |
|---|---|
| React Native | onnxruntime-react-native |
| Android (native) | ONNX Runtime Android (Kotlin / Java) |
| iOS (native) | ONNX Runtime iOS (Swift / Obj-C) |
| Flutter | Native ONNX Runtime via platform channels |
| Ionic / Capacitor | Native plugin wrapping ONNX Runtime |
Pure JavaScript ONNX inference is generally not recommended for production mobile apps due to performance and memory constraints.
Using ONNX in React Native (Recommended Path)
1. Install ONNX Runtime for React Native
For the most reliable setup, use React Native CLI (not Expo Go).
yarn add onnxruntime-react-native
cd ios && pod install && cd ..
Expo users: ONNX Runtime requires native modules. You must use Expo prebuild and a custom dev client.
2. Add the ONNX Model to Your App
Common approaches:
- Bundle the model inside the app (
assets/models/model.onnx) - Download the model on first launch and cache it locally
Bundling is simpler and recommended for the first version.
3. Create an ONNX Inference Session
import * as ort from "onnxruntime-react-native";
export async function loadModel(modelPath: string) {
const session = await ort.InferenceSession.create(modelPath, {
executionProviders: ["cpu"],
});
return session;
}
This loads the model and prepares it for inference.
4. Run Inference
export async function runInference(
session: ort.InferenceSession,
inputName: string,
data: Float32Array,
dims: number[]
) {
const tensor = new ort.Tensor("float32", data, dims);
const feeds: Record<string, ort.Tensor> = {};
feeds[inputName] = tensor;
const results = await session.run(feeds);
return results;
}
You must ensure:
inputNamematches the ONNX model inputdimsmatches the expected input shape
Image Preprocessing (Most Common Pitfall)
Most vision ONNX models expect inputs like:
- Shape:
[1, 3, H, W](NCHW) or[1, H, W, 3](NHWC) - Pixel range:
0–1or normalized with mean/std - Color format: RGB (sometimes BGR)
Typical preprocessing steps
- Load image from camera or gallery
- Resize to model input size (e.g. 224×224)
- Convert pixels to Float32Array
- Reorder channels if needed
- Normalize values
In React Native, pixel extraction and resizing are often done using native helpers or image-processing libraries. If you control model training, exporting a model with simpler preprocessing can significantly reduce app complexity.
Inspecting Model Inputs and Outputs
Before integrating the model, inspect it locally:
import onnx
model = onnx.load("model.onnx")
print([i.name for i in model.graph.input])
print([o.name for o in model.graph.output])
Hardcode or store these names in a config file used by the app.
Postprocessing Patterns
Classification
- Output shape:
[1, num_classes] - Apply softmax
- Select top-k class
Object Detection
- Parse bounding boxes, labels, confidence scores
- Apply non-maximum suppression (NMS)
- Map coordinates back to image space
Postprocessing is usually done in JavaScript, but heavy logic may be moved to native code if performance becomes an issue.
Performance Tips for Mobile ONNX
- Quantize models (INT8 or dynamic quantization)
- Reduce input resolution when possible
- Warm up the model on app launch
- Avoid heavy JS loops for pixel processing
-
Enable accelerators if available:
- Android: NNAPI
- iOS: CoreML
Well-optimized ONNX models can run in real time on mid-range mobile devices.
Benefits of Running a Local LLM Directly on the Device
Using ONNX is not limited to vision or classical ML models. Increasingly, teams are deploying small and optimized Large Language Models (LLMs) directly on mobile devices for tasks such as summarization, chat assistants, form autofill, OCR post-processing, and decision support.
Running a local LLM on-device provides several important benefits:
1. Privacy by Design
All inference happens entirely on the user’s device:
- No prompts or user data are sent to external servers
- Ideal for healthcare, government, finance, and enterprise apps
- Simplifies compliance with data protection laws (PDPA, GDPR, HIPAA)
This is one of the strongest reasons governments and regulated industries prefer on-device LLMs.
2. Offline and Low-Connectivity Operation
Local LLMs continue to work even when:
- The device has no internet connection
- Network latency is high or unreliable
- Users are in remote or restricted environments
This makes local LLMs suitable for:
- Field inspection apps
- Smart farming tools
- Emergency or disaster-response systems
3. Predictable Latency and UX
Because inference runs locally:
- Response time is consistent and predictable
- No network round-trip delays
- UI interactions feel more responsive
For mobile UX, predictable latency is often more important than raw model size or accuracy.
4. Cost Control and Scalability
With on-device LLMs:
- No per-request API fees
- No cloud GPU inference cost
- No need to scale backend infrastructure for AI traffic
Costs scale with number of devices, not number of prompts—an important advantage for consumer and enterprise mobile apps.
5. Better Integration with On-Device Context
Local LLMs can directly integrate with:
- Device sensors
- Camera results
- OCR output
- Local databases
- App state and user behavior
This enables more context-aware and reactive AI features without exposing sensitive device data externally.
6. Gradual Hybrid AI Architecture
A common production pattern is:
- Use local LLM for fast, private, everyday tasks
- Fall back to cloud LLM only for complex or rare cases
This hybrid approach balances:
- Privacy
- Cost
- Capability
ONNX-based deployment makes this strategy easier to evolve over time.
7. Practical Local LLM Use Cases on Mobile
Examples of what teams already deploy:
- Offline chatbot for internal enterprise apps
- Smart form filling and validation
- Voice transcription post-processing
- Knowledge base Q&A embedded in the app
- Rule explanation and decision reasoning
These workloads typically use small, quantized LLMs optimized for mobile CPUs.
Sample Local LLM Use Cases in Real Mobile Applications
Below are practical, already-deployed use cases where running a local LLM on-device (via ONNX Runtime or similar) provides clear advantages.
1. Government & GovTech Mobile Apps
Use cases:
- Explaining government form fields in plain language
- Assisting citizens to complete applications offline
- Summarizing regulations or policy documents stored locally
- Decision explanation for eligibility checks
Why local LLM:
- Sensitive citizen data never leaves the device
- Works in rural areas with poor connectivity
- Aligns with privacy-by-default government policies
2. Healthcare & Medical Field Apps
Use cases:
- Offline medical note summarization for doctors
- Patient instruction explanation in simple language
- OCR post-processing of prescriptions or lab reports
- Clinical checklist assistance
Why local LLM:
- No transmission of personal health information (PHI)
- Lower regulatory risk (HIPAA / PDPA / GDPR)
- Instant response during clinical workflows
3. Smart Farming & Field Inspection Apps
Use cases:
- Explaining plant disease detection results
- Generating treatment suggestions from local rule sets
- Summarizing sensor readings into actionable advice
- Voice-to-text notes for farm inspections
Why local LLM:
- Farms often lack stable internet
- Combines vision model output + LLM reasoning
- Reduces cloud dependency and operational cost
4. Enterprise Internal Tools
Use cases:
- Offline chatbot for SOPs and manuals
- Incident report summarization
- Smart form validation and autofill
- Explaining system alerts and logs
Why local LLM:
- Company data never leaves employee devices
- Predictable cost for large internal deployments
- Faster UX for routine operations
5. Manufacturing & Industrial Apps
Use cases:
- Explaining machine alarms and fault codes
- Maintenance checklist guidance
- Shift handover summarization
- Root-cause analysis suggestions (rule + LLM)
Why local LLM:
- Works on factory floors with restricted networks
- Reduces dependency on central IT systems
- Improves operator understanding and safety
6. Education & Training Apps
Use cases:
- Offline tutoring and Q&A
- Explaining textbook content in simpler language
- Interactive quizzes with natural-language feedback
- Language learning assistants
Why local LLM:
- No student data sent to cloud
- Works in low-connectivity regions
- Lower long-term cost for large user bases
7. Consumer Productivity Apps
Use cases:
- Email and note summarization
- Smart to-do list generation
- Voice memo structuring
- Personal knowledge assistant
Why local LLM:
- Personal data stays private
- Instant responses
- No subscription or API usage cost
Important Constraints to Understand
Local LLMs are powerful, but not free:
- Models must be small (often 1–4B parameters or less)
- Quantization is usually required (INT8 / INT4)
- Token throughput is lower than cloud GPUs
Designing the right task scope is critical for success.
Other Mobile Frameworks
Flutter
Recommended approach:
- Use platform channels
- Run ONNX Runtime natively on Android/iOS
- Send tensors and results back to Dart
This gives better long-term stability than pure Dart ML runtimes.
Native iOS / Android
Native integration provides:
- Best performance
- Easier camera and image buffer access
- Direct integration with system ML accelerators
Ionic / Capacitor
- Build a native plugin wrapping ONNX Runtime
- Expose inference APIs to JavaScript
- Use web UI + native ML engine architecture
Suggested Project Structure
model/
├─ model.onnx
├─ labels.json
└─ model.config.json
src/
├─ ml/
│ ├─ session.ts
│ ├─ preprocess.ts
│ └─ postprocess.ts
└─ screens/
└─ CameraInferenceScreen.tsx
This separation keeps ML logic clean and testable.
Common Errors Checklist
- Input shape mismatch (NCHW vs NHWC)
- Incorrect normalization
- RGB / BGR mismatch
- Wrong input or output tensor name
- Model file not accessible at runtime
Most mobile ONNX issues come from preprocessing mismatches, not the runtime itself.
Conclusion
ONNX enables reliable, offline-capable AI on mobile apps. For cross-platform teams, React Native + ONNX Runtime offers a strong balance between performance and developer productivity. With proper preprocessing and model optimization, ONNX models can run smoothly even on consumer-grade devices.
If you know your model type and input format, this setup can be adapted into a production-ready mobile AI pipeline with minimal changes.
Get in Touch with us
Related Posts
- AI 如何帮助发现金融机会
- How AI Helps Predict Financial Opportunities
- 在 React Native 与移动应用中使用 ONNX 模型的方法
- 叶片病害检测算法如何工作:从相机到决策
- How Leaf Disease Detection Algorithms Work: From Camera to Decision
- Smart Farming Lite:不依赖传感器的实用型数字农业
- Smart Farming Lite: Practical Digital Agriculture Without Sensors
- 为什么定制化MES更适合中国工厂
- Why Custom-Made MES Wins Where Ready-Made Systems Fail
- How to Build a Thailand-Specific Election Simulation
- When AI Replaces Search: How Content Creators Survive (and Win)
- 面向中国市场的再生资源金属价格预测(不投机、重决策)
- How to Predict Metal Prices for Recycling Businesses (Without Becoming a Trader)
- Smart Durian Farming with Minimum Cost (Thailand)
- 谁动了我的奶酪?
- Who Moved My Cheese?
- 面向中国的定制化电商系统设计
- Designing Tailored E-Commerce Systems
- AI 反模式:AI 如何“毁掉”系统
- Anti‑Patterns Where AI Breaks Systems













