How to Use an ONNX Model in React Native (and Other Mobile App Frameworks)

ONNX (Open Neural Network Exchange) is one of the most practical formats for deploying machine learning models on mobile devices. It allows you to train models once (PyTorch, TensorFlow, etc.) and run them efficiently across Android, iOS, and cross-platform frameworks like React Native and Flutter.

This article explains how ONNX inference works on mobile, with a hands-on focus on React Native, followed by patterns for other mobile frameworks.

Why ONNX Is a Good Fit for Mobile Apps

ONNX is widely adopted for mobile deployment because:

It is framework-independent (train anywhere, deploy anywhere)
It supports CPU and mobile accelerators (NNAPI, CoreML)
It avoids heavy runtime dependencies like Python
It works well with offline / on-device inference

For mobile apps that need privacy, low latency, or offline AI, ONNX is often the best choice.

Mobile ONNX Runtime Options (Overview)

Platform	Recommended Runtime
React Native	onnxruntime-react-native
Android (native)	ONNX Runtime Android (Kotlin / Java)
iOS (native)	ONNX Runtime iOS (Swift / Obj-C)
Flutter	Native ONNX Runtime via platform channels
Ionic / Capacitor	Native plugin wrapping ONNX Runtime

Pure JavaScript ONNX inference is generally not recommended for production mobile apps due to performance and memory constraints.

Using ONNX in React Native (Recommended Path)

1. Install ONNX Runtime for React Native

For the most reliable setup, use React Native CLI (not Expo Go).

yarn add onnxruntime-react-native
cd ios && pod install && cd ..

Expo users: ONNX Runtime requires native modules. You must use Expo prebuild and a custom dev client.

2. Add the ONNX Model to Your App

Common approaches:

Bundle the model inside the app (assets/models/model.onnx)
Download the model on first launch and cache it locally

Bundling is simpler and recommended for the first version.

3. Create an ONNX Inference Session

import * as ort from "onnxruntime-react-native";

export async function loadModel(modelPath: string) {
  const session = await ort.InferenceSession.create(modelPath, {
    executionProviders: ["cpu"],
  });
  return session;
}

This loads the model and prepares it for inference.

4. Run Inference

export async function runInference(
  session: ort.InferenceSession,
  inputName: string,
  data: Float32Array,
  dims: number[]
) {
  const tensor = new ort.Tensor("float32", data, dims);

  const feeds: Record<string, ort.Tensor> = {};
  feeds[inputName] = tensor;

  const results = await session.run(feeds);
  return results;
}

You must ensure:

inputName matches the ONNX model input
dims matches the expected input shape

Image Preprocessing (Most Common Pitfall)

Most vision ONNX models expect inputs like:

Shape: [1, 3, H, W] (NCHW) or [1, H, W, 3] (NHWC)
Pixel range: 0–1 or normalized with mean/std
Color format: RGB (sometimes BGR)

Typical preprocessing steps

Load image from camera or gallery
Resize to model input size (e.g. 224×224)
Convert pixels to Float32Array
Reorder channels if needed
Normalize values

In React Native, pixel extraction and resizing are often done using native helpers or image-processing libraries. If you control model training, exporting a model with simpler preprocessing can significantly reduce app complexity.

Inspecting Model Inputs and Outputs

Before integrating the model, inspect it locally:

import onnx
model = onnx.load("model.onnx")
print([i.name for i in model.graph.input])
print([o.name for o in model.graph.output])

Hardcode or store these names in a config file used by the app.

Postprocessing Patterns

Classification

Output shape: [1, num_classes]
Apply softmax
Select top-k class

Object Detection

Parse bounding boxes, labels, confidence scores
Apply non-maximum suppression (NMS)
Map coordinates back to image space

Postprocessing is usually done in JavaScript, but heavy logic may be moved to native code if performance becomes an issue.

Performance Tips for Mobile ONNX

Quantize models (INT8 or dynamic quantization)
Reduce input resolution when possible
Warm up the model on app launch
Avoid heavy JS loops for pixel processing
Enable accelerators if available:
- Android: NNAPI
- iOS: CoreML

Well-optimized ONNX models can run in real time on mid-range mobile devices.

Benefits of Running a Local LLM Directly on the Device

Using ONNX is not limited to vision or classical ML models. Increasingly, teams are deploying small and optimized Large Language Models (LLMs) directly on mobile devices for tasks such as summarization, chat assistants, form autofill, OCR post-processing, and decision support.

Running a local LLM on-device provides several important benefits:

1. Privacy by Design

All inference happens entirely on the user’s device:

No prompts or user data are sent to external servers
Ideal for healthcare, government, finance, and enterprise apps
Simplifies compliance with data protection laws (PDPA, GDPR, HIPAA)

This is one of the strongest reasons governments and regulated industries prefer on-device LLMs.

2. Offline and Low-Connectivity Operation

Local LLMs continue to work even when:

The device has no internet connection
Network latency is high or unreliable
Users are in remote or restricted environments

This makes local LLMs suitable for:

Field inspection apps
Smart farming tools
Emergency or disaster-response systems

3. Predictable Latency and UX

Because inference runs locally:

Response time is consistent and predictable
No network round-trip delays
UI interactions feel more responsive

For mobile UX, predictable latency is often more important than raw model size or accuracy.

4. Cost Control and Scalability

With on-device LLMs:

No per-request API fees
No cloud GPU inference cost
No need to scale backend infrastructure for AI traffic

Costs scale with number of devices, not number of prompts—an important advantage for consumer and enterprise mobile apps.

5. Better Integration with On-Device Context

Local LLMs can directly integrate with:

Device sensors
Camera results
OCR output
Local databases
App state and user behavior

This enables more context-aware and reactive AI features without exposing sensitive device data externally.

6. Gradual Hybrid AI Architecture

A common production pattern is:

Use local LLM for fast, private, everyday tasks
Fall back to cloud LLM only for complex or rare cases

This hybrid approach balances:

Privacy
Cost
Capability

ONNX-based deployment makes this strategy easier to evolve over time.

7. Practical Local LLM Use Cases on Mobile

Examples of what teams already deploy:

Offline chatbot for internal enterprise apps
Smart form filling and validation
Voice transcription post-processing
Knowledge base Q&A embedded in the app
Rule explanation and decision reasoning

These workloads typically use small, quantized LLMs optimized for mobile CPUs.

Sample Local LLM Use Cases in Real Mobile Applications

Below are practical, already-deployed use cases where running a local LLM on-device (via ONNX Runtime or similar) provides clear advantages.

1. Government & GovTech Mobile Apps

Use cases:

Explaining government form fields in plain language
Assisting citizens to complete applications offline
Summarizing regulations or policy documents stored locally
Decision explanation for eligibility checks

Why local LLM:

Sensitive citizen data never leaves the device
Works in rural areas with poor connectivity
Aligns with privacy-by-default government policies

2. Healthcare & Medical Field Apps

Use cases:

Offline medical note summarization for doctors
Patient instruction explanation in simple language
OCR post-processing of prescriptions or lab reports
Clinical checklist assistance

Why local LLM:

No transmission of personal health information (PHI)
Lower regulatory risk (HIPAA / PDPA / GDPR)
Instant response during clinical workflows

3. Smart Farming & Field Inspection Apps

Use cases:

Explaining plant disease detection results
Generating treatment suggestions from local rule sets
Summarizing sensor readings into actionable advice
Voice-to-text notes for farm inspections

Why local LLM:

Farms often lack stable internet
Combines vision model output + LLM reasoning
Reduces cloud dependency and operational cost

4. Enterprise Internal Tools

Use cases:

Offline chatbot for SOPs and manuals
Incident report summarization
Smart form validation and autofill
Explaining system alerts and logs

Why local LLM:

Company data never leaves employee devices
Predictable cost for large internal deployments
Faster UX for routine operations

5. Manufacturing & Industrial Apps

Use cases:

Explaining machine alarms and fault codes
Maintenance checklist guidance
Shift handover summarization
Root-cause analysis suggestions (rule + LLM)

Why local LLM:

Works on factory floors with restricted networks
Reduces dependency on central IT systems
Improves operator understanding and safety

6. Education & Training Apps

Use cases:

Offline tutoring and Q&A
Explaining textbook content in simpler language
Interactive quizzes with natural-language feedback
Language learning assistants

Why local LLM:

No student data sent to cloud
Works in low-connectivity regions
Lower long-term cost for large user bases

7. Consumer Productivity Apps

Use cases:

Email and note summarization
Smart to-do list generation
Voice memo structuring
Personal knowledge assistant

Why local LLM:

Personal data stays private
Instant responses
No subscription or API usage cost

Important Constraints to Understand

Local LLMs are powerful, but not free:

Models must be small (often 1–4B parameters or less)
Quantization is usually required (INT8 / INT4)
Token throughput is lower than cloud GPUs

Designing the right task scope is critical for success.

Other Mobile Frameworks

Flutter

Recommended approach:

Use platform channels
Run ONNX Runtime natively on Android/iOS
Send tensors and results back to Dart

This gives better long-term stability than pure Dart ML runtimes.

Native iOS / Android

Native integration provides:

Best performance
Easier camera and image buffer access
Direct integration with system ML accelerators

Ionic / Capacitor

Build a native plugin wrapping ONNX Runtime
Expose inference APIs to JavaScript
Use web UI + native ML engine architecture

Suggested Project Structure

model/
 ├─ model.onnx
 ├─ labels.json
 └─ model.config.json
src/
 ├─ ml/
 │   ├─ session.ts
 │   ├─ preprocess.ts
 │   └─ postprocess.ts
 └─ screens/
     └─ CameraInferenceScreen.tsx

This separation keeps ML logic clean and testable.

Common Errors Checklist

Input shape mismatch (NCHW vs NHWC)
Incorrect normalization
RGB / BGR mismatch
Wrong input or output tensor name
Model file not accessible at runtime

Most mobile ONNX issues come from preprocessing mismatches, not the runtime itself.

Conclusion

ONNX enables reliable, offline-capable AI on mobile apps. For cross-platform teams, React Native + ONNX Runtime offers a strong balance between performance and developer productivity. With proper preprocessing and model optimization, ONNX models can run smoothly even on consumer-grade devices.

If you know your model type and input format, this setup can be adapted into a production-ready mobile AI pipeline with minimal changes.