How to Use an ONNX Model in React Native (and Other Mobile App Frameworks)

ONNX (Open Neural Network Exchange) is one of the most practical formats for deploying machine learning models on mobile devices. It allows you to train models once (PyTorch, TensorFlow, etc.) and run them efficiently across Android, iOS, and cross-platform frameworks like React Native and Flutter.

This article explains how ONNX inference works on mobile, with a hands-on focus on React Native, followed by patterns for other mobile frameworks.


Why ONNX Is a Good Fit for Mobile Apps

ONNX is widely adopted for mobile deployment because:

  • It is framework-independent (train anywhere, deploy anywhere)
  • It supports CPU and mobile accelerators (NNAPI, CoreML)
  • It avoids heavy runtime dependencies like Python
  • It works well with offline / on-device inference

For mobile apps that need privacy, low latency, or offline AI, ONNX is often the best choice.


Mobile ONNX Runtime Options (Overview)

Platform Recommended Runtime
React Native onnxruntime-react-native
Android (native) ONNX Runtime Android (Kotlin / Java)
iOS (native) ONNX Runtime iOS (Swift / Obj-C)
Flutter Native ONNX Runtime via platform channels
Ionic / Capacitor Native plugin wrapping ONNX Runtime

Pure JavaScript ONNX inference is generally not recommended for production mobile apps due to performance and memory constraints.


Using ONNX in React Native (Recommended Path)

1. Install ONNX Runtime for React Native

For the most reliable setup, use React Native CLI (not Expo Go).

yarn add onnxruntime-react-native
cd ios && pod install && cd ..

Expo users: ONNX Runtime requires native modules. You must use Expo prebuild and a custom dev client.


2. Add the ONNX Model to Your App

Common approaches:

  • Bundle the model inside the app (assets/models/model.onnx)
  • Download the model on first launch and cache it locally

Bundling is simpler and recommended for the first version.


3. Create an ONNX Inference Session

import * as ort from "onnxruntime-react-native";

export async function loadModel(modelPath: string) {
  const session = await ort.InferenceSession.create(modelPath, {
    executionProviders: ["cpu"],
  });
  return session;
}

This loads the model and prepares it for inference.


4. Run Inference

export async function runInference(
  session: ort.InferenceSession,
  inputName: string,
  data: Float32Array,
  dims: number[]
) {
  const tensor = new ort.Tensor("float32", data, dims);

  const feeds: Record<string, ort.Tensor> = {};
  feeds[inputName] = tensor;

  const results = await session.run(feeds);
  return results;
}

You must ensure:

  • inputName matches the ONNX model input
  • dims matches the expected input shape

Image Preprocessing (Most Common Pitfall)

Most vision ONNX models expect inputs like:

  • Shape: [1, 3, H, W] (NCHW) or [1, H, W, 3] (NHWC)
  • Pixel range: 0–1 or normalized with mean/std
  • Color format: RGB (sometimes BGR)

Typical preprocessing steps

  1. Load image from camera or gallery
  2. Resize to model input size (e.g. 224×224)
  3. Convert pixels to Float32Array
  4. Reorder channels if needed
  5. Normalize values

In React Native, pixel extraction and resizing are often done using native helpers or image-processing libraries. If you control model training, exporting a model with simpler preprocessing can significantly reduce app complexity.


Inspecting Model Inputs and Outputs

Before integrating the model, inspect it locally:

import onnx
model = onnx.load("model.onnx")
print([i.name for i in model.graph.input])
print([o.name for o in model.graph.output])

Hardcode or store these names in a config file used by the app.


Postprocessing Patterns

Classification

  • Output shape: [1, num_classes]
  • Apply softmax
  • Select top-k class

Object Detection

  • Parse bounding boxes, labels, confidence scores
  • Apply non-maximum suppression (NMS)
  • Map coordinates back to image space

Postprocessing is usually done in JavaScript, but heavy logic may be moved to native code if performance becomes an issue.


Performance Tips for Mobile ONNX

  • Quantize models (INT8 or dynamic quantization)
  • Reduce input resolution when possible
  • Warm up the model on app launch
  • Avoid heavy JS loops for pixel processing
  • Enable accelerators if available:

    • Android: NNAPI
    • iOS: CoreML

Well-optimized ONNX models can run in real time on mid-range mobile devices.


Benefits of Running a Local LLM Directly on the Device

Using ONNX is not limited to vision or classical ML models. Increasingly, teams are deploying small and optimized Large Language Models (LLMs) directly on mobile devices for tasks such as summarization, chat assistants, form autofill, OCR post-processing, and decision support.

Running a local LLM on-device provides several important benefits:

1. Privacy by Design

All inference happens entirely on the user’s device:

  • No prompts or user data are sent to external servers
  • Ideal for healthcare, government, finance, and enterprise apps
  • Simplifies compliance with data protection laws (PDPA, GDPR, HIPAA)

This is one of the strongest reasons governments and regulated industries prefer on-device LLMs.


2. Offline and Low-Connectivity Operation

Local LLMs continue to work even when:

  • The device has no internet connection
  • Network latency is high or unreliable
  • Users are in remote or restricted environments

This makes local LLMs suitable for:

  • Field inspection apps
  • Smart farming tools
  • Emergency or disaster-response systems

3. Predictable Latency and UX

Because inference runs locally:

  • Response time is consistent and predictable
  • No network round-trip delays
  • UI interactions feel more responsive

For mobile UX, predictable latency is often more important than raw model size or accuracy.


4. Cost Control and Scalability

With on-device LLMs:

  • No per-request API fees
  • No cloud GPU inference cost
  • No need to scale backend infrastructure for AI traffic

Costs scale with number of devices, not number of prompts—an important advantage for consumer and enterprise mobile apps.


5. Better Integration with On-Device Context

Local LLMs can directly integrate with:

  • Device sensors
  • Camera results
  • OCR output
  • Local databases
  • App state and user behavior

This enables more context-aware and reactive AI features without exposing sensitive device data externally.


6. Gradual Hybrid AI Architecture

A common production pattern is:

  • Use local LLM for fast, private, everyday tasks
  • Fall back to cloud LLM only for complex or rare cases

This hybrid approach balances:

  • Privacy
  • Cost
  • Capability

ONNX-based deployment makes this strategy easier to evolve over time.


7. Practical Local LLM Use Cases on Mobile

Examples of what teams already deploy:

  • Offline chatbot for internal enterprise apps
  • Smart form filling and validation
  • Voice transcription post-processing
  • Knowledge base Q&A embedded in the app
  • Rule explanation and decision reasoning

These workloads typically use small, quantized LLMs optimized for mobile CPUs.


Sample Local LLM Use Cases in Real Mobile Applications

Below are practical, already-deployed use cases where running a local LLM on-device (via ONNX Runtime or similar) provides clear advantages.


1. Government & GovTech Mobile Apps

Use cases:

  • Explaining government form fields in plain language
  • Assisting citizens to complete applications offline
  • Summarizing regulations or policy documents stored locally
  • Decision explanation for eligibility checks

Why local LLM:

  • Sensitive citizen data never leaves the device
  • Works in rural areas with poor connectivity
  • Aligns with privacy-by-default government policies

2. Healthcare & Medical Field Apps

Use cases:

  • Offline medical note summarization for doctors
  • Patient instruction explanation in simple language
  • OCR post-processing of prescriptions or lab reports
  • Clinical checklist assistance

Why local LLM:

  • No transmission of personal health information (PHI)
  • Lower regulatory risk (HIPAA / PDPA / GDPR)
  • Instant response during clinical workflows

3. Smart Farming & Field Inspection Apps

Use cases:

  • Explaining plant disease detection results
  • Generating treatment suggestions from local rule sets
  • Summarizing sensor readings into actionable advice
  • Voice-to-text notes for farm inspections

Why local LLM:

  • Farms often lack stable internet
  • Combines vision model output + LLM reasoning
  • Reduces cloud dependency and operational cost

4. Enterprise Internal Tools

Use cases:

  • Offline chatbot for SOPs and manuals
  • Incident report summarization
  • Smart form validation and autofill
  • Explaining system alerts and logs

Why local LLM:

  • Company data never leaves employee devices
  • Predictable cost for large internal deployments
  • Faster UX for routine operations

5. Manufacturing & Industrial Apps

Use cases:

  • Explaining machine alarms and fault codes
  • Maintenance checklist guidance
  • Shift handover summarization
  • Root-cause analysis suggestions (rule + LLM)

Why local LLM:

  • Works on factory floors with restricted networks
  • Reduces dependency on central IT systems
  • Improves operator understanding and safety

6. Education & Training Apps

Use cases:

  • Offline tutoring and Q&A
  • Explaining textbook content in simpler language
  • Interactive quizzes with natural-language feedback
  • Language learning assistants

Why local LLM:

  • No student data sent to cloud
  • Works in low-connectivity regions
  • Lower long-term cost for large user bases

7. Consumer Productivity Apps

Use cases:

  • Email and note summarization
  • Smart to-do list generation
  • Voice memo structuring
  • Personal knowledge assistant

Why local LLM:

  • Personal data stays private
  • Instant responses
  • No subscription or API usage cost

Important Constraints to Understand

Local LLMs are powerful, but not free:

  • Models must be small (often 1–4B parameters or less)
  • Quantization is usually required (INT8 / INT4)
  • Token throughput is lower than cloud GPUs

Designing the right task scope is critical for success.


Other Mobile Frameworks

Flutter

Recommended approach:

  • Use platform channels
  • Run ONNX Runtime natively on Android/iOS
  • Send tensors and results back to Dart

This gives better long-term stability than pure Dart ML runtimes.

Native iOS / Android

Native integration provides:

  • Best performance
  • Easier camera and image buffer access
  • Direct integration with system ML accelerators

Ionic / Capacitor

  • Build a native plugin wrapping ONNX Runtime
  • Expose inference APIs to JavaScript
  • Use web UI + native ML engine architecture

Suggested Project Structure

model/
 ├─ model.onnx
 ├─ labels.json
 └─ model.config.json
src/
 ├─ ml/
 │   ├─ session.ts
 │   ├─ preprocess.ts
 │   └─ postprocess.ts
 └─ screens/
     └─ CameraInferenceScreen.tsx

This separation keeps ML logic clean and testable.


Common Errors Checklist

  • Input shape mismatch (NCHW vs NHWC)
  • Incorrect normalization
  • RGB / BGR mismatch
  • Wrong input or output tensor name
  • Model file not accessible at runtime

Most mobile ONNX issues come from preprocessing mismatches, not the runtime itself.


Conclusion

ONNX enables reliable, offline-capable AI on mobile apps. For cross-platform teams, React Native + ONNX Runtime offers a strong balance between performance and developer productivity. With proper preprocessing and model optimization, ONNX models can run smoothly even on consumer-grade devices.

If you know your model type and input format, this setup can be adapted into a production-ready mobile AI pipeline with minimal changes.


Get in Touch with us

Chat with Us on LINE

iiitum1984

Speak to Us or Whatsapp

(+66) 83001 0222

Related Posts

Our Products