Training YOLO with a Custom Dataset: A Step-by-Step Guide

Object detection has become an essential technology in various industries, including security, automation, and robotics. YOLO (You Only Look Once) is one of the most popular real-time object detection models due to its speed and accuracy. In this blog post, we will walk you through training YOLO with your custom dataset, making it ready for real-world applications.

Step 1: Install Dependencies

To begin, install the necessary dependencies. The latest versions of YOLOv5 or YOLOv8 make training simpler and more efficient.

# Clone the YOLOv5 repository
git clone https://github.com/ultralytics/yolov5.git
cd yolov5

# Install required packages
pip install -r requirements.txt

For YOLOv8, you can install the Ultralytics package directly:

pip install ultralytics

Step 2: Prepare Your Dataset

YOLO requires data in a specific format, where each image has an associated annotation file in the YOLO format:

<class_id> <x_center> <y_center> <width> <height>

All values are normalized between 0 and 1. Below is the correct dataset folder structure:

/dataset
  ├── images
  │   ├── train
  │   │   ├── img1.jpg
  │   │   ├── img2.jpg
  │   ├── val
  │       ├── img3.jpg
  │       ├── img4.jpg
  ├── labels
  │   ├── train
  │   │   ├── img1.txt
  │   │   ├── img2.txt
  │   ├── val
  │       ├── img3.txt
  │       ├── img4.txt
  ├── data.yaml

Creating the `data.yaml` File

This file defines the dataset structure and class names:

train: /path/to/dataset/images/train
val: /path/to/dataset/images/val

nc: 2  # Number of object classes
names: ['person', 'car']  # Object class names

Step 3: Train the Model

To train YOLOv5, run the following command:

python train.py --img 640 --batch 16 --epochs 50 --data dataset/data.yaml --weights yolov5s.pt --cache

For YOLOv8, use:

yolo train model=yolov8n.pt data=dataset/data.yaml epochs=50 imgsz=640

Step 4: Monitor Training Progress

YOLO logs various performance metrics during training. If using YOLOv5, results will be stored in runs/train/exp/. You can visualize training performance using TensorBoard:

tensorboard --logdir=runs/train

Step 5: Evaluate and Test the Model

Once training is complete, test the model on new images:

python detect.py --weights runs/train/exp/weights/best.pt --img 640 --source test_images/

For YOLOv8:

yolo detect model=runs/train/exp/weights/best.pt source=test_images/

Step 6: Export for Deployment

YOLO models can be exported to multiple formats for deployment:

python export.py --weights runs/train/exp/weights/best.pt --include onnx torchscript

For YOLOv8:

yolo export model=runs/train/exp/weights/best.pt format=onnx

Final Thoughts

Training YOLO with a custom dataset enables real-world object detection for applications such as security, traffic monitoring, and automation. By following this step-by-step guide, you can prepare, train, and deploy your YOLO model effectively.

Would you like help automating the dataset preparation or optimizing training settings? Let us know in the comments!