Understanding Training, Validation, and Testing in Machine Learning
A Complete Guide to How Models Learn, Improve, and Get Evaluated
When learning machine learning or deep learning, one of the most important foundations is understanding the three phases of model development:
- Training
- Validation
- Testing
These three phases ensure that a model not only learns patterns, but also generalizes and performs well in the real world.
This article explains each phase clearly and shows you how they fit together in a complete, automated workflow.
🔥 Part 1 — Training: Where the Model Learns
The training phase is where the neural network actually learns from data.
During training:
- The model performs a forward pass to make predictions
- Loss is computed
- Backpropagation calculates gradients
- The optimizer updates weights
- Repeat for many epochs
✔ Purpose of Training
- Learn patterns
- Adjust model parameters
- Reduce training loss
✔ PyTorch Example
model.train()
for x, y in train_loader:
optimizer.zero_grad()
preds = model(x)
loss = criterion(preds, y)
loss.backward()
optimizer.step()
This is the “learning loop” that makes the model smarter.
🔍 Part 2 — Validation: Where We Tune and Select the Best Model
The validation phase is NOT training.
The model only runs forward to estimate how well it generalizes.
During validation:
- No gradients
- No learning
- No weight updates
- Only evaluation
Validation tells us:
- Is the model overfitting?
- Should we adjust hyperparameters?
- Which epoch produced the best model?
✔ Validation in PyTorch
model.eval()
with torch.no_grad():
for x, y in val_loader:
preds = model(x)
val_loss += criterion(preds, y).item()
✔ Why validation matters
- Prevents overfitting
- Helps tune learning rate, architecture, dropout
- Allows us to save the best model checkpoint
⭐ Part 3 — “Validate Every Epoch → Save Best Model”
After each epoch, we validate the model.
If this epoch’s validation loss is the best so far, we save a checkpoint.
Example:
| Epoch | Train Loss | Val Loss | Action |
|---|---|---|---|
| 1 | 0.50 | 0.42 | save |
| 2 | 0.40 | 0.36 | save |
| 3 | 0.32 | 0.30 | save |
| 4 | 0.28 | 0.35 | no save |
| 5 | 0.26 | 0.31 | no save |
The best model is from epoch 3, not the last epoch.
This prevents overfitting and ensures we select the best version.
⚙️ Part 4 — Early Stopping (Optional Optimization)
Early stopping automatically ends training when validation stops improving.
Example:
patience = 5
Meaning:
👉 If validation loss does not improve for 5 consecutive epochs → stop training.
This saves time and stops overfitting.
🧪 Part 5 — Testing: The Final Exam
The test set is used only once, after selecting the best model.
Test set evaluates:
- Final accuracy
- Real-world performance
- Generalization ability
⚠️ Critical rule:
Do NOT tune model based on test results.
Otherwise test data becomes contaminated.
📊 Part 6 — Full Workflow (Mermaid.js Diagram)
Below is a complete, clean Mermaid.js diagram showing the whole process:
flowchart TD
A[Dataset] --> B[Split Train / Validation / Test]
B --> C[Training Loop<br>Forward + Backprop + Update]
C --> D[Validation Loop<br>No Gradient]
D --> E{Best Validation<br>Performance?}
E -->|Yes| F[Save Best Model Checkpoint]
E -->|No| G[Skip Saving]
F --> H[Continue Training]
G --> H[Continue Training]
H --> I{Training Finished<br>or Early Stopping?}
I -->|No| C
I -->|Yes| J[Load Best Checkpoint]
J --> K[Test Set Evaluation]
K --> L[Final Performance]
This diagram summarizes the entire machine learning lifecycle from data processing to final evaluation.
🧠 Part 7 — Summary
| Phase | Purpose | Learns? |
|---|---|---|
| Training | Learn weights & features | ✔ Yes |
| Validation | Tune hyperparameters, pick best model | ❌ No |
| Testing | Final unbiased evaluation | ❌ No |
The golden workflow:
Train → Validate → Save Best Model → Test Once
This ensures a model that learns well and generalizes well.
Get in Touch with us
Related Posts
- 中国版:基于 AI 的预测性维护——从传感器到预测模型的完整解析
- AI for Predictive Maintenance: From Sensors to Prediction Models
- 会计行业中的 AI 助手——能做什么,不能做什么
- AI Assistants for Accountants: What They Can and Cannot Do
- 为什么中小企业在 ERP 定制上花费过高?— 深度解析与解决方案
- Why SMEs Overpay for ERP Customization — And How to Prevent It
- 为什么我们打造 SimpliShop —— 为中国企业提供可扩展、可集成、可定制的电商系统
- Why SimpliShop Was Built — And How It Helps Businesses Grow Faster Worldwide
- Fine-Tuning 与 Prompt Engineering 有什么区别? —— 给中国企业的 AI 应用实战指南
- Fine-Tuning vs Prompt Engineering Explained
- 精准灌溉(Precision Irrigation)入门
- Introduction to Precision Irrigation
- 物联网传感器并不是智慧农业的核心——真正的挑战是“数据整合
- IoT Sensors Are Overrated — Data Integration Is the Real Challenge
- React / React Native 移动应用开发服务提案书(面向中国市场)
- Mobile App Development Using React & React Native
- 面向中国市场的 AI 垂直整合(AI Vertical Integration):帮助企业全面升级为高效率、数据驱动的智能组织
- AI Vertical Integration for Organizations
- 中国企业:2025 年 AI 落地的分步骤实用指南
- How Organizations Can Adopt AI Step-by-Step — Practical Guide for 2025













