Understanding Training, Validation, and Testing in Machine Learning
A Complete Guide to How Models Learn, Improve, and Get Evaluated
When learning machine learning or deep learning, one of the most important foundations is understanding the three phases of model development:
- Training
- Validation
- Testing
These three phases ensure that a model not only learns patterns, but also generalizes and performs well in the real world.
This article explains each phase clearly and shows you how they fit together in a complete, automated workflow.
🔥 Part 1 — Training: Where the Model Learns
The training phase is where the neural network actually learns from data.
During training:
- The model performs a forward pass to make predictions
- Loss is computed
- Backpropagation calculates gradients
- The optimizer updates weights
- Repeat for many epochs
✔ Purpose of Training
- Learn patterns
- Adjust model parameters
- Reduce training loss
✔ PyTorch Example
model.train()
for x, y in train_loader:
optimizer.zero_grad()
preds = model(x)
loss = criterion(preds, y)
loss.backward()
optimizer.step()
This is the “learning loop” that makes the model smarter.
🔍 Part 2 — Validation: Where We Tune and Select the Best Model
The validation phase is NOT training.
The model only runs forward to estimate how well it generalizes.
During validation:
- No gradients
- No learning
- No weight updates
- Only evaluation
Validation tells us:
- Is the model overfitting?
- Should we adjust hyperparameters?
- Which epoch produced the best model?
✔ Validation in PyTorch
model.eval()
with torch.no_grad():
for x, y in val_loader:
preds = model(x)
val_loss += criterion(preds, y).item()
✔ Why validation matters
- Prevents overfitting
- Helps tune learning rate, architecture, dropout
- Allows us to save the best model checkpoint
⭐ Part 3 — “Validate Every Epoch → Save Best Model”
After each epoch, we validate the model.
If this epoch’s validation loss is the best so far, we save a checkpoint.
Example:
| Epoch | Train Loss | Val Loss | Action |
|---|---|---|---|
| 1 | 0.50 | 0.42 | save |
| 2 | 0.40 | 0.36 | save |
| 3 | 0.32 | 0.30 | save |
| 4 | 0.28 | 0.35 | no save |
| 5 | 0.26 | 0.31 | no save |
The best model is from epoch 3, not the last epoch.
This prevents overfitting and ensures we select the best version.
⚙️ Part 4 — Early Stopping (Optional Optimization)
Early stopping automatically ends training when validation stops improving.
Example:
patience = 5
Meaning:
👉 If validation loss does not improve for 5 consecutive epochs → stop training.
This saves time and stops overfitting.
🧪 Part 5 — Testing: The Final Exam
The test set is used only once, after selecting the best model.
Test set evaluates:
- Final accuracy
- Real-world performance
- Generalization ability
⚠️ Critical rule:
Do NOT tune model based on test results.
Otherwise test data becomes contaminated.
📊 Part 6 — Full Workflow (Mermaid.js Diagram)
Below is a complete, clean Mermaid.js diagram showing the whole process:
flowchart TD
A[Dataset] --> B[Split Train / Validation / Test]
B --> C[Training Loop<br>Forward + Backprop + Update]
C --> D[Validation Loop<br>No Gradient]
D --> E{Best Validation<br>Performance?}
E -->|Yes| F[Save Best Model Checkpoint]
E -->|No| G[Skip Saving]
F --> H[Continue Training]
G --> H[Continue Training]
H --> I{Training Finished<br>or Early Stopping?}
I -->|No| C
I -->|Yes| J[Load Best Checkpoint]
J --> K[Test Set Evaluation]
K --> L[Final Performance]
This diagram summarizes the entire machine learning lifecycle from data processing to final evaluation.
🧠 Part 7 — Summary
| Phase | Purpose | Learns? |
|---|---|---|
| Training | Learn weights & features | ✔ Yes |
| Validation | Tune hyperparameters, pick best model | ❌ No |
| Testing | Final unbiased evaluation | ❌ No |
The golden workflow:
Train → Validate → Save Best Model → Test Once
This ensures a model that learns well and generalizes well.
Get in Touch with us
Related Posts
- 理解机器学习中的 Training、Validation、Testing
- 深入理解神经网络
- Understanding Neural Networks Deeply
- AI 商品真伪鉴定系统:为现代零售品牌打造的智能解决方案
- AI-Powered Product Authenticity Verification for Modern Retail Brands
- Timeless Wisdom: The Books That Teach You How to Think Like an Experimental Physicist
- SimpliBreakout: The Multi-Market Breakout and Trend Screener for Active Traders
- SimpliUni: The Smart Campus App That Simplifies University Life
- Building a Multi-Market Breakout Stock Screener in Python
- How Agentic AI and MCP Servers Work Together: The Next Step in Intelligent Automation
- DevOps in Django E-Commerce System with DRF and Docker
- How AI Can Solve Real Challenges in Agile Development
- Connecting TAK and Wazuh for Real-Time Threat Awareness
- Scaling Wazuh for Multi-Site Network Security Monitoring
- Why ERP Projects Fail — and How to Avoid It
- How to Build Strong Communities with Technology
- How AI Can Make Open Zoos More Fun, Smart, and Educational
- How to Choose the Right Recycling Factory for Industrial Scrap
- Understanding Modern Database Technologies — and How to Choose the Right One
- The Future Is at the Edge — Understanding Edge & Distributed Computing in 2025













