Understanding Neural Networks Deeply
Why Edges Come Before Shapes, Why We Use Conv2d, and Why ReLU Must Follow Convolution
When beginners first learn about neural networks — especially convolutional neural networks (CNNs) — they often ask:
- Why do early layers detect edges, not shapes?
- Why do we use Conv2d so much?
- What does convolution actually mean?
- Why must we apply ReLU after every Conv2d layer?
- How does stacking layers let networks learn complex patterns?
This guide explains the answers deeply and intuitively, with examples and reasoning from math, engineering, and even neuroscience.
If you’ve ever wanted to truly understand how deep learning works “under the hood,” this article is for you.
🔥 Part 1 — Why Deeper Layers Learn Higher-Level Features
Neural networks learn in a hierarchy:
| Layer | What It Learns | Why |
|---|---|---|
| 1 | Edges | simplest patterns, most information-rich |
| 2 | Shapes | edges combine into corners, curves |
| 3 | Textures | shapes combine into repeated patterns |
| 4+ | Object parts | eyes, wheels, leaves |
| Final | Objects | cat, dog, car, etc. |
❗ But here’s the key:
We never program the network to do this.
There is no code like:
layer1.learn_edges()
layer2.learn_shapes()
Instead, the network has one objective:
\text{Minimize the loss}
Through backpropagation, each layer learns whatever features are most useful to reduce that loss.
Edges happen to be the simplest, strongest signals → so they appear first.
Shapes require edges → so they appear later.
This is called emergent hierarchical feature learning.
🔍 Part 2 — What Convolution Really Means
Convolution is the heart of image understanding.
✔ Simple explanation:
Convolution = sliding a small filter (like a 3×3 grid) over an image to detect patterns.
Example filter:
[ 1 0 -1 ]
[ 1 0 -1 ]
[ 1 0 -1 ]
This detects vertical edges.
What the convolution does:
- Multiply filter values with image pixels
- Sum the result
- Move one pixel over
- Repeat across the whole image
This process extracts:
- edges
- corners
- curves
- textures
- shapes
Deep layers stack these patterns into more complex concepts.
✔ Mathematical definition:
(I * K)(x,y) = \sum_{m,n} I(x+m, y+n)\cdot K(m,n)
But the intuition is enough: convolution is a pattern detector.
🟦 Part 3 — Why We Almost Always Use Conv2d for Images
nn.Conv2d is the best tool for image tasks because:
✔ 1. Images have spatial structure
Nearby pixels are related. Convolution respects locality.
✔ 2. Convolution shares weights
One small filter is reused across the whole image → fewer parameters → less overfitting.
✔ 3. Translation invariance
The filter detects the same feature anywhere in the image.
✔ 4. Efficiency
A fully connected layer on a 224×224×3 image would require over 150k weights per neuron.
Conv2d needs 9 weights.
✔ 5. Hierarchical feature extraction
Deep Conv2d layers naturally build up patterns:
Edges → Shapes → Textures → Object parts → Objects
This is why networks like AlexNet, VGG, ResNet, and MobileNet are based on convolution.
Even Vision Transformers (ViT) still use a convolution-like patch embedding at the input.
🔥 Part 4 — Understanding Conv2d Parameters
nn.Conv2d has several parameters, but the most important are:
nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding)
| Parameter | Meaning |
|---|---|
in_channels |
input depth (e.g., 1=grayscale, 3=RGB, 32=features) |
out_channels |
number of filters to learn |
kernel_size |
size of filter (3×3, 5×5, etc.) |
stride |
how far the filter moves each step |
padding |
add zeros around image to keep size same |
A common Conv layer:
nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1)
Means:
- RGB in
- 64 learned 3×3 filters out
This is the core building block of modern CNNs.
⚡ Part 5 — Why ReLU Must Come After Conv2d
After every Conv2d, we apply ReLU:
nn.Conv2d(...)
nn.ReLU()
But why?
✔ 1. Convolution is linear
Without nonlinear activation, stacking many Conv layers = one single linear function.
Such a model cannot learn:
- edges
- shapes
- textures
- classifications
✔ 2. ReLU introduces nonlinearity
ReLU(x) = \max(0, x)
This breaks linearity and lets layers combine features in complex ways.
✔ 3. ReLU prevents vanishing gradients
Unlike sigmoid/tanh, ReLU keeps gradient strong for positive values → faster training.
✔ 4. ReLU produces clean, activated features
Negative values disappear → edges and textures become sharp and meaningful.
This combination is simple but incredibly powerful:
Conv → ReLU → Conv → ReLU → Conv → ReLU → …
This is the basic structure of almost every successful CNN.
🧠 Part 6 — Putting It All Together: How Deep Networks Learn
A neural network learns by:
1 Forward pass
Compute predictions.
2 Loss function
Measure error.
3 Backpropagation
Compute gradients:
\frac{\partial L}{\partial W}
4 Weight update
Using optimizers like SGD or Adam:
W \leftarrow W - \eta ,\frac{\partial L}{\partial W}
5 Repeat for many layers
Each layer adjusts itself to help reduce loss.
This automatic process is what creates:
- edge filters
- shape detectors
- texture patterns
- object detectors
You never program these manually.
They emerge naturally because they help minimize the final loss.
🎯 Final Thoughts
Neural networks seem magical, but they work because of powerful ideas:
- Convolution extracts local patterns.
- ReLU introduces nonlinearity.
- Backpropagation trains filters automatically.
- Deep layers build hierarchical representations.
- Edges → shapes → textures → objects is a natural consequence of how networks process information.
Once you understand these principles, you understand the core of modern deep learning.
Get in Touch with us
Related Posts
- 中国版:基于 AI 的预测性维护——从传感器到预测模型的完整解析
- AI for Predictive Maintenance: From Sensors to Prediction Models
- 会计行业中的 AI 助手——能做什么,不能做什么
- AI Assistants for Accountants: What They Can and Cannot Do
- 为什么中小企业在 ERP 定制上花费过高?— 深度解析与解决方案
- Why SMEs Overpay for ERP Customization — And How to Prevent It
- 为什么我们打造 SimpliShop —— 为中国企业提供可扩展、可集成、可定制的电商系统
- Why SimpliShop Was Built — And How It Helps Businesses Grow Faster Worldwide
- Fine-Tuning 与 Prompt Engineering 有什么区别? —— 给中国企业的 AI 应用实战指南
- Fine-Tuning vs Prompt Engineering Explained
- 精准灌溉(Precision Irrigation)入门
- Introduction to Precision Irrigation
- 物联网传感器并不是智慧农业的核心——真正的挑战是“数据整合
- IoT Sensors Are Overrated — Data Integration Is the Real Challenge
- React / React Native 移动应用开发服务提案书(面向中国市场)
- Mobile App Development Using React & React Native
- 面向中国市场的 AI 垂直整合(AI Vertical Integration):帮助企业全面升级为高效率、数据驱动的智能组织
- AI Vertical Integration for Organizations
- 中国企业:2025 年 AI 落地的分步骤实用指南
- How Organizations Can Adopt AI Step-by-Step — Practical Guide for 2025













