LeNet-5: Ancestor of CNN architectures

Posted Sep 18, 2019 Updated Jun 13, 2025

By Abhijit More 4 min read

Introduction
Historical Context
Overview of LeNet-5 Architecture
Detailed Layer-wise Architecture
Design Choices and Rationale
Training Details
Key Innovations and Insights
Impact on the Deep Learning Field
Criticisms and Limitations
Conclusion

1. Introduction

LeNet-5, developed by Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner in 1998, was one of the first convolutional neural networks (CNNs). It was designed to recognize handwritten digits (0–9) from the MNIST dataset. Although LeNet-5 might seem primitive today, it laid the groundwork for nearly all modern CNN architectures.

It introduced ideas like convolutional layers, subsampling (pooling), parameter sharing, and local receptive fields, forming the core of today’s CNNs.

2. Historical Context

At the time LeNet was developed:

Deep learning was not popular.
Computers had limited processing power and memory.
Most recognition systems relied on handcrafted features.

LeNet-5 changed the game by demonstrating:

End-to-end learning directly from raw pixels to classification output, using gradient-based optimization.

The architecture was deployed in systems that processed millions of checks per day in banks across the US.

3. Overview of LeNet-5 Architecture

LeNet-5 is a 7-layer network, excluding the input, with the following types of layers:

Convolutional layers (C1, C3)
Subsampling layers (S2, S4)
Fully connected layers (C5, F6)
Output layer (10 units with softmax for digits 0–9)

The input is a 32×32 grayscale image (not 28×28 like standard MNIST). Padding is applied to retain spatial resolution during convolution.

4. Detailed Layer-wise Architecture

Layer	Type	Input Size	Filter/Units	Output Size	Description
Input	-	32×32×1	-	32×32×1	Padded input image
C1	Convolution	32×32×1	6 @ 5×5	28×28×6	Feature maps with local receptive fields
S2	Subsampling	28×28×6	2×2 avg pool	14×14×6	Downsampling with learned weights
C3	Convolution	14×14×6	16 @ 5×5	10×10×16	Not all 6 input maps are connected to all 16 outputs
S4	Subsampling	10×10×16	2×2 avg pool	5×5×16	Further dimensionality reduction
C5	Fully Connected	5×5×16	120 units	1×1×120	Flattened, fully connected
F6	Fully Connected	120	84 units	84	Classic MLP style
Output	Fully Connected	84	10 units	10	Final classification layer with softmax

5. Design Choices and Rationale

🧠 Local Receptive Fields

Each neuron in a conv layer only connects to a small region in the input.
Inspired by how neurons in the visual cortex process stimuli.

All neurons in a feature map share the same weights, reducing total parameters and improving generalization.

🧠 Subsampling (Pooling)

Subsampling layers reduce resolution while retaining important spatial information.
LeNet used average pooling with trainable coefficients, not just max pooling.

🧠 Selective Connectivity in C3

C3 doesn’t connect each input map to every output map.
This allowed the network to learn combinations of features and break symmetry without increasing parameters excessively.

🧠 Use of Fully Connected Layers

The deeper layers (C5, F6) are fully connected, making LeNet a hybrid between CNN and classic MLP.

6. Training Details

Dataset: MNIST and similar datasets with grayscale digits.
Input: 32×32 images (28×28 images are zero-padded).
Loss Function: Mean squared error (not cross-entropy)
Optimizer: Stochastic Gradient Descent (SGD)
Activation Function: Sigmoid or tanh (ReLU wasn’t common yet)
Hardware: Trained on CPUs (GPUs didn’t exist for ML yet)

7. Key Innovations and Insights

Feature	Contribution
✅ Convolution	Allowed feature extraction from raw pixels
✅ Pooling	Added translational invariance
✅ Weight sharing	Reduced number of parameters
✅ Selective connections	Introduced hierarchical feature composition
✅ End-to-end learning	Trained all layers via backpropagation
✅ Practical impact	Deployed in real-world bank check systems

LeNet-5 was the first architecture to show that neural networks can learn hierarchical visual features and outperform handcrafted methods on vision tasks.

8. Impact on the Deep Learning Field

Although it was largely ignored for a decade (due to hardware and data limitations), LeNet-5 became hugely influential:

Blueprint for CNNs like AlexNet, VGG, ResNet
Sparked modern computer vision revolution
Inspired autonomous learning directly from raw inputs
Foundation for deep learning in NLP and other fields

9. Criticisms and Limitations

Limitation	Description
❌ Small scale	Designed for digits, not natural images
❌ No ReLU or BatchNorm	Used sigmoid/tanh and no normalization
❌ Limited generalization	Architecture was dataset-specific
❌ Manual design choices	Lacked automation or search mechanisms
❌ Training difficulty	Required careful initialization and tuning

10. Conclusion

LeNet-5 was ahead of its time.

Despite its simplicity and limitations, it introduced core ideas that power almost all deep learning architectures today. It showed that convolution + pooling + fully connected layers + end-to-end training could solve real-world problems like digit recognition with high accuracy.

In many ways, LeNet-5 is the “ancestor” of modern deep learning models, and its influence is felt every time you use CNNs on an image task.

Computer Vision

CNN Architectures

This post is licensed under CC BY 4.0 by the author.

LeNet-5: Ancestor of CNN architectures

Table of Contents

1. Introduction

2. Historical Context

3. Overview of LeNet-5 Architecture

4. Detailed Layer-wise Architecture

5. Design Choices and Rationale

🧠 Local Receptive Fields

🧠 Subsampling (Pooling)

🧠 Selective Connectivity in C3

🧠 Use of Fully Connected Layers

6. Training Details

7. Key Innovations and Insights

8. Impact on the Deep Learning Field

9. Criticisms and Limitations

10. Conclusion

Trending Tags

Table of Contents

1. Introduction

2. Historical Context

3. Overview of LeNet-5 Architecture

4. Detailed Layer-wise Architecture

5. Design Choices and Rationale

🧠 Local Receptive Fields

🧠 Parameter Sharing

🧠 Subsampling (Pooling)

🧠 Selective Connectivity in C3

🧠 Use of Fully Connected Layers

6. Training Details

7. Key Innovations and Insights

8. Impact on the Deep Learning Field

9. Criticisms and Limitations

10. Conclusion

Trending Tags