Post

LeNet-5: Ancestor of CNN architectures

Table of Contents

  1. Introduction
  2. Historical Context
  3. Overview of LeNet-5 Architecture
  4. Detailed Layer-wise Architecture
  5. Design Choices and Rationale
  6. Training Details
  7. Key Innovations and Insights
  8. Impact on the Deep Learning Field
  9. Criticisms and Limitations
  10. Conclusion

1. Introduction

LeNet-5, developed by Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner in 1998, was one of the first convolutional neural networks (CNNs). It was designed to recognize handwritten digits (0–9) from the MNIST dataset. Although LeNet-5 might seem primitive today, it laid the groundwork for nearly all modern CNN architectures.

It introduced ideas like convolutional layers, subsampling (pooling), parameter sharing, and local receptive fields, forming the core of today’s CNNs.


2. Historical Context

At the time LeNet was developed:

  • Deep learning was not popular.
  • Computers had limited processing power and memory.
  • Most recognition systems relied on handcrafted features.

LeNet-5 changed the game by demonstrating:

End-to-end learning directly from raw pixels to classification output, using gradient-based optimization.

The architecture was deployed in systems that processed millions of checks per day in banks across the US.


3. Overview of LeNet-5 Architecture

LeNet-5 is a 7-layer network, excluding the input, with the following types of layers:

  • Convolutional layers (C1, C3)
  • Subsampling layers (S2, S4)
  • Fully connected layers (C5, F6)
  • Output layer (10 units with softmax for digits 0–9)

The input is a 32×32 grayscale image (not 28×28 like standard MNIST). Padding is applied to retain spatial resolution during convolution.


4. Detailed Layer-wise Architecture

LayerTypeInput SizeFilter/UnitsOutput SizeDescription
Input-32×32×1-32×32×1Padded input image
C1Convolution32×32×16 @ 5×528×28×6Feature maps with local receptive fields
S2Subsampling28×28×62×2 avg pool14×14×6Downsampling with learned weights
C3Convolution14×14×616 @ 5×510×10×16Not all 6 input maps are connected to all 16 outputs
S4Subsampling10×10×162×2 avg pool5×5×16Further dimensionality reduction
C5Fully Connected5×5×16120 units1×1×120Flattened, fully connected
F6Fully Connected12084 units84Classic MLP style
OutputFully Connected8410 units10Final classification layer with softmax

5. Design Choices and Rationale

🧠 Local Receptive Fields

  • Each neuron in a conv layer only connects to a small region in the input.
  • Inspired by how neurons in the visual cortex process stimuli.

🧠 Parameter Sharing

  • All neurons in a feature map share the same weights, reducing total parameters and improving generalization.

🧠 Subsampling (Pooling)

  • Subsampling layers reduce resolution while retaining important spatial information.
  • LeNet used average pooling with trainable coefficients, not just max pooling.

🧠 Selective Connectivity in C3

  • C3 doesn’t connect each input map to every output map.
  • This allowed the network to learn combinations of features and break symmetry without increasing parameters excessively.

🧠 Use of Fully Connected Layers

  • The deeper layers (C5, F6) are fully connected, making LeNet a hybrid between CNN and classic MLP.

6. Training Details

  • Dataset: MNIST and similar datasets with grayscale digits.
  • Input: 32×32 images (28×28 images are zero-padded).
  • Loss Function: Mean squared error (not cross-entropy)
  • Optimizer: Stochastic Gradient Descent (SGD)
  • Activation Function: Sigmoid or tanh (ReLU wasn’t common yet)
  • Hardware: Trained on CPUs (GPUs didn’t exist for ML yet)

7. Key Innovations and Insights

FeatureContribution
✅ ConvolutionAllowed feature extraction from raw pixels
✅ PoolingAdded translational invariance
✅ Weight sharingReduced number of parameters
✅ Selective connectionsIntroduced hierarchical feature composition
✅ End-to-end learningTrained all layers via backpropagation
✅ Practical impactDeployed in real-world bank check systems

LeNet-5 was the first architecture to show that neural networks can learn hierarchical visual features and outperform handcrafted methods on vision tasks.


8. Impact on the Deep Learning Field

Although it was largely ignored for a decade (due to hardware and data limitations), LeNet-5 became hugely influential:

  • Blueprint for CNNs like AlexNet, VGG, ResNet
  • Sparked modern computer vision revolution
  • Inspired autonomous learning directly from raw inputs
  • Foundation for deep learning in NLP and other fields

9. Criticisms and Limitations

LimitationDescription
❌ Small scaleDesigned for digits, not natural images
❌ No ReLU or BatchNormUsed sigmoid/tanh and no normalization
❌ Limited generalizationArchitecture was dataset-specific
❌ Manual design choicesLacked automation or search mechanisms
❌ Training difficultyRequired careful initialization and tuning

10. Conclusion

LeNet-5 was ahead of its time.

Despite its simplicity and limitations, it introduced core ideas that power almost all deep learning architectures today. It showed that convolution + pooling + fully connected layers + end-to-end training could solve real-world problems like digit recognition with high accuracy.

In many ways, LeNet-5 is the “ancestor” of modern deep learning models, and its influence is felt every time you use CNNs on an image task.


This post is licensed under CC BY 4.0 by the author.