Post

A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS

Open in Github Page

Introduction

Welcome to a comprehensive summary of the YOLO (You Only Look Once) models, detailing their evolution from YOLOv1 to YOLO-NAS. YOLO is one of the most popular object detection frameworks in real-time applications such as autonomous vehicles, video surveillance, and robotics. This README provides a snapshot of each YOLO model, highlighting the key innovations, developers, and performance improvements. Let’s dive into the world of YOLO!

1️⃣ YOLOv1 (2016)

  • Developers: Joseph Redmon et al.
  • Key Features:
    • First real-time object detection model using a single neural network pass.
    • Divides image into grids to predict bounding boxes and class probabilities.
  • Limitations: Struggles with small objects and nearby object detection.

2️⃣ YOLOv2 (2017)

  • Developers: Joseph Redmon and Ali Farhadi.
  • Key Features:
    • Introduced batch normalization and anchor boxes for better bounding box predictions.
    • Improved accuracy with multi-scale training.

3️⃣ YOLOv3 (2018)

  • Developers: Joseph Redmon and Ali Farhadi.
  • Key Features:
    • New Darknet-53 backbone for better feature extraction.
    • Multi-scale predictions for improved detection of small objects.

4️⃣ YOLOv4 (2020)

  • Developers: Alexey Bochkovskiy, Chien-Yao Wang, Hong-Yuan Mark Liao.
  • Key Features:
    • Introduced CSPDarknet53 and PANet for feature fusion.
    • Innovations like Mosaic augmentation and CIoU loss for improved training.

5️⃣ YOLOv5 (2020)

  • Developer: Glen Jocher at Ultralytics.
  • Key Features:
    • Developed in PyTorch, easy to use and deploy.
    • Scalable models from nano to extra-large, optimized for speed and accuracy.

6️⃣ Scaled YOLOv4 (2021)

  • Developers: Same team as YOLOv4.
  • Key Features:
    • Introduced scaling for lightweight and high-performance models.
    • YOLOv4-tiny and YOLOv4-large for edge devices and cloud GPUs.

7️⃣ YOLOR (2021)

  • Developers: Same team as YOLOv4.
  • Key Features:
    • Multi-task learning for tasks like detection, classification, and pose estimation.
    • Uses implicit knowledge to boost model performance.

8️⃣ YOLOX (2021)

  • Developers: Megvii Technology.
  • Key Features:
    • Anchor-free architecture for simplified training.
    • Decoupled head for better accuracy in classification and regression.

9️⃣ YOLOv6 (2022)

  • Developers: Meituan Vision AI Department.
  • Key Features:
    • New EfficientRep backbone based on RepVGG.
    • Improved quantization and task alignment for faster and more accurate detection.

🔟 YOLOv7 (2022)

  • Developers: Alexey Bochkovskiy, Chien-Yao Wang, Hong-Yuan Mark Liao.
  • Key Features:
    • Introduced E-ELAN blocks for efficient learning.
    • Optimized for small objects with real-time performance improvements.

1️⃣1️⃣ YOLOv8 (2023)

  • Developer: Ultralytics.
  • Key Features:
    • Anchor-free with a decoupled head for objectness, classification, and regression tasks.
    • Supports multiple tasks like segmentation, detection, and pose estimation.

1️⃣2️⃣ YOLO-NAS (2023)

  • Developer: Deci.
  • Key Features:
    • Designed using AutoNAC, an automatic architecture search tool for real-time applications.
    • Enhanced for small object detection and edge-device deployments.

📈 Conclusion

YOLO has come a long way from its inception, balancing real-time performance with increased accuracy across different tasks. Each version builds on its predecessor, making YOLO the go-to framework for object detection in diverse applications.


🔗 References:

Explore more from original paper: A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS.


This post is licensed under CC BY 4.0 by the author.