YOLO Algorithm for Object Detection Explained [+Examples] (2023)

What is object detection?

Object detection is a computer vision task that involves identifying and locating objects in images or videos. It is an important part of many applications, such as surveillance, self-driving cars, or robotics. Object detection algorithms can be divided into two main categories: single-shot detectors and two-stage detectors.

YOLO Algorithm for Object Detection Explained [+Examples] (1)

One of the earliest successful attempts to address the object detection problem using deep learning was the R-CNN (Regions with CNN features) model, developed by Ross Girshick and his team at Microsoft Research in 2014. This model used a combination of region proposal algorithms and convolutional neural networks (CNNs) to detect and localize objects in images.

Object detection algorithms are broadly classified into two categories based on how many times the same input image is passed through a network.

YOLO Algorithm for Object Detection Explained [+Examples] (2)

Single-shot object detection

Single-shot object detection uses a single pass of the input image to make predictions about the presence and location of objects in the image. It processes an entire image in a single pass, making them computationally efficient.

However, single-shot object detection is generally less accurate than other methods, and it’s less effective in detecting small objects. Such algorithms can be used to detect objects in real time in resource-constrained environments.

YOLO is a single-shot detector that uses a fully convolutional neural network (CNN) to process an image. We will dive deeper into the YOLO model in the next section.

Two-shot object detection

Two-shot object detection uses two passes of the input image to make predictions about the presence and location of objects. The first pass is used to generate a set of proposals or potential object locations, and the second pass is used to refine these proposals and make final predictions. This approach is more accurate than single-shot object detection but is also more computationally expensive.

Overall, the choice between single-shot and two-shot object detection depends on the specific requirements and constraints of the application.

Generally, single-shot object detection is better suited for real-time applications, while two-shot object detection is better for applications where accuracy is more important.

Object detection models performance evaluation metrics

To determine and compare the predictive performance of different object detection models, we need standard quantitative metrics.

The two most common evaluation metrics are Intersection over Union (IoU) and the Average Precision (AP) metrics.

Intersection over Union (IoU)

Intersection over Union is a popular metric to measure localization accuracy and calculate localization errors in object detection models.

To calculate the IoU between the predicted and the ground truth bounding boxes, we first take the intersecting area between the two corresponding bounding boxes for the same object. Following this, we calculate the total area covered by the two bounding boxes— also known as the “Union” and the area of overlap between them called the “Intersection.”

The intersection divided by the Union gives us the ratio of the overlap to the total area, providing a good estimate of how close the prediction bounding box is to the original bounding box.

(Video) What is YOLO algorithm? | Deep Learning Tutorial 31 (Tensorflow, Keras & Python)

YOLO Algorithm for Object Detection Explained [+Examples] (3)

💡 Pro tip: Would you like to start annotating with bounding boxes? Check out 9 Essential Features for a Bounding Box Annotation Tool.

Average Precision (AP)

Average Precision (AP) is calculated as the area under a precision vs. recall curve for a set of predictions.

Recall is calculated as the ratio of the total predictions made by the model under a class with a total of existing labels for the class. Precision refers to the ratio of true positives with respect to the total predictions made by the model.

Recall and precision offer a trade-off that is graphically represented into a curve by varying the classification threshold. The area under this precision vs. recall curve gives us the Average Precision per class for the model. The average of this value, taken over all classes, is called mean Average Precision (mAP).

💡 Read more: Mean Average Precision (mAP) Explained: Everything You Need to Know

In object detection, precision and recall aren’t used for class predictions. Instead, they serve as predictions of boundary boxes for measuring the decision performance. An IoU value > 0.5. is taken as a positive prediction, while an IoU value < 0.5 is a negative prediction.

What is YOLO?

You Only Look Once (YOLO) proposes using an end-to-end neural network that makes predictions of bounding boxes and class probabilities all at once. It differs from the approach taken by previous object detection algorithms, which repurposed classifiers to perform detection.

Following a fundamentally different approach to object detection, YOLO achieved state-of-the-art results, beating other real-time object detection algorithms by a large margin.

While algorithms like Faster RCNN work by detecting possible regions of interest using the Region Proposal Network and then performing recognition on those regions separately, YOLO performs all of its predictions with the help of a single fully connected layer.

Methods that use Region Proposal Networks perform multiple iterations for the same image, while YOLO gets away with a single iteration.

Several new versions of the same model have been proposed since the initial release of YOLO in 2015, each building on and improving its predecessor. Here's a timeline showcasing YOLO's development in recent years.

YOLO Algorithm for Object Detection Explained [+Examples] (4)

How does YOLO work? YOLO Architecture

The YOLO algorithm takes an image as input and then uses a simple deep convolutional neural network to detect objects in the image. The architecture of the CNN model that forms the backbone of YOLO is shown below.

YOLO Algorithm for Object Detection Explained [+Examples] (5)

The first 20 convolution layers of the model are pre-trained using ImageNet by plugging in a temporary average pooling and fully connected layer. Then, this pre-trained model is converted to perform detection since previous research showcased that adding convolution and connected layers to a pre-trained network improves performance. YOLO’s final fully connected layer predicts both class probabilities and bounding box coordinates.

YOLO divides an input image into an S × S grid. If the center of an object falls into a grid cell, that grid cell is responsible for detecting that object. Each grid cell predicts B bounding boxes and confidence scores for those boxes. These confidence scores reflect how confident the model is that the box contains an object and how accurate it thinks the predicted box is.

YOLO predicts multiple bounding boxes per grid cell. At training time, we only want one bounding box predictor to be responsible for each object. YOLO assigns one predictor to be “responsible” for predicting an object based on which prediction has the highest current IOU with the ground truth. This leads to specialization between the bounding box predictors. Each predictor gets better at forecasting certain sizes, aspect ratios, or classes of objects, improving the overall recall score.

(Video) YOLO Object Detection Explained for Beginners

One key technique used in the YOLO models is non-maximum suppression (NMS). NMS is a post-processing step that is used to improve the accuracy and efficiency of object detection. In object detection, it is common for multiple bounding boxes to be generated for a single object in an image. These bounding boxes may overlap or be located at different positions, but they all represent the same object. NMS is used to identify and remove redundant or incorrect bounding boxes and to output a single bounding box for each object in the image.

Now, let us look into the improvements that the later versions of YOLO have brought to the parent model.

💡 Pro tip: Take a look at this list of 65+ Best Free Datasets for Machine Learning to find relevant data for training your models.


YOLO v2, also known as YOLO9000, was introduced in 2016 as an improvement over the original YOLO algorithm. It was designed to be faster and more accurate than YOLO and to be able to detect a wider range of object classes. This updated version also uses a different CNN backbone called Darknet-19, a variant of the VGGNet architecture with simple progressive convolution and pooling layers.

One of the main improvements in YOLO v2 is the use of anchor boxes. Anchor boxes are a set of predefined bounding boxes of different aspect ratios and scales. When predicting bounding boxes, YOLO v2 uses a combination of the anchor boxes and the predicted offsets to determine the final bounding box. This allows the algorithm to handle a wider range of object sizes and aspect ratios.

Another improvement in YOLO v2 is the use of batch normalization, which helps to improve the accuracy and stability of the model. YOLO v2 also uses a multi-scale training strategy, which involves training the model on images at multiple scales and then averaging the predictions. This helps to improve the detection performance of small objects.

YOLO v2 also introduces a new loss function better suited to object detection tasks. The loss function is based on the sum of the squared errors between the predicted and ground truth bounding boxes and class probabilities.

The results obtained by YOLO v2 compared to the original version and other contemporary models are shown below.

YOLO Algorithm for Object Detection Explained [+Examples] (6)


YOLO v3 is the third version of the YOLO object detection algorithm. It was introduced in 2018 as an improvement over YOLO v2, aiming to increase the accuracy and speed of the algorithm.

One of the main improvements in YOLO v3 is the use of a new CNN architecture called Darknet-53. Darknet-53 is a variant of the ResNet architecture and is designed specifically for object detection tasks. It has 53 convolutional layers and is able to achieve state-of-the-art results on various object detection benchmarks.

Another improvement in YOLO v3 are anchor boxes with different scales and aspect ratios. In YOLO v2, the anchor boxes were all the same size, which limited the ability of the algorithm to detect objects of different sizes and shapes. In YOLO v3 the anchor boxes are scaled, and aspect ratios are varied to better match the size and shape of the objects being detected.

YOLO v3 also introduces the concept of "feature pyramid networks" (FPN). FPNs are a CNN architecture used to detect objects at multiple scales. They construct a pyramid of feature maps, with each level of the pyramid being used to detect objects at a different scale. This helps to improve the detection performance on small objects, as the model is able to see the objects at multiple scales.

In addition to these improvements, YOLO v3 can handle a wider range of object sizes and aspect ratios. It is also more accurate and stable than the previous versions of YOLO.

YOLO Algorithm for Object Detection Explained [+Examples] (7)


Note: Joseph Redmond, the original creator of YOLO, has left the AI community a few years before, so YOLOv4 and other versions past that are not his official work. Some of them are maintained by co-authors, but none of the releases past YOLOv3 is considered the "official" YOLO.

YOLO v4 is the fourth version of the YOLO object detection algorithm introduced in 2020 by Bochkovskiy et al. as an improvement over YOLO v3.

(Video) YOLO (You Only Look Once) algorithm for Object Detection Explained!

The primary improvement in YOLO v4 over YOLO v3 is the use of a new CNN architecture called CSPNet (shown below). CSPNet stands for "Cross Stage Partial Network" and is a variant of the ResNet architecture designed specifically for object detection tasks. It has a relatively shallow structure, with only 54 convolutional layers. However, it can achieve state-of-the-art results on various object detection benchmarks.

YOLO Algorithm for Object Detection Explained [+Examples] (8)

Both YOLO v3 and YOLO v4 use anchor boxes with different scales and aspect ratios to better match the size and shape of the detected objects. YOLO v4 introduces a new method for generating the anchor boxes, called "k-means clustering." It involves using a clustering algorithm to group the ground truth bounding boxes into clusters and then using the centroids of the clusters as the anchor boxes. This allows the anchor boxes to be more closely aligned with the detected objects' size and shape.

While both YOLO v3 and YOLO v4 use a similar loss function for training the model, YOLO v4 introduces a new term called "GHM loss.” It’s a variant of the focal loss function and is designed to improve the model’s performance on imbalanced datasets. YOLO v4 also improves the architecture of the FPNs used in YOLO v3.

YOLO Algorithm for Object Detection Explained [+Examples] (9)


YOLO v5 was introduced in 2020 by the same team that developed the original YOLO algorithm as an open-source project and is maintained by Ultralytics. YOLO v5 builds upon the success of previous versions and adds several new features and improvements.

Unlike YOLO, YOLO v5 uses a more complex architecture called EfficientDet (architecture shown below), based on the EfficientNet network architecture. Using a more complex architecture in YOLO v5 allows it to achieve higher accuracy and better generalization to a wider range of object categories.

YOLO Algorithm for Object Detection Explained [+Examples] (10)

Another difference between YOLO and YOLO v5 is the training data used to learn the object detection model. YOLO was trained on the PASCAL VOC dataset, which consists of 20 object categories. YOLO v5, on the other hand, was trained on a larger and more diverse dataset called D5, which includes a total of 600 object categories.

YOLO v5 uses a new method for generating the anchor boxes, called "dynamic anchor boxes." It involves using a clustering algorithm to group the ground truth bounding boxes into clusters and then using the centroids of the clusters as the anchor boxes. This allows the anchor boxes to be more closely aligned with the detected objects' size and shape.

YOLO v5 also introduces the concept of "spatial pyramid pooling" (SPP), a type of pooling layer used to reduce the spatial resolution of the feature maps. SPP is used to improve the detection performance on small objects, as it allows the model to see the objects at multiple scales. YOLO v4 also uses SPP, but YOLO v5 includes several improvements to the SPP architecture that allow it to achieve better results.

YOLO v4 and YOLO v5 use a similar loss function to train the model. However, YOLO v5 introduces a new term called "CIoU loss," which is a variant of the IoU loss function designed to improve the model's performance on imbalanced datasets.


YOLO v6 was proposed in 2022 by Li et al. as an improvement over previous versions. One of the main differences between YOLO v5 and YOLO v6 is the CNN architecture used. YOLO v6 uses a variant of the EfficientNet architecture called EfficientNet-L2. It’s a more efficient architecture than EfficientDet used in YOLO v5, with fewer parameters and a higher computational efficiency. It can achieve state-of-the-art results on various object detection benchmarks. The framework of the YOLO v6 model is shown below.

YOLO Algorithm for Object Detection Explained [+Examples] (11)

YOLO v6 also introduces a new method for generating the anchor boxes, called "dense anchor boxes."

The results obtained by YOLO v6 compared to other state-of-the-art methods are shown below.

(Video) C4W3L09 YOLO Algorithm

YOLO Algorithm for Object Detection Explained [+Examples] (12)

What’s new with YOLO v7?

YOLO v7, the latest version of YOLO, has several improvements over the previous versions. One of the main improvements is the use of anchor boxes.

Anchor boxes are a set of predefined boxes with different aspect ratios that are used to detect objects of different shapes. YOLO v7 uses nine anchor boxes, which allows it to detect a wider range of object shapes and sizes compared to previous versions, thus helping to reduce the number of false positives.

Here is YOLO v7 in action:

A key improvement in YOLO v7 is the use of a new loss function called “focal loss.” Previous versions of YOLO used a standard cross-entropy loss function, which is known to be less effective at detecting small objects. Focal loss battles this issue by down-weighting the loss for well-classified examples and focusing on the hard examples—the objects that are hard to detect.

YOLO v7 also has a higher resolution than the previous versions. It processes images at a resolution of 608 by 608 pixels, which is higher than the 416 by 416 resolution used in YOLO v3. This higher resolution allows YOLO v7 to detect smaller objects and to have a higher accuracy overall.

YOLO Algorithm for Object Detection Explained [+Examples] (13)

One of the main advantages of YOLO v7 is its speed. It can process images at a rate of 155 frames per second, much faster than other state-of-the-art object detection algorithms. Even the original baseline YOLO model was capable of processing at a maximum rate of 45 frames per second. This makes it suitable for sensitive real-time applications such as surveillance and self-driving cars, where higher processing speeds are crucial.

YOLO Algorithm for Object Detection Explained [+Examples] (14)

Regarding accuracy, YOLO v7 performs well compared to other object detection algorithms. It achieves an average precision of 37.2% at an IoU (intersection over union) threshold of 0.5 on the popular COCO dataset, which is comparable to other state-of-the-art object detection algorithms. The quantitative comparison of the performance is shown below.

YOLO Algorithm for Object Detection Explained [+Examples] (15)

However, it should be noted that YOLO v7 is less accurate than two-stage detectors such as Faster R-CNN and Mask R-CNN, which tend to achieve higher average precision on the COCO dataset but also require longer inference times.

Limitations of YOLO v7

YOLO v7 is a powerful and effective object detection algorithm, but it does have a few limitations.

  1. YOLO v7, like many object detection algorithms, struggles to detect small objects. It might fail to accurately detecting objects in crowded scenes or when objects are far away from the camera.
  2. YOLO v7 is also not perfect at detecting objects at different scales. This can make it difficult to detect objects that are either very large or very small compared to the other objects in the scene.
  3. YOLO v7 can be sensitive to changes in lighting or other environmental conditions, so it may be inconvenient to use in real-world applications where lighting conditions may vary.
  4. YOLO v7 can be computationally intensive, which can make it difficult to run in real-time on resource-constrained devices like smartphones or other edge devices.


At the time of writing this article, the release of YOLO v8 has been confirmed by Ultralytics that promises new features and improved performance over its predecessors. YOLO v8 boasts of a new API that will make training and inference much easier on both CPU and GPU devices and the framework will support previous YOLO versions. The developers are still working on releasing a scientific paper that will include a detailed description of the model architecture and performance.

Key takeaways

YOLO (You Only Look Once) is a popular object detection algorithm that has revolutionized the field of computer vision. It is fast and efficient, making it an excellent choice for real-time object detection tasks. It has achieved state-of-the-art performance on various benchmarks and has been widely adopted in various real-world applications.

(Video) YOLO Object Detection (Part 1)

One of the main advantages of YOLO is its fast inference speed, which allows it to process images in real time. It’s well-suited for applications such as video surveillance, self-driving cars, and augmented reality. Additionally, YOLO has a simple architecture and requires minimal training data, making it easy to implement and adapt to new tasks.

Despite limitations such as struggling with small objects and the inability to perform fine-grained object classification, YOLO has proven to be a valuable tool for object detection and has opened up many new possibilities for researchers and practitioners. As the field of Computer Vision continues to advance, it will be interesting to see how YOLO and other object detection algorithms evolve and improve.


How YOLO algorithm works step by step? ›

Exact dimensions and steps that the YOLO algorithm follows:
  1. Takes an input image of shape (608, 608, 3).
  2. Passes this image to a convolutional neural network (CNN), which returns a(19,19, 5, 85) dimensional output.
  3. The last two dimensions of the above output are flattened to get output volume of (19, 19, 425):
Aug 15, 2021

How to use Yolo for object detection? ›

YOLO (You Only Look Once) is a method / way to do object detection. It is the algorithm /strategy behind how the code is going to detect objects in the image. The official implementation of this idea is available through DarkNet (neural net implementation from the ground up in C from the author).

How does Yolo detection work? ›

The YOLO framework (You Only Look Once) on the other hand, deals with object detection in a different way. It takes the entire image in a single instance and predicts the bounding box coordinates and class probabilities for these boxes.

Is Yolo a model or algorithm? ›

YOLO (“You Only Look Once”) is an effective real-time object recognition algorithm, first described in the seminal 2015 paper by Joseph Redmon et al.

What algorithm does Yolo use? ›

YOLO algorithm is an algorithm based on regression, instead of selecting the interesting part of an Image, it predicts classes and bounding boxes for the whole image in one run of the Algorithm. To understand the YOLO algorithm, first we need to understand what is actually being predicted.

How many images are needed for Yolo? ›

To achieve a robust YOLOv5 model, it is recommended to train with over 1500 images per class, and more then 10,000 instances per class. It is also recommended to add up to 10% background images, to reduce false-positives errors.

How do you collect images for object detection? ›

  1. From the cluster management console, select Workload > Spark > Deep Learning.
  2. Select the Datasets tab.
  3. Click New.
  4. Create a dataset from Images for Object Detection.
  5. Provide a dataset name.
  6. Specify a Spark instance group.
  7. Provide a training folder. ...
  8. Provide the percentage of training images for validation.

What is the best algorithm for object detection? ›

Most Popular Object Detection Algorithms. Popular algorithms used to perform object detection include convolutional neural networks (R-CNN, Region-Based Convolutional Neural Networks), Fast R-CNN, and YOLO (You Only Look Once). The R-CNN's are in the R-CNN family, while YOLO is part of the single-shot detector family.

How many objects can be detected by Yolo? ›

Based on the COCO dataset, YOLO can detect the 80 COCO object classes: person. bicycle, car, motorbike, aeroplane, bus, train, truck, boat. traffic light, fire hydrant, stop sign, parking meter, bench.

What objects can Yolo detect? ›

You only look once (YOLO) is a system for detecting objects on the Pascal VOC 2012 dataset.
It can detect the 20 Pascal object classes:
  • person.
  • bird, cat, cow, dog, horse, sheep.
  • aeroplane, bicycle, boat, bus, car, motorbike, train.
  • bottle, chair, dining table, potted plant, sofa, tv/monitor.

How does object detection algorithm work? ›

Object detection is a computer vision technique that works to identify and locate objects within an image or video. Specifically, object detection draws bounding boxes around these detected objects, which allow us to locate where said objects are in (or how they move through) a given scene.

Can Yolo detect person? ›

Systems like R-CNN and Faster R-CNN, make multiple assessments for a single image, making YOLO extremely fast, running in real-time with a capable GPU. For detect people used YOLOv3 algorithm which is published by and shows that it has high accuracy to identify people.

What is Yolo data format? ›

3. YOLO: In the YOLO labeling format, a . txt file with the same name is created for each image file in the same directory. Each . txt file contains the annotations for the corresponding image file, including its object class, object coordinates, height, and width.

How does Yolo prepare data? ›

How to Train YOLO v5 on a Custom Dataset
  1. Set up the code.
  2. Download the Data.
  3. Convert the Annotations into the YOLO v5 Format. Partition the Dataset.
  4. Training Options. Data Config File. Hyperparameter Config File. ...
  5. Inference. Computing the mAP on the test dataset.
  6. Conclusion... and a bit about the naming saga.

How does Yolo model measure accuracy? ›

To evaluate object detection models like R-CNN and YOLO, the mean average precision (mAP) is used. The mAP compares the ground-truth bounding box to the detected box and returns a score. The higher the score, the more accurate the model is in its detections.

What architecture is used in Yolo? ›

YOLO architecture is similar to GoogleNet. As illustrated below, it has overall 24 convolutional layers, four max-pooling layers, and two fully connected layers. The architecture works as follows: Resizes the input image into 448x448 before going through the convolutional network.

Which Yolo is fastest? ›

In general, YOLOv7 surpasses all previous object detectors in terms of both speed and accuracy, ranging from 5 FPS to as much as 160 FPS. The YOLO v7 algorithm achieves the highest accuracy among all other real-time object detection models – while achieving 30 FPS or higher using a GPU V100.

Can Yolo detect faces? ›

The original Yolo model can detect 80 different object classes with high accuracy. We used this Yolo facial recognition model for detecting only one object - the face. We trained this algorithm on WiderFace (image dataset containing 393,703 face labels) dataset.

Can you use Yolo without WIFI? ›

WIFI is the wireless internet that is invisible to show. Cellular is when you put a SIM card inside the Yolobox, you do not need to be near a building to connect. All you need is cellular coverage and you can stream directly from the inside.

How much RAM does Yolo use? ›

Furthermore, to run the full YOLOv3-Tiny network, this implementation requires only 17.72 MB of off-chip memory for parameters and 896 KB of on-chip memory, which provides a 49.5× to 64.8× memory reduction compared with the GPU implementations.

How many layers does Yolo have? ›

YOLO has 24 convolutional layers followed by 2 fully connected layers (FC).

How many samples is needed for object detection? ›

Normally, at least (minimum) number of 200 bounding_boxes_annotations per object should be present. That is, each of your classes should have at least 200 annotations.

How many images required for object detection? ›

For each label you must have at least 10 images, each with at least one annotation (bounding box and the label). However, for model training purposes it's recommended you use about 1000 annotations per label. In general, the more images per label you have the better your model will perform.

What is a good dataset for object detection? ›

DOTA is a highly popular dataset for object detection in aerial images, collected from a variety of sources, sensors and platforms. The images range from a low of 800x800 to 200,000x200,000 pixels in resolution and contain objects of many different types, shapes and sizes.

What is the real time example of object detection? ›

Face Detection and Face Recognition

Face detection and recognition are perhaps the most widely used applications of computer vision. Every time you upload a picture on Facebook, Instagram or Google Photos, it automatically detects the people in the images. This is the power of computer vision at work.

Which algorithm is best for image recognition? ›

CNN is a powerful algorithm for image processing. These algorithms are currently the best algorithms we have for the automated processing of images. Many companies use these algorithms to do things like identifying the objects in an image. Images contain data of RGB combination.

What is image size in Yolo? ›

Architecture of the original YOLOv3 with 416 × 416 pixels input resolution.

Can Yolo detect small objects? ›

In addition, the AIE-YOLO network is more capable of small object detection tasks under complex conditions.

Can Yolo detect fruits? ›

Several studies have utilized YOLO-based models for fruit detection and have demonstrated that YOLO models have a huge potential in accurate real time detection of fruits in an orchard [6,7,8,9,10,11,12,13,14,15,16].

Why do people use Yolo? ›

Teenagers frequently use this slang term to justify an action that might be considered risky or decadent. The slang expression YOLO can be considered the modern equivalent of carpe diem, Latin for seize the day.

Can Yolo detect multiple objects? ›

Q2: Single model, since YOLO is capable of object detection with multiple classes without sacrificing much speed and accuracy.

What is the purpose of Yolo? ›

YOLO is an algorithm that uses neural networks to provide real-time object detection. This algorithm is popular because of its speed and accuracy. It has been used in various applications to detect traffic signals, people, parking meters, and animals.

What are the three stages of object recognition? ›

The model developed here distinguishes three characteristic levels of visual percep- tion: (i) the level of basic visual information processing, (ii) the level of perceptual content and (iii) the level of higher-order perceptual cognition.

How do you implement an object detection model? ›

In order to build our object detection system in a more structured way, we can follow the below steps:
  1. Step 1: Divide the image into a 10×10 grid like this:
  2. Step 2: Define the centroids for each patch.
  3. Step 3: For each centroid, take three different patches of different heights and aspect ratio:
Jun 28, 2018

Why Yolo is look only once? ›

YOLO stands for You Only Look Once is an algorithm which detects all the object in a image/frame in a single shot as the name says You Only Look Once means it looks for the image/frame only once and able to detect all the objects in the image/frame.

Is Yolo deep learning? ›

The “You Only Look Once,” or YOLO, family of models are a series of end-to-end deep learning models designed for fast object detection, developed by Joseph Redmon, et al. and first described in the 2015 paper titled “You Only Look Once: Unified, Real-Time Object Detection.”

Can Yolo be used for text recognition? ›

Using YOLO(You only look once) for Text Detection

YOLO is a state-of-the-art, real-time object detection network. There are many versions of it. YOLOv3 is the most recent and the fastest version. YOLOv3 uses Darknet-53 as it's feature extractor.

How does Yolo calculate mAP? ›

The mAP is calculated by finding Average Precision(AP) for each class and then average over a number of classes. The mAP incorporates the trade-off between precision and recall and considers both false positives (FP) and false negatives (FN).

How is Yolo loss calculated? ›

YOLO uses sum-squared error between the predictions and the ground truth to calculate loss. The loss function composes of: the classification loss. the localization loss (errors between the predicted boundary box and the ground truth).

How to train a model using Yolo? ›

To kick off training we running the training command with the following options:
  1. img: define input image size.
  2. batch: determine batch size.
  3. epochs: define the number of training epochs. ...
  4. data: set the path to our yaml file.
  5. cfg: specify our model configuration.
  6. weights: specify a custom path to weights. ...
  7. name: result names.

What is the output of Yolo? ›

The YOLO network has 3 outputs: 507 (13 x 13 x 3) for large objects. 2028 (26 x 26 x 3) for medium objects. 8112 (52 x 52 x 3) for small objects.

What is precision and recall in Yolo? ›

Precision = (True Positive)/(True Positive + False Positive) Recall—Recall is the ratio of the number of true positives to the total number of actual (relevant) objects. For example, if the model correctly detects 75 trees in an image, and there are actually 100 trees in the image, the recall is 75 percent.

How do you calculate accuracy? ›

Mathematically, this can be stated as:
  1. Accuracy = TP + TN TP + TN + FP + FN.
  2. Sensitivity = TP TP + FN.
  3. Specificity = TN TN + FP.

What is a good model accuracy score? ›

So, What Exactly Does Good Accuracy Look Like? Good accuracy in machine learning is subjective. But in our opinion, anything greater than 70% is a great model performance. In fact, an accuracy measure of anything between 70%-90% is not only ideal, it's realistic.


1. YOLO Object Detection (TensorFlow tutorial)
(Siraj Raval)
2. How computers learn to recognize objects instantly | Joseph Redmon
3. Object Detection Using YOLO ALgorithm (in English)| Machine Learning
(Raihanul Alam Hridoy)
4. YOLO Object Detection Using OpenCV And Python | Python Projects | Python Training | Edureka
5. YOLO Basic Introduction. | You only LIVE once. | Object Detection.
(Datum Learning)
6. YOLOv1 from Scratch
(Aladdin Persson)
Top Articles
Latest Posts
Article information

Author: Tish Haag

Last Updated: 03/23/2023

Views: 5479

Rating: 4.7 / 5 (47 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Tish Haag

Birthday: 1999-11-18

Address: 30256 Tara Expressway, Kutchburgh, VT 92892-0078

Phone: +4215847628708

Job: Internal Consulting Engineer

Hobby: Roller skating, Roller skating, Kayaking, Flying, Graffiti, Ghost hunting, scrapbook

Introduction: My name is Tish Haag, I am a excited, delightful, curious, beautiful, agreeable, enchanting, fancy person who loves writing and wants to share my knowledge and understanding with you.