Object detection from scratch with Pytorch : YOLOv2

This article discussed about innovative improvements over YOLOv1 and introduced YOLOv2 & YOLO9000(on schedule).

mz bai

4 min readSep 8, 2023

For those interested in v1

Object detection from scratch with Pytorch : YOLOv1

This article aims to discuss both basic ideas of the object detection to the full implementation of yolov1.

acrocanthosaurus627.medium.com

1. Improvements over YOLOv1

Backbone

Darknet19 is the successor to YOLOv1 backbone(Extraction). There are four differences from the latter

batch normalization
convolution prediction head instead of fully connected layer
passthrough to combine features of different layers, the feature would be compressed to one eighth of channels and folded to one fourth of size.
remove one pooling layer results in more output boxes since downsampling scale is 32 which is twice than before.

Prediction output

Compared to YOLOv1, YOLOv2 has its outputs transformed to reasonable range, they are finally all nonnegative!

The most worthy to mention is the width and height, the square root is removed and changed to log difference between truths and anchors just like SSD does.

Dimension prior

YOLOv2 uses a different strategy to calculate the shape of anchors by K-means rather than hand-crafted. I got group-averaged IOU between 0.61~0.64.

Loss function

The formula itself doesn’t change much but the matching strategy does. This time the assumption that only one object in a cell is relaxed, every box will match with more than one ground truth and thus a cell will get more than one prediction.

Non object loss: those boxes don’t match a single ground truth with IOU higher than the threshold.

2. Object loss: for positive anchors, each corresponding box predicts their own best matched ground truths.

3. Prior loss: when training on the first 12800 seen pictures, we will calculate gradients for prior loss of negative anchors. Quite heuristic, not seen in later version of YOLO series.

4. Coefficients of loss components: the coefficient of the object loss is 5, the others are 1.(In YOLOv1, the coef. of location loss is 5 and that of no object loss is 0.5).

Since there are so many anchors in a cell and the probability map coupled with anchors instead of cells now, we won’t see the classic ‘class probability map’ afterwards.

Training strategy

high-resolution finetuned classifier

Gave up training on my own since it is impossible to train a modern classifier on ImageNet in a day with a single common GPU(even 1 epoch for ResNet18 takes 14hrs). Instead I converted weights from Darknet to PyTorch to test this strategy.

2. multiscale training for detector

I randomly picked resolution every one epoch, however the model performed even worse.

3. part supervision

According to the paper, there are two stages to train YOLO9000, “train the classifier” then “jointly train classifier and detection”. The matching strategy and the loss function is adjusted here.

Hierarchical softmax tree

This method can extend detection labels with classification alone.

Under construction…

2. Results

Evaluation

classifier

Darknet-19 : 69.2% vs 72.9%

Darknet-19(448) : 75.9% vs 76.2%

weights are converted from https://pjreddie.com/darknet/imagenet/

detector

all trained 100 epochs

In paper, Darknet-19(448): 76.8 mAP

ResNet-18 (no high-res finetune or multiscale): 57.1 mAP

ResNet-34 (no high-res finetune or multiscale): 60.2 mAP

=> not much improved from the last version.

Demo

3. Implementation differences

Multiscale per epoch not per 10 batch. (Not used)
No learning rate schedule.
Data augmentation might be different.
Resnet18 isn’t high resolution finetuned, but the detector with this backbone can still performs equally well as Darknet-19 finetuned one.

Code

pytorch-implemenations/object detection/library at main · gitE0Z9/pytorch-implemenations

Deep learning models implemented in PyTorch. Contribute to gitE0Z9/pytorch-implemenations development by creating an…

github.com

Reference

YOLO9000: Better, Faster, Stronger

We introduce YOLO9000, a state-of-the-art, real-time object detection system that can detect over 9000 object…

arxiv.org

GitHub - AlexeyAB/darknet: YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection…

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet ) - GitHub …