PyTorch_YOLOv4 - detect.py performs poorly compared with detect.py from YOLOv5 & 7; How to set optimal values for conf-thres & iou-thres #51

valentinitnelav · 2022-08-23T13:26:13Z

Running inference with detect.py for YOLOv7 was very similar to YOLOv5. However, for PyTorch_YOLOv4, things got a bit less smooth.

This is the detection script which I just tried for PyTorch_YOLOv4.

These arguments are the same as for YOLOv7 (for which I sent you already the txt prediction files).

--img-size 640 \
--conf-thres 0.25 \
--iou-thres 0.45

PyTorch_YOLOv4 doesn't have the --nosave option, so it saves the images as well at inference time and I didn't find an argument to stop this action.

Then it has two new arguments --cfg & --names, which are not used in yolov7 or 5's detect.py

--cfg ~/PAI/detectors/PyTorch_YOLOv4/cfg/yolov4-csp-s-leaky.cfg \
--names ~/PAI/detectors/PyTorch_YOLOv4/data/pai.names \

The pai.names files must contain the label names:

Araneae
Coleoptera
Diptera
Hemiptera
Hymenoptera_Formicidae
Hymenoptera
Lepidoptera
Orthoptera

Most disturbing is that the run of detect.py produces only 238 txt prediction files on the 1680 test image files.

ls *txt | wc -l
238

Also, the .err file usually produced when running a cluster job are empty (as opposed to YOLOv5, which for each image gives info about the time needed and extra infos).

I am not sure at this point what argument to change in detect.py of PyTorch_YOLOv4 to increase the number of detections.
I can reduce the values for --conf-thres & --iou-thres, but that doesn't make it comparable anymore with how I run for YOLOv7 & 5. Actually, the values above are the default values for v5 & 7. For v4 the defaults are --conf-thres 04 & --iou-thres 0.5 - see https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/detect.py
Out of curiosity, I reduced these values to:

--conf-thres 0.1 \
--iou-thres 0.2 \

and I got 1443 txt prediction files for the 1680 test image files. Still a lower number than what I got for YOLOv7 with the default values (1668 txt files).

EDIT: However, I just saw that this creates too many prediction boxes per image. I will send you the results.

All in all, how do we find a comparable situation when running detect.py on the test dataset between YOLO versions?

The text was updated successfully, but these errors were encountered:

stark-t · 2022-08-23T14:18:58Z

@valentinitnelav I really hoped we could compare the predictions with the same parameters (eg confthres, iouthres), but in the end we should compare the best possible results, which would mean we have to predict using all possible combinations.... that would be alot of work.

valentinitnelav · 2022-08-23T14:44:21Z

@stark-t , what if we define somehow universally acceptable/as objective as possible the "best" --conf-thres & --iou-thres values for each case? For example, those that minimize a loss or maximize/have a global minimum in the space of some performance metric for each YOLO version.

For each case we do a "search grid", say --conf-thres, --iou-thres in (0-1] space with a step of 0.05 (or other reasonable intervals and steps), and run detect.py each time. Then for each results folder, we pull some evaluation metric and see where in that space reaches the optimal value.

I think I can write a bash script to run detect.py hundreds of times on a GPU. But I need your help to decide on the evaluation metrics.

Would this work? Or doesn't YOLO have some suggestions on these already? I saw that one could look at the F1_curve.png (for YOLOv5 & 7) and get some optimal confidence. However, I didn't see a graph like this for yolov4.

Could be that I should use the best.pt weights instead of best_overall.pt - see #50
Actually, I will give that one a try and see what I get, before going into more complex things.

valentinitnelav · 2022-08-23T14:55:59Z

FYI: We do not find an answer in these links below, but the general idea is that the "best" approach is to find the optimal values for these parameters. We could see them as hyperparameters at inference time, especially since these will also impact the rate of false positives on our field images.

ultralytics/yolov5#8615 (comment)

ultralytics/yolov5#7906

valentinitnelav · 2022-08-23T15:16:39Z

Could be that I should use the best.pt weights instead of best_overall.pt - see #50
Actually, I will give that one a try and see what I get, before going into more complex things.

So, I just run detect.py for YOLOv4 with:

--img-size 640 \
--conf-thres 0.25 \
--iou-thres 0.45

My hopes were ruined, because it produced again only 238 txt prediction files on the 1680 test image files. Similar to using 'best_overall.pt'

# Number of txt files generated
cd ~/PAI/detectors/PyTorch_YOLOv4/runs/detect/
cd 3265637_detection_using_3217130_yolov4_pacsp_s_b8_e300_img640_hyp_custom
ls  *txt | wc -l
238

# Number of jpg files in the test dataset
cd ~/datasets/P1_Data_sampled/test/images
ls  *jpg | wc -l # this will not catch png or jpeg ones, ut 1680 is the right number
1680

I will try all the other best options and see what I get - see see #50

valentinitnelav · 2022-08-24T07:54:00Z

Overview of weights trials using:

--img-size 640 \
--conf-thres 0.25 \
--iou-thres 0.45

Unfortunately, none of the weights options generated a number of detection txt file close to the total number of images in the test dataset. The best results were 238 out of 1680.

best.pt

Job id 3265637

# Number of txt files generated
cd ~/PAI/detectors/PyTorch_YOLOv4/runs/detect/
cd 3265637_detection_using_3217130_yolov4_pacsp_s_b8_e300_img640_hyp_custom
ls  *txt | wc -l
# 238

best_overall.pt

Job id 3265668

# Number of txt files generated
cd ..
cd 3265668_detection_using_3217130_yolov4_pacsp_s_b8_e300_img640_hyp_custom
ls  *txt | wc -l
# 238

best_ap50.pt

Job id 3265661

# Number of txt files generated
cd ..
cd 3265661_detection_using_3217130_yolov4_pacsp_s_b8_e300_img640_hyp_custom
ls  *txt | wc -l
# 238

best_ap.pt

Job id 3265663

# Number of txt files generated
cd ..
cd 3265663_detection_using_3217130_yolov4_pacsp_s_b8_e300_img640_hyp_custom
ls  *txt | wc -l
# 238

best_f.pt

Job id 3265662

# Number of txt files generated
cd ..
cd 3265662_detection_using_3217130_yolov4_pacsp_s_b8_e300_img640_hyp_custom
ls  *txt | wc -l
# 92

best_p.pt

Job id 3265662

# Number of txt files generated
cd ..
cd 3265665_detection_using_3217130_yolov4_pacsp_s_b8_e300_img640_hyp_custom
ls  *txt | wc -l
# 4

best_r.pt

Job id 3265666

# Number of txt files generated
cd ..
cd 3265666_detection_using_3217130_yolov4_pacsp_s_b8_e300_img640_hyp_custom
ls  *txt | wc -l
# 238

valentinitnelav · 2022-08-24T08:14:18Z

Hi @stark-t ,

Given the results above, I think we need to do a grid search for the optimal values of conf-thres & iou-thres for each detector (YOLO 4,5,7).
If you agree, then I can start working on a bash script that can run detect.py for each YOLO version , looping through the (0,1] interval for conf-thres & iou-thres

Such a script will produce a dozen of detection folders with txt label files that you can run through an evaluation script and compute performance metrics (e.g.: precision, recall, average precision, F1, IoUs). Then we can plot these values on two axis of conf-thres & iou-thres ranging from 0 to 1 and will have heat maps so that we can decide for the optimal values of conf-thres & iou-thres

Is there a simpler approach to this issue?

stark-t · 2022-08-29T08:19:55Z

@valentinitnelav maybe we can narrow down the steps from 10%, 20%, .... 90%, since we already have some insights that lower thersholds work better right?
So either we just use 10%, 20%, 30% or 25%, 50%, 75% for both thresholds resulting in nine total iterations.

valentinitnelav · 2022-10-18T08:49:43Z

Hi @stark-t , should I go ahead and close all the issues related to YOLOv4 since we drop it from the results comparison pipeline? I don't think I will get more time to investigate the issues at the moment.

valentinitnelav · 2022-12-01T13:50:39Z

We don't implement YOLOv4 any longer. See also the other related issues linked above.

valentinitnelav changed the title ~~PyTorch_YOLOv4 - detect.py performs poorly compared with detect.py from YOLOv5 & 7~~ PyTorch_YOLOv4 - detect.py performs poorly compared with detect.py from YOLOv5 & 7; How to set optimal values for conf-thres & iou-thres Aug 23, 2022

This was referenced Sep 2, 2022

IoU & confidence threshold results #52

Closed

YOLOv4 prediction issues #53

Closed

valentinitnelav added a commit that referenced this issue Sep 2, 2022

See issues #51 #52 & #53

5ed7117

valentinitnelav added a commit that referenced this issue Sep 2, 2022

Loop detection for YOLOv5. See #51 #52 & #53

74b7d9b

valentinitnelav closed this as completed Dec 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyTorch_YOLOv4 - detect.py performs poorly compared with detect.py from YOLOv5 & 7; How to set optimal values for conf-thres & iou-thres #51

PyTorch_YOLOv4 - detect.py performs poorly compared with detect.py from YOLOv5 & 7; How to set optimal values for conf-thres & iou-thres #51

valentinitnelav commented Aug 23, 2022 •

edited

Loading

stark-t commented Aug 23, 2022

valentinitnelav commented Aug 23, 2022 •

edited

Loading

valentinitnelav commented Aug 23, 2022

valentinitnelav commented Aug 23, 2022

valentinitnelav commented Aug 24, 2022 •

edited

Loading

valentinitnelav commented Aug 24, 2022

stark-t commented Aug 29, 2022

valentinitnelav commented Oct 18, 2022

valentinitnelav commented Dec 1, 2022

PyTorch_YOLOv4 - detect.py performs poorly compared with detect.py from YOLOv5 & 7; How to set optimal values for conf-thres & iou-thres #51

PyTorch_YOLOv4 - detect.py performs poorly compared with detect.py from YOLOv5 & 7; How to set optimal values for conf-thres & iou-thres #51

Comments

valentinitnelav commented Aug 23, 2022 • edited Loading

stark-t commented Aug 23, 2022

valentinitnelav commented Aug 23, 2022 • edited Loading

valentinitnelav commented Aug 23, 2022

valentinitnelav commented Aug 23, 2022

valentinitnelav commented Aug 24, 2022 • edited Loading

best.pt

best_overall.pt

best_ap50.pt

best_ap.pt

best_f.pt

best_p.pt

best_r.pt

valentinitnelav commented Aug 24, 2022

stark-t commented Aug 29, 2022

valentinitnelav commented Oct 18, 2022

valentinitnelav commented Dec 1, 2022

valentinitnelav commented Aug 23, 2022 •

edited

Loading

valentinitnelav commented Aug 23, 2022 •

edited

Loading

valentinitnelav commented Aug 24, 2022 •

edited

Loading