Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Bug when reporting Recall / Precision / mAP #8686

Closed

Conversation

pourmand1376
Copy link
Contributor

@pourmand1376 pourmand1376 commented Jul 23, 2022

There is a bug which I fully described in

This PR fixes the bug. I explained in the issue completely. I have added a list called target_cls which keeps track of all labels. This shouldn't be inside for loop which examines out. After that we combine the target_cls into stats again. This is for code consistency and to make sure we have no problem anywhere else.

The core of the problem comes from the fact that we are looping over filtered results, which makes our metrics wrong.

If you want more explanation, do not hesitate to tell me.

Also fixes:

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Enhanced validation metrics logging for better model evaluation.

📊 Key Changes

  • Added collection of target class IDs during model validation to obtain per-class metrics.
  • Updated statistics calculation to include the newly collected target class information.
  • Refined recorded statistics by removing a redundant element (the target class was recorded twice).

🎯 Purpose & Impact

  • ✅ The changes aim to improve the model evaluation process by providing more detailed information on class-wise performance, which can be critical for fine-tuning and understanding model behavior.
  • 🚀 Potential impact includes more informed decision-making for model improvements and clearer insights into how well the model is performing across different classes. This is especially useful when dealing with datasets that have a large number of classes or imbalanced classes.

@pourmand1376 pourmand1376 changed the title Fix Bug when reporting Recall/Precision/mAP Fix Bug when reporting Recall / Precision / mAP Jul 23, 2022
@pourmand1376 pourmand1376 marked this pull request as ready for review July 23, 2022 09:15
@pourmand1376 pourmand1376 marked this pull request as draft July 23, 2022 09:39
@pourmand1376 pourmand1376 marked this pull request as ready for review July 23, 2022 10:11
@glenn-jocher
Copy link
Member

@pourmand1376 thanks for the PR! Have you quantified the effect of this change on COCO mAP?

@glenn-jocher
Copy link
Member

@pourmand1376 I tested this PR against master with the following two commands and observed the exact same results. Are you sure this PR is changing any results? Can you provide reproducible code that illustrates the effect of the PR please? Thanks!

# Download COCO val
torch.hub.download_url_to_file('https://ultralytics.com/assets/coco2017val.zip', 'tmp.zip')
!unzip -q tmp.zip -d ../datasets && rm tmp.zip

# master
!git clone https://github.com/ultralytics/yolov5  # clone
%cd yolov5
%pip install -qr requirements.txt  # install
!python val.py --weights yolov5x.pt --data coco.yaml --img 640 --iou 0.65 --half --conf 0.001
!python val.py --weights yolov5x.pt --data coco.yaml --img 640 --iou 0.65 --half --conf 0.2

# PR
%cd ..
!git clone https://github.com/pourmand1376/yolov5 -b fix_bug_validation yolov5-pr  # clone
%cd yolov5-pr
!python val.py --weights yolov5x.pt --data coco.yaml --img 640 --iou 0.65 --half --conf 0.001
!python val.py --weights yolov5x.pt --data coco.yaml --img 640 --iou 0.65 --half --conf 0.2

@pourmand1376
Copy link
Contributor Author

pourmand1376 commented Jul 23, 2022

If you test with higher conf-thres like 0.5, you can see my points.

Also, I think you have set the parameters wrong! You should set it according to this

yolov5/val.py

Lines 336 to 337 in 1c5e92a

parser.add_argument('--conf-thres', type=float, default=0.001, help='confidence threshold')
parser.add_argument('--iou-thres', type=float, default=0.6, help='NMS IoU threshold')

# Download COCO val
torch.hub.download_url_to_file('https://ultralytics.com/assets/coco2017val.zip', 'tmp.zip')
!unzip -q tmp.zip -d ../datasets && rm tmp.zip

# master
!git clone https://github.com/ultralytics/yolov5  # clone
%cd yolov5
%pip install -qr requirements.txt  # install
!python val.py --weights yolov5x.pt --data coco.yaml --img 640 --iou-thres 0.65 --half --conf-thres 0.001
!python val.py --weights yolov5x.pt --data coco.yaml --img 640 --iou-thres 0.65 --half --conf-thres 0.2

# PR
%cd ..
!git clone https://github.com/pourmand1376/yolov5 -b fix_bug_validation yolov5-pr  # clone
%cd yolov5-pr
!python val.py --weights yolov5x.pt --data coco.yaml --img 640 --iou-thres 0.65 --half --conf-thres 0.001
!python val.py --weights yolov5x.pt --data coco.yaml --img 640 --iou-thres 0.65 --half --conf-thres 0.2

I will soon publish coco results ...

@glenn-jocher
Copy link
Member

@pourmand1376 argparser arguments can be shortened if they have no conflicts, i.e. --conf and --conf-thres are the same. I'll retest at higher --conf, but I doubt if moving to 0.2 has no effect that moving to 0.5 will have one.

@glenn-jocher
Copy link
Member

glenn-jocher commented Jul 23, 2022

@pourmand1376 updated test also shows no change:

Input

# install
!git clone https://github.com/ultralytics/yolov5  # clone
%cd yolov5
%pip install -qr requirements.txt  # install

# Download COCO val
import torch
torch.hub.download_url_to_file('https://ultralytics.com/assets/coco2017val.zip', 'tmp.zip')
!unzip -q tmp.zip -d ../datasets && rm tmp.zip

# master
!python val.py --weights yolov5x.pt --data coco.yaml --img 640 --iou 0.65 --half --conf 0.001
!python val.py --weights yolov5x.pt --data coco.yaml --img 640 --iou 0.65 --half --conf 0.5

# PR
%cd ..
!git clone https://github.com/pourmand1376/yolov5 -b fix_bug_validation yolov5-pr  # clone
%cd yolov5-pr
!python val.py --weights yolov5x.pt --data coco.yaml --img 640 --iou 0.65 --half --conf 0.001
!python val.py --weights yolov5x.pt --data coco.yaml --img 640 --iou 0.65 --half --conf 0.5

Output

Cloning into 'yolov5'...
remote: Enumerating objects: 13039, done.
remote: Counting objects: 100% (214/214), done.
remote: Compressing objects: 100% (101/101), done.
remote: Total 13039 (delta 133), reused 187 (delta 113), pack-reused 12825
Receiving objects: 100% (13039/13039), 12.46 MiB | 11.13 MiB/s, done.
Resolving deltas: 100% (8959/8959), done.
/content/yolov5
     |████████████████████████████████| 596 kB 14.8 MB/s 
100%
780M/780M [00:02<00:00, 308MB/s]
val: data=/content/yolov5/data/coco.yaml, weights=['yolov5x.pt'], batch_size=32, imgsz=640, conf_thres=0.001, iou_thres=0.65, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=True, project=runs/val, name=exp, exist_ok=False, half=True, dnn=False
YOLOv5 🚀 v6.1-314-g7f7bd6f Python-3.7.13 torch-1.12.0+cu113 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)

Downloading https://github.com/ultralytics/yolov5/releases/download/v6.1/yolov5x.pt to yolov5x.pt...
100% 166M/166M [00:08<00:00, 21.4MB/s]

Fusing layers... 
YOLOv5x summary: 444 layers, 86705005 parameters, 0 gradients
Downloading https://ultralytics.com/assets/Arial.ttf to /root/.config/Ultralytics/Arial.ttf...
100% 755k/755k [00:00<00:00, 46.8MB/s]
val: Scanning '/content/datasets/coco/val2017' images and labels...4952 found, 48 missing, 0 empty, 0 corrupt: 100% 5000/5000 [00:00<00:00, 10859.23it/s]
val: New cache created: /content/datasets/coco/val2017.cache
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100% 157/157 [01:10<00:00,  2.23it/s]
                 all       5000      36335      0.743      0.625      0.683      0.504
Speed: 0.1ms pre-process, 4.7ms inference, 1.2ms NMS per image at shape (32, 3, 640, 640)

Evaluating pycocotools mAP... saving runs/val/exp/yolov5x_predictions.json...
loading annotations into memory...
Done (t=0.40s)
creating index...
index created!
Loading and preparing results...
DONE (t=4.81s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=74.53s).
Accumulating evaluation results...
DONE (t=16.56s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.506
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.688
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.549
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.340
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.558
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.651
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.382
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.631
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.684
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.528
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.737
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.833
Results saved to runs/val/exp
val: data=/content/yolov5/data/coco.yaml, weights=['yolov5x.pt'], batch_size=32, imgsz=640, conf_thres=0.5, iou_thres=0.65, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=True, project=runs/val, name=exp, exist_ok=False, half=True, dnn=False
WARNING: confidence threshold 0.5 > 0.001 produces invalid results ⚠️
YOLOv5 🚀 v6.1-314-g7f7bd6f Python-3.7.13 torch-1.12.0+cu113 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)

Fusing layers... 
YOLOv5x summary: 444 layers, 86705005 parameters, 0 gradients
val: Scanning '/content/datasets/coco/val2017.cache' images and labels... 4952 found, 48 missing, 0 empty, 0 corrupt: 100% 5000/5000 [00:00<?, ?it/s]
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100% 157/157 [00:59<00:00,  2.65it/s]
                 all       5000      36335      0.803      0.582      0.707      0.572
Speed: 0.1ms pre-process, 4.7ms inference, 0.8ms NMS per image at shape (32, 3, 640, 640)

Evaluating pycocotools mAP... saving runs/val/exp2/yolov5x_predictions.json...
loading annotations into memory...
Done (t=0.69s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.26s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=12.74s).
Accumulating evaluation results...
DONE (t=2.22s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.424
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.548
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.466
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.231
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.483
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.586
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.330
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.468
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.472
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.249
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.532
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.655
Results saved to runs/val/exp2
/content
Cloning into 'yolov5-pr'...
remote: Enumerating objects: 14479, done.
remote: Counting objects: 100% (144/144), done.
remote: Compressing objects: 100% (77/77), done.
remote: Total 14479 (delta 89), reused 114 (delta 67), pack-reused 14335
Receiving objects: 100% (14479/14479), 12.68 MiB | 21.46 MiB/s, done.
Resolving deltas: 100% (10050/10050), done.
/content/yolov5-pr
val: data=/content/yolov5-pr/data/coco.yaml, weights=['yolov5x.pt'], batch_size=32, imgsz=640, conf_thres=0.001, iou_thres=0.65, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=True, project=runs/val, name=exp, exist_ok=False, half=True, dnn=False
YOLOv5 🚀 v6.1-326-g1a956c0 Python-3.7.13 torch-1.12.0+cu113 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)

Downloading https://github.com/ultralytics/yolov5/releases/download/v6.1/yolov5x.pt to yolov5x.pt...
100% 166M/166M [00:00<00:00, 350MB/s]

Fusing layers... 
YOLOv5x summary: 444 layers, 86705005 parameters, 0 gradients
val: Scanning '/content/datasets/coco/val2017.cache' images and labels... 4952 found, 48 missing, 0 empty, 0 corrupt: 100% 5000/5000 [00:00<?, ?it/s]
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100% 157/157 [01:09<00:00,  2.26it/s]
                 all       5000      36335      0.743      0.625      0.683      0.504
Speed: 0.1ms pre-process, 4.6ms inference, 1.2ms NMS per image at shape (32, 3, 640, 640)

Evaluating pycocotools mAP... saving runs/val/exp/yolov5x_predictions.json...
loading annotations into memory...
Done (t=0.41s)
creating index...
index created!
Loading and preparing results...
DONE (t=5.78s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=74.49s).
Accumulating evaluation results...
DONE (t=14.65s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.506
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.688
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.549
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.340
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.558
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.651
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.382
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.631
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.684
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.528
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.737
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.833
Results saved to runs/val/exp
val: data=/content/yolov5-pr/data/coco.yaml, weights=['yolov5x.pt'], batch_size=32, imgsz=640, conf_thres=0.5, iou_thres=0.65, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=True, project=runs/val, name=exp, exist_ok=False, half=True, dnn=False
WARNING: confidence threshold 0.5 > 0.001 produces invalid results ⚠️
YOLOv5 🚀 v6.1-326-g1a956c0 Python-3.7.13 torch-1.12.0+cu113 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)

Fusing layers... 
YOLOv5x summary: 444 layers, 86705005 parameters, 0 gradients
val: Scanning '/content/datasets/coco/val2017.cache' images and labels... 4952 found, 48 missing, 0 empty, 0 corrupt: 100% 5000/5000 [00:00<?, ?it/s]
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100% 157/157 [00:59<00:00,  2.65it/s]
                 all       5000      36335      0.803      0.582      0.707      0.572
Speed: 0.1ms pre-process, 4.7ms inference, 0.8ms NMS per image at shape (32, 3, 640, 640)

Evaluating pycocotools mAP... saving runs/val/exp2/yolov5x_predictions.json...
loading annotations into memory...
Done (t=0.68s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.25s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=12.61s).
Accumulating evaluation results...
DONE (t=2.20s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.424
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.548
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.466
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.231
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.483
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.586
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.330
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.468
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.472
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.249
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.532
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.655
Results saved to runs/val/exp2

@pourmand1376
Copy link
Contributor Author

pourmand1376 commented Jul 23, 2022

We can not see this in COCO as it has many classes, however we try, there would be one class that actually goes into for loop and from there the result would be correct. I am talking about datasets which have only one class or maybe two.

Do you know any?

I am thinking about trying COCO dataset with only one label ...

@pourmand1376
Copy link
Contributor Author

pourmand1376 commented Jul 23, 2022

I can confirm that the error can not be reproduced. The problem was that I hadn't updated my forked repo for two month. In the meantime, this problem has been fixed.

Now I think we should inform users to update their repo. Currently, It just compares it to master branch but that's not enough. We should add remote for https://github.com/ultralytics/yolov5 and compare with master of this remote if that doesn't exist.

You can still merge the result if you want, but that doesn't change anything!

@glenn-jocher
Copy link
Member

@pourmand1376 awesome! Thanks for submitting the PR regardless and thank you for confirming everything is working correctly now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants