Only use annotated images to create dataset in autosplit() #2331

kinoute · 2021-03-02T01:02:41Z

After I read your message to get the best results from Yolov5 in #2313 (comment) where you talk about background images and also after our discussion regarding autosplit() and annotated files in #2228, I decided to add two functionalities in autosplit():

The first one is the ability to create a dataset/splits only with images that have an annotation file, i.e a .txt file, associated to it. As we talked about this, the absence of a txt file could mean two things:
- either the image wasn't yet labelled by someone,
- either there is no object to detect.
When it's easy to create small datasets, when you have to create datasets with thousands of images (and more coming), it's hard to track where you at and you don't want to wait to have all of them annotated before starting to train. Which means some images would lack txt files and annotations, resulting in label inconsistency as you say in Hunt for the highest MAP #2313. By adding the annotated_only argument to the function, people could create, if they want to, datasets/splits only with images that were labelled, for sure.
The second functionality is the ability to fill our dataset with background images automatically. Basically, we provide a path in the bg_imgs_path argument and some images from this path/folder will be picked and added to the splits. The number, or more precisely, the ratio of background images can be configured with the bg_imgs_ratio. I followed your advices on the other issue and limited it to a float between 0 and 0.1 (0% and 10%). This ratio will be used to calculate how many background images should be added in each split. If my training split has 1000 images and I set up bg_imgs_ratio to 0.1, then, if I have enough background images to fill the demand, around 100 background images will be added to the training split.

🛠️ PR Summary

_{Made with ❤️ by Ultralytics Actions}

🌟 Summary

Updated YOLOv5 Dockerfile, README, and various performance improvements.

📊 Key Changes

Docker base image updated to a newer PyTorch version.
Installation of Python packages now includes a notebook and uses a cache.
Environment variable HOME set for Docker.
Simplified PyTorch Hub models loading code.
Argoverse-HD dataset support added with scripts for preparation.
Default pretrained weights usage is now consistent across model types.
Performance improvements in data loading functions.

🎯 Purpose & Impact

🚀 Better base environment due to newer Docker image.
⏱ Reduced Python package installation time via pip caching.
🛠 Simplified usage for users loading models with PyTorch Hub.
📈 Enhanced dataset support enables users to work with Argoverse-HD, expanding the range of applicable real-world scenarios.
🤖 Ensures pretrained models are now the default, providing better out-of-the-box performance.
✅ General optimizations lead to faster data preprocessing, benefiting all users in terms of efficiency.

Sync with master branch

* EMA bug fix 2 * update

* Resume with custom anchors fix * Update train.py

* faster random index generator for mosaic augementation We don't need to access list to generate random index It makes augmentation slower. * Update datasets.py Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

Set HOME environment variable per Binder requirements. https://github.com/binder-examples/minimal-dockerfile

…augmentation (#2383) image weights compatible faster random index generator v2 for mosaic augmentation

* option for skip last layer and cuda export support * added parameter device * fix import * cleanup 1 * cleanup 2 * opt-in grid --grid will export with grid computation, default export will skip grid (same as current) * default --device cpu GPU export causes ONNX and CoreML errors. Co-authored-by: Jan Hajek <jan.hajek@gmail.com> Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* GCP sudo docker * cleanup

* added argoverse-download ability * bugfix * add support for Argoverse dataset * Refactored code * renamed to argoverse-HD * unzip -q and YOLOv5 small cleanup items * add image counts Co-authored-by: Kartikeya Sharma <kartikes@trinity.vision.cs.cmu.edu> Co-authored-by: Kartikeya Sharma <kartikes@trinity-0-32.eth> Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

* Integer printout * test.py 'Labels' * Update train.py

* Update test.py --task train val study * update argparser --task

* labels.png class names * fontsize=10

curl preferred over wget for slightly better cross platform compatibility (i.e. out of the box macos compatible).

* Add autoShape() speed profiling * Update common.py * Create README.md * Update hubconf.py * cleanuip

kinoute and others added 30 commits February 18, 2021 22:43

add class and confidence on images

e2d1b0b

Merge pull request #1 from ultralytics/master

2664add

Sync with master branch

Get image with predictions as base64

eb70836

Return base64 instead of print

aa87b44

Get image with predictions as base64

5998453

Return base64 instead of print

1cb182c

Merge branch 'master' of github.com:kinoute/yolov5

2e88f0f

Merge branch 'master' of https://github.com/ultralytics/yolov5

8a3ac2d

Autosplit with background images & annotated files only

d71bf80

remove base64 Pull Request

2ca8dbb

EMA bug fix 2 (#2330)

fab5085

* EMA bug fix 2 * update

FROM nvcr.io/nvidia/pytorch:21.02-py3 (#2341)

ab86cec

Confusion matrix background axis swap (#2114)

2c56ad5

Created using Colaboratory

fe6ebb9

Anchor override (#2350)

a3ecf0f

Resume with custom anchors fix (#2361)

e931b9d

* Resume with custom anchors fix * Update train.py

Faster random index generator for mosaic augmentation (#2345)

300d518

* faster random index generator for mosaic augementation We don't need to access list to generate random index It makes augmentation slower. * Update datasets.py Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

--no-cache notebook (#2381)

692e1f3

ENV HOME=/usr/src/app (#2382)

c64fe21

Set HOME environment variable per Binder requirements. https://github.com/binder-examples/minimal-dockerfile

image weights compatible faster random index generator v2 for mosaic …

cd8ed35

…augmentation (#2383) image weights compatible faster random index generator v2 for mosaic augmentation

bbox_iou() stability and speed improvements (#2385)

ba18528

AWS wait && echo "All tasks done." (#2391)

7c2c957

GCP sudo docker userdata.sh (#2393)

e8a2b83

* GCP sudo docker * cleanup

CVPR 2021 Argoverse-HD autodownload fix (#2418)

d5ca8ca

DDP after autoanchor reorder (#2421)

886f1c0

Integer printout (#2450)

f01f322

* Integer printout * test.py 'Labels' * Update train.py

Update test.py --task train val study (#2453)

f419721

* Update test.py --task train val study * update argparser --task

labels.jpg class names (#2454)

08d4918

* labels.png class names * fontsize=10

glenn-jocher and others added 11 commits March 12, 2021 22:27

CVPR 2021 Argoverse-HD autodownload curl (#2455)

747c265

curl preferred over wget for slightly better cross platform compatibility (i.e. out of the box macos compatible).

Add autoShape() speed profiling (#2459)

569757e

* Add autoShape() speed profiling * Update common.py * Create README.md * Update hubconf.py * cleanuip

autoShape() speed profiling update (#2460)

f813f6d

Update tutorial.ipynb

20d879d

Created using Colaboratory

6f718ce

Get image with predictions as base64

1b2dade

Return base64 instead of print

5b2614f

Autosplit with background images & annotated files only

3b2e1d8

remove base64 Pull Request

20c2aa4

Merge branch 'master' of github.com:kinoute/yolov5

a094f84

Create dataset from annotated images only

b69ed71

kinoute changed the title ~~Add background images and annotated only features to autosplit()~~ Only use annotated images to create dataset in autosplit() Mar 14, 2021

kinoute closed this Mar 14, 2021

kinoute mentioned this pull request Mar 14, 2021

Be able to create dataset from annotated images only #2466

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only use annotated images to create dataset in autosplit() #2331

Only use annotated images to create dataset in autosplit() #2331

kinoute commented Mar 2, 2021 •

edited by UltralyticsAssistant

Loading

Only use annotated images to create dataset in autosplit() #2331

Only use annotated images to create dataset in autosplit() #2331

Conversation

kinoute commented Mar 2, 2021 • edited by UltralyticsAssistant Loading

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

kinoute commented Mar 2, 2021 •

edited by UltralyticsAssistant

Loading