Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only use annotated images to create dataset in autosplit() #2331

Closed
wants to merge 41 commits into from
Closed

Only use annotated images to create dataset in autosplit() #2331

wants to merge 41 commits into from

Conversation

kinoute
Copy link
Contributor

@kinoute kinoute commented Mar 2, 2021

After I read your message to get the best results from Yolov5 in #2313 (comment) where you talk about background images and also after our discussion regarding autosplit() and annotated files in #2228, I decided to add two functionalities in autosplit():

  • The first one is the ability to create a dataset/splits only with images that have an annotation file, i.e a .txt file, associated to it. As we talked about this, the absence of a txt file could mean two things:

    • either the image wasn't yet labelled by someone,
    • either there is no object to detect.

    When it's easy to create small datasets, when you have to create datasets with thousands of images (and more coming), it's hard to track where you at and you don't want to wait to have all of them annotated before starting to train. Which means some images would lack txt files and annotations, resulting in label inconsistency as you say in Hunt for the highest MAP #2313. By adding the annotated_only argument to the function, people could create, if they want to, datasets/splits only with images that were labelled, for sure.

  • The second functionality is the ability to fill our dataset with background images automatically. Basically, we provide a path in the bg_imgs_path argument and some images from this path/folder will be picked and added to the splits. The number, or more precisely, the ratio of background images can be configured with the bg_imgs_ratio. I followed your advices on the other issue and limited it to a float between 0 and 0.1 (0% and 10%). This ratio will be used to calculate how many background images should be added in each split. If my training split has 1000 images and I set up bg_imgs_ratio to 0.1, then, if I have enough background images to fill the demand, around 100 background images will be added to the training split.

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Updated YOLOv5 Dockerfile, README, and various performance improvements.

📊 Key Changes

  • Docker base image updated to a newer PyTorch version.
  • Installation of Python packages now includes a notebook and uses a cache.
  • Environment variable HOME set for Docker.
  • Simplified PyTorch Hub models loading code.
  • Argoverse-HD dataset support added with scripts for preparation.
  • Default pretrained weights usage is now consistent across model types.
  • Performance improvements in data loading functions.

🎯 Purpose & Impact

  • 🚀 Better base environment due to newer Docker image.
  • ⏱ Reduced Python package installation time via pip caching.
  • 🛠 Simplified usage for users loading models with PyTorch Hub.
  • 📈 Enhanced dataset support enables users to work with Argoverse-HD, expanding the range of applicable real-world scenarios.
  • 🤖 Ensures pretrained models are now the default, providing better out-of-the-box performance.
  • ✅ General optimizations lead to faster data preprocessing, benefiting all users in terms of efficiency.

kinoute and others added 30 commits February 18, 2021 22:43
* EMA bug fix 2

* update
* Resume with custom anchors fix

* Update train.py
* faster random index generator for mosaic augementation

We don't need to access list to generate random index

It makes augmentation slower.

* Update datasets.py

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
…augmentation (#2383)

image weights compatible faster random index generator v2 for mosaic augmentation
* option for skip last layer and cuda export support

* added parameter device

* fix import

* cleanup 1

* cleanup 2

* opt-in grid

--grid will export with grid computation, default export will skip grid (same as current)

* default --device cpu

GPU export causes ONNX and CoreML errors.

Co-authored-by: Jan Hajek <jan.hajek@gmail.com>
Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
* GCP sudo docker

* cleanup
* added argoverse-download ability

* bugfix

* add support for Argoverse dataset

* Refactored code

* renamed to argoverse-HD

* unzip -q and YOLOv5

small cleanup items

* add image counts

Co-authored-by: Kartikeya Sharma <kartikes@trinity.vision.cs.cmu.edu>
Co-authored-by: Kartikeya Sharma <kartikes@trinity-0-32.eth>
Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
* Integer printout

* test.py 'Labels'

* Update train.py
* Update test.py --task train val study

* update argparser --task
* labels.png class names

* fontsize=10
@kinoute kinoute changed the title Add background images and annotated only features to autosplit() Only use annotated images to create dataset in autosplit() Mar 14, 2021
@kinoute kinoute closed this Mar 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants