Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoBatch checks against failed solutions #8159

Merged
merged 3 commits into from
Jun 9, 2022
Merged

Conversation

glenn-jocher
Copy link
Member

@glenn-jocher glenn-jocher commented Jun 9, 2022

@kalenmike this is a simple improvement to AutoBatch to verify that returned solutions have not already failed, i.e. return batch-size 8 when 8 already produced CUDA out of memory.

This is a halfway fix until I can implement a 'final solution' that will actively verify the solved-for batch size rather than passively assume it works.

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Improved GPU memory management for batch size determination in YOLOv5.

📊 Key Changes

  • Added emojis import to autobatch.py for enriched logging.
  • Included device check before profiling to handle situations without CUDA.
  • Enhanced display of CUDA memory statistics including name, total, reserved, allocated, and free memory.
  • Revised profiling of batch sizes to store results more effectively.
  • Implemented a polynomial fit to determine optimal batch size based on available memory.
  • Added a catch for profiling failures to ensure chosen batch size has successfully passed memory requirement tests.

🎯 Purpose & Impact

  • 🔍 Clearer insights into memory usage: Users gain better visibility about their GPU memory, aiding in troubleshooting and performance optimization.
  • 💡 Better batch size predictions: The system more accurately predicts batch sizes for training, helping avoid memory-related crashes and improving utilization.
  • 🛡 Increased robustness: Fallbacks when certain batch sizes fail during profiling leads to more stable and reliable model training sessions.

@kalenmike this is a simple improvement to AutoBatch to verify that returned solutions have not already failed, i.e. return batch-size 8 when 8 already produced CUDA out of memory.

This is a halfway fix until I can implement a 'final solution' that will actively verify the solved-for batch size rather than passively assume it works.
@glenn-jocher glenn-jocher self-assigned this Jun 9, 2022
@glenn-jocher glenn-jocher merged commit 6e46617 into master Jun 9, 2022
@glenn-jocher glenn-jocher deleted the update/autobatch branch June 9, 2022 15:15
tdhooghe pushed a commit to tdhooghe/yolov5 that referenced this pull request Jun 10, 2022
* AutoBatch checks against failed solutions

@kalenmike this is a simple improvement to AutoBatch to verify that returned solutions have not already failed, i.e. return batch-size 8 when 8 already produced CUDA out of memory.

This is a halfway fix until I can implement a 'final solution' that will actively verify the solved-for batch size rather than passively assume it works.

* Update autobatch.py

* Update autobatch.py
ctjanuhowski pushed a commit to ctjanuhowski/yolov5 that referenced this pull request Sep 8, 2022
* AutoBatch checks against failed solutions

@kalenmike this is a simple improvement to AutoBatch to verify that returned solutions have not already failed, i.e. return batch-size 8 when 8 already produced CUDA out of memory.

This is a halfway fix until I can implement a 'final solution' that will actively verify the solved-for batch size rather than passively assume it works.

* Update autobatch.py

* Update autobatch.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant