AutoBatch checks against failed solutions #8159

glenn-jocher · 2022-06-09T09:19:39Z

@kalenmike this is a simple improvement to AutoBatch to verify that returned solutions have not already failed, i.e. return batch-size 8 when 8 already produced CUDA out of memory.

This is a halfway fix until I can implement a 'final solution' that will actively verify the solved-for batch size rather than passively assume it works.

🛠️ PR Summary

_{Made with ❤️ by Ultralytics Actions}

🌟 Summary

Improved GPU memory management for batch size determination in YOLOv5.

📊 Key Changes

Added emojis import to autobatch.py for enriched logging.
Included device check before profiling to handle situations without CUDA.
Enhanced display of CUDA memory statistics including name, total, reserved, allocated, and free memory.
Revised profiling of batch sizes to store results more effectively.
Implemented a polynomial fit to determine optimal batch size based on available memory.
Added a catch for profiling failures to ensure chosen batch size has successfully passed memory requirement tests.

🎯 Purpose & Impact

🔍 Clearer insights into memory usage: Users gain better visibility about their GPU memory, aiding in troubleshooting and performance optimization.
💡 Better batch size predictions: The system more accurately predicts batch sizes for training, helping avoid memory-related crashes and improving utilization.
🛡 Increased robustness: Fallbacks when certain batch sizes fail during profiling leads to more stable and reliable model training sessions.

@kalenmike

@kalenmike this is a simple improvement to AutoBatch to verify that returned solutions have not already failed, i.e. return batch-size 8 when 8 already produced CUDA out of memory. This is a halfway fix until I can implement a 'final solution' that will actively verify the solved-for batch size rather than passively assume it works.

@kalenmike

* AutoBatch checks against failed solutions @kalenmike this is a simple improvement to AutoBatch to verify that returned solutions have not already failed, i.e. return batch-size 8 when 8 already produced CUDA out of memory. This is a halfway fix until I can implement a 'final solution' that will actively verify the solved-for batch size rather than passively assume it works. * Update autobatch.py * Update autobatch.py

@kalenmike

* AutoBatch checks against failed solutions @kalenmike this is a simple improvement to AutoBatch to verify that returned solutions have not already failed, i.e. return batch-size 8 when 8 already produced CUDA out of memory. This is a halfway fix until I can implement a 'final solution' that will actively verify the solved-for batch size rather than passively assume it works. * Update autobatch.py * Update autobatch.py

glenn-jocher self-assigned this Jun 9, 2022

glenn-jocher added 2 commits June 9, 2022 11:21

Update autobatch.py

34741c7

Update autobatch.py

65331e3

glenn-jocher merged commit 6e46617 into master Jun 9, 2022

glenn-jocher deleted the update/autobatch branch June 9, 2022 15:15

Hojland mentioned this pull request Oct 17, 2022

feat/bump Go-Autonomous/yolov5#15

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AutoBatch checks against failed solutions #8159

AutoBatch checks against failed solutions #8159

glenn-jocher commented Jun 9, 2022 •

edited by UltralyticsAssistant

Loading

AutoBatch checks against failed solutions #8159

AutoBatch checks against failed solutions #8159

Conversation

glenn-jocher commented Jun 9, 2022 • edited by UltralyticsAssistant Loading

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

glenn-jocher commented Jun 9, 2022 •

edited by UltralyticsAssistant

Loading