Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PyTorch AMP check #7917

Merged
merged 11 commits into from
May 22, 2022
Merged

Add PyTorch AMP check #7917

merged 11 commits into from
May 22, 2022

Conversation

glenn-jocher
Copy link
Member

@glenn-jocher glenn-jocher commented May 21, 2022

Help identify AMP issues before training starts. Issue raised in #7908

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Enhanced AutoShape initialization with verbosity control and improved Automatic Mixed Precision (AMP) support in training.

📊 Key Changes

  • Added verbose flag to AutoShape class for optional logging.
  • Introduced check_amp function to verify AMP compatibility and functionality.
  • Replaced direct amp import with torch.cuda.amp for AMP contexts.
  • Adjusted model training to use AMP based on model compatibility check.
  • Removed redundant import statements.

🎯 Purpose & Impact

  • Verbosity Control: Allows quieter model initialization, reducing console clutter for users.
  • AMP Verification: Ensures AMP works correctly with the model, which can help avoid training issues and support debugging.
  • AMP Utilization: More precise handling of when to enable AMP, potentially improving training speed and memory efficiency.
  • Code Cleanup: Streamlined codebase for better maintainability and clarity, yielding an easier-to-understand setup for developers and users alike. 🧹💻

🚀 For users, expect potentially faster and more efficient model training with the added comfort of toggling informational messages. For developers, this is a step towards a cleaner and more robust codebase.

@glenn-jocher glenn-jocher self-assigned this May 21, 2022
@glenn-jocher glenn-jocher linked an issue May 21, 2022 that may be closed by this pull request
2 tasks
utils/general.py Outdated Show resolved Hide resolved
@glenn-jocher
Copy link
Member Author

@MarkDeia I've applied the changes. Can you check again and make sure it fails on your system?

@YipKo
Copy link

YipKo commented May 22, 2022

@MarkDeia I've applied the changes. Can you check again and make sure it fails on your system?

Sure, I've run it locally with the cuda11 version of pytorch and it shows that the check failed, on colab it says that the check passed.Seems it works well.
image

YipKo
YipKo previously approved these changes May 22, 2022
@glenn-jocher
Copy link
Member Author

@MarkDeia really strange. Are you sure the issue originates in this PR, i.e. PR fails with torchvision==0.11.2 but master is ok?

@YipKo
Copy link

YipKo commented May 22, 2022

@MarkDeia really strange. Are you sure the issue originates in this PR, i.e. PR fails with torchvision==0.11.2 but master is ok?

@glenn-jocher I think there is something wrong with my local environment as master failed to run too.
I have just found a solution to this problem, it's because the version of pillow installed automatically by pip is too high.

@glenn-jocher glenn-jocher merged commit eb1217f into master May 22, 2022
@glenn-jocher glenn-jocher deleted the amp_check branch May 22, 2022 11:41
@glenn-jocher
Copy link
Member Author

@MarkDeia PR is merged. Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐

tdhooghe pushed a commit to tdhooghe/yolov5 that referenced this pull request Jun 10, 2022
* Add PyTorch AMP check

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Cleanup

* Cleanup

* Cleanup

* Robust for DDP

* Fixes

* Add amp enabled boolean to check_train_batch_size

* Simplify

* space to prefix

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
ctjanuhowski pushed a commit to ctjanuhowski/yolov5 that referenced this pull request Sep 8, 2022
* Add PyTorch AMP check

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Cleanup

* Cleanup

* Cleanup

* Robust for DDP

* Fixes

* Add amp enabled boolean to check_train_batch_size

* Simplify

* space to prefix

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NaN tensor values problem for GTX16xx users (no problem on other devices)
2 participants