Add PyTorch AMP check #7917

glenn-jocher · 2022-05-21T17:46:26Z

Help identify AMP issues before training starts. Issue raised in #7908

🛠️ PR Summary

🌟 Summary

Enhanced AutoShape initialization with verbosity control and improved Automatic Mixed Precision (AMP) support in training.

📊 Key Changes

Added verbose flag to AutoShape class for optional logging.
Introduced check_amp function to verify AMP compatibility and functionality.
Replaced direct amp import with torch.cuda.amp for AMP contexts.
Adjusted model training to use AMP based on model compatibility check.
Removed redundant import statements.

🎯 Purpose & Impact

Verbosity Control: Allows quieter model initialization, reducing console clutter for users.
AMP Verification: Ensures AMP works correctly with the model, which can help avoid training issues and support debugging.
AMP Utilization: More precise handling of when to enable AMP, potentially improving training speed and memory efficiency.
Code Cleanup: Streamlined codebase for better maintainability and clarity, yielding an easier-to-understand setup for developers and users alike. 🧹💻

🚀 For users, expect potentially faster and more efficient model training with the added comfort of toggling informational messages. For developers, this is a step towards a cleaner and more robust codebase.

for more information, see https://pre-commit.ci

utils/general.py

glenn-jocher · 2022-05-22T09:21:50Z

@MarkDeia I've applied the changes. Can you check again and make sure it fails on your system?

YipKo · 2022-05-22T09:23:42Z

@MarkDeia I've applied the changes. Can you check again and make sure it fails on your system?

Sure, I've run it locally with the cuda11 version of pytorch and it shows that the check failed, on colab it says that the check passed.Seems it works well.

glenn-jocher · 2022-05-22T11:35:12Z

@MarkDeia really strange. Are you sure the issue originates in this PR, i.e. PR fails with torchvision==0.11.2 but master is ok?

YipKo · 2022-05-22T11:37:16Z

@MarkDeia really strange. Are you sure the issue originates in this PR, i.e. PR fails with torchvision==0.11.2 but master is ok?

@glenn-jocher I think there is something wrong with my local environment as master failed to run too.
I have just found a solution to this problem, it's because the version of pillow installed automatically by pip is too high.

glenn-jocher · 2022-05-22T11:41:37Z

@MarkDeia PR is merged. Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐

* Add PyTorch AMP check * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Cleanup * Cleanup * Cleanup * Robust for DDP * Fixes * Add amp enabled boolean to check_train_batch_size * Simplify * space to prefix Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Add PyTorch AMP check

1c8204f

glenn-jocher self-assigned this May 21, 2022

[pre-commit.ci] auto fixes from pre-commit.com hooks

2f92c3c

for more information, see https://pre-commit.ci

glenn-jocher linked an issue May 21, 2022 that may be closed by this pull request

NaN tensor values problem for GTX16xx users (no problem on other devices) #7908

Closed

2 tasks

glenn-jocher added 3 commits May 21, 2022 19:55

Cleanup

6ad990a

Cleanup

3ccf8cb

Cleanup

34e9cca

glenn-jocher mentioned this pull request May 21, 2022

NaN tensor values problem for GTX16xx users (no problem on other devices) #7908

Closed

2 tasks

Robust for DDP

d71f39c

YipKo reviewed May 22, 2022

View reviewed changes

utils/general.py Outdated Show resolved Hide resolved

Fixes

4dbe6aa

YipKo previously approved these changes May 22, 2022

View reviewed changes

Add amp enabled boolean to check_train_batch_size

0dcecc3

glenn-jocher dismissed YipKo’s stale review via 0dcecc3 May 22, 2022 11:14

glenn-jocher added 2 commits May 22, 2022 13:22

Simplify

78613e0

space to prefix

ca11e37

Merge branch 'master' into amp_check

7106e21

glenn-jocher merged commit eb1217f into master May 22, 2022

glenn-jocher deleted the amp_check branch May 22, 2022 11:41

glenn-jocher mentioned this pull request Jun 17, 2022

Training reproducibility improvements #8213

Merged

Hojland mentioned this pull request Oct 17, 2022

feat/bump Go-Autonomous/yolov5#15

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PyTorch AMP check #7917

Add PyTorch AMP check #7917

glenn-jocher commented May 21, 2022 •

edited by UltralyticsAssistant

Loading

glenn-jocher commented May 22, 2022

YipKo commented May 22, 2022 •

edited

Loading

glenn-jocher commented May 22, 2022

YipKo commented May 22, 2022 •

edited

Loading

glenn-jocher commented May 22, 2022

Add PyTorch AMP check #7917

Add PyTorch AMP check #7917

Conversation

glenn-jocher commented May 21, 2022 • edited by UltralyticsAssistant Loading

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

glenn-jocher commented May 22, 2022

YipKo commented May 22, 2022 • edited Loading

glenn-jocher commented May 22, 2022

YipKo commented May 22, 2022 • edited Loading

glenn-jocher commented May 22, 2022

glenn-jocher commented May 21, 2022 •

edited by UltralyticsAssistant

Loading

YipKo commented May 22, 2022 •

edited

Loading

YipKo commented May 22, 2022 •

edited

Loading