Skip to content

Example of training a simple Logistic Regression Pytorch model with DistributedDataParallel

Notifications You must be signed in to change notification settings

guanzgrace/torch-distributed-data-parallel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Training with DistributedDataParallel

This is an end-to-end example of training a simple Logistic Regression Pytorch model with DistributedDataParallel (DDP; single-node, multi-GPU data parallel training) on a fake dataset. The dataset gets distributed to multiple GPUs by DistributedSampler. This builds off of this tutorial and the Pytorch DDP tutorial.

Let's say you have 8 GPUs and want to run it on GPUs 5, 6, and 7, since GPUs 0-4 are in use by others. Then, it can be run with: CUDA_VISIBLE_DEVICES=5,6,7 python3 main.py

Additional resources

About

Example of training a simple Logistic Regression Pytorch model with DistributedDataParallel

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages