Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Multi-GPU Parallel Training in chargpt.py #130

Open
JinXiaofeng1234 opened this issue Feb 6, 2024 · 0 comments
Open

Support for Multi-GPU Parallel Training in chargpt.py #130

JinXiaofeng1234 opened this issue Feb 6, 2024 · 0 comments

Comments

@JinXiaofeng1234
Copy link

Hello minGPT Team,

I recently rented a cloud service with 4 NVIDIA RTX 4090 GPUs, aiming to leverage them for training models using your chargpt.py script. However, I encountered an issue where the script seems to utilize only the memory of a single GPU (24GB), which is insufficient for my training requirements.

Given the potential of multi-GPU training to significantly reduce training time and handle larger models or datasets, I'm interested in modifying chargpt.py to support multi-GPU parallel training. Could you provide guidance or suggestions on how to achieve this? Specifically, I'm looking for advice on integrating PyTorch's DataParallel or DistributedDataParallel functionalities into the script.

I appreciate any help or pointers you can provide. Thank you for your time and for the great work on the minGPT project.

Best regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant