-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve documentation and add distributed.rst #84
base: thd
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the tutorial ought to look quite like that. The users have no need to know anything about the backend, except that they have to chose its implementation.
Modes | ||
===== | ||
|
||
There are two modes to control the way calculations are distributed: master-worker mode and process-group mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the m-w mode and the p-g mode
There are two modes to control the way calculations are distributed: master-worker mode and process-group mode. | ||
Each of them is designed to resemble interfaces and APIs familiar to PyTorch users. | ||
|
||
The process-group mode API is made to look like the MPI API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the MPI API is a great reference in a tutorial - maybe a little eplanation of the characteristics instead?
|
||
.. TODO: code examples | ||
|
||
The master-worker mode on the other hand is dedicated to the users familiar with Nvidia CUDA. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's dedicated to users familiar with PyTorch's CUDA API
|
||
The master-worker mode on the other hand is dedicated to the users familiar with Nvidia CUDA. | ||
The API is simple and makes it easy to start using it right away. | ||
This model does not scale well due to the bottleneck in the master node which is responsible for all the planning job queuing in all the worker nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does not scale well -> is not designed for great scalability
?
Distributed computing | ||
===================== | ||
|
||
In order to configure connections for a program using THD it is required to set a couple of environmental variables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think THD is the C++ library, which doesn't concern the users
Fixes rebase madness (#76).
TODO: