Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve documentation and add distributed.rst #84

Open
wants to merge 1 commit into
base: thd
Choose a base branch
from
Open

Improve documentation and add distributed.rst #84

wants to merge 1 commit into from

Conversation

0mp
Copy link
Collaborator

@0mp 0mp commented May 6, 2017

Fixes rebase madness (#76).

TODO:

  • Explain less. This is a manual, not a paper.

Copy link
Collaborator

@jytug jytug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the tutorial ought to look quite like that. The users have no need to know anything about the backend, except that they have to chose its implementation.

Modes
=====

There are two modes to control the way calculations are distributed: master-worker mode and process-group mode.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the m-w mode and the p-g mode

There are two modes to control the way calculations are distributed: master-worker mode and process-group mode.
Each of them is designed to resemble interfaces and APIs familiar to PyTorch users.

The process-group mode API is made to look like the MPI API.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the MPI API is a great reference in a tutorial - maybe a little eplanation of the characteristics instead?


.. TODO: code examples

The master-worker mode on the other hand is dedicated to the users familiar with Nvidia CUDA.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's dedicated to users familiar with PyTorch's CUDA API


The master-worker mode on the other hand is dedicated to the users familiar with Nvidia CUDA.
The API is simple and makes it easy to start using it right away.
This model does not scale well due to the bottleneck in the master node which is responsible for all the planning job queuing in all the worker nodes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does not scale well -> is not designed for great scalability
?

Distributed computing
=====================

In order to configure connections for a program using THD it is required to set a couple of environmental variables.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think THD is the C++ library, which doesn't concern the users

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants