[PP] Fix PP meta init #582

wconstab · 2024-09-18T00:45:02Z

Stack from ghstack (oldest at bottom):

-> [PP] Fix PP meta init #582

Uses meta device for tensors/model used before pipeline splitting.

Important:
Relies on pytorch/pytorch#136243 to make PipelineStage avoid
materializing the model and the input/output buffers eagerly.

Relies on existing .to(device) in train.py to finally materialize the
model.

[ghstack-poisoned]

Uses meta device for tensors/model used before pipeline splitting. *Important:* Relies on pytorch/pytorch#136243 to make PipelineStage avoid materializing the model and the input/output buffers eagerly. Relies on existing .to(device) in train.py to finally materialize the model. ghstack-source-id: 66fa9f1f78dff0b1af753dc4b2afcc09d897751d Pull Request resolved: #582

tianyu-l · 2024-09-18T03:33:17Z

torchtitan/parallelisms/pipeline_llama.py


-        model.to_empty(device=device)
        stage = PipelineStage(


just to understand: the device arg for PipelineStage still needs to be the actual device, e.g. cuda, correct?

Correct. And I want to remove that too in PipelineStage but I didn't do it yet.

kwen2501

Lgtm

kwen2501 · 2024-09-19T17:06:15Z

Relies on existing .to(device) in train.py to finally materialize the model.

Curious -- why would train.py make a .to call?

Should init_weight create tensor on the right device directly?

Or, if we are loading from DCP, would DCP return a state dict with DTensors on target device or just a state dict with DTensors on CPU?

Update

6f5eaf8

[ghstack-poisoned]

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 18, 2024

wconstab requested review from kwen2501, H-Huang, lessw2020 and tianyu-l September 18, 2024 00:46

tianyu-l reviewed Sep 18, 2024

View reviewed changes

kwen2501 approved these changes Sep 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PP] Fix PP meta init #582

[PP] Fix PP meta init #582

wconstab commented Sep 18, 2024 •

edited

Loading

tianyu-l Sep 18, 2024

wconstab Sep 18, 2024

kwen2501 left a comment

kwen2501 commented Sep 19, 2024

[PP] Fix PP meta init #582

Are you sure you want to change the base?

[PP] Fix PP meta init #582

Conversation

wconstab commented Sep 18, 2024 • edited Loading

tianyu-l Sep 18, 2024

Choose a reason for hiding this comment

wconstab Sep 18, 2024

Choose a reason for hiding this comment

kwen2501 left a comment

Choose a reason for hiding this comment

kwen2501 commented Sep 19, 2024

wconstab commented Sep 18, 2024 •

edited

Loading