[Sync] Merge feature/colossal-infer with main #5568

yuanheng-zhao · 2024-04-08T08:36:33Z

📌 Checklist before creating the PR

I have created an issue for this PR for traceability
The title follows the standard format: [doc/gemini/tensor/...]: A concise description
I have added relevant tags if possible for us to better distinguish different PRs
I have installed pre-commit: pip install pre-commit && pre-commit install

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

Merge main to resolve pre-commit ci automatic modifications.

💥 Checklist before requesting a review

I have linked my PR to an issue (instruction)
My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
I have performed a self-review of my code
I have added thorough tests.
I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

🌝 Yes, I do.
🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

* benchmark gpt2 * fix fix fix fix * [doc] fix typo in Colossal-LLaMA-2/README.md (hpcaitech#5247) * [workflow] fixed build CI (hpcaitech#5240) * [workflow] fixed build CI * polish * polish * polish * polish * polish * [ci] fixed booster test (hpcaitech#5251) * [ci] fixed booster test * [ci] fixed booster test * [ci] fixed booster test * [ci] fixed ddp test (hpcaitech#5254) * [ci] fixed ddp test * polish * fix typo in applications/ColossalEval/README.md (hpcaitech#5250) * [ci] fix shardformer tests. (hpcaitech#5255) * fix ci fix * revert: revert p2p * feat: add enable_metadata_cache option * revert: enable t5 tests --------- Co-authored-by: Wenhao Chen <cwher@outlook.com> * [doc] fix doc typo (hpcaitech#5256) * [doc] fix annotation display * [doc] fix llama2 doc * [hotfix]: add pp sanity check and fix mbs arg (hpcaitech#5268) * fix: fix misleading mbs arg * feat: add pp sanity check * fix: fix 1f1b sanity check * [workflow] fixed incomplete bash command (hpcaitech#5272) * [workflow] fixed oom tests (hpcaitech#5275) * [workflow] fixed oom tests * polish * polish * polish * [ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (hpcaitech#5276) * fix ci fix * fix test * revert: revert p2p * feat: add enable_metadata_cache option * revert: enable t5 tests * fix --------- Co-authored-by: Wenhao Chen <cwher@outlook.com> * [shardformer] hybridparallelplugin support gradients accumulation. (hpcaitech#5246) * support gradients acc fix fix fix fix fix fix fix fix fix fix fix fix fix * fix fix * fix fix fix * [hotfix] Fix ShardFormer test execution path when using sequence parallelism (hpcaitech#5230) * fix auto loading gpt2 tokenizer (hpcaitech#5279) * [doc] add llama2-13B disyplay (hpcaitech#5285) * Update README.md * fix 13b typo --------- Co-authored-by: binmakeswell <binmakeswell@gmail.com> * fix llama pretrain (hpcaitech#5287) * fix * fix * fix fix * fix fix fix * fix fix * benchmark gpt2 * fix fix fix fix * [workflow] fixed build CI (hpcaitech#5240) * [workflow] fixed build CI * polish * polish * polish * polish * polish * [ci] fixed booster test (hpcaitech#5251) * [ci] fixed booster test * [ci] fixed booster test * [ci] fixed booster test * fix fix * fix fix fix * fix * fix fix fix fix fix * fix * Update shardformer.py --------- Co-authored-by: digger yu <digger-yu@outlook.com> Co-authored-by: Frank Lee <somerlee.9@gmail.com> Co-authored-by: Wenhao Chen <cwher@outlook.com> Co-authored-by: binmakeswell <binmakeswell@gmail.com> Co-authored-by: Zhongkai Zhao <kanezz620@gmail.com> Co-authored-by: Michelle <97082656+MichelleMa8@users.noreply.github.com> Co-authored-by: Desperado-Jia <502205863@qq.com>

* [doc] sora release * [doc] sora release * [doc] sora release * [doc] sora release

* fix import error * Update dpt_depth.py --------- Co-authored-by: binmakeswell <binmakeswell@gmail.com>

…r/ (hpcaitech#5317)

…caitech#5422)

…#5335) Co-authored-by: binmakeswell <binmakeswell@gmail.com>

Signed-off-by: hugo-syn <hugo.vincent@synacktiv.com>

* Update train_ddp.yaml delete "strategy" to fix DDP config loading bug in "main.py" * Update train_ddp.yaml fix inference with scripts/txt2img.py config file load bug. * Update README.md add pretrain model test code.

…caitech#5428) * add stream chat for chat version * remove os.system clear * modify function name

* [devops] fix compatibility * [hotfix] update compatibility test on pr * [devops] fix compatibility * [devops] record duration during comp test * [test] decrease test duration * fix falcon

…aitech#5431) * fix * padding vocab_size when using pipeline parallellism padding vocab_size when using pipeline parallellism fix fix * fix * fix fix fix * fix gather output * fix * fix * fix fix resize embedding fix resize embedding * fix resize embedding fix * revert * revert * revert

* [doc] release Open-Sora 1.0 with model weights * [doc] release Open-Sora 1.0 with model weights * [doc] release Open-Sora 1.0 with model weights

* [doc] update open-sora demo * [doc] update open-sora demo * [doc] update open-sora demo

* [misc] add submodule * remove submodule * [example] support grok-1 tp inference * [example] add grok-1 inference script * [example] refactor code * [example] add grok-1 readme * [exmaple] add test ci * [exmaple] update readme

* [release] grok-1 inference * [release] grok-1 inference * [release] grok-1 inference

* revise grok-1 example * remove unused arg in scripts * prevent re-installing torch * update readme * revert modifying colossalai requirements * add perf * trivial * add tokenizer url

…ech#5404) * fix: simplify merge_batch * fix: use return_outputs=False to eliminate extra memory consumption * feat: add return_outputs warning * style: remove `return_outputs=False` as it is the default value

* [release] grok-1 inference benchmark * [release] grok-1 inference benchmark * [release] grok-1 inference benchmark * [release] grok-1 inference benchmark * [release] grok-1 inference benchmark

* fix * padding vocab_size when using pipeline parallellism padding vocab_size when using pipeline parallellism fix fix * fix * fix fix fix * fix gather output * fix * fix * fix fix resize embedding fix resize embedding * fix resize embedding fix * revert * revert * revert * fix lm forward distribution * fix * test ci * fix

Fix layout convertor caching

…#5510) * [feature] refactor colo attention (hpcaitech#5462) * [extension] update api * [feature] add colo attention * [feature] update sdpa * [feature] update npu attention * [feature] update flash-attn * [test] add flash attn test * [test] update flash attn test * [shardformer] update modeling to fit colo attention (hpcaitech#5465) * [misc] refactor folder structure * [shardformer] update llama flash-attn * [shardformer] fix llama policy * [devops] update tensornvme install * [test] update llama test * [shardformer] update colo attn kernel dispatch * [shardformer] update blip2 * [shardformer] update chatglm * [shardformer] update gpt2 * [shardformer] update gptj * [shardformer] update opt * [shardformer] update vit * [shardformer] update colo attention mask prep * [shardformer] update whisper * [test] fix shardformer tests (hpcaitech#5514) * [test] fix shardformer tests * [test] fix shardformer tests

hpcaitech#5517) Co-authored-by: github-actions <github-actions@github.com>

… is used (hpcaitech#5189) * Use self.[distribute_layers|get_stage_index] to exploit custom layer distribution * Change static methods for t5 layer distribution to member functions * Change static methods for whisper layer distribution to member functions * Replace whisper policy usage with self one * Fix test case to use non-static layer distribution methods * fix: fix typo --------- Co-authored-by: Wenhao Chen <cwher@outlook.com>

) * [fix] use tokenizer from the same pretrained path * trust remote code

* Add dpo. Fix sft, ppo, lora. Refactor all * fix and tested ppo * 2 nd round refactor * add ci tests * fix ci * fix ci * fix readme, style * fix readme style * fix style, fix benchmark * reproduce benchmark result, remove useless files * rename to ColossalChat * use new image * fix ci workflow * fix ci * use local model/tokenizer for ci tests * fix ci * fix ci * fix ci * fix ci timeout * fix rm progress bar. fix ci timeout * fix ci * fix ci typo * remove 3d plugin from ci temporary * test environment * cannot save optimizer * support chat template * fix readme * fix path * test ci locally * restore build_or_pr * fix ci data path * fix benchmark * fix ci, move ci tests to 3080, disable fast tokenizer * move ci to 85 * support flash attention 2 * add all-in-one data preparation script. Fix colossal-llama2-chat chat template * add hardware requirements * move ci test data * fix save_model, add unwrap * fix missing bos * fix missing bos; support grad accumulation with gemini * fix ci * fix ci * fix ci * fix llama2 chat template config * debug sft * debug sft * fix colossalai version requirement * fix ci * add sanity check to prevent NaN loss * fix requirements * add dummy data generation script * add dummy data generation script * add dummy data generation script * add dummy data generation script * update readme * update readme * update readme and ignore * fix logger bug * support parallel_output * modify data preparation logic * fix tokenization * update lr * fix inference * run pre-commit --------- Co-authored-by: Tong Li <tong.li352711588@gmail.com>

…genous shard policy for llama (hpcaitech#5508) * feat: add `GradientCheckpointConfig` and `PipelineGradientCheckpointConfig` * feat: apply `GradientCheckpointConfig` to policy and llama_forward * feat: move `distribute_layer` and `get_stage_index` to PipelineStageManager * fix: add optional args for `distribute_layer` and `get_stage_index` * fix: fix changed API calls * test: update llama tests * style: polish `GradientCheckpointConfig` * fix: fix pipeline utils tests

Co-authored-by: Edenzzzz <wtan45@wisc.edu>

* sequence parallel optimization * validate sequence parallel in llama (code to be polished) * shardformer api writing * integrate sequence parallel in ShardFormer * fix pp bugs and sp bugs for LlaMa model * integrating ring-based sequence parallelism into ShardFormer * [sequence parallelism]: Add fused megatron function * integrating ring-based sequence parallelism into ShardFormer --------- Co-authored-by: linsj20 <linsj20@mails.tsinghua.edu.cn> * fix bugs when useing sp and flashattention together * fix operation function name * support flash attention for ulysses-style sp * clarify sp process group * fix compatibility bugs in moe plugin * fix fused linear bugs * fix linear layer test * support gpt model all-to-all sp * modify shard data dimension (meant to be dim=-1) * support megtron-style sp and distributed attn for llama model * [shardformer] add megatron sp to llama * support llama7B 128k with distributed attention * [shardformer] robustness enhancement * add block attn * sp mode 1: keep input as a complete sequence * fix sp compatability * finish sp mode 3 support for gpt * using all_to_all_single when batch size is 1 * support mode 2 sp in gpt2 (hpcaitech#5) * [shardformer] add megatron sp to llama * support llama7B 128k with distributed attention * [shardformer] robustness enhancement * add block attn * sp mode 1: keep input as a complete sequence * fix sp compatability * refactor ring implementation * support mode 2 sp in gpt2 * polish code * enable distributed attn mask when using sp mode 2 and 3 in llama * automatically enable flash attn when using sp mode 2 and 3 in llama * inplace attn mask * add zero2 support for sequence parallel * polish code * fix bugs * fix gemini checkpoint io * loose tensor checking atol and rtol * add comment * fix llama layernorm grad * fix zero grad * fix zero grad * fix conflict * update split and gather auto grad func * sequence parallel: inside text split (hpcaitech#6) * polish code (part 1) * polish code (part 2) * polish code (part 2.5) * polish code (part 3) * sequence parallel: inside text split * miscellaneous minor fixes * polish code * fix ulysses style ZeRO * sequence parallel: inside text split * miscellaneous minor fixes * disaggregate sp group and dp group for sp * fix llama and gpt sp * polish code * move ulysses grad sync to ddp (hpcaitech#9) * remove zero_stage and unbind the grad sync for alltoall sp * add 2d group creation test * move ulysses grad sync to ddp * add 2d group creation test * remove useless code * change shard config not to enable sp when enable_all_optimizations * add sp warnings for several model * remove useless code --------- Co-authored-by: linsj20 <linsj20@mails.tsinghua.edu.cn>

Co-authored-by: Edenzzzz <wtan45@wisc.edu>

…#5548)

* [devops] remove post commit ci * [misc] run pre-commit on all files * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

for more information, see https://pre-commit.ci

yuanheng-zhao · 2024-04-08T12:51:42Z

Issues-translate-bot · 2024-04-08T12:51:52Z

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

yuanheng-zhao · 2024-04-09T02:03:37Z

Dependency issue in inference tests will be fixed in another PR.

flybird11111 and others added 30 commits March 4, 2024 16:18

[doc] sora release (hpcaitech#5425)

822241a

* [doc] sora release * [doc] sora release * [doc] sora release * [doc] sora release

[devops] fix extention building (hpcaitech#5427)

070df68

[hotfix] fix sd vit import error (hpcaitech#5420)

e304e4d

* fix import error * Update dpt_depth.py --------- Co-authored-by: binmakeswell <binmakeswell@gmail.com>

[hotfix] fix typo of openmoe model source (hpcaitech#5403)

e239cf9

[doc] update some translations with README-zh-Hans.md (hpcaitech#5382)

70cce5c

[hotfix] fix typo change _descrption to _description (hpcaitech#5331)

16c96d4

[hotfix] fix typo change enabel to enable under colossalai/shardforme…

049121d

…r/ (hpcaitech#5317)

[eval-hotfix] set few_shot_data to None when few shot is disabled (hp…

a7ae2b5

…caitech#5422)

[hotfix] fix typo change MoECheckpintIO to MoECheckpointIO (hpcaitech…

5e1c93d

…#5335) Co-authored-by: binmakeswell <binmakeswell@gmail.com>

[doc] Fix typo s/infered/inferred/ (hpcaitech#5288)

c8003d4

Signed-off-by: hugo-syn <hugo.vincent@synacktiv.com>

[hotfix] fix stable diffusion inference bug. (hpcaitech#5289)

68f55a7

* Update train_ddp.yaml delete "strategy" to fix DDP config loading bug in "main.py" * Update train_ddp.yaml fix inference with scripts/txt2img.py config file load bug. * Update README.md add pretrain model test code.

[colossal-llama2] add stream chat examlple for chat version model (hp…

743e7fa

…caitech#5428) * add stream chat for chat version * remove os.system clear * modify function name

[release] update version (hpcaitech#5411)

8020f42

fix tensor data update for gemini loss caluculation (hpcaitech#5442)

da885ed

[hotfix] fix typo s/keywrods/keywords etc. (hpcaitech#5429)

385e85a

[devops] fix compatibility (hpcaitech#5444)

f2e8b9e

* [devops] fix compatibility * [hotfix] update compatibility test on pr * [devops] fix compatibility * [devops] record duration during comp test * [test] decrease test duration * fix falcon

[doc] release Open-Sora 1.0 with model weights (hpcaitech#5468)

bd998ce

* [doc] release Open-Sora 1.0 with model weights * [doc] release Open-Sora 1.0 with model weights * [doc] release Open-Sora 1.0 with model weights

[doc] update open-sora demo (hpcaitech#5479)

d158fc0

* [doc] update open-sora demo * [doc] update open-sora demo * [doc] update open-sora demo

[example] add grok-1 inference (hpcaitech#5485)

848a574

* [misc] add submodule * remove submodule * [example] support grok-1 tp inference * [example] add grok-1 inference script * [example] refactor code * [example] add grok-1 readme * [exmaple] add test ci * [exmaple] update readme

[release] grok-1 314b inference (hpcaitech#5490)

6df844b

* [release] grok-1 inference * [release] grok-1 inference * [release] grok-1 inference

[example] update Grok-1 inference (hpcaitech#5495)

5fcd779

* revise grok-1 example * remove unused arg in scripts * prevent re-installing torch * update readme * revert modifying colossalai requirements * add perf * trivial * add tokenizer url

[hotfix] set return_outputs=False in examples and polish code (hpcait…

bb0a668

…ech#5404) * fix: simplify merge_batch * fix: use return_outputs=False to eliminate extra memory consumption * feat: add return_outputs warning * style: remove `return_outputs=False` as it is the default value

[release] grok-1 inference benchmark (hpcaitech#5500)

34e9092

* [release] grok-1 inference benchmark * [release] grok-1 inference benchmark * [release] grok-1 inference benchmark * [release] grok-1 inference benchmark * [release] grok-1 inference benchmark

[fix] fix grok-1 example typo (hpcaitech#5506)

131f32a

[devops] fix example test ci (hpcaitech#5504)

a7790a9

Fix ColoTensorSpec for py11 (hpcaitech#5440)

cbe34c5

fixed layout converter caching and updated tester

61da3fb

Edenzzzz and others added 16 commits March 26, 2024 19:50

Empty-Commit

18edcd5

Merge pull request hpcaitech#5515 from Edenzzzz/fix_layout_convert

9a3321e

Fix layout convertor caching

[format] applied code formatting on changed files in pull request 5510 (

e6707a6

hpcaitech#5517) Co-authored-by: github-actions <github-actions@github.com>

[Fix] Grok-1 use tokenizer from the same pretrained path (hpcaitech#5532

36c4bb2

) * [fix] use tokenizer from the same pretrained path * trust remote code

fix incorrect sharding without zero (hpcaitech#5545)

7e0ec5a

Co-authored-by: Edenzzzz <wtan45@wisc.edu>

[hotfix] quick fixes to make legacy tutorials runnable (hpcaitech#5559)

15055f9

Co-authored-by: Edenzzzz <wtan45@wisc.edu>

[fix] fix typo s/muiti-node /multi-node etc. (hpcaitech#5448)

a799ca3

[hotfix] fix typo s/get_defualt_parser /get_default_parser (hpcaitech…

341263d

…#5548)

[Fix] resolve conflicts of merging main

ed5ebd1

remove unused triton kernels

ce9401a

yuanheng-zhao changed the title ~~[Fix] Merge feature/colossal-infer with main~~ [Sync] Merge feature/colossal-infer with main Apr 8, 2024

pre-commit-ci bot and others added 2 commits April 8, 2024 08:41

[pre-commit.ci] auto fixes from pre-commit.com hooks

d788175

for more information, see https://pre-commit.ci

remove outdated triton test

7ca1d1c

yuanheng-zhao marked this pull request as ready for review April 8, 2024 09:06

yuanheng-zhao requested a review from a team as a code owner April 8, 2024 09:06

FrankLeeeee approved these changes Apr 9, 2024

View reviewed changes

yuanheng-zhao merged commit d56c963 into hpcaitech:feature/colossal-infer Apr 9, 2024
8 of 10 checks passed

yuanheng-zhao deleted the infer/merge/main branch April 9, 2024 02:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Sync] Merge feature/colossal-infer with main #5568

[Sync] Merge feature/colossal-infer with main #5568

yuanheng-zhao commented Apr 8, 2024

yuanheng-zhao commented Apr 8, 2024

Issues-translate-bot commented Apr 8, 2024

yuanheng-zhao commented Apr 9, 2024

[Sync] Merge feature/colossal-infer with main #5568

[Sync] Merge feature/colossal-infer with main #5568

Conversation

yuanheng-zhao commented Apr 8, 2024

📌 Checklist before creating the PR

🚨 Issue number

📝 What does this PR do?

💥 Checklist before requesting a review

⭐️ Do you enjoy contributing to Colossal-AI?

yuanheng-zhao commented Apr 8, 2024

Issues-translate-bot commented Apr 8, 2024

yuanheng-zhao commented Apr 9, 2024