Merge branch 'main' into cogvideo-doc

xdit-project · Sep 26, 2024 · a152ff3 · a152ff3
2 parents d506726 + 4a31937
commit a152ff3
Show file tree

Hide file tree

Showing 102 changed files with 170 additions and 15,001 deletions.
diff --git a/README.md b/README.md
@@ -82,17 +82,16 @@ The overview of xDiT is shown as follows.
 
 <h2 id="updates">📢 Updates</h2>
 
-* 🎉**September 26, 2024**: xDiT has been officially used by [CogVideo](https://github.com/THUDM/CogVideo)! The inference scripts are placed in [parallel_inference/](https://github.com/THUDM/CogVideo/blob/main/tools/parallel_inference) at their repository.
-* 🎉**September 23, 2024**: Support CogVideoX sequence parallel version. The inference script is [examples/cogvideox_example](examples/cogvideox_example.py).
+* 🎉**September 26, 2024**: xDiT has been officially used by [THUDM/CogVideo](https://github.com/THUDM/CogVideo)! The inference scripts are placed in [parallel_inference/](https://github.com/THUDM/CogVideo/blob/main/tools/parallel_inference) at their repository.
+* 🎉**September 23, 2024**: Support CogVideoX. The inference scripts are [examples/cogvideox_example.py](examples/cogvideox_example.py).
 * 🎉**August 26, 2024**: We apply torch.compile and [onediff](https://github.com/siliconflow/onediff) nexfort backend to accelerate GPU kernels speed.
-* 🎉**August 9, 2024**: Support Latte sequence parallel version. The inference script is [examples/latte_example.py](examples/latte_example.py).
-* 🎉**August 8, 2024**: Support Flux sequence parallel version. The inference script is [examples/flux_example.py](examples/flux_example.py).
-* 🎉**August 2, 2024**: Support Stable Diffusion 3 hybrid parallel version. The inference script is [examples/sd3_example.py](examples/sd3_example.py).
-* 🎉**July 18, 2024**: Support PixArt-Sigma and PixArt-Alpha. The inference scripts are [examples/pixartsigma_example.py](examples/pixartsigma_example.py) and [examples/pixartalpha_example.py](examples/pixartalpha_example.py).
+* 🎉**August 15, 2024**: Support Hunyuan-DiT hybrid parallel version. The inference scripts are [examples/hunyuandit_example.py](examples/hunyuandit_example.py).
+* 🎉**August 9, 2024**: Support Latte sequence parallel version. The inference scripts are [examples/latte_example.py](examples/latte_example.py).
+* 🎉**August 8, 2024**: Support Flux sequence parallel version. The inference scripts are [examples/flux_example.py](examples/flux_example.py).
+* 🎉**August 2, 2024**: Support Stable Diffusion 3 hybrid parallel version. The inference scripts are [examples/sd3_example.py](examples/sd3_example.py).
+* 🎉**July 18, 2024**: Support PixArt-Sigma and PixArt-Alpha. The inference scripts are [examples/pixartsigma_example.py](examples/pixartsigma_example.py), [examples/pixartalpha_example.py](examples/pixartalpha_example.py).
 * 🎉**July 17, 2024**: Rename the project to xDiT. The project has evolved from a collection of parallel methods into a unified inference framework and supported the hybrid parallel for DiTs.
-* 🎉**July 10, 2024**: Support HunyuanDiT. The inference script is [legacy/scripts/hunyuandit_example.py](./legacy/scripts/hunyuandit_example.py).
-* 🎉**June 26, 2024**: Support Stable Diffusion 3. The inference script is [legacy/scripts/sd3_example.py](./legacy/scripts/sd3_example.py).
-* 🎉**May 24, 2024**: PipeFusion is public released. It supports PixArt-alpha [legacy/scripts/pixart_example.py](./legacy/scripts/pixart_example.py), DiT [legacy/scripts/ditxl_example.py](./legacy/scripts/ditxl_example.py) and SDXL [legacy/scripts/sdxl_example.py](./legacy/scripts/sdxl_example.py).
+* 🎉**May 24, 2024**: PipeFusion is public released. It supports PixArt-alpha [scripts/pixart_example.py](./scripts/pixart_example.py), DiT [scripts/ditxl_example.py](./scripts/ditxl_example.py) and SDXL [scripts/sdxl_example.py](./scripts/sdxl_example.py). This version is currently in the `legacy` branch.
 
 
 <h2 id="support-dits">🎯 Supported DiTs</h2>
@@ -360,9 +359,9 @@ We conducted a major upgrade of this project in August 2024.
 
 The latest APIs is located in the [xfuser/](./xfuser/) directory, supports hybrid parallelism. It offers clearer and more structured code but currently supports fewer models.
 
-The legacy APIs is in the [legacy/](./legacy/) directory, limited to single parallelism. It supports a richer of parallel methods, including PipeFusion, Sequence Parallel, DistriFusion, and Tensor Parallel. CFG Parallel can be hybrid with PipeFusion but not with other parallel methods.
+The legacy APIs is in the [legacy](https://github.com/xdit-project/xDiT/tree/legacy) branch, limited to single parallelism. It supports a richer of parallel methods, including PipeFusion, Sequence Parallel, DistriFusion, and Tensor Parallel. CFG Parallel can be hybrid with PipeFusion but not with other parallel methods.
 
-For models not yet supported by the latest APIs, you can run the examples in the [legacy/scripts/](./legacy/scripts/) directory. If you wish to develop new features on a model or require hybrid parallelism, stay tuned for further project updates. 
+For models not yet supported by the latest APIs, you can run the examples in the [scripts/](https://github.com/xdit-project/xDiT/tree/legacy/scripts) directory under branch `legacy`. If you wish to develop new features on a model or require hybrid parallelism, stay tuned for further project updates. 
 
 We also welcome developers to join and contribute more features and models to the project. Tell us which model you need in xDiT in [discussions](https://github.com/xdit-project/xDiT/discussions).
 

diff --git a/examples/flux_example.py b/examples/flux_example.py
@@ -18,6 +18,7 @@ def main():
     args = xFuserArgs.add_cli_args(parser).parse_args()
     engine_args = xFuserArgs.from_cli_args(args)
     engine_config, input_config = engine_args.create_config()
+    engine_config.runtime_config.dtype = torch.bfloat16
     local_rank = get_world_group().local_rank
 
     pipe = xFuserFluxPipeline.from_pretrained(
@@ -32,7 +33,7 @@ def main():
     else:
         pipe = pipe.to(f"cuda:{local_rank}")
 
-    pipe.prepare_run(input_config)
+    pipe.prepare_run(input_config, steps=1)
 
     torch.cuda.reset_peak_memory_stats()
     start_time = time.time()
@@ -60,7 +61,7 @@ def main():
         dp_group_index = get_data_parallel_rank()
         num_dp_groups = get_data_parallel_world_size()
         dp_batch_size = (input_config.batch_size + num_dp_groups - 1) // num_dp_groups
-        if is_dp_last_group():
+        if pipe.is_dp_last_group():
             for i, image in enumerate(output.images):
                 image_rank = dp_group_index * dp_batch_size + i
                 image_name = f"flux_result_{parallel_info}_{image_rank}_tc_{engine_args.use_torch_compile}.png"

diff --git a/examples/hunyuandit_example.py b/examples/hunyuandit_example.py
@@ -50,7 +50,7 @@ def main():
         dp_group_index = get_data_parallel_rank()
         num_dp_groups = get_data_parallel_world_size()
         dp_batch_size = (input_config.batch_size + num_dp_groups - 1) // num_dp_groups
-        if is_dp_last_group():
+        if pipe.is_dp_last_group():
             if not os.path.exists("results"):
                 os.mkdir("results")
             for i, image in enumerate(output.images):

diff --git a/examples/pixartalpha_example.py b/examples/pixartalpha_example.py
@@ -50,7 +50,7 @@ def main():
         dp_group_index = get_data_parallel_rank()
         num_dp_groups = get_data_parallel_world_size()
         dp_batch_size = (input_config.batch_size + num_dp_groups - 1) // num_dp_groups
-        if is_dp_last_group():
+        if pipe.is_dp_last_group():
             if not os.path.exists("results"):
                 os.mkdir("results")
             for i, image in enumerate(output.images):

diff --git a/examples/pixartsigma_example.py b/examples/pixartsigma_example.py
@@ -36,6 +36,7 @@ def main():
         output_type=input_config.output_type,
         use_resolution_binning=input_config.use_resolution_binning,
         generator=torch.Generator(device="cuda").manual_seed(input_config.seed),
+        clean_caption=False,
     )
     end_time = time.time()
     elapsed_time = end_time - start_time
@@ -50,7 +51,7 @@ def main():
         dp_group_index = get_data_parallel_rank()
         num_dp_groups = get_data_parallel_world_size()
         dp_batch_size = (input_config.batch_size + num_dp_groups - 1) // num_dp_groups
-        if is_dp_last_group():
+        if pipe.is_dp_last_group():
             if not os.path.exists("results"):
                 os.mkdir("results")
             for i, image in enumerate(output.images):

diff --git a/examples/sd3_example.py b/examples/sd3_example.py
@@ -49,7 +49,7 @@ def main():
         dp_group_index = get_data_parallel_rank()
         num_dp_groups = get_data_parallel_world_size()
         dp_batch_size = (input_config.batch_size + num_dp_groups - 1) // num_dp_groups
-        if is_dp_last_group():
+        if pipe.is_dp_last_group():
             if not os.path.exists("results"):
                 os.mkdir("results")
             for i, image in enumerate(output.images):

diff --git a/legacy/pipefuser/logger.py b/legacy/pipefuser/logger.py
diff --git a/legacy/pipefuser/models/__init__.py b/legacy/pipefuser/models/__init__.py
diff --git a/legacy/pipefuser/models/base_model.py b/legacy/pipefuser/models/base_model.py
diff --git a/legacy/pipefuser/models/distri_dit_hunyuan_pipefusion.py b/legacy/pipefuser/models/distri_dit_hunyuan_pipefusion.py