About deepspeed and fsdp speed differences？ #24

zyuh · 2023-03-31T04:42:22Z

I have encountered some problems which cause that deepspeed cannot be used normally.
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
May I ask if you have compared the speed difference between deepspeed and fsdp? only ~16h finetuning on (llama 30b lora) is surprising

research4pan · 2023-03-31T04:59:52Z

The AttributeError may be caused by ninja compilation error, e.g. missing ninja builder (can be solved by installing ninja apt-get install ninja-build), or occupation of compilation lock. Could you please provide the full error message so we may help with determining issue type?

As for the speed difference, we haven't check the fsdp speed yet. It is indeed surprising that finetuning on medical dataset can be so efficient under our framework. In fact, instruction finetuning can be even faster: only 1h on llama 30b lora + alpaca in 8*A100 🤗

zyuh · 2023-03-31T05:30:54Z

The detailed error message is here:

Detected CUDA files, patching ldflags
Emitting ninja build file /nvme/zhouyuhang/.cache/torch_extensions/py39_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] /usr/local/cuda/bin/nvcc -ccbin /nvme/zhouyuhang/miniconda3/envs/lmflow/bin/x86_64-conda-linux-gnu-cc -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/csrc/includes -I/usr/local/cuda/include -isystem /nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/include -isystem /nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/TH -isystem /nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /nvme/zhouyuhang/miniconda3/envs/lmflow/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -c /nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o
FAILED: custom_cuda_kernel.cuda.o
/usr/local/cuda/bin/nvcc -ccbin /nvme/zhouyuhang/miniconda3/envs/lmflow/bin/x86_64-conda-linux-gnu-cc -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/csrc/includes -I/usr/local/cuda/include -isystem /nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/include -isystem /nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/TH -isystem /nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /nvme/zhouyuhang/miniconda3/envs/lmflow/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -c /nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o
x86_64-conda-linux-gnu-cc: fatal error: cannot execute 'cc1plus': execvp: No such file or directory
compilation terminated.
nvcc fatal : Failed to preprocess host compiler properties.
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
subprocess.run(
File "/nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/nvme/zhouyuhang/CODE/LMFlow/examples/finetune.py", line 82, in
main()
File "/nvme/zhouyuhang/CODE/LMFlow/examples/finetune.py", line 78, in main
tuned_model = finetuner.tune(model=model, lm_dataset=lm_dataset)
File "/nvme/zhouyuhang/CODE/LMFlow/src/lmflow/pipeline/finetuner.py", line 232, in tune
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/nvme/zhouyuhang/CODE/LMFlow/transformers/src/transformers/trainer.py", line 1633, in train
return inner_training_loop(
File "/nvme/zhouyuhang/CODE/LMFlow/transformers/src/transformers/trainer.py", line 1702, in _inner_training_loop
deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
File "/nvme/zhouyuhang/CODE/LMFlow/transformers/src/transformers/deepspeed.py", line 378, in deepspeed_init
deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/deepspeed/init.py", line 125, in initialize
engine = DeepSpeedEngine(args=args,
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/deepspeed/runtime/engine.py", line 340, in init
self._configure_optimizer(optimizer, model_parameters)
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/deepspeed/runtime/engine.py", line 1283, in _configure_optimizer
basic_optimizer = self._configure_basic_optimizer(model_parameters)
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/deepspeed/runtime/engine.py", line 1354, in _configure_basic_optimizer
optimizer = DeepSpeedCPUAdam(model_parameters,
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/deepspeed/ops/adam/cpu_adam.py", line 96, in init
self.ds_opt_adam = CPUAdamBuilder().load()
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/op_builder/builder.py", line 485, in load
return self.jit_load(verbose)
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/op_builder/builder.py", line 520, in jit_load
op_module = load(
File "/nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1509, in _jit_compile
_write_ninja_file_and_build_library(
File "/nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1624, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'cpu_adam'
Loading extension module cpu_adam...
Traceback (most recent call last):
File "/nvme/zhouyuhang/CODE/LMFlow/examples/finetune.py", line 82, in
main()
File "/nvme/zhouyuhang/CODE/LMFlow/examples/finetune.py", line 78, in main
tuned_model = finetuner.tune(model=model, lm_dataset=lm_dataset)
File "/nvme/zhouyuhang/CODE/LMFlow/src/lmflow/pipeline/finetuner.py", line 232, in tune
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/nvme/zhouyuhang/CODE/LMFlow/transformers/src/transformers/trainer.py", line 1633, in train
return inner_training_loop(
File "/nvme/zhouyuhang/CODE/LMFlow/transformers/src/transformers/trainer.py", line 1702, in _inner_training_loop
deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
File "/nvme/zhouyuhang/CODE/LMFlow/transformers/src/transformers/deepspeed.py", line 378, in deepspeed_init
deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/deepspeed/init.py", line 125, in initialize
engine = DeepSpeedEngine(args=args,
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/deepspeed/runtime/engine.py", line 340, in init
self._configure_optimizer(optimizer, model_parameters)
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/deepspeed/runtime/engine.py", line 1283, in _configure_optimizer
basic_optimizer = self._configure_basic_optimizer(model_parameters)
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/deepspeed/runtime/engine.py", line 1354, in _configure_basic_optimizer
optimizer = DeepSpeedCPUAdam(model_parameters,
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/deepspeed/ops/adam/cpu_adam.py", line 96, in init
self.ds_opt_adam = CPUAdamBuilder().load()
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/op_builder/builder.py", line 485, in load
return self.jit_load(verbose)
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/op_builder/builder.py", line 520, in jit_load
op_module = load(
File "/nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1535, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1929, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
File "", line 565, in module_from_spec
File "", line 1173, in create_module
File "", line 228, in _call_with_frames_removed
ImportError: /nvme/zhouyuhang/.cache/torch_extensions/py39_cu117/cpu_adam/cpu_adam.so: cannot open shared object file: No such file or directory
Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7fe3fd1bb040>
Traceback (most recent call last):
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/deepspeed/ops/adam/cpu_adam.py", line 110, in del
self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7f27a9b20040>
Traceback (most recent call last):
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/deepspeed/ops/adam/cpu_adam.py", line 110, in del
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
[2023-03-31 13:22:04,404] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 1717
[2023-03-31 13:22:04,405] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 1718

I do not have root permission, after using apt-get install ninja-build, terminal output :
bash: apt-get: command not found...

And ds_report is here:

DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja
ninja .................. [OKAY]

op name ................ installed .. compatible

[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
[WARNING] sparse_attn requires a torch version >= 1.5 but detected 2.0
[WARNING] using untested triton version (2.0.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]

DeepSpeed general environment info:
torch install path ............... ['/nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch']
torch version .................... 2.0.0+cu117
deepspeed install path ........... ['/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/deepspeed']
deepspeed info ................... 0.8.3+unknown, unknown, unknown
torch cuda version ............... 11.7
torch hip version ................ None
nvcc version ..................... 11.3
deepspeed wheel compiled w. ...... torch 2.0, cuda 11.7

research4pan · 2023-03-31T06:38:15Z

...
/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o
x86_64-conda-linux-gnu-cc: fatal error: cannot execute 'cc1plus': execvp: No such file or directory
...

It seems the gcc/g++ compiler is missing here. For users with root permissions, you may try

conda install -c conda-forge cxx-compiler

If you don't have the root permission, you may

ask the administrator to install gcc/g++ for you
download a compatible version of gcc/g++ binary directly from its official website (https://ftp.gnu.org/gnu/gcc/) and configure it locally.

version=0.9.2      # May not be compatible with your system, check it before installation
tar -xzf gcc-${version}.tar.gz
cd gcc-${version}
./configure --prefix=${your_local_path}
make

Then add the binary folder to your environment variable PATH=PATH:... so it can be detected. Then it should work.

zyuh · 2023-03-31T07:11:54Z

Thank you for your reply,
unfortunately, after I use conda install -c conda-forge cxx-compiler and successfully install, it seems the same error is reported...

Emitting ninja build file /nvme/zhouyuhang/.cache/torch_extensions/py39_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] /usr/local/cuda/bin/nvcc -ccbin /nvme/zhouyuhang/miniconda3/envs/lmflow/bin/x86_64-conda-linux-gnu-cc -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/csrc/includes -I/usr/local/cuda/include -isystem /nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/include -isystem /nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/TH -isystem /nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /nvme/zhouyuhang/miniconda3/envs/lmflow/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -c /nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o
FAILED: custom_cuda_kernel.cuda.o
/usr/local/cuda/bin/nvcc -ccbin /nvme/zhouyuhang/miniconda3/envs/lmflow/bin/x86_64-conda-linux-gnu-cc -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/csrc/includes -I/usr/local/cuda/include -isystem /nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/include -isystem /nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/TH -isystem /nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /nvme/zhouyuhang/miniconda3/envs/lmflow/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -c /nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o
In file included from /usr/local/cuda/include/cuda_runtime.h:83,
from :
/usr/local/cuda/include/crt/host_config.h:139:2: error: #error -- unsupported GNU version! gcc versions later than 10 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
139 | #error -- unsupported GNU version! gcc versions later than 10 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
| ^~~~~
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
subprocess.run(
File "/nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/nvme/zhouyuhang/CODE/LMFlow/examples/finetune.py", line 82, in
main()
File "/nvme/zhouyuhang/CODE/LMFlow/examples/finetune.py", line 78, in main
tuned_model = finetuner.tune(model=model, lm_dataset=lm_dataset)
File "/nvme/zhouyuhang/CODE/LMFlow/src/lmflow/pipeline/finetuner.py", line 232, in tune
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/nvme/zhouyuhang/CODE/LMFlow/transformers/src/transformers/trainer.py", line 1633, in train
return inner_training_loop(
File "/nvme/zhouyuhang/CODE/LMFlow/transformers/src/transformers/trainer.py", line 1702, in _inner_training_loop
deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
File "/nvme/zhouyuhang/CODE/LMFlow/transformers/src/transformers/deepspeed.py", line 378, in deepspeed_init
deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/deepspeed/init.py", line 125, in initialize
engine = DeepSpeedEngine(args=args,
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/deepspeed/runtime/engine.py", line 340, in init
self._configure_optimizer(optimizer, model_parameters)
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/deepspeed/runtime/engine.py", line 1283, in _configure_optimizer
basic_optimizer = self._configure_basic_optimizer(model_parameters)
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/deepspeed/runtime/engine.py", line 1354, in _configure_basic_optimizer
optimizer = DeepSpeedCPUAdam(model_parameters,
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/deepspeed/ops/adam/cpu_adam.py", line 96, in init
self.ds_opt_adam = CPUAdamBuilder().load()
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/op_builder/builder.py", line 485, in load
return self.jit_load(verbose)
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/op_builder/builder.py", line 520, in jit_load
op_module = load(
File "/nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1509, in _jit_compile
_write_ninja_file_and_build_library(
File "/nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1624, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'cpu_adam'
Loading extension module cpu_adam...
Traceback (most recent call last):
File "/nvme/zhouyuhang/CODE/LMFlow/examples/finetune.py", line 82, in
main()
File "/nvme/zhouyuhang/CODE/LMFlow/examples/finetune.py", line 78, in main
tuned_model = finetuner.tune(model=model, lm_dataset=lm_dataset)
File "/nvme/zhouyuhang/CODE/LMFlow/src/lmflow/pipeline/finetuner.py", line 232, in tune
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/nvme/zhouyuhang/CODE/LMFlow/transformers/src/transformers/trainer.py", line 1633, in train
return inner_training_loop(
File "/nvme/zhouyuhang/CODE/LMFlow/transformers/src/transformers/trainer.py", line 1702, in _inner_training_loop
deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
File "/nvme/zhouyuhang/CODE/LMFlow/transformers/src/transformers/deepspeed.py", line 378, in deepspeed_init
deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/deepspeed/init.py", line 125, in initialize
engine = DeepSpeedEngine(args=args,
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/deepspeed/runtime/engine.py", line 340, in init
self._configure_optimizer(optimizer, model_parameters)
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/deepspeed/runtime/engine.py", line 1283, in _configure_optimizer
basic_optimizer = self._configure_basic_optimizer(model_parameters)
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/deepspeed/runtime/engine.py", line 1354, in _configure_basic_optimizer
optimizer = DeepSpeedCPUAdam(model_parameters,
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/deepspeed/ops/adam/cpu_adam.py", line 96, in init
self.ds_opt_adam = CPUAdamBuilder().load()
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/op_builder/builder.py", line 485, in load
return self.jit_load(verbose)
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/op_builder/builder.py", line 520, in jit_load
op_module = load(
File "/nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1535, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/nvme/zhouyuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1929, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
File "", line 565, in module_from_spec
File "", line 1173, in create_module
File "", line 228, in _call_with_frames_removed
ImportError: /nvme/zhouyuhang/.cache/torch_extensions/py39_cu117/cpu_adam/cpu_adam.so: cannot open shared object file: No such file or directory
Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7f151d12df70>
Traceback (most recent call last):
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/deepspeed/ops/adam/cpu_adam.py", line 110, in del
self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7f10111adf70>
Traceback (most recent call last):
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/deepspeed/ops/adam/cpu_adam.py", line 110, in del
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
[2023-03-31 14:58:42,429] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 48637
[2023-03-31 14:58:42,430] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 48638

=================
I also try DS_BUILD_CPU_ADAM=1 DS_BUILD_AIO=1 DS_BUILD_UTILS=1 pip install -e . mentioned in but not work too. This command reports error as follow:

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Obtaining file:///nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [12 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/setup.py", line 156, in
abort(f"Unable to pre-compile {op_name}")
File "/nvme/zhouyuhang/CODE/LMFlow/DeepSpeed-0.8.3/setup.py", line 48, in abort
assert False, msg
AssertionError: Unable to pre-compile async_io
DS_BUILD_OPS=0
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[ERROR] Unable to pre-compile async_io
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

research4pan · 2023-03-31T07:30:39Z

...
/usr/local/cuda/include/crt/host_config.h:139:2: error: #error -- unsupported GNU version! gcc versions later than 10 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
...

The error message has changed, that's a step towards solving the problem. Seems like the gcc compiler is successfully installed, but your local cuda version doesn't support. You may try to install an older version of gcc, e.g. 9.x.x. To specify a package version in conda, you may refer to https://stackoverflow.com/questions/43222407/how-to-list-package-versions-available-with-conda.

zyuh · 2023-04-03T03:16:59Z

I have tried many different versions, but it doesn't work. It seems that different package versions are strictly matched. Could you please provide the conda list for reference？

zyuh · 2023-04-03T07:18:17Z

Hello, I have solved this problem！The root cause is on the pytorch side. They accidentally shipped the nvcc with their conda package which breaks the toolchain. ( microsoft/DeepSpeed#2684 is work! )

shizhediao · 2023-04-03T07:19:19Z

Hi,
It is really nice to hear from you that you have solved the problem!
Feel free to ask us if you encounter further issues.

Thank you!

john20000625 · 2023-04-03T09:26:15Z

Same problem. So how do you solve the problem in the end? Run export PATH=/usr/local/cuda/bin:$PATH or other solutions? Thank you!

zyuh · 2023-04-03T09:36:22Z

@john20000625
Yes, use this command if 'ATen/cuda/CUDAContext.h' appears in the error message.
In addition, you need to ensure that the versions of other dependency packages match until no error during run "DS_BUILD_CPU_ADAM=1 DS_BUILD_AIO=1 DS_BUILD_UTILS=1 pip install -e."

john20000625 · 2023-04-03T10:02:03Z

@zyuh
Thank you! BTW, which version of cuda are you using?

john20000625 · 2023-04-03T10:16:53Z

@zyuh
Also, I found that I did not have 'ATen/cuda/CUDAContext.h' in my error message. Instead, I had "ATen/Context.h:3". Is that the same thing? Thank you a lot!

zyuh · 2023-04-04T01:59:13Z

@john20000625
Hello! the version of my cuda is 11.6. I'm not sure if it's the same thing, but if the pytorch you're using in your environment is a recent installation, it's probably the same problem. And it's rumored that it won't be fixed until the next version of pytorch.

shizhediao closed this as completed Apr 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About deepspeed and fsdp speed differences？ #24

About deepspeed and fsdp speed differences？ #24

zyuh commented Mar 31, 2023

research4pan commented Mar 31, 2023 •

edited

Loading

zyuh commented Mar 31, 2023

research4pan commented Mar 31, 2023 •

edited

Loading

zyuh commented Mar 31, 2023

research4pan commented Mar 31, 2023

zyuh commented Apr 3, 2023

zyuh commented Apr 3, 2023

shizhediao commented Apr 3, 2023

john20000625 commented Apr 3, 2023

zyuh commented Apr 3, 2023

john20000625 commented Apr 3, 2023

john20000625 commented Apr 3, 2023

zyuh commented Apr 4, 2023

About deepspeed and fsdp speed differences？ #24

About deepspeed and fsdp speed differences？ #24

Comments

zyuh commented Mar 31, 2023

research4pan commented Mar 31, 2023 • edited Loading

zyuh commented Mar 31, 2023

And ds_report is here:

DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja ninja .................. [OKAY]

op name ................ installed .. compatible

research4pan commented Mar 31, 2023 • edited Loading

zyuh commented Mar 31, 2023

research4pan commented Mar 31, 2023

zyuh commented Apr 3, 2023

zyuh commented Apr 3, 2023

shizhediao commented Apr 3, 2023

john20000625 commented Apr 3, 2023

zyuh commented Apr 3, 2023

john20000625 commented Apr 3, 2023

john20000625 commented Apr 3, 2023

zyuh commented Apr 4, 2023

research4pan commented Mar 31, 2023 •

edited

Loading

NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja
ninja .................. [OKAY]

research4pan commented Mar 31, 2023 •

edited

Loading