Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

寒武纪mlu 如何对PaddleCustomDevice的mlu进行源码编译? #1331

Open
wangzy0327 opened this issue Jul 2, 2024 · 6 comments
Open
Assignees

Comments

@wangzy0327
Copy link

由于python版本要求使用3.8版本,不能直接使用安装python3.10版本的wheel包
paddle_custom_mlu.whl
可以给出paddlecustomdevice源码编译的步骤和命令么?谢谢!
@YanhuiDua

@YanhuiDua
Copy link
Collaborator

@wangzy0327
Copy link
Author

wangzy0327 commented Jul 3, 2024

@YanhuiDua 我按照步骤,用python3.8进行源码编译 PaddleCustomDevice release/2.6版本,过程中遇到一些错误,
遇到的错误摘录如下:

Submodule path 'Paddle': checked out '90138318312fbb60b0bdce8b0f4fb317879fe62e'
-- PADDLE_SOURCE_DIR=/home/wzy/PaddleCustomDevice/Paddle
-- Paddle version is 0.0.0
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
...
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Looking for C++ include inttypes.h - found
-- Looking for C++ include sys/types.h
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
...
-- Generating done
-- Build files have been written to: /home/wzy/PaddleCustomDevice/backends/mlu/build/third_party/mkldnn/src/extern_mkldnn-build
[ 15%] Performing build step for 'extern_mkldnn'
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
...
[ 24%] Building CXX object src/cpu/x64/CMakeFiles/dnnl_cpu_x64.dir/cpu_barrier.cpp.o
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
...
[ 25%] Building CXX object src/graph/backend/dnnl/CMakeFiles/dnnl_graph_backend_dnnl.dir/passes/lower.cpp.o
-- Looking for snprintf - found
-- Looking for get_static_proc_name in unwind
-- Looking for get_static_proc_name in unwind - not found
-- Looking for UnDecorateSymbolName in dbghelp
-- Looking for UnDecorateSymbolName in dbghelp - not found
-- Performing Test HAVE___ATTRIBUTE__
-- Performing Test HAVE___ATTRIBUTE__ - Success
-- Performing Test HAVE___ATTRIBUTE__VISIBILITY_DEFAULT
-- Performing Test HAVE___ATTRIBUTE__VISIBILITY_DEFAULT - Success
-- Performing Test HAVE___ATTRIBUTE__VISIBILITY_HIDDEN
-- Performing Test HAVE___ATTRIBUTE__VISIBILITY_HIDDEN - Success
-- Performing Test HAVE___BUILTIN_EXPECT
-- Performing Test HAVE___BUILTIN_EXPECT - Success
-- Performing Test HAVE___SYNC_VAL_COMPARE_AND_SWAP
-- Performing Test HAVE___SYNC_VAL_COMPARE_AND_SWAP - Success
-- Performing Test HAVE_RWLOCK
-- Performing Test HAVE_RWLOCK - Failed
-- Performing Test HAVE___DECLSPEC
-- Performing Test HAVE___DECLSPEC - Failed
-- Performing Test STL_NO_NAMESPACE
-- Performing Test STL_NO_NAMESPACE - Failed


但是也能正常编译出wheel包。
安装完wheel包后 ,

wzy@gxnzx119:~/PaddleCustomDevice/backends/mlu$ python3 -m pip install build/dist/paddle_custom_mlu-0.0.0-cp38-cp38-linux_x86_64.whl
Defaulting to user installation because normal site-packages is not writeable
Processing ./build/dist/paddle_custom_mlu-0.0.0-cp38-cp38-linux_x86_64.whl
paddle-custom-mlu is already installed with the same version as the provided wheel. Use --force-reinstall to force an installation of the wheel.
WARNING: Error parsing dependencies of distro-info: Invalid version: '0.23ubuntu1'
WARNING: Error parsing dependencies of python-debian: Invalid version: '0.1.36ubuntu1'

在执行之前同样验证过的程序时,出现Segmentation fault。
打印栈帧,如下:

Segmentation fault (core dumped)
wzy@gxnzx119:~/paddle_tests/models$ lldb python3
(lldb) target create "python3"
Current executable set to 'python3' (x86_64).
(lldb) run benchmark_ano.py 
Process 3134839 launched: '/usr/bin/python3' (x86_64)
Process 3134839 stopped and restarted: thread 1 received signal: SIGCHLD
Process 3134839 stopped and restarted: thread 1 received signal: SIGCHLD
Process 3134839 stopped and restarted: thread 1 received signal: SIGCHLD
warning: (x86_64) /home/wzy/.local/lib/python3.8/site-packages/numpy.libs/libgfortran-040039e1.so.5.0.0 No LZMA support found for reading .gnu_debugdata section
Process 3134839 stopped and restarted: thread 1 received signal: SIGCHLD
warning: (x86_64) /home/wzy/.local/lib/python3.8/site-packages/pillow.libs/libXau-00ec42fe.so.6.0.0 No LZMA support found for reading .gnu_debugdata section
I0703 02:51:22.082170 3134839 init.cc:234] ENV [CUSTOM_DEVICE_ROOT]=/home/wzy/.local/lib/python3.8/site-packages/paddle_custom_device
I0703 02:51:22.082192 3134839 init.cc:143] Try loading custom device libs from: [/home/wzy/.local/lib/python3.8/site-packages/paddle_custom_device]
Process 3134839 stopped
* thread #1, name = 'python3', stop reason = signal SIGSEGV: invalid address (fault address: 0xf00000001)
    frame #0: 0x0000000f00000001
error: memory read failed for 0xf00000000
(lldb) bt
* thread #1, name = 'python3', stop reason = signal SIGSEGV: invalid address (fault address: 0xf00000001)
  * frame #0: 0x0000000f00000001
    frame #1: 0x00007fffe3554c6e libphi.so`phi::CustomKernelMap::RegisterCustomKernel(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, phi::KernelKey const&, phi::Kernel const&) + 622
    frame #2: 0x00007fffaebdc83e libpaddle-custom-mlu.so`phi::KernelRegistrar::ConstructKernel(phi::RegType, char const*, char const*, common::DataLayout, phi::DataType, void (*)(phi::KernelKey const&, phi::KernelArgsDef*), void (*)(phi::KernelKey const&, phi::Kernel*), std::function<void (phi::KernelContext*)>, void*) (.constprop.371) + 2222
    frame #3: 0x00007fffaebdcdfe libpaddle-custom-mlu.so`phi::KernelRegistrar::KernelRegistrar(phi::RegType, char const*, char const*, common::DataLayout, phi::DataType, void (*)(phi::KernelKey const&, phi::KernelArgsDef*), void (*)(phi::KernelKey const&, phi::Kernel*), std::function<void (phi::KernelContext*)>, void*) + 158
    frame #4: 0x00007fffaeb3430e libpaddle-custom-mlu.so`__static_initialization_and_destruction_0(int, int) (.constprop.355) + 4062

请问如何解决呢?

@YanhuiDua
Copy link
Collaborator

YanhuiDua commented Jul 3, 2024

看上去是第三方依赖哭pthread的问题,建议使用官方提供的镜像:docker pull registry.baidubce.com/device/paddle-mlu:ctr2.15.0-ubuntu20-x86_64-gcc84-py310,在这个镜像里安装py38的环境进行编译

也可以参考这个dockerfile自己产出py38的镜像:

paddle-mlu的dockerfile : https://github.com/PaddlePaddle/PaddleCustomDevice/blob/develop/backends/mlu/tools/dockerfile/Dockerfile.mlu.kylinv10.gcc82.py310

paddle-cpu的dockerfile: https://github.com/PaddlePaddle/PaddleCustomDevice/blob/develop/backends/custom_cpu/tools/dockerfile/Dockerfile.ubuntu20.x86_64.gcc84

@wangzy0327
Copy link
Author

wangzy0327 commented Jul 4, 2024

重新尝试在registry.baidubce.com/device/paddle-mlu:ctr2.15.0-ubuntu20-x86_64-gcc84-py310镜像里安装了py38的环境进行编译,发现与主机端编译时报错一样。是否是由于paddlecustomdevice版本问题导致的编译不通过呢?如果是paddlecustomdevice版本的问题,请问正常执行的paddlecustomdevice版本是哪个?
@YanhuiDua

@qili93
Copy link
Collaborator

qili93 commented Jul 4, 2024

重新尝试在registry.baidubce.com/device/paddle-mlu:ctr2.15.0-ubuntu20-x86_64-gcc84-py310镜像里安装了py38的环境进行编译,发现与主机端编译时报错一样。是否是由于paddlecustomdevice版本问题导致的编译不通过呢?如果是paddlecustomdevice版本的问题,请问正常执行的paddlecustomdevice版本是哪个? @YanhuiDua

根据这个报错,你编译的包应该是可以的,需要通过 --force-reinstall 命令重新安装下

paddle-custom-mlu is already installed with the same version as the provided wheel. Use --force-reinstall to force an installation of the wheel.
WARNING: Error parsing dependencies of distro-info: Invalid version: '0.23ubuntu1'
WARNING: Error parsing dependencies of python-debian: Invalid version: '0.1.36ubuntu1'

@wangzy0327
Copy link
Author

当我使用下面的dockerfile构建py3.8版本的容器时

paddle-cpu的dockerfile: https://github.com/PaddlePaddle/PaddleCustomDevice/blob/develop/backends/custom_cpu/tools/dockerfile/Dockerfile.ubuntu20.x86_64.gcc84

构建到这部分构建命令时,

# install Paddle requirement
RUN wget --no-check-certificate https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/python/requirements.txt -O requirements.txt && \
    pip install -r requirements.txt -i https://pip.baidu-int.com/simple --trusted-host pip.baidu-int.com && rm -rf requirements.txt
RUN wget --no-check-certificate https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/python/unittest_py/requirements.txt -O requirements.txt && \
    pip install -r requirements.txt -i https://pip.baidu-int.com/simple --trusted-host pip.baidu-int.com && rm -rf requirements.txt

出现错误:

WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fe36442f4f0>: Failed to establish a new connection: [Errno -2] Name or service not known')': /simple/httpx/
WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fe36442f7f0>: Failed to establish a new connection: [Errno -2] Name or service not known')': /simple/httpx/
WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fe36442f9a0>: Failed to establish a new connection: [Errno -2] Name or service not known')': /simple/httpx/
WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fe36442fb50>: Failed to establish a new connection: [Errno -2] Name or service not known')': /simple/httpx/
WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fe364502ee0>: Failed to establish a new connection: [Errno -2] Name or service not known')': /simple/httpx/
ERROR: Could not find a version that satisfies the requirement httpx (from versions: none)
ERROR: No matching distribution found for httpx

ping pip.baidu-int.com 显示 Name or service not known,请问如何解决?
@qili93 @YanhuiDua

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants