-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a segmentation fault due to unloaded libcufile #158
Conversation
Do not unload the cufile (GDS) library as libcufile registers a cleanup function with atexit() and unloading the library will cause a segfault (calling the cleanup function that doesn't exist anymore). - Leave the libcufile library loaded
Need to take some time to dig the libcufile in detail if it is actually due to |
@jakirkham More update on the root-cause: libcufile.so started using Related information: |
GDS team suggested using
see #177 |
Do not unload the
cufile (GDS)
library becauselibcufile
registers a cleanup function withatexit()
and unloading the library will cause a segfault (calling the cleanup function that doesn't exist anymore).It turns out that CUDA 11.5 bundled GDS library (libcufile) and it is available in GPUCI's build/test container image (such as
gpuci/rapidsai:21.12-cuda11.5-devel-ubuntu18.04-py3.7
).cuCIM would dynamically load
libcufile.so
shared library and unload it when a global static variable in cuCIM is destroyed.Since libcufile's cleanup function(through atexit_thread handler) is registered after the libcufile is loaded, it causes a segmentation fault at exit if the libcufile is explicitly unloaded through
dlclose()
.(See #153)
You can see discussions with
atexit in dynamically loaded shared library
keywords (the actual root cause is the use ofthread_local
variable inlibcufile.so
).Maybe using destructor attribute could fix the issue from GDS(cufile) side.
This patch leaves the libcufile library loaded, without calling
::dlclose(library_handle)
method to unloadlibcufile.so
.Update (2021-11-20): I couldn't find
atexit()
used in libcufile (though I can seeatexit()
call for an executable file[fio]) but it seems that a method is registered and called at the exit time so we cannot help but leave the dynamically loaded library without unloading.Update (2021-11-23): libcufile.so started using
thread_local
variable since v1.1 which makes the shared library unloadable.For this reason, this patch is a correct patch until libcufile is updated to make it possible.
Related information: