-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent segfault in libdl test on linux64 buildbots #13719
Comments
Is this still happening, or can we close? |
Still happening. I got this to a state where it happens repeatably on a local Linux 64 bit machine (using LLVM 3.7.0, but that may or may not be related since the buildbots are all using 3.3) and have been delta-debugging it for several days to try to reduce the number of tests that are needed to run to reproduce it. |
Is this something where you would be helped by more eyes/hands? |
If anyone else can reproduce reliably, it will likely need someone who knows better than I do what to look for to fix it. It's most repeatable when running all tests in a single process via |
I've reduced the list of tests a bit here locally, but it's still a pretty long list where removing any one of them causes the failure to go away. Anyone else see this locally, on a |
Whoa, we had a whole string of a bunch of these: http://buildbot.e.ip.saba.us:8010/builders/build_centos7.1-x64?numbuilds=100 Really, no one but me has seen this locally? edit: this may be a consequence of "too many people use ubuntu" http://buildbot.e.ip.saba.us:8010/builders/build_ubuntu14.04-x64?numbuilds=100 |
9 days and counting since successful nightlies. http://build.julialang.org:8010/builders/build_centos7.1-x64?numbuilds=250 Someone who understands how dlopen works should probably try a docker container or vm of centos 7 to reproduce this. |
tentatively closing as 5a66fba seems to have fixed the buildbot |
It looks like this does not occur on the problematic buildbot for release-0.4, but might not hurt to backport anyway? |
I've decided to reopen this, since the commit was more of a band-aid and doesn't address the underlying problem: in short, Julia has probably allocated too much memory. In particular, we are probably running into the behavior noted in the following thread, where fork (unlike mmap) does not allow any memory overcommit, and > 50% of physical memory is already allocated (swap is disabled): https://lkml.org/lkml/2009/2/11/319 @carnaval do you see any issue with Julia marking its memory pool as MADV_DONTFORK? that should also help fork performance. |
Will this fail if the argument to |
And does |
Isn't the whole point of |
COW pages are still counted against the (new) process, and there isn't enough physical memory in the system to permit that without overcommitting (which the linux kernel implementation of fork will refuse to do) this doesn't affect vfork, while the behavior of posix_spawn in this regard is implementation defined, but probably follows the behavior of fork
yes (although a stack copy would be trivial). more generally, it would also mean you can't simply fork julia to run two copies (although i doubt that'll work very well anyways due to the file descriptors all being shared) |
Is the issue here the virtual address space size or the memory we are actually using? And is it affected by I'm wondering if we can just specify |
there appears to be no mechanism to reduce the commit size, and it is only affected by MADV_DONTFORK it is only affected by the number of pages that have actually been committed -- reserved pages don't count. |
I am currently experiencing this problem as well. I thought maybe running the tests single-threaded would avoid the problem, but no luck. So, both Details:
Here is the output I get from
|
Did this commit miss the backport? |
Whoops, guess so. Wasn't sure if it was master only. Now we have an example on release. @afbarnard how much memory does your system have? |
The above log was for a laptop with 4GB. I just tested on my workstation (8GB) and the increasing memory usage issue exists but no crash. Workstation details:
Snipped output from
|
@afbarnard thanks. Can you try again on your laptop with a85c3a0? It's really valuable to have someone who can locally, reliably reproduce a bug like this, so thanks for testing things out! |
So There still appears to be a memory leak issue, however. (Final maxrss 1986 MB!) Is something being done about that? |
That's just how much memory the test suite uses. |
@vtjnash do you consider this still not completely fixed? |
Hello,
julia-0.6.0_binary_testall.txt Any additional tests or info I can provide I'd be happy to! |
@brevans That's a different issue. It seems that |
See recent failures at http://buildbot.e.ip.saba.us:8010/builders/build_ubuntu12.04-x64?numbuilds=250 and http://buildbot.e.ip.saba.us:8010/builders/build_centos7.1-x64?numbuilds=250
The text was updated successfully, but these errors were encountered: