Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partially address #11351 #11491

Merged
merged 1 commit into from
Jun 12, 2015
Merged

Partially address #11351 #11491

merged 1 commit into from
Jun 12, 2015

Conversation

quinnj
Copy link
Member

@quinnj quinnj commented May 29, 2015

Allow windows to catch segfaults when trying to write to read-only memory; parity with unix. Unfortunately, I'm away from my windows box until later tonight or tomorrow, so I haven't had a chance to test this locally (to ensure from the REPL that we see OutOfMemoryError).

cc @vtjnash @ihnorton

Note this doesn't address the other current segfault in #11351 that occurs when writing to read-only memory from within an include script (hence the [ci skip]). I've tried poking around jl_load and jl_parse_eval_all, but I get a little lost in trying to figure out why it's not respecting the installed signal handlers.

Windows documentation on exception here: https://msdn.microsoft.com/en-us/library/windows/desktop/aa363082(v=vs.85).aspx

@ihnorton
Copy link
Member

Tested locally, LGTM. Unfortunately, OutOfMemoryError is blatantly misleading here (:pouting_cat: #10503). BoundsError isn't correct either, but seems marginally more helpful.

@quinnj
Copy link
Member Author

quinnj commented May 30, 2015

Cool. I'm actually just building this now too. Yeah, perhaps we need a separate WriteMemoryError that we can throw.

Since you were poking around in init.c today, any pointers on the other segfault I'm seeing through include?

@simonster simonster added the domain:error handling Handling of exceptions by Julia or the user label May 30, 2015
@ihnorton
Copy link
Member

I can reproduce, but haven't gotten much further. Maybe the JL_TRY block in jl_parse_eval_all is unsetting the top level handler.

@quinnj
Copy link
Member Author

quinnj commented Jun 1, 2015

Ok, I went ahead and added a ReadOnlyMemoryError type as a more clear error here. Unfortunately, the include segfault is killer because you can't make test-allwithout segfaulting, unless we comment out the new error testing here (run from the REPL, it works fine). I'll try to find some time today or tomorrow to look into that more.

@quinnj
Copy link
Member Author

quinnj commented Jun 1, 2015

Interesting that travis passes for 64-bit. @ihnorton, is the include segfault an OSX only thing?

@ihnorton
Copy link
Member

ihnorton commented Jun 1, 2015

No, I saw it on Windows. I don't use OS X, but if you can reproduce it
there it ought to be easier to debug (gdb doesn't work against julia-debug
on Windows). I couldn't reproduce on Linux.

On Mon, Jun 1, 2015 at 5:06 PM, Jacob Quinn notifications@github.com
wrote:

Interesting that travis passes. @ihnorton https://github.com/ihnorton,
is the include segfault an OSX only thing?


Reply to this email directly or view it on GitHub
#11491 (comment).

@quinnj
Copy link
Member Author

quinnj commented Jun 2, 2015

I can reproduce on OSX, but the initial gdb doesn't reveal much. Any pointers on where/how to put breakpoints to get more insights?

Program received signal SIGBUS, Bus error.
0x0000000102d1bfda in ?? ()
(gdb) bt
#0  0x0000000102d1bfda in ?? ()
#1  0x00007fff5fbfe320 in ?? ()
#2  0xa500c4106f94ead5 in ?? ()
#3  0x00007fff5fbfe310 in ?? ()
#4  0x0000000102d1bf40 in ?? ()
#5  0x00007fff5fbfe350 in ?? ()
#6  0x0000000100029d80 in jl_apply (f=0x7fff5fbfe320, args=0xa500c4106f94ead5, nargs=1) at ./julia.h:1300
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

@kmsquire
Copy link
Member

kmsquire commented Jun 2, 2015

Is this a debug build?

@quinnj
Copy link
Member Author

quinnj commented Jun 2, 2015

I'm no gdb expert by any means, but the above output was from gdb --args usr/bin/julia-debug /Users/jacobquinn/inner.jl

@kmsquire
Copy link
Member

kmsquire commented Jun 2, 2015

Maybe try lldb. Though I don't have any expectations.

@ihnorton
Copy link
Member

ihnorton commented Jun 2, 2015

I really don't know what to expect on OS X -- I thought backtraces were
decent there with LLVM 3.3. If you build with LLVM3.7 I think they should
also be nice now, per one of Keno's recent PRs. Or instead of the backtrace
you could step through execution of that expression from
jl_toplevel_eval_flex (set a breakpoint there conditional on jl_lineno)

On Tue, Jun 2, 2015 at 11:51 AM, Kevin Squire notifications@github.com
wrote:

Maybe try lldb. Though I don't have any expectations.


Reply to this email directly or view it on GitHub
#11491 (comment).

@quinnj
Copy link
Member Author

quinnj commented Jun 2, 2015

Thanks for the pointers @ihnorton. Any idea why trying to do call jl_(obj) is killing the session for me?

Breakpoint 1, eval (e=0x1055a47b0, locals=0x7fff5fbfec20, nl=0, ngensym=1) at interpreter.c:109
109     if (jl_is_symbol(e)) {
(gdb) call jl_(e)
GenSym(0) = Expr(:call, :UInt8, Char(0x00000078))::Any
infrun.c:5921: internal-error: void insert_longjmp_resume_breakpoint(struct gdbarch *, CORE_ADDR): Assertion `inferior_thread ()->control.exception_resume_breakpoint == NULL' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) n

This is a bug, please report it.  For instructions, see:
<http://www.gnu.org/software/gdb/bugs/>.

infrun.c:5921: internal-error: void insert_longjmp_resume_breakpoint(struct gdbarch *, CORE_ADDR): Assertion `inferior_thread ()->control.exception_resume_breakpoint == NULL' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Create a core file of GDB? (y or n) n
Command aborted.

@quinnj
Copy link
Member Author

quinnj commented Jun 3, 2015

@vtjnash any objections to merging this? I can continue working on tracking down the other segfault, but it's really a separate issue from what's being solved here for windows. I also suspect this solves #11552 (with a backport).

@tkelman
Copy link
Contributor

tkelman commented Jun 3, 2015

+1 from me

@tkelman
Copy link
Contributor

tkelman commented Jun 3, 2015

Actually both the travis segfaults are happening on the worker that's running the file test so probably related. Backtrace points to sweep_big_list, any ideas @carnaval @yuyichao?

Could you also rebase out the merge commit and fixup?

…ndows and creating a new ReadOnlyMemoryError type
@quinnj
Copy link
Member Author

quinnj commented Jun 3, 2015

Alright, cleaned up. Let's see what CI say this time.

@quinnj
Copy link
Member Author

quinnj commented Jun 3, 2015

Windows 32-bit passed! Lol....the windows 64-bit timed out for some reason on linalg tests, and the 4 Travis jobs all failed for different reasons including 1) ERROR (unhandled task failure): EOFError: read end of file, 2) exception on 9: ERROR: SystemError: shm_open() failed for /jl026600cGKm65iygJsjQbMwBnbf: No such file or directory in _shm_mmap_array at sharedarray.jl:383 and 3) for the two 32-bit builds signal (11): Segmentation fault sweep_big_list at /home/travis/build/JuliaLang/julia/src/gc.c:914. Not sure I'll have much time to look into this anymore today, so if anyone feels like playing around, feel free.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Jun 3, 2015

the change looks good to me, although of course, the segfault needs to be addressed in the test before it can be merged

@carnaval
Copy link
Contributor

carnaval commented Jun 3, 2015

If the line number of the fault is correct it means someone corrupted the big object free list. Either a logic error in the gc itself or user code misbehaving. I'll try and see if this is easy to reproduce on a 32bit vm.

@yuyichao
Copy link
Contributor

yuyichao commented Jun 3, 2015

Is this the same fault?

@JeffBezanson
Copy link
Sponsor Member

Bump. Should we merge this? The failures seem unrelated.

The fault in #11552 doesn't seem to be due to read-only memory.

@quinnj
Copy link
Member Author

quinnj commented Jun 9, 2015

I think the remaining problem here is that while we now catch read-only memory errors correctly cross platform, previously we didn't have a test for it and adding the test causes the include segfault to occur when run from Base.runtests() or make test-file. I.e. the read-only errors are only caught at the REPL. From a small sample of me and @ihnorton, I think the include fault may only be OSX/Windows, though 32-bit windows was the only CI to pass. ¯_(ツ)_/¯

@quinnj
Copy link
Member Author

quinnj commented Jun 9, 2015

I can reproduce the include segfault reliably on OSX and could help debug, I've just been a little limited on time (and frankly, the necessary skills) to really dig into this.

@vtjnash vtjnash merged commit a2b6943 into master Jun 12, 2015
@quinnj quinnj mentioned this pull request Jun 12, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:error handling Handling of exceptions by Julia or the user
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants