Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mmap_array segfaults #11351

Closed
quinnj opened this issue May 19, 2015 · 22 comments
Closed

mmap_array segfaults #11351

quinnj opened this issue May 19, 2015 · 22 comments
Labels
system:windows Affects only Windows

Comments

@quinnj
Copy link
Member

quinnj commented May 19, 2015

While battle-testing #11280, I discovered the following for Windows:

s = open(file, "r") # open file as read-only with contents "Hello World\n"
m = Mmap.Array(s) # same as previous mmap_array()
m[5] = UInt8('x')
# segfault

On OSX, I get an OutOfMemoryError(), but I don't think that's very informative.

I'm not entirely sure what to do here. Restrict Mmap.Array to file's where we have read & write permissions? Is that overly restrictive? Put a warning in the docs to be careful with read-only files? Rethink the Mmap.Array interface entirely?

@ScottPJones
Copy link
Contributor

Does this also happen with the current mmap code?
Please don't restrict it to files with read & write permission!

@quinnj
Copy link
Member Author

quinnj commented May 19, 2015

Yes, this is current behavior on 0.3/0.4.

@quinnj
Copy link
Member Author

quinnj commented May 19, 2015

Here's another fun Windows mmap_array segfault:

f = open("C:/Users/karbarcca/test.jl", "r")
r = mmap_array(Uint8, (10,), f, -1)
# segfault

By trying to do a negative offset. (Yay for more testing!)

@quinnj quinnj changed the title Windows segfault mmap_array mmap_array segfaults May 19, 2015
@ScottPJones
Copy link
Contributor

Ugh... Is m[5] = x done by setindex!(m,5,x) ? Could you simply have a readonly flag that gives an error for that?

@quinnj
Copy link
Member Author

quinnj commented May 19, 2015

I think this is one of the design flaws of mmap_array and allowing the A = pointer_to_array(ptr, dims) where ptr is our pointer to the mmapped region. It's certainly easy to add a iswritable flag to my new Mmap.Stream type (which I'm finding is extremely similar to an IOBuffer), but for Arrays, I'm not sure what we can do other than yell at people in docs.

@ihnorton
Copy link
Member

(related:
https://groups.google.com/d/msg/julia-users/_OFCVa3HAZQ/sF5VSeC6rIsJ)

On Tue, May 19, 2015 at 1:51 PM, Jacob Quinn notifications@github.com
wrote:

I think this is one of the design flaws of mmap_array and allowing the A
= pointer_to_array(ptr, dims) where ptr is our pointer to the mmapped
region. It's certainly easy to add a iswritable flag to my new Mmap.Stream
type (which I'm finding is extremely similar to an IOBuffer), but for
Arrays, I'm not sure what we can do other than yell at people in docs.


Reply to this email directly or view it on GitHub
#11351 (comment).

@simonster
Copy link
Member

See also #3434 and #6703.

@timholy
Copy link
Sponsor Member

timholy commented May 19, 2015

#6877, see in particular the last sentences of my introductory post.

@simonster
Copy link
Member

Also, I suppose this is a case where #10503 made things less clear.

@quinnj
Copy link
Member Author

quinnj commented May 19, 2015

So I know this is bleeding into #11280, but what do people think about getting rid of mmap_array entirely and just having Mmap.Stream? It would basically function as currently outlined in my PR, yet we could easily add an iswriteable field to the type and define getindex/setindex! so that it becomes a pseudo-array/Stream type (which is what it pretty much is).

@simonster
Copy link
Member

My workflow often involves creating objects that have array-typed fields, saving them to disk, and then reading the objects back in using JLD's mmaparrays option, which uses mmap_array to read the underlying arrays. Because JLD constructs instances of the same types with mmapped arrays in place of heap-allocated arrays, they really do need to be Arrays. But I could always move the existing mmap_array code to JLD.

However, if we want a separate type, I'm not sure if it should be Mmap.Stream or something new. We don't have multiple inheritance, so we'd have to pick between Mmap.Stream <: IO and Mmap.Stream <: AbstractArray.

@quinnj
Copy link
Member Author

quinnj commented May 19, 2015

Yeah, I was just looking through rewriting without mmap_array and realized the multiple inheritance problem as well. My thought would be to have Mmap.Stream <: IO with a convert(::Type{Array}, x::Stream) method that essentially performs the last line of mmap_array currently. Mainly because I think it's easier to subtype IO than AbstractArray.

@timholy
Copy link
Sponsor Member

timholy commented May 19, 2015

From a certain perspective it's just a block of memory; it's a detail that it happens to be backed by a disk file. Seems much more array-like than IO-like to me.

@quinnj
Copy link
Member Author

quinnj commented May 19, 2015

Yes, the more I think about it, the more it makes sense to think about it like an Array, but a special Array that has to deal with file permissions 😨 Alright, I think I'll try to make Mmap.Stream <: AbstractArray; who can point me to what it takes to be a 1st class AbstractArray around here?

@timholy
Copy link
Sponsor Member

timholy commented May 19, 2015

If we can merge #10525, almost nothing is needed other than scalar indexing and support for size. As it is, you have to support quite a lot of stuff (indexing with ranges, indexing with Vector{Int}, indexing a 3d array with 2 indexes, etc).

Personally, until #10525 is merged I think your best strategy is to keep mmap_array and have it return an Array.

@ScottPJones
Copy link
Contributor

It's not just file permissions... for example, what if you are using shared memory (does your mmap allow for that?), and you only want one process to be able to modify the data, and many to be able to read it...
[think database buffers]...

@timholy
Copy link
Sponsor Member

timholy commented May 19, 2015

Yes, SharedArrays use this to share memory. I don't think we have anything implemented to give different workers different permissions, aside from having the workers open the "file" with different permissions. We don't currently have a concept of a "read-only" array in base. (Such a thing could easily be implemented in a package, however.)

@quinnj
Copy link
Member Author

quinnj commented May 20, 2015

Yet another fun one:

f = open("/Users/jacobquinn/test")
m = mmap_array(UInt8, (11,), f, 0)
# returns successfully
Libc.munmap(pointer(m), 11) # run munmap() manually
m[1]
# segfaults

@simonster
Copy link
Member

@quinnj If you call munmap explicitly, I think you're playing with fire. That doesn't seem too different from:

x = zeros(10)
Libc.free(pointer(x))
# segfaults

@quinnj
Copy link
Member Author

quinnj commented May 20, 2015

Yeah, that's a fair point. I'd also argue that this is something that could be guarded easily against by having an actual Mmap.Array type that checked for readability/writability on getindex/setindex operations (PR soon to be updated).

@quinnj
Copy link
Member Author

quinnj commented May 29, 2015

So in doing some more testing, I've found another weird case of segfaulting: if I run

a = mmap_array(...) # read-only array
a[5] = 2 # try to setindex!

from the command line, I correctly get an OutOfMemory error on unix (still segfaults on windows). If I run the same commands, however, from include(script.jl), it segfaults. Is there something in include that disables/ignores the signal-handling that throws the OutOfMemory error?

I just reverted #11280 back to returning a regular Array, but all the segfaults are worrisome. I could take a stab at trying to fix, but I imagine it's loads more efficient for others.

quinnj added a commit that referenced this issue Jun 3, 2015
…ndows and creating a new ReadOnlyMemoryError type
@JeffBezanson JeffBezanson added the system:windows Affects only Windows label Jun 9, 2015
@quinnj
Copy link
Member Author

quinnj commented Jun 12, 2015

Fixed via #11491.

@quinnj quinnj closed this as completed Jun 12, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
system:windows Affects only Windows
Projects
None yet
Development

No branches or pull requests

6 participants