Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arrays with Missing Hunks #180

Open
r-barnes opened this issue Oct 1, 2018 · 11 comments
Open

Arrays with Missing Hunks #180

r-barnes opened this issue Oct 1, 2018 · 11 comments

Comments

@r-barnes
Copy link
Contributor

r-barnes commented Oct 1, 2018

Is it feasible that DArray could support distributed arrays in which pieces are missing?

For instance, say I have the following:

X X    X
X X X X
X X

where each X represents a dense portion of an array and the blanks represent a portion which can be modeled as representing a single value throughout.

In my mind, DArray has some sort of address table and, when it does a lookup, it could note the gap and return an appropriately formatted response using the constant value.

@vchuravy
Copy link
Member

vchuravy commented Oct 1, 2018

Hm yes, that seems feasible. Sounds like you would want to use something like a https://github.com/JuliaArrays/FillArrays.jl, the only problem is that we don't support heterogeneous chunk-types. #170 as an example is actually about ensuring that the chunk-types by construction are consistent.

It might be feasible to do promote to a joint super-type, but that might break other things
like similar. So yes feasible in theory, but not yet in practice.

@r-barnes
Copy link
Contributor Author

r-barnes commented Oct 1, 2018

Just for reference: for the data I'm currently working with it's the difference between 700 GB and 2 TB of RAM.

FillArrays looks pretty perfect. I wonder if you have suggestions of places I could look in the codebase if I wanted to try hacking on this?

@vchuravy
Copy link
Member

vchuravy commented Oct 1, 2018

#170 modifies exactly the core constructor that you would want to modify. Once you have a DArray that is constructed in the way you want it comes with the challenge of making sure that all the operations you are interested in work.

@r-barnes
Copy link
Contributor Author

r-barnes commented Oct 1, 2018

I'll take a dig.

There might be a cheap-hack approach, though: having multiple sections of the DArray reference the same underlying memory:

@everywhere using DistributedArrays
r1  = @spawnat 2 zeros(4,4) 
r2  = @spawnat 3 zeros(4,4) 
r3  = @spawnat 4 rand(4,4)
ras = [r1 r2; r3 r3]
D   = DArray(ras)

Unfortunately, my experiments in #183 make me nervous about this.

@vchuravy
Copy link
Member

vchuravy commented Oct 1, 2018

Hm yes I suspect that there might be quite some functions written with the assumption that each chunk will be on a different processor, even though it clearly doesn't need to be assumed.

@r-barnes
Copy link
Contributor Author

r-barnes commented Oct 1, 2018

The following seems to work for me:

@everywhere using DistributedArrays
@everywhere using FillArrays
r1  = @spawnat 2 FillArrays.Zeros(4,4) 
r2  = @spawnat 3 FillArrays.Zeros(4,4) 
r3  = @spawnat 4 rand(4,4)
r5  = @spawnat 5 rand(4,4)
ras = [r1 r3; r5 r2]
D   = DArray(ras)

[@fetchfrom p typeof(D[:L]) for p in workers()]
[@fetchfrom p eltype(D[:L]) for p in workers()]

And this can be done even with #170 in place.

@vchuravy
Copy link
Member

vchuravy commented Oct 1, 2018

Hm that fails for me on current master:

D   = DArray(ras)
ERROR: MethodError: no method matching Zeros{Float64,2,Tuple{Int64,Int64}}(::Array{Float64,2})
Stacktrace:
 [1] empty_localpart(::Type, ::Int64, ::Type) at /home/vchuravy/.julia/dev/DistributedArrays/src/darray.jl:66
 [2] macro expansion at ./task.jl:264 [inlined]
 [3] macro expansion at /home/vchuravy/.julia/dev/DistributedArrays/src/darray.jl:84 [inlined]
 [4] macro expansion at ./task.jl:244 [inlined]
 [5] DArray(::Tuple{Int64,Int64}, ::Array{Future,2}, ::Tuple{Int64,Int64}, ::Array{Int64,2}, ::Array{Tuple{UnitRange{Int64},UnitRange{Int64}},2}, ::Array{Array{Int64,1},1}) at /home/vchuravy/.julia/dev/DistributedArrays/src/darray.jl:82
 [6] macro expansion at ./task.jl:266 [inlined]
 [7] macro expansion at /home/vchuravy/.julia/dev/DistributedArrays/src/darray.jl:194 [inlined]
 [8] macro expansion at ./task.jl:244 [inlined]
 [9] DArray(::Array{Future,2}) at /home/vchuravy/.julia/dev/DistributedArrays/src/darray.jl:192
 [10] top-level scope at none:0

@vchuravy
Copy link
Member

vchuravy commented Oct 1, 2018

Ah it works on #175..., but we need a method to select the "right" array-type. Since right now it could happen that it chooses a different T as a primary eltype and I think the process local version of this have diverged...

@r-barnes
Copy link
Contributor Author

r-barnes commented Oct 1, 2018

I'm running [aaf54ef3] DistributedArrays v0.5.1 right now.

(Is there a good development cycle for working from the master branch of a local repo?)

Isn't eltype the type of the data held by an array? It seems as though that could be enforced constant across the entire DArray, even if the containers holding the elements differ.

@r-barnes
Copy link
Contributor Author

r-barnes commented Oct 1, 2018

Is it possible to make the type more general to encompass the possibility of submatrices of different types?

@vchuravy
Copy link
Member

vchuravy commented Oct 1, 2018

(Is there a good development cycle for working from the master branch of a local repo?)

I generally use ]dev for that which can also take a local path

Isn't eltype the type of the data held by an array? It seems as though that could be enforced constant across the entire DArray, even if the containers holding the elements differ.

Yes the eltype being consitent is even more important and that is an invariant we need to uphold.

Is it possible to make the type more general to encompass the possibility of submatrices of different types?

There are two alternatives:

a = Fill(1.0, 2, 2)
b = zeros(2, 2)

AT = Union{typeof(a), typeof(b)}

and then teach DArray to reason about this union (immediate question would be what is similar(AT))

Or promote_type(typeof(a), typeof(b)) which is an AbstractArray{Float64, 2} so that could work quite well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants