Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Size constraints? #10

Open
cscherrer opened this issue Feb 1, 2021 · 1 comment
Open

Size constraints? #10

cscherrer opened this issue Feb 1, 2021 · 1 comment

Comments

@cscherrer
Copy link

Hi @chriselrod , thanks for making this available, and for your nice summary here:
JuliaMath/MeasureTheory.jl#49 (reply in thread)

With StaticArrays, there's compiler overhead that grows with the size of the array. From some command line checks this doesn't seem to be the case for StrideArrays. Is that right? Would it be accurate that StrideArray performance will always be at least as good as vanilla Arrays, as long as you're careful about the gc concerns?

@chriselrod
Copy link
Member

A major contributor to StaticArray's overhead is that NTuples aren't actually treated specially in Julia, which should eventually be fixed: JuliaLang/julia#31681

From some command line checks this doesn't seem to be the case for StrideArrays. Is that right?

It shouldn't. StrideArrays memory is often backed by a NTuple, but it avoids interacting with it.
MArrays also tend to do a lot better than SArrays for that reason.

Would it be accurate that StrideArray performance will always be at least as good as vanilla Arrays, as long as you're careful about the gc concerns?

It shouldn't degrade with size.
But the stack-allocation trick has it's limits -- namely, the stack size.

On 64-bit Julia builds, Octavian.jl's matrix multiplication (C = A * B) uses this trick to allocate memory to copy blocks of A into.
However, tests were failing on 32 bit builds because of stack overflow errors.
Not caused by a function accidentally recursively calling itself until you run out of stack, but because these allocated matrices were accidentally using it all!
It was creating arrays of around 1 MiB in size (or 131_072 Float64).

The stack is much smaller than the amount of RAM. Apparently, it is also smaller in 32-bit builds of Julia.

So for those it does a simpler version of what ProbabilityModels did: allocates a chunk of memory to use when it needs it.
Stacks are extremely simple, which is also why they're so fast. Allocating and freeing memory is just changing the value of an integer. Simpler than ProbabilityModels, because Octavian is only creating one of these arrays at a time per thread.

But it's nicer to let LLVM handle incrementing the pointer for you.

As for performance, the goal is to be faster, in part thanks to providing more efficient methods, e.g.:

julia> using StrideArrays, StaticArrays, BenchmarkTools

julia> A = @MMatrix rand(8,8); B = @MMatrix rand(8,8); C = similar(A);

julia> @benchmark mul!($C, $A, $B)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     39.008 ns (0.00% GC)
  median time:      39.088 ns (0.00% GC)
  mean time:        40.785 ns (0.00% GC)
  maximum time:     82.652 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     991

julia> @benchmark mul!($(StrideArray(C)), $(StrideArray(A)), $(StrideArray(B)))
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     14.254 ns (0.00% GC)
  median time:      15.284 ns (0.00% GC)
  mean time:        15.255 ns (0.00% GC)
  maximum time:     37.802 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     998

julia> Abase = Array(A);

julia> @btime sum($Abase)
  17.074 ns (0 allocations: 0 bytes)
32.746376987098714

julia> @btime sum($A)
  19.874 ns (0 allocations: 0 bytes)
32.746376987098714

julia> @btime sum($(StrideArray(A)))
  3.284 ns (0 allocations: 0 bytes)
32.746376987098714

julia> @btime mapreduce(log, +, $Abase)
  256.875 ns (0 allocations: 0 bytes)
-60.07821360899319

julia> @btime mapreduce(log, +, $A)
  244.497 ns (0 allocations: 0 bytes)
-60.07821360899319

julia> @btime mapreduce(log, +, $(StrideArray(A)))
  45.248 ns (0 allocations: 0 bytes)
-60.07821360899318

julia> @btime mapreduce(log, +, $(StrideArray(Abase)))
  50.536 ns (0 allocations: 0 bytes)
-60.07821360899318

julia> Float64(mapreduce(log  big, +, Abase)) # just checking which version rounded correctly ;)
-60.07821360899318

Of course, there are probably bugs and features missing, including functions that might not appropriately add the @gcpreserve to allow for stack allocation, so issues are welcome if/when you run into any problems!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants