Missing TBAA information on view #34847

vchuravy · 2020-02-23T04:08:21Z

In https://discourse.julialang.org/t/rfc-ann-restacker-jl-a-workaround-for-the-heap-allocated-immutable-problem/35037 @tkf that copying a view manually at that tart of the function causes a sizable performance improvement. It seems different from the stack allocation of immutable that contain heap references (x-ref: #34126). From my brief investigation it seems like the performance difference is due to a vectorization failure. The example I looked at is:

@noinline function f!(ys, xs)
         @inbounds for i in eachindex(ys, xs)
           x = xs[i]
           if -0.5 < x < 0.5
             ys[i] = 2x
           end
        end
end

"Restacking" ys causes this function to be vectorizable.
Looking at the optimized LLVM IR I noticed

   %40 = add i64 %39, %28, !dbg !67
   %41 = getelementptr inbounds double, double addrspace(13)* %31, i64 %40, !dbg !67
   %42 = load double, double addrspace(13)* %41, align 8, !dbg !67, !tbaa !70
...
   %47 = add i64 %39, %36, !dbg !81
   %48 = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %38, align 8, !dbg !81, !tbaa !59, !nonnull !4
   %49 = getelementptr inbounds double, double addrspace(13)* %48, i64 %47, !dbg !81
   store double %46, double addrspace(13)* %49, align 8, !dbg !81, !tbaa !70

Where %42 = load is the getindex from xs and store double is the setindex! into ys. Notice how in %48 we are reloading the base pointer of the array at each iteration.

So how are we deriving %38 and %48, why don't we have to reload the pointer we are loading from?

The series of operations we do to obtain the base pointer for the getpointer

  %24 = bitcast %jl_value_t addrspace(11)* %11 to %jl_value_t addrspace(10)* addrspace(11)*
  %25 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %24, align 8, !tbaa !41, !nonnull !4, !dereferenceable !57, !align !58
  %26 = getelementptr inbounds i8, i8 addrspace(11)* %12, i64 16
  %27 = bitcast i8 addrspace(11)* %26 to i64 addrspace(11)*
  %28 = load i64, i64 addrspace(11)* %27, align 8, !tbaa !41
  %29 = addrspacecast %jl_value_t addrspace(10)* %25 to %jl_value_t addrspace(11)*
  %30 = bitcast %jl_value_t addrspace(11)* %29 to double addrspace(13)* addrspace(11)*
  %31 = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %30, align 8, !tbaa !59, !nonnull !4

the operations we do to obtain the pointer from which to load the base pointer.

  %32 = bitcast %jl_value_t addrspace(11)* %8 to %jl_value_t addrspace(10)* addrspace(11)*
  %33 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %32, align 8
  %34 = getelementptr inbounds i8, i8 addrspace(11)* %9, i64 16
  %35 = bitcast i8 addrspace(11)* %34 to i64 addrspace(11)*
  %36 = load i64, i64 addrspace(11)* %35, align 8
  %37 = addrspacecast %jl_value_t addrspace(10)* %33 to %jl_value_t addrspace(11)*
  %38 = bitcast %jl_value_t addrspace(11)* %37 to double addrspace(13)* addrspace(11)*

Notice how in the later case we are missing all TBAA information. The two interesting tbaa trees are below.

!41 = !{!42, !42, i64 0}
!42 = !{!"jtbaa_immut", !43, i64 0}
!43 = !{!"jtbaa_value", !44, i64 0}
!44 = !{!"jtbaa_data", !45, i64 0}
!45 = !{!"jtbaa", !46, i64 0}
!46 = !{!"jtbaa"}
!59 = !{!60, !60, i64 0}
!60 = !{!"jtbaa_arrayptr", !61, i64 0}
!61 = !{!"jtbaa_array", !45, i64 0}

I don't understand our TBAA annotations well enough to conclude that this is a bug, but shouldn't the view be marked immutable either way?

The text was updated successfully, but these errors were encountered:

longemen3000 · 2020-08-23T03:13:23Z

Bump

yuyichao · 2020-08-23T04:10:06Z

This is the same issue as #36970. Also, AFAICT, the issue is not the absense of TBAA but the absense of dereferenceable. All the memory operations in the loop have all the metadata on them. However, LICM striped the metadata for the load of the array from subarray and therefore lost information needed to hoist the load of the array pointer in ys from inside the branch. Adding the dereferenceable metadata back to the post-LICM IR makes vectorization happen.

https://reviews.llvm.org/D85598 helps with #36970 but not this one since the load is not constant.

pdeffebach · 2020-12-07T18:26:28Z

Bumping this as I think it's relevant. @chriselrod pointed me to this issue. Currently @view isn't taking full advantage of a transpose or a PermutedDimsArray, see below

julia> function summatrows(x)
       N = size(x, 1)
       z = Vector{Float64}(undef, N)
       @inbounds for i in 1:N
           z[i] = sum(@view x[i, :])
       end
       z
       end;
julia> function summatcols(x)
       N = size(x, 2)
       z = Vector{Float64}(undef, N)
       @inbounds for i in 1:N
           z[i] = sum(@view x[:, i])
       end
       z
       end;
julia> x = rand(1000, 1000);
julia> y = transpose(permutedims(x));
julia> using BenchmarkTools;
julia> @btime summatcols($x);
  354.489 μs (1 allocation: 7.94 KiB)
julia> @btime summatrows($x);
  2.249 ms (1 allocation: 7.94 KiB)
julia> @btime summatcols($y);
  2.345 ms (1 allocation: 7.94 KiB)
julia> @btime summatrows($y);
  1.269 ms (1 allocation: 7.94 KiB)

The very last timing should be 350 μs

vchuravy added the performance Must go faster label Feb 23, 2020

timholy mentioned this issue Feb 26, 2020

Collaboration, transfer, and adding iterators JuliaArrays/ArrayInterface.jl#38

Open

yuyichao added the kind:upstream The issue is with an upstream dependency, e.g. LLVM label Aug 23, 2020

pdeffebach mentioned this issue Dec 8, 2020

Summing along a dimension of a PermutedDimsArray could be faster #38774

Closed

jd-lara mentioned this issue Jun 17, 2021

Performance disparity in expression operations JuliaStochOpt/ParameterJuMP.jl#85

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing TBAA information on view #34847

Missing TBAA information on view #34847

vchuravy commented Feb 23, 2020 •

edited

Loading

longemen3000 commented Aug 23, 2020

yuyichao commented Aug 23, 2020 •

edited

Loading

pdeffebach commented Dec 7, 2020

Missing TBAA information on view #34847

Missing TBAA information on view #34847

Comments

vchuravy commented Feb 23, 2020 • edited Loading

longemen3000 commented Aug 23, 2020

yuyichao commented Aug 23, 2020 • edited Loading

pdeffebach commented Dec 7, 2020

vchuravy commented Feb 23, 2020 •

edited

Loading

yuyichao commented Aug 23, 2020 •

edited

Loading