Store to GC frame preventing vectorization #15717

yuyichao · 2016-03-31T18:03:29Z

Using patched LLVM 3.7.1 and OrcJIT.

Bisect log

yuyichao% git bisect bad 
c3039d49bc53d17b31510f788eac6bc7c9ee7fff is the first bad commit
commit c3039d49bc53d17b31510f788eac6bc7c9ee7fff
Author: Jameson Nash <vtjnash@gmail.com>
Date:   Thu Mar 10 15:20:10 2016 -0500

    ensure that debug info is always available for -O0 mode

    this should ensure that the jlcall args array is always visible in the debugger
    and that all variables can be inspected from -O0 mode
    and that julia-debug is built at the -O0 optimization level (so all variables are visible)

:100644 100644 1ff6f833e6350d8a36372566c86a2124ff457d47 52cd40a56cd21b6b1472cd7aa6d5ca5f5741a338 M      Makefile
:040000 040000 7f2de288213c8055e7b122fb6d49889b5d8fd91f f0503a5bd854f1ae07c870991d8f7ba67e3c62e1 M      src

yuyichao% git bisect log 
git bisect start
# bad: [a5f2c7a7f2c0786a0fd4d0eebd5690bb905a349b] add a gc root and some write barriers needed from jb/linear3 merge
git bisect bad a5f2c7a7f2c0786a0fd4d0eebd5690bb905a349b
# good: [9bfd27bd380124174a5f37c342e5c048874d71a4] Merge pull request #13412 from JuliaLang/jb/functions
git bisect good 9bfd27bd380124174a5f37c342e5c048874d71a4
# good: [b252d6103bcd93878fa029674687bf4c94dd7f99] Merge pull request #15200 from JuliaLang/anj/cholp
git bisect good b252d6103bcd93878fa029674687bf4c94dd7f99
# good: [892b9406e3c68a56e43587de740520c16dd0f6d8] support multiple arguments in `Generator` by zipping
git bisect good 892b9406e3c68a56e43587de740520c16dd0f6d8
# bad: [a0135328997c0aaea0b0a8f38848aed903aa196f] Merge pull request #15567 from JuliaLang/yyc/llvm39
git bisect bad a0135328997c0aaea0b0a8f38848aed903aa196f
# bad: [f3374b9e381994dd845381cf5212779780d9377f] Merge pull request #15347 from JuliaLang/kf/refactorobjlookup
git bisect bad f3374b9e381994dd845381cf5212779780d9377f
# bad: [a75f4c31d808e10c8c4247d1f5797e17159450db] Merge pull request #15334 from omus/dateformat-docs
git bisect bad a75f4c31d808e10c8c4247d1f5797e17159450db
# good: [fc469b68316061d95003cb0768c43dbc4b1efd0f] Merge pull request #15462 from justbur/fix-15461
git bisect good fc469b68316061d95003cb0768c43dbc4b1efd0f
# bad: [1745a5fc4d4c377a52f19a752dbec306769408c8] Merge pull request #15443 from tlnagy/master
git bisect bad 1745a5fc4d4c377a52f19a752dbec306769408c8
# bad: [64dbecf6ca76a46be34c3addd39cd516f383b70a] Merge pull request #15444 from JuliaLang/jn/dwarfbug
git bisect bad 64dbecf6ca76a46be34c3addd39cd516f383b70a
# good: [ebf6c64a6ad566a42e4c8b5408f59e46b030e5f4] fix debug info in compiler
git bisect good ebf6c64a6ad566a42e4c8b5408f59e46b030e5f4
# bad: [c3039d49bc53d17b31510f788eac6bc7c9ee7fff] ensure that debug info is always available for -O0 mode
git bisect bad c3039d49bc53d17b31510f788eac6bc7c9ee7fff
# first bad commit: [c3039d49bc53d17b31510f788eac6bc7c9ee7fff] ensure that debug info is always available for -O0 mode

Possibly similar to #13301 but that was "fixed" after codegen_rewrite2 and SIMD doesn't even work for the cases that used to work before...

@vtjnash

The text was updated successfully, but these errors were encountered:

yuyichao · 2016-03-31T18:06:37Z

Simple code and the corresponding IR on the first bad commit.

julia> function f(a)
           @inbounds @simd for i in eachindex(a)
               a[i] += 1
           end
           nothing
       end
f (generic function with 1 method)

full llvm-ir

if13:                                             ; preds = %if13.preheader, %if13
  %"##i#7433.01" = phi i64 [ %41, %if13 ], [ 0, %if13.preheader ]
  store %jl_value_t* %0, %jl_value_t** %3, align 8, !dbg !49, !llvm.mem.parallel_loop_access !57
  %35 = load float*, float** %24, align 8, !dbg !49, !tbaa !59, !llvm.mem.parallel_loop_access !57
  %36 = getelementptr float, float* %35, i64 %"##i#7433.01", !dbg !49
  %37 = load float, float* %36, align 4, !dbg !49, !tbaa !60, !llvm.mem.parallel_loop_access !57
  %38 = fadd float %37, 1.000000e+00, !dbg !49
  store %jl_value_t* %0, %jl_value_t** %4, align 8, !dbg !49, !llvm.mem.parallel_loop_access !57
  %39 = load float*, float** %24, align 8, !dbg !49, !tbaa !59, !llvm.mem.parallel_loop_access !57
  %40 = getelementptr float, float* %39, i64 %"##i#7433.01", !dbg !49
  store float %38, float* %40, align 4, !dbg !49, !tbaa !60, !llvm.mem.parallel_loop_access !57
  %41 = add nuw nsw i64 %"##i#7433.01", 1, !dbg !61, !simd_loop !13
  call void @llvm.dbg.value(metadata i64 %41, i64 0, metadata !22, metadata !31), !dbg !32
  %exitcond = icmp eq i64 %41, %22, !dbg !50
  br i1 %exitcond, label %L.backedge.loopexit, label %if13, !dbg !50, !llvm.loop !58

KristofferC · 2016-03-31T18:12:54Z

Ref #13777

yuyichao · 2016-03-31T18:12:56Z

Actually looks like the issue is similar to #15402 but it gets even worse after that

yuyichao · 2016-04-02T17:52:07Z

With #15735 and #13463 the example above vectorizes at normal optimization level. However, if the array is allocated in the same function or the variable is otherwise assgined to (similar to #13301) the redundant store in the loop still prevent the optimization from happening at normal optimization level.

function f_simd(n::Integer)
    a = zeros(Float32, n)
    @inbounds @simd for i in eachindex(a)
        a[i] += 1
    end
    nothing
end

IR of the inner loop:

if15:                                             ; preds = %if15, %if15.lr.ph
  %"i#256.017" = phi i64 [ 0, %if15.lr.ph ], [ %45, %if15 ]
  store %jl_value_t* %24, %jl_value_t** %9, align 8, !dbg !53, !tbaa !32, !llvm.mem.parallel_loop_access !59
  %42 = getelementptr float, float* %41, i64 %"i#256.017", !dbg !53
  %43 = load float, float* %42, align 4, !dbg !53, !tbaa !61, !llvm.mem.parallel_loop_access !59
  %44 = fadd float %43, 1.000000e+00, !dbg !53
  store %jl_value_t* %24, %jl_value_t** %10, align 8, !dbg !53, !tbaa !32, !llvm.mem.parallel_loop_access !59
  store float %44, float* %42, align 4, !dbg !53, !tbaa !61, !llvm.mem.parallel_loop_access !59
  %45 = add nuw nsw i64 %"i#256.017", 1, !dbg !62, !simd_loop !4
  call void @llvm.dbg.value(metadata i64 %45, i64 0, metadata !26, metadata !30), !dbg !31
  %exitcond = icmp eq i64 %45, %35, !dbg !58
  br i1 %exitcond, label %L11.loopexit, label %if15, !dbg !58, !llvm.loop !60

yuyichao · 2016-06-09T16:29:46Z

Vectorization works now (on llvm 3.8 at least) move to #15369

yuyichao added performance Must go faster kind:regression Regression in behavior compared to a previous version compiler:codegen Generation of LLVM IR and native code labels Mar 31, 2016

This was referenced Mar 31, 2016

Unnecessary GC root for getfield of SSA immutable object #15402

Closed

Unnecessary GC root preceding arrayset #15719

Closed

tbaa_gcframe #13463

Merged

tkelman mentioned this issue Apr 2, 2016

Warning and slowdown for multithreaded Julia code #15740

Closed

yuyichao changed the title ~~SIMD does not work anymore~~ Store to GC frame preventing vectorization Apr 2, 2016

yuyichao mentioned this issue Jun 9, 2016

Very inefficient GC frame generation #15369

Closed

yuyichao closed this as completed Jun 9, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store to GC frame preventing vectorization #15717

Store to GC frame preventing vectorization #15717

yuyichao commented Mar 31, 2016

yuyichao commented Mar 31, 2016

KristofferC commented Mar 31, 2016

yuyichao commented Mar 31, 2016

yuyichao commented Apr 2, 2016

yuyichao commented Jun 9, 2016 •

edited

Loading

Store to GC frame preventing vectorization #15717

Store to GC frame preventing vectorization #15717

Comments

yuyichao commented Mar 31, 2016

yuyichao commented Mar 31, 2016

KristofferC commented Mar 31, 2016

yuyichao commented Mar 31, 2016

yuyichao commented Apr 2, 2016

yuyichao commented Jun 9, 2016 • edited Loading

yuyichao commented Jun 9, 2016 •

edited

Loading