Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store to GC frame preventing vectorization #15717

Closed
yuyichao opened this issue Mar 31, 2016 · 5 comments
Closed

Store to GC frame preventing vectorization #15717

yuyichao opened this issue Mar 31, 2016 · 5 comments
Labels
compiler:codegen Generation of LLVM IR and native code kind:regression Regression in behavior compared to a previous version performance Must go faster

Comments

@yuyichao
Copy link
Contributor

Using patched LLVM 3.7.1 and OrcJIT.

Bisect log

yuyichao% git bisect bad 
c3039d49bc53d17b31510f788eac6bc7c9ee7fff is the first bad commit
commit c3039d49bc53d17b31510f788eac6bc7c9ee7fff
Author: Jameson Nash <vtjnash@gmail.com>
Date:   Thu Mar 10 15:20:10 2016 -0500

    ensure that debug info is always available for -O0 mode

    this should ensure that the jlcall args array is always visible in the debugger
    and that all variables can be inspected from -O0 mode
    and that julia-debug is built at the -O0 optimization level (so all variables are visible)

:100644 100644 1ff6f833e6350d8a36372566c86a2124ff457d47 52cd40a56cd21b6b1472cd7aa6d5ca5f5741a338 M      Makefile
:040000 040000 7f2de288213c8055e7b122fb6d49889b5d8fd91f f0503a5bd854f1ae07c870991d8f7ba67e3c62e1 M      src

yuyichao% git bisect log 
git bisect start
# bad: [a5f2c7a7f2c0786a0fd4d0eebd5690bb905a349b] add a gc root and some write barriers needed from jb/linear3 merge
git bisect bad a5f2c7a7f2c0786a0fd4d0eebd5690bb905a349b
# good: [9bfd27bd380124174a5f37c342e5c048874d71a4] Merge pull request #13412 from JuliaLang/jb/functions
git bisect good 9bfd27bd380124174a5f37c342e5c048874d71a4
# good: [b252d6103bcd93878fa029674687bf4c94dd7f99] Merge pull request #15200 from JuliaLang/anj/cholp
git bisect good b252d6103bcd93878fa029674687bf4c94dd7f99
# good: [892b9406e3c68a56e43587de740520c16dd0f6d8] support multiple arguments in `Generator` by zipping
git bisect good 892b9406e3c68a56e43587de740520c16dd0f6d8
# bad: [a0135328997c0aaea0b0a8f38848aed903aa196f] Merge pull request #15567 from JuliaLang/yyc/llvm39
git bisect bad a0135328997c0aaea0b0a8f38848aed903aa196f
# bad: [f3374b9e381994dd845381cf5212779780d9377f] Merge pull request #15347 from JuliaLang/kf/refactorobjlookup
git bisect bad f3374b9e381994dd845381cf5212779780d9377f
# bad: [a75f4c31d808e10c8c4247d1f5797e17159450db] Merge pull request #15334 from omus/dateformat-docs
git bisect bad a75f4c31d808e10c8c4247d1f5797e17159450db
# good: [fc469b68316061d95003cb0768c43dbc4b1efd0f] Merge pull request #15462 from justbur/fix-15461
git bisect good fc469b68316061d95003cb0768c43dbc4b1efd0f
# bad: [1745a5fc4d4c377a52f19a752dbec306769408c8] Merge pull request #15443 from tlnagy/master
git bisect bad 1745a5fc4d4c377a52f19a752dbec306769408c8
# bad: [64dbecf6ca76a46be34c3addd39cd516f383b70a] Merge pull request #15444 from JuliaLang/jn/dwarfbug
git bisect bad 64dbecf6ca76a46be34c3addd39cd516f383b70a
# good: [ebf6c64a6ad566a42e4c8b5408f59e46b030e5f4] fix debug info in compiler
git bisect good ebf6c64a6ad566a42e4c8b5408f59e46b030e5f4
# bad: [c3039d49bc53d17b31510f788eac6bc7c9ee7fff] ensure that debug info is always available for -O0 mode
git bisect bad c3039d49bc53d17b31510f788eac6bc7c9ee7fff
# first bad commit: [c3039d49bc53d17b31510f788eac6bc7c9ee7fff] ensure that debug info is always available for -O0 mode

Possibly similar to #13301 but that was "fixed" after codegen_rewrite2 and SIMD doesn't even work for the cases that used to work before...

@vtjnash

@yuyichao yuyichao added performance Must go faster kind:regression Regression in behavior compared to a previous version compiler:codegen Generation of LLVM IR and native code labels Mar 31, 2016
@yuyichao
Copy link
Contributor Author

Simple code and the corresponding IR on the first bad commit.

julia> function f(a)
           @inbounds @simd for i in eachindex(a)
               a[i] += 1
           end
           nothing
       end
f (generic function with 1 method)

full llvm-ir

if13:                                             ; preds = %if13.preheader, %if13
  %"##i#7433.01" = phi i64 [ %41, %if13 ], [ 0, %if13.preheader ]
  store %jl_value_t* %0, %jl_value_t** %3, align 8, !dbg !49, !llvm.mem.parallel_loop_access !57
  %35 = load float*, float** %24, align 8, !dbg !49, !tbaa !59, !llvm.mem.parallel_loop_access !57
  %36 = getelementptr float, float* %35, i64 %"##i#7433.01", !dbg !49
  %37 = load float, float* %36, align 4, !dbg !49, !tbaa !60, !llvm.mem.parallel_loop_access !57
  %38 = fadd float %37, 1.000000e+00, !dbg !49
  store %jl_value_t* %0, %jl_value_t** %4, align 8, !dbg !49, !llvm.mem.parallel_loop_access !57
  %39 = load float*, float** %24, align 8, !dbg !49, !tbaa !59, !llvm.mem.parallel_loop_access !57
  %40 = getelementptr float, float* %39, i64 %"##i#7433.01", !dbg !49
  store float %38, float* %40, align 4, !dbg !49, !tbaa !60, !llvm.mem.parallel_loop_access !57
  %41 = add nuw nsw i64 %"##i#7433.01", 1, !dbg !61, !simd_loop !13
  call void @llvm.dbg.value(metadata i64 %41, i64 0, metadata !22, metadata !31), !dbg !32
  %exitcond = icmp eq i64 %41, %22, !dbg !50
  br i1 %exitcond, label %L.backedge.loopexit, label %if13, !dbg !50, !llvm.loop !58

@KristofferC
Copy link
Sponsor Member

Ref #13777

@yuyichao
Copy link
Contributor Author

Actually looks like the issue is similar to #15402 but it gets even worse after that

@yuyichao
Copy link
Contributor Author

yuyichao commented Apr 2, 2016

With #15735 and #13463 the example above vectorizes at normal optimization level. However, if the array is allocated in the same function or the variable is otherwise assgined to (similar to #13301) the redundant store in the loop still prevent the optimization from happening at normal optimization level.

function f_simd(n::Integer)
    a = zeros(Float32, n)
    @inbounds @simd for i in eachindex(a)
        a[i] += 1
    end
    nothing
end

IR of the inner loop:

if15:                                             ; preds = %if15, %if15.lr.ph
  %"i#256.017" = phi i64 [ 0, %if15.lr.ph ], [ %45, %if15 ]
  store %jl_value_t* %24, %jl_value_t** %9, align 8, !dbg !53, !tbaa !32, !llvm.mem.parallel_loop_access !59
  %42 = getelementptr float, float* %41, i64 %"i#256.017", !dbg !53
  %43 = load float, float* %42, align 4, !dbg !53, !tbaa !61, !llvm.mem.parallel_loop_access !59
  %44 = fadd float %43, 1.000000e+00, !dbg !53
  store %jl_value_t* %24, %jl_value_t** %10, align 8, !dbg !53, !tbaa !32, !llvm.mem.parallel_loop_access !59
  store float %44, float* %42, align 4, !dbg !53, !tbaa !61, !llvm.mem.parallel_loop_access !59
  %45 = add nuw nsw i64 %"i#256.017", 1, !dbg !62, !simd_loop !4
  call void @llvm.dbg.value(metadata i64 %45, i64 0, metadata !26, metadata !30), !dbg !31
  %exitcond = icmp eq i64 %45, %35, !dbg !58
  br i1 %exitcond, label %L11.loopexit, label %if15, !dbg !58, !llvm.loop !60

@yuyichao yuyichao changed the title SIMD does not work anymore Store to GC frame preventing vectorization Apr 2, 2016
@yuyichao
Copy link
Contributor Author

yuyichao commented Jun 9, 2016

Vectorization works now (on llvm 3.8 at least) move to #15369

@yuyichao yuyichao closed this as completed Jun 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code kind:regression Regression in behavior compared to a previous version performance Must go faster
Projects
None yet
Development

No branches or pull requests

2 participants