Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rules for BLAS.dot, BLAS.dotc, and BLAS.dotu #739

Open
wants to merge 34 commits into
base: main
Choose a base branch
from

Conversation

sethaxen
Copy link
Collaborator

@sethaxen sethaxen commented Apr 18, 2023

In an attempt to learn Enzyme's rule system and speed up AD of BLAS calls, this PR adds a rule for BLAS.dot (and BLAS.dotc and BLAS.dotu). On an input of length 10,000, this is 6x faster than the fallback in forward mode and 60x faster than the fallback in reverse mode.

@codecov-commenter
Copy link

codecov-commenter commented Apr 18, 2023

Codecov Report

Patch coverage: 2.98% and project coverage change: -0.38 ⚠️

Comparison is base (e96c0c5) 78.88% compared to head (cf9ae74) 78.50%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #739      +/-   ##
==========================================
- Coverage   78.88%   78.50%   -0.38%     
==========================================
  Files          18       19       +1     
  Lines        8118     8185      +67     
==========================================
+ Hits         6404     6426      +22     
- Misses       1714     1759      +45     
Impacted Files Coverage Δ
src/Enzyme.jl 87.06% <ø> (ø)
src/rules/LinearAlgebra/blas.jl 2.98% <2.98%> (ø)

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

end
_, _, Xow, _, Yow = EnzymeRules.overwritten(config)
# copy only the elements we need
Xtape = Xow ? BLAS.blascopy!(n.val, X.val, incx.val, similar(X.val, n.val), 1) : nothing
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it makes more sense to use the fallback instead of copying? Is that even possible?

@sethaxen
Copy link
Collaborator Author

How does Enzyme treat complex numbers? e.g. if I wanted to also support BLAS.dotc, would the Duplicateds all store complex derivative vectors, and would the Active store a complex derivative?

@wsmoses
Copy link
Member

wsmoses commented Apr 19, 2023

How does Enzyme treat complex numbers? e.g. if I wanted to also support BLAS.dotc, would the Duplicateds all store complex derivative vectors, and would the Active store a complex derivative?

yup


const ConstOrDuplicated{T} = Union{Const{T},Duplicated{T}}

_safe_similar(x::AbstractArray, n::Integer) = similar(x, n)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These utilities should be reusable for and greatly simplify the rules for all other Level 1 BLAS functions.

@ZuseZ4
Copy link
Member

ZuseZ4 commented Apr 20, 2023

Hi @sethaxen,
great to see some work on EnzymeBlas from another approach!
I worked on BlasEnzyme last year, but didn't got around merging it back then.
Namely I do have tablegen implementations for

handling: asum
handling: axpy
handling: copy
handling: dot
handling: scal

but I am lacking the tests for it.
Do you have a couple of tests for dot that I could use? Then I can finish my refactoring
work here and re-use your tests once that's done.
It would be great if we end up having support at all three levels, bitcode, enzyme proper and julia level,
so we can compare those.

@sethaxen
Copy link
Collaborator Author

Do you have a couple of tests for dot that I could use?

Sure, I just added some tests for dot, dotu, and dotc. Currently the dot tests pass on my machine, but the complex ones segfault on my machine, and it's not clear why. (I'm still a bit confused on how rules with complex numbers are supposed to be defined, see #744). All tests pass on the fallbacks.

It would be great if we end up having support at all three levels, bitcode, enzyme proper and julia level,
so we can compare those.

Yeah, it'd be nice if we could make that comparison when this PR is finished before I tackle BLAS rules for the other functions you listed.

@ZuseZ4
Copy link
Member

ZuseZ4 commented Apr 21, 2023

I just build Enzyme.jl on top of Enzyme with Blas-Tblgen, removing the julia code that calls the Blas Fallback.
It seems like tablegen dot implementation passes the existing Julia tests and also the Enzyme LLVM-IR tests.
I need to clean it up a bit because I have multiple blas functions in my PR and not all of them are that well tested, but I hope to merge it soon.

@sethaxen
Copy link
Collaborator Author

It seems like tablegen dot implementation passes the existing Julia tests and also the Enzyme LLVM-IR tests.
I need to clean it up a bit because I have multiple blas functions in my PR and not all of them are that well tested, but I hope to merge it soon.

@ZuseZ4 I'm interested to hear more about this tablegen approach. It seems to produce extremely terse expressions of the rules, which would in some ways be preferable to this approach here. But do you have do define both forward- and reverse-mode rules? And I'm a little confused by the existing rules. e.g. axpy should I believe have a forward-model rule

axpy!(∂a, X, ∂Y)
axpy!(a, ∂X, ∂Y)

and the reverse-mode rule

∂a = dot(X, ∂Y)
axpy!(conj(a), ∂Y, ∂X)
scal!(0, ∂Y)

But the tablegen rule looks like neither of these and even uses asum: https://github.com/EnzymeAD/Enzyme/blob/d679b813efb21bf038d240b8780eaaef3b08b3d8/enzyme/Enzyme/targets/BlasDerivatives.td#L80-L87

@ZuseZ4
Copy link
Member

ZuseZ4 commented Apr 22, 2023

Hi @sethaxen.
So the general approach is to declare reverse mode rules, fwd. should be able to be handled using the primal function itself. I agree that tablegen is "nicer" than the julia level for now, but this is just due to the fact that I wrote both sides. The Blas rules and the new tablegen.cpp code which parses this Blas rules. The julia rules are more generic and not fine-tuned for blas.

The incorrect reverse blas rules are mostly due to a design decision of my last approach. If say dot uses axpy for the reverse pass, tablegen checks the number and types of axpy arguments based on the axpy rules, which in turn required me to declare and implement all blas functions used for the reverse axpy pass. That recursively caused a typical "big bang" approach where I needed to handle multiple rules at once. Since I didn't had the tests back then I just ended up using placeholder (/incorrect) reverse rules to compile it and thus never merged it. I do have a better solution for it now, I'll push more of it on Monday

@sethaxen sethaxen marked this pull request as ready for review April 24, 2023 21:38
@sethaxen
Copy link
Collaborator Author

Since I didn't had the tests back then I just ended up using placeholder (/incorrect) reverse rules to compile it and thus never merged it. I do have a better solution for it now, I'll push more of it on Monday

Ah, that makes sense. It'll be interesting to see how you handle asum, whose reverse-mode rule isn't computable just with BLAS functions.

@sethaxen
Copy link
Collaborator Author

@wsmoses the only failing tests are due to #778. Otherwise this should be ready for review.

@sethaxen sethaxen changed the title Add rule for BLAS.dot Add rules for BLAS.dot, BLAS.dotc, and BLAS.dotu Apr 25, 2023
@sethaxen
Copy link
Collaborator Author

It seems the BLAS fallback warnings are raised even if the the BLAS fallbacks are not being hit?

@sethaxen sethaxen requested a review from wsmoses April 25, 2023 14:23
@wsmoses
Copy link
Member

wsmoses commented Apr 25, 2023

Yeah the BLAS fallback injection is done prior to any custom rules, so will always occur. However if a custom rule is hit it will use that implementation rather than the injected fallback.

@ZuseZ4
Copy link
Member

ZuseZ4 commented Apr 25, 2023

Ah, that makes sense. It'll be interesting to see how you handle asum, whose reverse-mode rule isn't computable just with BLAS functions.

That is indeed a bit annoying. We do have Blas-Tablegen and Instruction-Tablegen. The second one could handle this, but we will probably develop it further in the future, while Blas-Tablegen hopefully can stay unchanged in the future. So I might actually copy over a bit of the logic, such that we don't introduce a dependency there.
dot is btw. ready to be merged, so I'm now trying to get some more lv. 1 functions ready.

Y::ConstOrBatchDuplicated{<:Union{Ptr{T},AbstractArray{T}}},
incy::Const{<:Integer},
) where {T<:BLAS.$Ttype}
RT <: Const && return func.val(n.val, X.val, incx.val, Y.val, incy.val)
Copy link
Collaborator Author

@sethaxen sethaxen Apr 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason calling the 2-arg dot, which forwards to the 5-arg dot, now errors with Const return type:

using Enzyme, LinearAlgebra
x, y, ∂x, ∂y = ntuple(_ -> randn(5), 4);
autodiff(Forward, BLAS.dot, Duplicated, Duplicated(x, ∂x), Duplicated(y, ∂y))  # fine
autodiff(Forward, BLAS.dot, Duplicated, Const(x), Duplicated(y, ∂y))  # fine
autodiff(Forward, BLAS.dot, Const, Duplicated(x, ∂x), Duplicated(y, ∂y))  # errors, see below
┌ Warning: Using fallback BLAS replacements, performance may be degraded
└ @ Enzyme.Compiler ~/.julia/packages/GPUCompiler/BxfIW/src/utils.jl:56
mod:; ModuleID = 'start'
source_filename = "start"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128-ni:10:11:12:13"
target triple = "x86_64-linux-gnu"

@_j_str1 = private unnamed_addr constant [11 x i8] c"typeassert\00"

; Function Attrs: noinline nosync readonly
define dso_local fastcc double @julia_dot_2279(i64 signext %0, i64 zeroext %1, i64 signext %2, i64 zeroext %3, i64 signext %4) unnamed_addr #0 !dbg !11 {
top:
  %5 = call {}*** @julia.get_pgcstack()
  %6 = inttoptr i64 %1 to double*, !dbg !14
  %7 = inttoptr i64 %3 to double*, !dbg !14
  %8 = sub i64 1, %0, !dbg !14
  %9 = icmp sgt i64 %0, 0, !dbg !14
  br i1 %9, label %10, label %cblas_ddot64_.exit, !dbg !14

10:                                               ; preds = %top
  %11 = icmp sgt i64 %4, 0, !dbg !14
  %12 = mul i64 %8, %4, !dbg !14
  %13 = select i1 %11, i64 0, i64 %12, !dbg !14
  %14 = icmp sgt i64 %2, 0, !dbg !14
  %15 = mul i64 %8, %2, !dbg !14
  %16 = select i1 %14, i64 0, i64 %15, !dbg !14
  br label %17, !dbg !14

17:                                               ; preds = %17, %10
  %18 = phi i64 [ 0, %10 ], [ %33, %17 ], !dbg !14
  %19 = phi i64 [ %13, %10 ], [ %32, %17 ], !dbg !14
  %20 = phi i64 [ %16, %10 ], [ %31, %17 ], !dbg !14
  %21 = phi double [ 0.000000e+00, %10 ], [ %30, %17 ], !dbg !14
  %22 = shl i64 %20, 32, !dbg !14
  %23 = ashr exact i64 %22, 32, !dbg !14
  %24 = getelementptr inbounds double, double* %6, i64 %23, !dbg !14
  %25 = load double, double* %24, align 8, !dbg !14, !tbaa !15
  %26 = shl i64 %19, 32, !dbg !14
  %27 = ashr exact i64 %26, 32, !dbg !14
  %28 = getelementptr inbounds double, double* %7, i64 %27, !dbg !14
  %29 = load double, double* %28, align 8, !dbg !14, !tbaa !15
  %30 = call double @llvm.fmuladd.f64(double %25, double %29, double %21) #17, !dbg !14
  %31 = add nsw i64 %23, %2, !dbg !14
  %32 = add nsw i64 %27, %4, !dbg !14
  %33 = add nuw nsw i64 %18, 1, !dbg !14
  %34 = icmp eq i64 %33, %0, !dbg !14
  br i1 %34, label %cblas_ddot64_.exit, label %17, !dbg !14, !llvm.loop !19

cblas_ddot64_.exit:                               ; preds = %17, %top
  %35 = phi double [ 0.000000e+00, %top ], [ %30, %17 ], !dbg !14
  ret double %35, !dbg !14
}

; Function Attrs: nofree readnone
declare {}*** @julia.get_pgcstack() #1

; Function Attrs: inaccessiblememonly allocsize(1)
declare noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}**, i64, {} addrspace(10)*) #2

; Function Attrs: inaccessiblememonly nofree
declare token @llvm.julia.gc_preserve_begin(...) #3

; Function Attrs: nofree nounwind readnone
declare nonnull {}* @julia.pointer_from_objref({} addrspace(11)*) local_unnamed_addr #4

; Function Attrs: inaccessiblememonly nofree
declare void @llvm.julia.gc_preserve_end(token) #3

; Function Attrs: inaccessiblememonly nofree norecurse nounwind
declare void @julia.write_barrier({} addrspace(10)* readonly, ...) local_unnamed_addr #5

; Function Attrs: nofree
declare nonnull {} addrspace(10)* @ijl_invoke({} addrspace(10)*, {} addrspace(10)** nocapture readonly, i32, {} addrspace(10)*) #6

declare nonnull {} addrspace(10)* @julia.call2({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32, {} addrspace(10)*)*, {} addrspace(10)*, {} addrspace(10)*, ...) local_unnamed_addr #7

; Function Attrs: noreturn
declare void @ijl_throw({} addrspace(12)*) local_unnamed_addr #8

; Function Attrs: nofree norecurse nounwind readnone
declare nonnull {} addrspace(10)* @julia.typeof({} addrspace(10)*) local_unnamed_addr #9

; Function Attrs: noreturn
declare void @ijl_type_error(i8*, {} addrspace(10)*, {} addrspace(12)*) local_unnamed_addr #8

; Function Attrs: nofree nosync nounwind readnone speculatable willreturn
declare double @llvm.fmuladd.f64(double, double, double) #10

define double @julia_dot_2276_inner.3({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, {} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %1) local_unnamed_addr #11 !dbg !22 {
entry:
  %2 = call {}*** @julia.get_pgcstack()
  %3 = bitcast {} addrspace(10)* %0 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !23
  %4 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %3 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !23
  %5 = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %4, i64 0, i32 1, !dbg !23
  %6 = load i64, i64 addrspace(11)* %5, align 8, !dbg !23, !range !28, !alias.scope !29, !noalias !32
  %7 = bitcast {} addrspace(10)* %1 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !23
  %8 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %7 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !23
  %9 = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %8, i64 0, i32 1, !dbg !23
  %10 = load i64, i64 addrspace(11)* %9, align 8, !dbg !23, !range !28, !alias.scope !29, !noalias !32
  %.not.i = icmp eq i64 %6, %10, !dbg !37
  br i1 %.not.i, label %julia_dot_2276_inner.exit, label %L12.i, !dbg !41

L12.i:                                            ; preds = %entry
  %current_task15.i = getelementptr inbounds {}**, {}*** %2, i64 -13, !dbg !42
  %current_task1.i = bitcast {}*** %current_task15.i to {}**, !dbg !42
  %11 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 16, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195699359008 to {}*) to {} addrspace(10)*)) #18, !dbg !42
  %12 = bitcast {} addrspace(10)* %11 to {} addrspace(10)* addrspace(10)*, !dbg !42
  %13 = addrspacecast {} addrspace(10)* addrspace(10)* %12 to {} addrspace(10)* addrspace(11)*, !dbg !42
  store {} addrspace(10)* null, {} addrspace(10)* addrspace(11)* %13, align 8, !dbg !42, !tbaa !45, !alias.scope !51, !noalias !52
  %14 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %13, i64 1, !dbg !42
  store {} addrspace(10)* null, {} addrspace(10)* addrspace(11)* %14, align 8, !dbg !42, !tbaa !45, !alias.scope !51, !noalias !52
  %15 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 32, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195667121680 to {}*) to {} addrspace(10)*)) #18, !dbg !42
  %16 = bitcast {} addrspace(10)* %15 to { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)*, !dbg !42
  %.repack.i = bitcast {} addrspace(10)* %15 to {} addrspace(10)* addrspace(10)*, !dbg !42
  store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195729619104 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %.repack.i, align 8, !dbg !42, !tbaa !55, !alias.scope !51, !noalias !52
  %.repack7.i = getelementptr inbounds { {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %16, i64 0, i32 1, !dbg !42
  store i64 %6, i64 addrspace(10)* %.repack7.i, align 8, !dbg !42, !tbaa !55, !alias.scope !51, !noalias !52
  %.repack9.i = getelementptr inbounds { {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %16, i64 0, i32 2, !dbg !42
  store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195729619072 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %.repack9.i, align 8, !dbg !42, !tbaa !55, !alias.scope !51, !noalias !52
  %.repack11.i = getelementptr inbounds { {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %16, i64 0, i32 3, !dbg !42
  store i64 %10, i64 addrspace(10)* %.repack11.i, align 8, !dbg !42, !tbaa !55, !alias.scope !51, !noalias !52
  store atomic {} addrspace(10)* %15, {} addrspace(10)* addrspace(11)* %13 release, align 8, !dbg !42, !tbaa !45, !alias.scope !51, !noalias !52
  call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* noundef nonnull %11, {} addrspace(10)* nonnull %15) #17, !dbg !42
  %17 = bitcast {} addrspace(10)* %11 to i8 addrspace(10)*, !dbg !42
  %18 = addrspacecast i8 addrspace(10)* %17 to i8 addrspace(11)*, !dbg !42
  %19 = getelementptr inbounds i8, i8 addrspace(11)* %18, i64 8, !dbg !42
  %20 = bitcast i8 addrspace(11)* %19 to {} addrspace(10)* addrspace(11)*, !dbg !42
  store atomic {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(11)* %20 release, align 8, !dbg !42, !tbaa !45, !alias.scope !51, !noalias !52
  %21 = load atomic {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %20 acquire, align 8, !dbg !57, !tbaa !45, !alias.scope !51, !noalias !68, !nonnull !13
  %22 = addrspacecast {} addrspace(10)* %21 to {} addrspace(11)*, !dbg !69
  %.not13.i = icmp eq {} addrspace(11)* %22, addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(11)*), !dbg !69
  br i1 %.not13.i, label %L17.i, label %L32.i, !dbg !69

L17.i:                                            ; preds = %L12.i
  %23 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195691323952 to {}*) to {} addrspace(10)*)) #18, !dbg !70
  %24 = bitcast {} addrspace(10)* %23 to {} addrspace(10)* addrspace(10)*, !dbg !70
  store {} addrspace(10)* %11, {} addrspace(10)* addrspace(10)* %24, align 8, !dbg !70, !tbaa !55, !alias.scope !51, !noalias !52
  %25 = call nonnull {} addrspace(10)* ({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32, {} addrspace(10)*)*, {} addrspace(10)*, {} addrspace(10)*, ...) @julia.call2({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32, {} addrspace(10)*)* noundef nonnull @ijl_invoke, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195718475744 to {}*) to {} addrspace(10)*), {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195662589696 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888394272 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195672369408 to {}*) to {} addrspace(10)*), {} addrspace(10)* nonnull %23) #19, !dbg !70
  %26 = cmpxchg {} addrspace(10)* addrspace(11)* %20, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(10)*), {} addrspace(10)* %25 acq_rel acquire, align 8, !dbg !74, !tbaa !45, !alias.scope !51, !noalias !68
  %27 = extractvalue { {} addrspace(10)*, i1 } %26, 0, !dbg !74
  %28 = extractvalue { {} addrspace(10)*, i1 } %26, 1, !dbg !74
  br i1 %28, label %xchg_wb.i, label %L27.i, !dbg !74

L27.i:                                            ; preds = %L17.i
  %29 = call {} addrspace(10)* @julia.typeof({} addrspace(10)* %27) #20, !dbg !77
  %30 = icmp eq {} addrspace(10)* %29, addrspacecast ({}* inttoptr (i64 140195745050864 to {}*) to {} addrspace(10)*), !dbg !77
  br i1 %30, label %L32.i, label %fail.i, !dbg !77

L32.i:                                            ; preds = %xchg_wb.i, %L27.i, %L12.i
  %value_phi.i = phi {} addrspace(10)* [ %25, %xchg_wb.i ], [ %21, %L12.i ], [ %27, %L27.i ]
  %31 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195706238240 to {}*) to {} addrspace(10)*)) #18, !dbg !41
  %32 = bitcast {} addrspace(10)* %31 to {} addrspace(10)* addrspace(10)*, !dbg !41
  store {} addrspace(10)* %value_phi.i, {} addrspace(10)* addrspace(10)* %32, align 8, !dbg !41, !tbaa !55, !alias.scope !51, !noalias !52
  %33 = addrspacecast {} addrspace(10)* %31 to {} addrspace(12)*, !dbg !41
  call void @ijl_throw({} addrspace(12)* %33) #21, !dbg !41
  unreachable, !dbg !41

xchg_wb.i:                                        ; preds = %L17.i
  call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* noundef nonnull %11, {} addrspace(10)* nonnull %25) #17, !dbg !74
  br label %L32.i, !dbg !77

fail.i:                                           ; preds = %L27.i
  %34 = addrspacecast {} addrspace(10)* %27 to {} addrspace(12)*, !dbg !77
  call void @ijl_type_error(i8* noundef getelementptr inbounds ([11 x i8], [11 x i8]* @_j_str1, i64 0, i64 0), {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195745050864 to {}*) to {} addrspace(10)*), {} addrspace(12)* %34) #21, !dbg !77
  unreachable, !dbg !77

julia_dot_2276_inner.exit:                        ; preds = %entry
  %35 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* nonnull %0, {} addrspace(10)* nonnull %1), !dbg !78
  %36 = addrspacecast {} addrspace(10)* %0 to {} addrspace(11)*, !dbg !79
  %37 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef %36) #20, !dbg !79
  %38 = bitcast {}* %37 to i8**, !dbg !79
  %39 = load i8*, i8** %38, align 8, !dbg !79, !tbaa !89, !alias.scope !29, !noalias !32, !nonnull !13
  %40 = ptrtoint i8* %39 to i64, !dbg !79
  %41 = addrspacecast {} addrspace(10)* %1 to {} addrspace(11)*, !dbg !79
  %42 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef %41) #20, !dbg !79
  %43 = bitcast {}* %42 to i8**, !dbg !79
  %44 = load i8*, i8** %43, align 8, !dbg !79, !tbaa !89, !alias.scope !29, !noalias !32, !nonnull !13
  %45 = ptrtoint i8* %44 to i64, !dbg !79
  %46 = call fastcc double @julia_dot_2279(i64 signext %6, i64 zeroext %40, i64 noundef signext 1, i64 zeroext %45, i64 noundef signext 1) #16, !dbg !78
  call void @llvm.julia.gc_preserve_end(token %35), !dbg !78
  ret double %46, !dbg !92
}

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #12

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #12

; Function Attrs: readnone
declare void @llvm.enzymefakeuse(...) #13

; Function Attrs: mustprogress willreturn
define double @preprocess_julia_dot_2276_inner.3({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, {} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %1) local_unnamed_addr #14 !dbg !93 {
entry:
  %2 = call {}*** @julia.get_pgcstack() #22
  %3 = bitcast {} addrspace(10)* %0 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !94
  %4 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %3 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !94
  %5 = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %4, i64 0, i32 1, !dbg !94
  %6 = load i64, i64 addrspace(11)* %5, align 8, !dbg !94, !range !28, !alias.scope !29, !noalias !32
  %7 = bitcast {} addrspace(10)* %1 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !94
  %8 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %7 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !94
  %9 = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %8, i64 0, i32 1, !dbg !94
  %10 = load i64, i64 addrspace(11)* %9, align 8, !dbg !94, !range !28, !alias.scope !29, !noalias !32
  %.not.i = icmp eq i64 %6, %10, !dbg !97
  br i1 %.not.i, label %julia_dot_2276_inner.exit, label %L12.i, !dbg !99

L12.i:                                            ; preds = %entry
  %current_task15.i = getelementptr inbounds {}**, {}*** %2, i64 -13, !dbg !100
  %current_task1.i = bitcast {}*** %current_task15.i to {}**, !dbg !100
  %11 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 16, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195699359008 to {}*) to {} addrspace(10)*)) #23, !dbg !100
  %12 = bitcast {} addrspace(10)* %11 to {} addrspace(10)* addrspace(10)*, !dbg !100
  %13 = addrspacecast {} addrspace(10)* addrspace(10)* %12 to {} addrspace(10)* addrspace(11)*, !dbg !100
  store {} addrspace(10)* null, {} addrspace(10)* addrspace(11)* %13, align 8, !dbg !100, !tbaa !45, !alias.scope !51, !noalias !101
  %14 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %13, i64 1, !dbg !100
  store {} addrspace(10)* null, {} addrspace(10)* addrspace(11)* %14, align 8, !dbg !100, !tbaa !45, !alias.scope !51, !noalias !101
  %15 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 32, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195667121680 to {}*) to {} addrspace(10)*)) #23, !dbg !100
  %16 = bitcast {} addrspace(10)* %15 to { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)*, !dbg !100
  %.repack.i = bitcast {} addrspace(10)* %15 to {} addrspace(10)* addrspace(10)*, !dbg !100
  store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195729619104 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %.repack.i, align 8, !dbg !100, !tbaa !55, !alias.scope !51, !noalias !101
  %.repack7.i = getelementptr inbounds { {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %16, i64 0, i32 1, !dbg !100
  store i64 %6, i64 addrspace(10)* %.repack7.i, align 8, !dbg !100, !tbaa !55, !alias.scope !51, !noalias !101
  %.repack9.i = getelementptr inbounds { {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %16, i64 0, i32 2, !dbg !100
  store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195729619072 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %.repack9.i, align 8, !dbg !100, !tbaa !55, !alias.scope !51, !noalias !101
  %.repack11.i = getelementptr inbounds { {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %16, i64 0, i32 3, !dbg !100
  store i64 %10, i64 addrspace(10)* %.repack11.i, align 8, !dbg !100, !tbaa !55, !alias.scope !51, !noalias !101
  store atomic {} addrspace(10)* %15, {} addrspace(10)* addrspace(11)* %13 release, align 8, !dbg !100, !tbaa !45, !alias.scope !51, !noalias !101
  call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* noundef nonnull %11, {} addrspace(10)* nonnull %15) #24, !dbg !100
  %17 = bitcast {} addrspace(10)* %11 to i8 addrspace(10)*, !dbg !100
  %18 = addrspacecast i8 addrspace(10)* %17 to i8 addrspace(11)*, !dbg !100
  %19 = getelementptr inbounds i8, i8 addrspace(11)* %18, i64 8, !dbg !100
  %20 = bitcast i8 addrspace(11)* %19 to {} addrspace(10)* addrspace(11)*, !dbg !100
  store atomic {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(11)* %20 release, align 8, !dbg !100, !tbaa !45, !alias.scope !51, !noalias !101
  %21 = load atomic {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %20 acquire, align 8, !dbg !104, !tbaa !45, !alias.scope !51, !noalias !68, !nonnull !13
  %22 = addrspacecast {} addrspace(10)* %21 to {} addrspace(11)*, !dbg !108
  %.not13.i = icmp eq {} addrspace(11)* %22, addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(11)*), !dbg !108
  br i1 %.not13.i, label %L17.i, label %L32.i, !dbg !108

L17.i:                                            ; preds = %L12.i
  %23 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195691323952 to {}*) to {} addrspace(10)*)) #23, !dbg !109
  %24 = bitcast {} addrspace(10)* %23 to {} addrspace(10)* addrspace(10)*, !dbg !109
  store {} addrspace(10)* %11, {} addrspace(10)* addrspace(10)* %24, align 8, !dbg !109, !tbaa !55, !alias.scope !51, !noalias !101
  %25 = call nonnull {} addrspace(10)* ({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32, {} addrspace(10)*)*, {} addrspace(10)*, {} addrspace(10)*, ...) @julia.call2({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32, {} addrspace(10)*)* noundef nonnull @ijl_invoke, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195718475744 to {}*) to {} addrspace(10)*), {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195662589696 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888394272 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195672369408 to {}*) to {} addrspace(10)*), {} addrspace(10)* nonnull %23) #25, !dbg !109
  %26 = cmpxchg {} addrspace(10)* addrspace(11)* %20, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(10)*), {} addrspace(10)* %25 acq_rel acquire, align 8, !dbg !111, !tbaa !45, !alias.scope !51, !noalias !68
  %27 = extractvalue { {} addrspace(10)*, i1 } %26, 0, !dbg !111
  %28 = extractvalue { {} addrspace(10)*, i1 } %26, 1, !dbg !111
  br i1 %28, label %xchg_wb.i, label %L27.i, !dbg !111

L27.i:                                            ; preds = %L17.i
  %29 = call {} addrspace(10)* @julia.typeof({} addrspace(10)* %27) #26, !dbg !113
  %30 = icmp eq {} addrspace(10)* %29, addrspacecast ({}* inttoptr (i64 140195745050864 to {}*) to {} addrspace(10)*), !dbg !113
  br i1 %30, label %L32.i, label %fail.i, !dbg !113

L32.i:                                            ; preds = %xchg_wb.i, %L27.i, %L12.i
  %value_phi.i = phi {} addrspace(10)* [ %25, %xchg_wb.i ], [ %21, %L12.i ], [ %27, %L27.i ]
  %31 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195706238240 to {}*) to {} addrspace(10)*)) #23, !dbg !99
  %32 = bitcast {} addrspace(10)* %31 to {} addrspace(10)* addrspace(10)*, !dbg !99
  store {} addrspace(10)* %value_phi.i, {} addrspace(10)* addrspace(10)* %32, align 8, !dbg !99, !tbaa !55, !alias.scope !51, !noalias !101
  %33 = addrspacecast {} addrspace(10)* %31 to {} addrspace(12)*, !dbg !99
  call void @ijl_throw({} addrspace(12)* %33) #27, !dbg !99
  unreachable, !dbg !99

xchg_wb.i:                                        ; preds = %L17.i
  call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* noundef nonnull %11, {} addrspace(10)* nonnull %25) #24, !dbg !111
  br label %L32.i, !dbg !113

fail.i:                                           ; preds = %L27.i
  %34 = addrspacecast {} addrspace(10)* %27 to {} addrspace(12)*, !dbg !113
  call void @ijl_type_error(i8* noundef getelementptr inbounds ([11 x i8], [11 x i8]* @_j_str1, i64 0, i64 0), {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195745050864 to {}*) to {} addrspace(10)*), {} addrspace(12)* %34) #27, !dbg !113
  unreachable, !dbg !113

julia_dot_2276_inner.exit:                        ; preds = %entry
  %35 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* nonnull %0, {} addrspace(10)* nonnull %1) #22, !dbg !114
  %36 = addrspacecast {} addrspace(10)* %0 to {} addrspace(11)*, !dbg !115
  %37 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef %36) #26, !dbg !115
  %38 = bitcast {}* %37 to i8**, !dbg !115
  %39 = load i8*, i8** %38, align 8, !dbg !115, !tbaa !89, !alias.scope !29, !noalias !32, !nonnull !13
  %40 = ptrtoint i8* %39 to i64, !dbg !115
  %41 = addrspacecast {} addrspace(10)* %1 to {} addrspace(11)*, !dbg !115
  %42 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef %41) #26, !dbg !115
  %43 = bitcast {}* %42 to i8**, !dbg !115
  %44 = load i8*, i8** %43, align 8, !dbg !115, !tbaa !89, !alias.scope !29, !noalias !32, !nonnull !13
  %45 = ptrtoint i8* %44 to i64, !dbg !115
  %46 = call fastcc double @julia_dot_2279(i64 signext %6, i64 zeroext %40, i64 noundef signext 1, i64 zeroext %45, i64 noundef signext 1) #28, !dbg !114
  call void @llvm.julia.gc_preserve_end(token %35) #22, !dbg !114
  ret double %46, !dbg !120
}

; Function Attrs: mustprogress willreturn
define internal void @fwddiffejulia_dot_2276_inner.3({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, {} addrspace(10)* %"'", {} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %1, {} addrspace(10)* %"'1") local_unnamed_addr #14 !dbg !121 {
entry:
  %2 = call {}*** @julia.get_pgcstack()
  %3 = call {}*** @julia.get_pgcstack()
  %4 = call {}*** @julia.get_pgcstack()
  %5 = call {}*** @julia.get_pgcstack()
  %6 = call {}*** @julia.get_pgcstack()
  %7 = call {}*** @julia.get_pgcstack() #22
  %8 = bitcast {} addrspace(10)* %0 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !122
  %9 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %8 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !122
  %10 = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %9, i64 0, i32 1, !dbg !122
  %11 = load i64, i64 addrspace(11)* %10, align 8, !dbg !122, !range !28, !alias.scope !125, !noalias !128
  %12 = bitcast {} addrspace(10)* %1 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !122
  %13 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %12 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !122
  %14 = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %13, i64 0, i32 1, !dbg !122
  %15 = load i64, i64 addrspace(11)* %14, align 8, !dbg !122, !range !28, !alias.scope !130, !noalias !133
  %.not.i = icmp eq i64 %11, %15, !dbg !135
  br i1 %.not.i, label %julia_dot_2276_inner.exit, label %L12.i, !dbg !137

L12.i:                                            ; preds = %entry
  %current_task15.i = getelementptr inbounds {}**, {}*** %7, i64 -13, !dbg !138
  %current_task1.i = bitcast {}*** %current_task15.i to {}**, !dbg !138
  %16 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 16, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195699359008 to {}*) to {} addrspace(10)*)) #23, !dbg !138
  %17 = bitcast {} addrspace(10)* %16 to {} addrspace(10)* addrspace(10)*, !dbg !138
  %18 = addrspacecast {} addrspace(10)* addrspace(10)* %17 to {} addrspace(10)* addrspace(11)*, !dbg !138
  store {} addrspace(10)* null, {} addrspace(10)* addrspace(11)* %18, align 8, !dbg !138, !tbaa !45, !alias.scope !51, !noalias !139
  %19 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %18, i64 1, !dbg !138
  store {} addrspace(10)* null, {} addrspace(10)* addrspace(11)* %19, align 8, !dbg !138, !tbaa !45, !alias.scope !51, !noalias !139
  %20 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 32, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195667121680 to {}*) to {} addrspace(10)*)) #23, !dbg !138
  %21 = bitcast {} addrspace(10)* %20 to { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)*, !dbg !138
  %.repack.i = bitcast {} addrspace(10)* %20 to {} addrspace(10)* addrspace(10)*, !dbg !138
  store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195729619104 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %.repack.i, align 8, !dbg !138, !tbaa !55, !alias.scope !51, !noalias !139
  %.repack7.i = getelementptr inbounds { {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %21, i64 0, i32 1, !dbg !138
  store i64 %11, i64 addrspace(10)* %.repack7.i, align 8, !dbg !138, !tbaa !55, !alias.scope !51, !noalias !139
  %.repack9.i = getelementptr inbounds { {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %21, i64 0, i32 2, !dbg !138
  store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195729619072 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %.repack9.i, align 8, !dbg !138, !tbaa !55, !alias.scope !51, !noalias !139
  %.repack11.i = getelementptr inbounds { {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %21, i64 0, i32 3, !dbg !138
  store i64 %15, i64 addrspace(10)* %.repack11.i, align 8, !dbg !138, !tbaa !55, !alias.scope !51, !noalias !139
  store atomic {} addrspace(10)* %20, {} addrspace(10)* addrspace(11)* %18 release, align 8, !dbg !138, !tbaa !45, !alias.scope !51, !noalias !139
  call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* noundef nonnull %16, {} addrspace(10)* nonnull %20) #24, !dbg !138
  %22 = bitcast {} addrspace(10)* %16 to i8 addrspace(10)*, !dbg !138
  %23 = addrspacecast i8 addrspace(10)* %22 to i8 addrspace(11)*, !dbg !138
  %24 = getelementptr inbounds i8, i8 addrspace(11)* %23, i64 8, !dbg !138
  %25 = bitcast i8 addrspace(11)* %24 to {} addrspace(10)* addrspace(11)*, !dbg !138
  store atomic {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(11)* %25 release, align 8, !dbg !138, !tbaa !45, !alias.scope !51, !noalias !139
  %26 = load atomic {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %25 acquire, align 8, !dbg !142, !tbaa !45, !alias.scope !51, !noalias !68, !nonnull !13
  %27 = addrspacecast {} addrspace(10)* %26 to {} addrspace(11)*, !dbg !146
  %.not13.i = icmp eq {} addrspace(11)* %27, addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(11)*), !dbg !146
  br i1 %.not13.i, label %L17.i, label %L32.i, !dbg !146

L17.i:                                            ; preds = %L12.i
  %28 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195691323952 to {}*) to {} addrspace(10)*)) #23, !dbg !147
  %29 = bitcast {} addrspace(10)* %28 to {} addrspace(10)* addrspace(10)*, !dbg !147
  store {} addrspace(10)* %16, {} addrspace(10)* addrspace(10)* %29, align 8, !dbg !147, !tbaa !55, !alias.scope !51, !noalias !139
  %30 = call nonnull {} addrspace(10)* ({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32, {} addrspace(10)*)*, {} addrspace(10)*, {} addrspace(10)*, ...) @julia.call2({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32, {} addrspace(10)*)* noundef nonnull @ijl_invoke, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195718475744 to {}*) to {} addrspace(10)*), {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195662589696 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888394272 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195672369408 to {}*) to {} addrspace(10)*), {} addrspace(10)* nonnull %28) #25, !dbg !147
  %31 = cmpxchg {} addrspace(10)* addrspace(11)* %25, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(10)*), {} addrspace(10)* %30 acq_rel acquire, align 8, !dbg !149, !tbaa !45, !alias.scope !51, !noalias !68
  %32 = extractvalue { {} addrspace(10)*, i1 } %31, 0, !dbg !149
  %33 = extractvalue { {} addrspace(10)*, i1 } %31, 1, !dbg !149
  br i1 %33, label %xchg_wb.i, label %L27.i, !dbg !149

L27.i:                                            ; preds = %L17.i
  %34 = call {} addrspace(10)* @julia.typeof({} addrspace(10)* %32) #26, !dbg !151
  %35 = icmp eq {} addrspace(10)* %34, addrspacecast ({}* inttoptr (i64 140195745050864 to {}*) to {} addrspace(10)*), !dbg !151
  br i1 %35, label %L32.i, label %fail.i, !dbg !151

L32.i:                                            ; preds = %xchg_wb.i, %L27.i, %L12.i
  %value_phi.i = phi {} addrspace(10)* [ %30, %xchg_wb.i ], [ %26, %L12.i ], [ %32, %L27.i ]
  %36 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195706238240 to {}*) to {} addrspace(10)*)) #23, !dbg !137
  %37 = bitcast {} addrspace(10)* %36 to {} addrspace(10)* addrspace(10)*, !dbg !137
  store {} addrspace(10)* %value_phi.i, {} addrspace(10)* addrspace(10)* %37, align 8, !dbg !137, !tbaa !55, !alias.scope !51, !noalias !139
  %38 = addrspacecast {} addrspace(10)* %36 to {} addrspace(12)*, !dbg !137
  call void @ijl_throw({} addrspace(12)* %38) #27, !dbg !137
  unreachable, !dbg !137

xchg_wb.i:                                        ; preds = %L17.i
  call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* noundef nonnull %16, {} addrspace(10)* nonnull %30) #24, !dbg !149
  br label %L32.i, !dbg !151

fail.i:                                           ; preds = %L27.i
  %39 = addrspacecast {} addrspace(10)* %32 to {} addrspace(12)*, !dbg !151
  call void @ijl_type_error(i8* noundef getelementptr inbounds ([11 x i8], [11 x i8]* @_j_str1, i64 0, i64 0), {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195745050864 to {}*) to {} addrspace(10)*), {} addrspace(12)* %39) #27, !dbg !151
  unreachable, !dbg !151

julia_dot_2276_inner.exit:                        ; preds = %entry
  %40 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* %0, {} addrspace(10)* %"'", {} addrspace(10)* %1, {} addrspace(10)* %"'1"), !dbg !152
  %"'ipc" = addrspacecast {} addrspace(10)* %"'" to {} addrspace(11)*, !dbg !153
  %41 = addrspacecast {} addrspace(10)* %0 to {} addrspace(11)*, !dbg !153
  %42 = call {}* @julia.pointer_from_objref({} addrspace(11)* %"'ipc"), !dbg !153
  %_replacementA = phi {}* , !dbg !153
  %"'ipc25" = bitcast {}* %42 to i8**, !dbg !153
  %_replacementA17 = phi i8** , !dbg !153
  %"'ipl" = load i8*, i8** %"'ipc25", align 8, !dbg !153, !tbaa !89, !alias.scope !158, !noalias !159, !nonnull !13
  %"'ipc26" = ptrtoint i8* %"'ipl" to i64, !dbg !153
  %_replacementA19 = phi i64 , !dbg !153
  %"'ipc20" = addrspacecast {} addrspace(10)* %"'1" to {} addrspace(11)*, !dbg !153
  %43 = addrspacecast {} addrspace(10)* %1 to {} addrspace(11)*, !dbg !153
  %44 = call {}* @julia.pointer_from_objref({} addrspace(11)* %"'ipc20"), !dbg !153
  %_replacementA21 = phi {}* , !dbg !153
  %"'ipc27" = bitcast {}* %44 to i8**, !dbg !153
  %_replacementA22 = phi i8** , !dbg !153
  %"'ipl28" = load i8*, i8** %"'ipc27", align 8, !dbg !153, !tbaa !89, !alias.scope !160, !noalias !161, !nonnull !13
  %_replacementA23 = phi i8* , !dbg !153
  %"'ipc29" = ptrtoint i8* %"'ipl28" to i64, !dbg !153
  %_replacementA24 = phi i64 , !dbg !153
  %45 = bitcast {}*** %6 to {}**, !dbg !152
  %46 = getelementptr inbounds {}*, {}** %45, i64 -13, !dbg !152
  %47 = getelementptr inbounds {}*, {}** %46, i64 15, !dbg !152
  %48 = bitcast {}** %47 to i8**, !dbg !152
  %49 = load i8*, i8** %48, align 8, !dbg !152
  %50 = call {} addrspace(10)* @julia.gc_alloc_obj({}** %46, i64 8, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195837729488 to {}*) to {} addrspace(10)*)), !dbg !152
  %51 = bitcast {} addrspace(10)* %50 to [1 x i64] addrspace(10)*, !dbg !152
  %52 = addrspacecast [1 x i64] addrspace(10)* %51 to [1 x i64] addrspace(11)*, !dbg !152
  %53 = getelementptr [1 x i64], [1 x i64] addrspace(11)* %52, i64 0, i32 0, !dbg !152
  store i64 %11, i64 addrspace(11)* %53, align 8, !dbg !152
  %54 = bitcast {}*** %5 to {}**, !dbg !152
  %55 = getelementptr inbounds {}*, {}** %54, i64 -13, !dbg !152
  %56 = getelementptr inbounds {}*, {}** %55, i64 15, !dbg !152
  %57 = bitcast {}** %56 to i8**, !dbg !152
  %58 = load i8*, i8** %57, align 8, !dbg !152
  %59 = call {} addrspace(10)* @julia.gc_alloc_obj({}** %55, i64 16, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195885311824 to {}*) to {} addrspace(10)*)), !dbg !152
  %60 = bitcast {} addrspace(10)* %59 to [2 x i64] addrspace(10)*, !dbg !152
  %61 = addrspacecast [2 x i64] addrspace(10)* %60 to [2 x i64] addrspace(11)*, !dbg !152
  %62 = getelementptr [2 x i64], [2 x i64] addrspace(11)* %61, i64 0, i32 0, !dbg !152
  store i64 %_replacementA19, i64 addrspace(11)* %62, align 8, !dbg !152
  %63 = getelementptr [2 x i64], [2 x i64] addrspace(11)* %61, i64 0, i32 1, !dbg !152
  store i64 %"'ipc26", i64 addrspace(11)* %63, align 8, !dbg !152
  %64 = bitcast {}*** %4 to {}**, !dbg !152
  %65 = getelementptr inbounds {}*, {}** %64, i64 -13, !dbg !152
  %66 = getelementptr inbounds {}*, {}** %65, i64 15, !dbg !152
  %67 = bitcast {}** %66 to i8**, !dbg !152
  %68 = load i8*, i8** %67, align 8, !dbg !152
  %69 = call {} addrspace(10)* @julia.gc_alloc_obj({}** %65, i64 8, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195837729488 to {}*) to {} addrspace(10)*)), !dbg !152
  %70 = bitcast {} addrspace(10)* %69 to [1 x i64] addrspace(10)*, !dbg !152
  %71 = addrspacecast [1 x i64] addrspace(10)* %70 to [1 x i64] addrspace(11)*, !dbg !152
  %72 = getelementptr [1 x i64], [1 x i64] addrspace(11)* %71, i64 0, i32 0, !dbg !152
  store i64 1, i64 addrspace(11)* %72, align 8, !dbg !152
  %73 = bitcast {}*** %3 to {}**, !dbg !152
  %74 = getelementptr inbounds {}*, {}** %73, i64 -13, !dbg !152
  %75 = getelementptr inbounds {}*, {}** %74, i64 15, !dbg !152
  %76 = bitcast {}** %75 to i8**, !dbg !152
  %77 = load i8*, i8** %76, align 8, !dbg !152
  %78 = call {} addrspace(10)* @julia.gc_alloc_obj({}** %74, i64 16, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195885311824 to {}*) to {} addrspace(10)*)), !dbg !152
  %79 = bitcast {} addrspace(10)* %78 to [2 x i64] addrspace(10)*, !dbg !152
  %80 = addrspacecast [2 x i64] addrspace(10)* %79 to [2 x i64] addrspace(11)*, !dbg !152
  %81 = getelementptr [2 x i64], [2 x i64] addrspace(11)* %80, i64 0, i32 0, !dbg !152
  store i64 %_replacementA24, i64 addrspace(11)* %81, align 8, !dbg !152
  %82 = getelementptr [2 x i64], [2 x i64] addrspace(11)* %80, i64 0, i32 1, !dbg !152
  store i64 %"'ipc29", i64 addrspace(11)* %82, align 8, !dbg !152
  %83 = bitcast {}*** %2 to {}**, !dbg !152
  %84 = getelementptr inbounds {}*, {}** %83, i64 -13, !dbg !152
  %85 = getelementptr inbounds {}*, {}** %84, i64 15, !dbg !152
  %86 = bitcast {}** %85 to i8**, !dbg !152
  %87 = load i8*, i8** %86, align 8, !dbg !152
  %88 = call {} addrspace(10)* @julia.gc_alloc_obj({}** %84, i64 8, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195837729488 to {}*) to {} addrspace(10)*)), !dbg !152
  %89 = bitcast {} addrspace(10)* %88 to [1 x i64] addrspace(10)*, !dbg !152
  %90 = addrspacecast [1 x i64] addrspace(10)* %89 to [1 x i64] addrspace(11)*, !dbg !152
  %91 = getelementptr [1 x i64], [1 x i64] addrspace(11)* %90, i64 0, i32 0, !dbg !152
  store i64 1, i64 addrspace(11)* %91, align 8, !dbg !152
  %92 = call fast double @julia_forward_2281([1 x i64] addrspace(11)* %52, [2 x i64] addrspace(11)* %61, [1 x i64] addrspace(11)* %71, [2 x i64] addrspace(11)* %80, [1 x i64] addrspace(11)* %90), !dbg !152
  call void @llvm.julia.gc_preserve_end(token %40) #22, !dbg !152
  ret void

allocsForInversion:                               ; No predecessors!
}

; Function Attrs: alwaysinline
define double @julia_forward_2281([1 x i64] addrspace(11)* nocapture nofree noundef nonnull readonly align 8 dereferenceable(8) %0, [2 x i64] addrspace(11)* nocapture nofree noundef nonnull readonly align 8 dereferenceable(16) %1, [1 x i64] addrspace(11)* nocapture nofree noundef nonnull readonly align 8 dereferenceable(8) %2, [2 x i64] addrspace(11)* nocapture nofree noundef nonnull readonly align 8 dereferenceable(16) %3, [1 x i64] addrspace(11)* nocapture nofree noundef nonnull readonly align 8 dereferenceable(8) %4) #15 !dbg !162 {
top:
  %5 = call {}*** @julia.get_pgcstack()
  %6 = getelementptr inbounds [1 x i64], [1 x i64] addrspace(11)* %0, i64 0, i64 0, !dbg !163
  %7 = getelementptr inbounds [2 x i64], [2 x i64] addrspace(11)* %1, i64 0, i64 0, !dbg !163
  %8 = getelementptr inbounds [1 x i64], [1 x i64] addrspace(11)* %2, i64 0, i64 0, !dbg !163
  %9 = getelementptr inbounds [2 x i64], [2 x i64] addrspace(11)* %3, i64 0, i64 0, !dbg !163
  %10 = getelementptr inbounds [1 x i64], [1 x i64] addrspace(11)* %4, i64 0, i64 0, !dbg !163
  %11 = load i64, i64 addrspace(11)* %6, align 8, !dbg !165, !tbaa !166, !alias.scope !168, !noalias !169
  %12 = load i64, i64 addrspace(11)* %7, align 8, !dbg !165, !tbaa !166, !alias.scope !168, !noalias !169
  %13 = load i64, i64 addrspace(11)* %8, align 8, !dbg !165, !tbaa !166, !alias.scope !168, !noalias !169
  %14 = load i64, i64 addrspace(11)* %9, align 8, !dbg !165, !tbaa !166, !alias.scope !168, !noalias !169
  %15 = load i64, i64 addrspace(11)* %10, align 8, !dbg !165, !tbaa !166, !alias.scope !168, !noalias !169
  %16 = call double @julia_dot_2284(i64 signext %11, i64 zeroext %12, i64 signext %13, i64 zeroext %14, i64 signext %15) #16, !dbg !165
  ret double %16, !dbg !165
}

define internal double @julia_dot_2284(i64 signext %0, i64 zeroext %1, i64 signext %2, i64 zeroext %3, i64 signext %4) #16 !dbg !170 {
top:
  %5 = call {}*** @julia.get_pgcstack()
  %6 = call double inttoptr (i64 140194943493221 to double (i64, i64, i64, i64, i64)*)(i64 %0, i64 %1, i64 %2, i64 %3, i64 %4), !dbg !171
  ret double %6, !dbg !171
}

attributes #0 = { noinline nosync readonly "enzyme_math"="enzyme_custom" "enzyme_preserve_primal"="*" "enzymejl_job"="140194381763984" "enzymejl_mi"="140193926607856" "enzymejl_world"="33467" "frame-pointer"="all" "probe-stack"="inline-asm" }
attributes #1 = { nofree readnone "enzyme_inactive" "enzyme_shouldrecompute" "enzymejl_world"="33467" }
attributes #2 = { inaccessiblememonly allocsize(1) "enzymejl_world"="33467" }
attributes #3 = { inaccessiblememonly nofree "enzyme_inactive" "enzymejl_world"="33467" }
attributes #4 = { nofree nounwind readnone "enzymejl_world"="33467" }
attributes #5 = { inaccessiblememonly nofree norecurse nounwind "enzyme_inactive" "enzymejl_world"="33467" }
attributes #6 = { nofree "enzymejl_world"="33467" }
attributes #7 = { "enzymejl_world"="33467" }
attributes #8 = { noreturn "enzymejl_world"="33467" }
attributes #9 = { nofree norecurse nounwind readnone "enzyme_inactive" "enzyme_shouldrecompute" "enzymejl_world"="33467" }
attributes #10 = { nofree nosync nounwind readnone speculatable willreturn "enzymejl_world"="33467" }
attributes #11 = { "enzymejl_world"="33467" "probe-stack"="inline-asm" }
attributes #12 = { argmemonly nofree nosync nounwind willreturn "enzymejl_world"="33467" }
attributes #13 = { readnone "enzymejl_world"="33467" }
attributes #14 = { mustprogress willreturn "enzymejl_world"="33467" "probe-stack"="inline-asm" }
attributes #15 = { alwaysinline "frame-pointer"="all" "probe-stack"="inline-asm" }
attributes #16 = { "frame-pointer"="all" "probe-stack"="inline-asm" }
attributes #17 = { nounwind }
attributes #18 = { allocsize(1) }
attributes #19 = { nofree }
attributes #20 = { nounwind readnone }
attributes #21 = { noreturn }
attributes #22 = { mustprogress willreturn }
attributes #23 = { mustprogress willreturn allocsize(1) }
attributes #24 = { mustprogress nounwind willreturn }
attributes #25 = { mustprogress nofree willreturn }
attributes #26 = { mustprogress nounwind readnone willreturn }
attributes #27 = { mustprogress noreturn willreturn }
attributes #28 = { mustprogress willreturn "frame-pointer"="all" "probe-stack"="inline-asm" }

!llvm.module.flags = !{!0, !1, !2, !3}
!llvm.dbg.cu = !{!4, !6, !7, !9}
!llvm.ident = !{!10}

!0 = !{i32 2, !"Dwarf Version", i32 4}
!1 = !{i32 2, !"Debug Info Version", i32 3}
!2 = !{i32 1, !"wchar_size", i32 4}
!3 = !{i32 7, !"uwtable", i32 1}
!4 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !5, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly, nameTableKind: None)
!5 = !DIFile(filename: "/cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/usr/share/julia/stdlib/v1.9/LinearAlgebra/src/blas.jl", directory: ".")
!6 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !5, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly, nameTableKind: None)
!7 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !8, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly, nameTableKind: None)
!8 = !DIFile(filename: "/home/sethaxen/projects/Enzyme.jl/src/rules/LinearAlgebra/blas.jl", directory: ".")
!9 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !5, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly, nameTableKind: None)
!10 = !{!"clang version 14.0.3 (/depot/downloads/clones/llvm-project.git-5a9787eb535c2edc5dea030cc221c1d60f38c9f42344f410e425ea2139e233aa 465c166c5422079185c3289cdc2613420d8d6c51)"}
!11 = distinct !DISubprogram(name: "dot", linkageName: "julia_dot_2279", scope: null, file: !5, line: 344, type: !12, scopeLine: 344, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !13)
!12 = !DISubroutineType(types: !13)
!13 = !{}
!14 = !DILocation(line: 345, scope: !11)
!15 = !{!16, !16, i64 0}
!16 = !{!"double", !17, i64 0}
!17 = !{!"omnipotent char", !18, i64 0}
!18 = !{!"Simple C/C++ TBAA"}
!19 = distinct !{!19, !20, !21}
!20 = !{!"llvm.loop.mustprogress"}
!21 = !{!"llvm.loop.unroll.disable"}
!22 = distinct !DISubprogram(name: "dot", linkageName: "julia_dot_2276", scope: null, file: !5, line: 392, type: !12, scopeLine: 392, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !13)
!23 = !DILocation(line: 10, scope: !24, inlinedAt: !26)
!24 = distinct !DISubprogram(name: "length;", linkageName: "length", scope: !25, file: !25, type: !12, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !13)
!25 = !DIFile(filename: "essentials.jl", directory: ".")
!26 = distinct !DILocation(line: 393, scope: !22, inlinedAt: !27)
!27 = distinct !DILocation(line: 0, scope: !22)
!28 = !{i64 0, i64 9223372036854775807}
!29 = !{!30}
!30 = !{!"jnoalias_typemd", !31}
!31 = !{!"jnoalias"}
!32 = !{!33, !34, !35, !36}
!33 = !{!"jnoalias_gcframe", !31}
!34 = !{!"jnoalias_stack", !31}
!35 = !{!"jnoalias_data", !31}
!36 = !{!"jnoalias_const", !31}
!37 = !DILocation(line: 499, scope: !38, inlinedAt: !40)
!38 = distinct !DISubprogram(name: "==;", linkageName: "==", scope: !39, file: !39, type: !12, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !13)
!39 = !DIFile(filename: "promotion.jl", directory: ".")
!40 = distinct !DILocation(line: 394, scope: !22, inlinedAt: !27)
!41 = !DILocation(line: 394, scope: !22, inlinedAt: !27)
!42 = !DILocation(line: 41, scope: !43, inlinedAt: !40)
!43 = distinct !DISubprogram(name: "LazyString;", linkageName: "LazyString", scope: !44, file: !44, type: !12, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !13)
!44 = !DIFile(filename: "strings/lazy.jl", directory: ".")
!45 = !{!46, !46, i64 0}
!46 = !{!"jtbaa_mutab", !47, i64 0}
!47 = !{!"jtbaa_value", !48, i64 0}
!48 = !{!"jtbaa_data", !49, i64 0}
!49 = !{!"jtbaa", !50, i64 0}
!50 = !{!"jtbaa"}
!51 = !{!35}
!52 = !{!53, !33, !34, !30, !36}
!53 = distinct !{!53, !54, !"na_addr13"}
!54 = distinct !{!54, !"addr13"}
!55 = !{!56, !56, i64 0}
!56 = !{!"jtbaa_immut", !47, i64 0}
!57 = !DILocation(line: 53, scope: !58, inlinedAt: !60)
!58 = distinct !DISubprogram(name: "getproperty;", linkageName: "getproperty", scope: !59, file: !59, type: !12, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !13)
!59 = !DIFile(filename: "Base.jl", directory: ".")
!60 = distinct !DILocation(line: 81, scope: !61, inlinedAt: !62)
!61 = distinct !DISubprogram(name: "String;", linkageName: "String", scope: !44, file: !44, type: !12, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !13)
!62 = distinct !DILocation(line: 232, scope: !63, inlinedAt: !65)
!63 = distinct !DISubprogram(name: "convert;", linkageName: "convert", scope: !64, file: !64, type: !12, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !13)
!64 = !DIFile(filename: "strings/basic.jl", directory: ".")
!65 = distinct !DILocation(line: 12, scope: !66, inlinedAt: !40)
!66 = distinct !DISubprogram(name: "DimensionMismatch;", linkageName: "DimensionMismatch", scope: !67, file: !67, type: !12, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !13)
!67 = !DIFile(filename: "array.jl", directory: ".")
!68 = !{!33, !34, !30, !36}
!69 = !DILocation(line: 82, scope: !61, inlinedAt: !62)
!70 = !DILocation(line: 107, scope: !71, inlinedAt: !73)
!71 = distinct !DISubprogram(name: "sprint;", linkageName: "sprint", scope: !72, file: !72, type: !12, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !13)
!72 = !DIFile(filename: "strings/io.jl", directory: ".")
!73 = distinct !DILocation(line: 83, scope: !61, inlinedAt: !62)
!74 = !DILocation(line: 61, scope: !75, inlinedAt: !76)
!75 = distinct !DISubprogram(name: "replaceproperty!;", linkageName: "replaceproperty!", scope: !59, file: !59, type: !12, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !13)
!76 = distinct !DILocation(line: 88, scope: !61, inlinedAt: !62)
!77 = !DILocation(line: 89, scope: !61, inlinedAt: !62)
!78 = !DILocation(line: 395, scope: !22, inlinedAt: !27)
!79 = !DILocation(line: 65, scope: !80, inlinedAt: !82)
!80 = distinct !DISubprogram(name: "unsafe_convert;", linkageName: "unsafe_convert", scope: !81, file: !81, type: !12, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !13)
!81 = !DIFile(filename: "pointer.jl", directory: ".")
!82 = distinct !DILocation(line: 1240, scope: !83, inlinedAt: !85)
!83 = distinct !DISubprogram(name: "pointer;", linkageName: "pointer", scope: !84, file: !84, type: !12, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !13)
!84 = !DIFile(filename: "abstractarray.jl", directory: ".")
!85 = distinct !DILocation(line: 177, scope: !86, inlinedAt: !87)
!86 = distinct !DISubprogram(name: "vec_pointer_stride;", linkageName: "vec_pointer_stride", scope: !5, file: !5, type: !12, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !13)
!87 = distinct !DILocation(line: 177, scope: !86, inlinedAt: !88)
!88 = distinct !DILocation(line: 395, scope: !22, inlinedAt: !27)
!89 = !{!90, !90, i64 0}
!90 = !{!"jtbaa_arrayptr", !91, i64 0}
!91 = !{!"jtbaa_array", !49, i64 0}
!92 = !DILocation(line: 0, scope: !22)
!93 = distinct !DISubprogram(name: "dot", linkageName: "julia_dot_2276", scope: null, file: !5, line: 392, type: !12, scopeLine: 392, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !13)
!94 = !DILocation(line: 10, scope: !24, inlinedAt: !95)
!95 = distinct !DILocation(line: 393, scope: !93, inlinedAt: !96)
!96 = distinct !DILocation(line: 0, scope: !93)
!97 = !DILocation(line: 499, scope: !38, inlinedAt: !98)
!98 = distinct !DILocation(line: 394, scope: !93, inlinedAt: !96)
!99 = !DILocation(line: 394, scope: !93, inlinedAt: !96)
!100 = !DILocation(line: 41, scope: !43, inlinedAt: !98)
!101 = !{!102, !33, !34, !30, !36}
!102 = distinct !{!102, !103, !"na_addr13"}
!103 = distinct !{!103, !"addr13"}
!104 = !DILocation(line: 53, scope: !58, inlinedAt: !105)
!105 = distinct !DILocation(line: 81, scope: !61, inlinedAt: !106)
!106 = distinct !DILocation(line: 232, scope: !63, inlinedAt: !107)
!107 = distinct !DILocation(line: 12, scope: !66, inlinedAt: !98)
!108 = !DILocation(line: 82, scope: !61, inlinedAt: !106)
!109 = !DILocation(line: 107, scope: !71, inlinedAt: !110)
!110 = distinct !DILocation(line: 83, scope: !61, inlinedAt: !106)
!111 = !DILocation(line: 61, scope: !75, inlinedAt: !112)
!112 = distinct !DILocation(line: 88, scope: !61, inlinedAt: !106)
!113 = !DILocation(line: 89, scope: !61, inlinedAt: !106)
!114 = !DILocation(line: 395, scope: !93, inlinedAt: !96)
!115 = !DILocation(line: 65, scope: !80, inlinedAt: !116)
!116 = distinct !DILocation(line: 1240, scope: !83, inlinedAt: !117)
!117 = distinct !DILocation(line: 177, scope: !86, inlinedAt: !118)
!118 = distinct !DILocation(line: 177, scope: !86, inlinedAt: !119)
!119 = distinct !DILocation(line: 395, scope: !93, inlinedAt: !96)
!120 = !DILocation(line: 0, scope: !93)
!121 = distinct !DISubprogram(name: "dot", linkageName: "julia_dot_2276", scope: null, file: !5, line: 392, type: !12, scopeLine: 392, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !13)
!122 = !DILocation(line: 10, scope: !24, inlinedAt: !123)
!123 = distinct !DILocation(line: 393, scope: !121, inlinedAt: !124)
!124 = distinct !DILocation(line: 0, scope: !121)
!125 = !{!126, !30}
!126 = distinct !{!126, !127, !"primal"}
!127 = distinct !{!127, !" diff: %"}
!128 = !{!129, !33, !34, !35, !36}
!129 = distinct !{!129, !127, !"shadow_0"}
!130 = !{!131, !30}
!131 = distinct !{!131, !132, !"primal"}
!132 = distinct !{!132, !" diff: %"}
!133 = !{!134, !33, !34, !35, !36}
!134 = distinct !{!134, !132, !"shadow_0"}
!135 = !DILocation(line: 499, scope: !38, inlinedAt: !136)
!136 = distinct !DILocation(line: 394, scope: !121, inlinedAt: !124)
!137 = !DILocation(line: 394, scope: !121, inlinedAt: !124)
!138 = !DILocation(line: 41, scope: !43, inlinedAt: !136)
!139 = !{!140, !33, !34, !30, !36}
!140 = distinct !{!140, !141, !"na_addr13"}
!141 = distinct !{!141, !"addr13"}
!142 = !DILocation(line: 53, scope: !58, inlinedAt: !143)
!143 = distinct !DILocation(line: 81, scope: !61, inlinedAt: !144)
!144 = distinct !DILocation(line: 232, scope: !63, inlinedAt: !145)
!145 = distinct !DILocation(line: 12, scope: !66, inlinedAt: !136)
!146 = !DILocation(line: 82, scope: !61, inlinedAt: !144)
!147 = !DILocation(line: 107, scope: !71, inlinedAt: !148)
!148 = distinct !DILocation(line: 83, scope: !61, inlinedAt: !144)
!149 = !DILocation(line: 61, scope: !75, inlinedAt: !150)
!150 = distinct !DILocation(line: 88, scope: !61, inlinedAt: !144)
!151 = !DILocation(line: 89, scope: !61, inlinedAt: !144)
!152 = !DILocation(line: 395, scope: !121, inlinedAt: !124)
!153 = !DILocation(line: 65, scope: !80, inlinedAt: !154)
!154 = distinct !DILocation(line: 1240, scope: !83, inlinedAt: !155)
!155 = distinct !DILocation(line: 177, scope: !86, inlinedAt: !156)
!156 = distinct !DILocation(line: 177, scope: !86, inlinedAt: !157)
!157 = distinct !DILocation(line: 395, scope: !121, inlinedAt: !124)
!158 = !{!129, !30}
!159 = !{!126, !33, !34, !35, !36}
!160 = !{!134, !30}
!161 = !{!131, !33, !34, !35, !36}
!162 = distinct !DISubprogram(name: "forward", linkageName: "julia_forward_2281", scope: null, file: !8, line: 62, type: !12, scopeLine: 62, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !7, retainedNodes: !13)
!163 = !DILocation(line: 37, scope: !164, inlinedAt: !165)
!164 = distinct !DISubprogram(name: "getproperty;", linkageName: "getproperty", scope: !59, file: !59, type: !12, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !7, retainedNodes: !13)
!165 = !DILocation(line: 75, scope: !162)
!166 = !{!167, !167, i64 0, i64 0}
!167 = !{!"jtbaa_const", !49, i64 0}
!168 = !{!36}
!169 = !{!33, !34, !35, !30}
!170 = distinct !DISubprogram(name: "dot", linkageName: "julia_dot_2284", scope: null, file: !5, line: 344, type: !12, scopeLine: 344, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !9, retainedNodes: !13)
!171 = !DILocation(line: 345, scope: !170)

oldFunc:; Function Attrs: mustprogress willreturn
define double @preprocess_julia_dot_2276_inner.3({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, {} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %1) local_unnamed_addr #14 !dbg !93 {
entry:
  %2 = call {}*** @julia.get_pgcstack() #17
  %3 = bitcast {} addrspace(10)* %0 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !94
  %4 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %3 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !94
  %5 = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %4, i64 0, i32 1, !dbg !94
  %6 = load i64, i64 addrspace(11)* %5, align 8, !dbg !94, !range !28, !alias.scope !29, !noalias !32
  %7 = bitcast {} addrspace(10)* %1 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !94
  %8 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %7 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !94
  %9 = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %8, i64 0, i32 1, !dbg !94
  %10 = load i64, i64 addrspace(11)* %9, align 8, !dbg !94, !range !28, !alias.scope !29, !noalias !32
  %.not.i = icmp eq i64 %6, %10, !dbg !97
  br i1 %.not.i, label %julia_dot_2276_inner.exit, label %L12.i, !dbg !99

L12.i:                                            ; preds = %entry
  %current_task15.i = getelementptr inbounds {}**, {}*** %2, i64 -13, !dbg !100
  %current_task1.i = bitcast {}*** %current_task15.i to {}**, !dbg !100
  %11 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 16, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195699359008 to {}*) to {} addrspace(10)*)) #18, !dbg !100
  %12 = bitcast {} addrspace(10)* %11 to {} addrspace(10)* addrspace(10)*, !dbg !100
  %13 = addrspacecast {} addrspace(10)* addrspace(10)* %12 to {} addrspace(10)* addrspace(11)*, !dbg !100
  store {} addrspace(10)* null, {} addrspace(10)* addrspace(11)* %13, align 8, !dbg !100, !tbaa !45, !alias.scope !51, !noalias !101
  %14 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %13, i64 1, !dbg !100
  store {} addrspace(10)* null, {} addrspace(10)* addrspace(11)* %14, align 8, !dbg !100, !tbaa !45, !alias.scope !51, !noalias !101
  %15 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 32, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195667121680 to {}*) to {} addrspace(10)*)) #18, !dbg !100
  %16 = bitcast {} addrspace(10)* %15 to { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)*, !dbg !100
  %.repack.i = bitcast {} addrspace(10)* %15 to {} addrspace(10)* addrspace(10)*, !dbg !100
  store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195729619104 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %.repack.i, align 8, !dbg !100, !tbaa !55, !alias.scope !51, !noalias !101
  %.repack7.i = getelementptr inbounds { {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %16, i64 0, i32 1, !dbg !100
  store i64 %6, i64 addrspace(10)* %.repack7.i, align 8, !dbg !100, !tbaa !55, !alias.scope !51, !noalias !101
  %.repack9.i = getelementptr inbounds { {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %16, i64 0, i32 2, !dbg !100
  store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195729619072 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %.repack9.i, align 8, !dbg !100, !tbaa !55, !alias.scope !51, !noalias !101
  %.repack11.i = getelementptr inbounds { {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %16, i64 0, i32 3, !dbg !100
  store i64 %10, i64 addrspace(10)* %.repack11.i, align 8, !dbg !100, !tbaa !55, !alias.scope !51, !noalias !101
  store atomic {} addrspace(10)* %15, {} addrspace(10)* addrspace(11)* %13 release, align 8, !dbg !100, !tbaa !45, !alias.scope !51, !noalias !101
  call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* noundef nonnull %11, {} addrspace(10)* nonnull %15) #19, !dbg !100
  %17 = bitcast {} addrspace(10)* %11 to i8 addrspace(10)*, !dbg !100
  %18 = addrspacecast i8 addrspace(10)* %17 to i8 addrspace(11)*, !dbg !100
  %19 = getelementptr inbounds i8, i8 addrspace(11)* %18, i64 8, !dbg !100
  %20 = bitcast i8 addrspace(11)* %19 to {} addrspace(10)* addrspace(11)*, !dbg !100
  store atomic {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(11)* %20 release, align 8, !dbg !100, !tbaa !45, !alias.scope !51, !noalias !101
  %21 = load atomic {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %20 acquire, align 8, !dbg !104, !tbaa !45, !alias.scope !51, !noalias !68, !nonnull !13
  %22 = addrspacecast {} addrspace(10)* %21 to {} addrspace(11)*, !dbg !108
  %.not13.i = icmp eq {} addrspace(11)* %22, addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(11)*), !dbg !108
  br i1 %.not13.i, label %L17.i, label %L32.i, !dbg !108

L17.i:                                            ; preds = %L12.i
  %23 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195691323952 to {}*) to {} addrspace(10)*)) #18, !dbg !109
  %24 = bitcast {} addrspace(10)* %23 to {} addrspace(10)* addrspace(10)*, !dbg !109
  store {} addrspace(10)* %11, {} addrspace(10)* addrspace(10)* %24, align 8, !dbg !109, !tbaa !55, !alias.scope !51, !noalias !101
  %25 = call nonnull {} addrspace(10)* ({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32, {} addrspace(10)*)*, {} addrspace(10)*, {} addrspace(10)*, ...) @julia.call2({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32, {} addrspace(10)*)* noundef nonnull @ijl_invoke, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195718475744 to {}*) to {} addrspace(10)*), {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195662589696 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888394272 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195672369408 to {}*) to {} addrspace(10)*), {} addrspace(10)* nonnull %23) #20, !dbg !109
  %26 = cmpxchg {} addrspace(10)* addrspace(11)* %20, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(10)*), {} addrspace(10)* %25 acq_rel acquire, align 8, !dbg !111, !tbaa !45, !alias.scope !51, !noalias !68
  %27 = extractvalue { {} addrspace(10)*, i1 } %26, 0, !dbg !111
  %28 = extractvalue { {} addrspace(10)*, i1 } %26, 1, !dbg !111
  br i1 %28, label %xchg_wb.i, label %L27.i, !dbg !111

L27.i:                                            ; preds = %L17.i
  %29 = call {} addrspace(10)* @julia.typeof({} addrspace(10)* %27) #21, !dbg !113
  %30 = icmp eq {} addrspace(10)* %29, addrspacecast ({}* inttoptr (i64 140195745050864 to {}*) to {} addrspace(10)*), !dbg !113
  br i1 %30, label %L32.i, label %fail.i, !dbg !113

L32.i:                                            ; preds = %xchg_wb.i, %L27.i, %L12.i
  %value_phi.i = phi {} addrspace(10)* [ %25, %xchg_wb.i ], [ %21, %L12.i ], [ %27, %L27.i ]
  %31 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195706238240 to {}*) to {} addrspace(10)*)) #18, !dbg !99
  %32 = bitcast {} addrspace(10)* %31 to {} addrspace(10)* addrspace(10)*, !dbg !99
  store {} addrspace(10)* %value_phi.i, {} addrspace(10)* addrspace(10)* %32, align 8, !dbg !99, !tbaa !55, !alias.scope !51, !noalias !101
  %33 = addrspacecast {} addrspace(10)* %31 to {} addrspace(12)*, !dbg !99
  call void @ijl_throw({} addrspace(12)* %33) #22, !dbg !99
  unreachable, !dbg !99

xchg_wb.i:                                        ; preds = %L17.i
  call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* noundef nonnull %11, {} addrspace(10)* nonnull %25) #19, !dbg !111
  br label %L32.i, !dbg !113

fail.i:                                           ; preds = %L27.i
  %34 = addrspacecast {} addrspace(10)* %27 to {} addrspace(12)*, !dbg !113
  call void @ijl_type_error(i8* noundef getelementptr inbounds ([11 x i8], [11 x i8]* @_j_str1, i64 0, i64 0), {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195745050864 to {}*) to {} addrspace(10)*), {} addrspace(12)* %34) #22, !dbg !113
  unreachable, !dbg !113

julia_dot_2276_inner.exit:                        ; preds = %entry
  %35 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* nonnull %0, {} addrspace(10)* nonnull %1) #17, !dbg !114
  %36 = addrspacecast {} addrspace(10)* %0 to {} addrspace(11)*, !dbg !115
  %37 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef %36) #21, !dbg !115
  %38 = bitcast {}* %37 to i8**, !dbg !115
  %39 = load i8*, i8** %38, align 8, !dbg !115, !tbaa !89, !alias.scope !29, !noalias !32, !nonnull !13
  %40 = ptrtoint i8* %39 to i64, !dbg !115
  %41 = addrspacecast {} addrspace(10)* %1 to {} addrspace(11)*, !dbg !115
  %42 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef %41) #21, !dbg !115
  %43 = bitcast {}* %42 to i8**, !dbg !115
  %44 = load i8*, i8** %43, align 8, !dbg !115, !tbaa !89, !alias.scope !29, !noalias !32, !nonnull !13
  %45 = ptrtoint i8* %44 to i64, !dbg !115
  %46 = call fastcc double @julia_dot_2279(i64 signext %6, i64 zeroext %40, i64 noundef signext 1, i64 zeroext %45, i64 noundef signext 1) #23, !dbg !114
  call void @llvm.julia.gc_preserve_end(token %35) #17, !dbg !114
  ret double %46, !dbg !120
}

newFunc:; Function Attrs: mustprogress willreturn
define internal void @fwddiffejulia_dot_2276_inner.3({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, {} addrspace(10)* %"'", {} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %1, {} addrspace(10)* %"'1") local_unnamed_addr #14 !dbg !121 {
entry:
  %2 = call {}*** @julia.get_pgcstack()
  %3 = call {}*** @julia.get_pgcstack()
  %4 = call {}*** @julia.get_pgcstack()
  %5 = call {}*** @julia.get_pgcstack()
  %6 = call {}*** @julia.get_pgcstack()
  %7 = call {}*** @julia.get_pgcstack() #17
  %8 = bitcast {} addrspace(10)* %0 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !122
  %9 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %8 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !122
  %10 = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %9, i64 0, i32 1, !dbg !122
  %11 = load i64, i64 addrspace(11)* %10, align 8, !dbg !122, !range !28, !alias.scope !125, !noalias !128
  %12 = bitcast {} addrspace(10)* %1 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !122
  %13 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %12 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !122
  %14 = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %13, i64 0, i32 1, !dbg !122
  %15 = load i64, i64 addrspace(11)* %14, align 8, !dbg !122, !range !28, !alias.scope !130, !noalias !133
  %.not.i = icmp eq i64 %11, %15, !dbg !135
  br i1 %.not.i, label %julia_dot_2276_inner.exit, label %L12.i, !dbg !137

L12.i:                                            ; preds = %entry
  %current_task15.i = getelementptr inbounds {}**, {}*** %7, i64 -13, !dbg !138
  %current_task1.i = bitcast {}*** %current_task15.i to {}**, !dbg !138
  %16 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 16, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195699359008 to {}*) to {} addrspace(10)*)) #18, !dbg !138
  %17 = bitcast {} addrspace(10)* %16 to {} addrspace(10)* addrspace(10)*, !dbg !138
  %18 = addrspacecast {} addrspace(10)* addrspace(10)* %17 to {} addrspace(10)* addrspace(11)*, !dbg !138
  store {} addrspace(10)* null, {} addrspace(10)* addrspace(11)* %18, align 8, !dbg !138, !tbaa !45, !alias.scope !51, !noalias !139
  %19 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %18, i64 1, !dbg !138
  store {} addrspace(10)* null, {} addrspace(10)* addrspace(11)* %19, align 8, !dbg !138, !tbaa !45, !alias.scope !51, !noalias !139
  %20 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 32, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195667121680 to {}*) to {} addrspace(10)*)) #18, !dbg !138
  %21 = bitcast {} addrspace(10)* %20 to { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)*, !dbg !138
  %.repack.i = bitcast {} addrspace(10)* %20 to {} addrspace(10)* addrspace(10)*, !dbg !138
  store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195729619104 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %.repack.i, align 8, !dbg !138, !tbaa !55, !alias.scope !51, !noalias !139
  %.repack7.i = getelementptr inbounds { {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %21, i64 0, i32 1, !dbg !138
  store i64 %11, i64 addrspace(10)* %.repack7.i, align 8, !dbg !138, !tbaa !55, !alias.scope !51, !noalias !139
  %.repack9.i = getelementptr inbounds { {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %21, i64 0, i32 2, !dbg !138
  store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195729619072 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %.repack9.i, align 8, !dbg !138, !tbaa !55, !alias.scope !51, !noalias !139
  %.repack11.i = getelementptr inbounds { {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %21, i64 0, i32 3, !dbg !138
  store i64 %15, i64 addrspace(10)* %.repack11.i, align 8, !dbg !138, !tbaa !55, !alias.scope !51, !noalias !139
  store atomic {} addrspace(10)* %20, {} addrspace(10)* addrspace(11)* %18 release, align 8, !dbg !138, !tbaa !45, !alias.scope !51, !noalias !139
  call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* noundef nonnull %16, {} addrspace(10)* nonnull %20) #19, !dbg !138
  %22 = bitcast {} addrspace(10)* %16 to i8 addrspace(10)*, !dbg !138
  %23 = addrspacecast i8 addrspace(10)* %22 to i8 addrspace(11)*, !dbg !138
  %24 = getelementptr inbounds i8, i8 addrspace(11)* %23, i64 8, !dbg !138
  %25 = bitcast i8 addrspace(11)* %24 to {} addrspace(10)* addrspace(11)*, !dbg !138
  store atomic {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(11)* %25 release, align 8, !dbg !138, !tbaa !45, !alias.scope !51, !noalias !139
  %26 = load atomic {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %25 acquire, align 8, !dbg !142, !tbaa !45, !alias.scope !51, !noalias !68, !nonnull !13
  %27 = addrspacecast {} addrspace(10)* %26 to {} addrspace(11)*, !dbg !146
  %.not13.i = icmp eq {} addrspace(11)* %27, addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(11)*), !dbg !146
  br i1 %.not13.i, label %L17.i, label %L32.i, !dbg !146

L17.i:                                            ; preds = %L12.i
  %28 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195691323952 to {}*) to {} addrspace(10)*)) #18, !dbg !147
  %29 = bitcast {} addrspace(10)* %28 to {} addrspace(10)* addrspace(10)*, !dbg !147
  store {} addrspace(10)* %16, {} addrspace(10)* addrspace(10)* %29, align 8, !dbg !147, !tbaa !55, !alias.scope !51, !noalias !139
  %30 = call nonnull {} addrspace(10)* ({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32, {} addrspace(10)*)*, {} addrspace(10)*, {} addrspace(10)*, ...) @julia.call2({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32, {} addrspace(10)*)* noundef nonnull @ijl_invoke, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195718475744 to {}*) to {} addrspace(10)*), {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195662589696 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888394272 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195672369408 to {}*) to {} addrspace(10)*), {} addrspace(10)* nonnull %28) #20, !dbg !147
  %31 = cmpxchg {} addrspace(10)* addrspace(11)* %25, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(10)*), {} addrspace(10)* %30 acq_rel acquire, align 8, !dbg !149, !tbaa !45, !alias.scope !51, !noalias !68
  %32 = extractvalue { {} addrspace(10)*, i1 } %31, 0, !dbg !149
  %33 = extractvalue { {} addrspace(10)*, i1 } %31, 1, !dbg !149
  br i1 %33, label %xchg_wb.i, label %L27.i, !dbg !149

L27.i:                                            ; preds = %L17.i
  %34 = call {} addrspace(10)* @julia.typeof({} addrspace(10)* %32) #21, !dbg !151
  %35 = icmp eq {} addrspace(10)* %34, addrspacecast ({}* inttoptr (i64 140195745050864 to {}*) to {} addrspace(10)*), !dbg !151
  br i1 %35, label %L32.i, label %fail.i, !dbg !151

L32.i:                                            ; preds = %xchg_wb.i, %L27.i, %L12.i
  %value_phi.i = phi {} addrspace(10)* [ %30, %xchg_wb.i ], [ %26, %L12.i ], [ %32, %L27.i ]
  %36 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195706238240 to {}*) to {} addrspace(10)*)) #18, !dbg !137
  %37 = bitcast {} addrspace(10)* %36 to {} addrspace(10)* addrspace(10)*, !dbg !137
  store {} addrspace(10)* %value_phi.i, {} addrspace(10)* addrspace(10)* %37, align 8, !dbg !137, !tbaa !55, !alias.scope !51, !noalias !139
  %38 = addrspacecast {} addrspace(10)* %36 to {} addrspace(12)*, !dbg !137
  call void @ijl_throw({} addrspace(12)* %38) #22, !dbg !137
  unreachable, !dbg !137

xchg_wb.i:                                        ; preds = %L17.i
  call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* noundef nonnull %16, {} addrspace(10)* nonnull %30) #19, !dbg !149
  br label %L32.i, !dbg !151

fail.i:                                           ; preds = %L27.i
  %39 = addrspacecast {} addrspace(10)* %32 to {} addrspace(12)*, !dbg !151
  call void @ijl_type_error(i8* noundef getelementptr inbounds ([11 x i8], [11 x i8]* @_j_str1, i64 0, i64 0), {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195745050864 to {}*) to {} addrspace(10)*), {} addrspace(12)* %39) #22, !dbg !151
  unreachable, !dbg !151

julia_dot_2276_inner.exit:                        ; preds = %entry
  %40 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* %0, {} addrspace(10)* %"'", {} addrspace(10)* %1, {} addrspace(10)* %"'1"), !dbg !152
  %"'ipc" = addrspacecast {} addrspace(10)* %"'" to {} addrspace(11)*, !dbg !153
  %41 = addrspacecast {} addrspace(10)* %0 to {} addrspace(11)*, !dbg !153
  %42 = call {}* @julia.pointer_from_objref({} addrspace(11)* %"'ipc"), !dbg !153
  %_replacementA = phi {}* , !dbg !153
  %"'ipc25" = bitcast {}* %42 to i8**, !dbg !153
  %_replacementA17 = phi i8** , !dbg !153
  %"'ipl" = load i8*, i8** %"'ipc25", align 8, !dbg !153, !tbaa !89, !alias.scope !158, !noalias !159, !nonnull !13
  %"'ipc26" = ptrtoint i8* %"'ipl" to i64, !dbg !153
  %_replacementA19 = phi i64 , !dbg !153
  %"'ipc20" = addrspacecast {} addrspace(10)* %"'1" to {} addrspace(11)*, !dbg !153
  %43 = addrspacecast {} addrspace(10)* %1 to {} addrspace(11)*, !dbg !153
  %44 = call {}* @julia.pointer_from_objref({} addrspace(11)* %"'ipc20"), !dbg !153
  %_replacementA21 = phi {}* , !dbg !153
  %"'ipc27" = bitcast {}* %44 to i8**, !dbg !153
  %_replacementA22 = phi i8** , !dbg !153
  %"'ipl28" = load i8*, i8** %"'ipc27", align 8, !dbg !153, !tbaa !89, !alias.scope !160, !noalias !161, !nonnull !13
  %_replacementA23 = phi i8* , !dbg !153
  %"'ipc29" = ptrtoint i8* %"'ipl28" to i64, !dbg !153
  %_replacementA24 = phi i64 , !dbg !153
  %45 = bitcast {}*** %6 to {}**, !dbg !152
  %46 = getelementptr inbounds {}*, {}** %45, i64 -13, !dbg !152
  %47 = getelementptr inbounds {}*, {}** %46, i64 15, !dbg !152
  %48 = bitcast {}** %47 to i8**, !dbg !152
  %49 = load i8*, i8** %48, align 8, !dbg !152
  %50 = call {} addrspace(10)* @julia.gc_alloc_obj({}** %46, i64 8, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195837729488 to {}*) to {} addrspace(10)*)), !dbg !152
  %51 = bitcast {} addrspace(10)* %50 to [1 x i64] addrspace(10)*, !dbg !152
  %52 = addrspacecast [1 x i64] addrspace(10)* %51 to [1 x i64] addrspace(11)*, !dbg !152
  %53 = getelementptr [1 x i64], [1 x i64] addrspace(11)* %52, i64 0, i32 0, !dbg !152
  store i64 %11, i64 addrspace(11)* %53, align 8, !dbg !152
  %54 = bitcast {}*** %5 to {}**, !dbg !152
  %55 = getelementptr inbounds {}*, {}** %54, i64 -13, !dbg !152
  %56 = getelementptr inbounds {}*, {}** %55, i64 15, !dbg !152
  %57 = bitcast {}** %56 to i8**, !dbg !152
  %58 = load i8*, i8** %57, align 8, !dbg !152
  %59 = call {} addrspace(10)* @julia.gc_alloc_obj({}** %55, i64 16, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195885311824 to {}*) to {} addrspace(10)*)), !dbg !152
  %60 = bitcast {} addrspace(10)* %59 to [2 x i64] addrspace(10)*, !dbg !152
  %61 = addrspacecast [2 x i64] addrspace(10)* %60 to [2 x i64] addrspace(11)*, !dbg !152
  %62 = getelementptr [2 x i64], [2 x i64] addrspace(11)* %61, i64 0, i32 0, !dbg !152
  store i64 %_replacementA19, i64 addrspace(11)* %62, align 8, !dbg !152
  %63 = getelementptr [2 x i64], [2 x i64] addrspace(11)* %61, i64 0, i32 1, !dbg !152
  store i64 %"'ipc26", i64 addrspace(11)* %63, align 8, !dbg !152
  %64 = bitcast {}*** %4 to {}**, !dbg !152
  %65 = getelementptr inbounds {}*, {}** %64, i64 -13, !dbg !152
  %66 = getelementptr inbounds {}*, {}** %65, i64 15, !dbg !152
  %67 = bitcast {}** %66 to i8**, !dbg !152
  %68 = load i8*, i8** %67, align 8, !dbg !152
  %69 = call {} addrspace(10)* @julia.gc_alloc_obj({}** %65, i64 8, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195837729488 to {}*) to {} addrspace(10)*)), !dbg !152
  %70 = bitcast {} addrspace(10)* %69 to [1 x i64] addrspace(10)*, !dbg !152
  %71 = addrspacecast [1 x i64] addrspace(10)* %70 to [1 x i64] addrspace(11)*, !dbg !152
  %72 = getelementptr [1 x i64], [1 x i64] addrspace(11)* %71, i64 0, i32 0, !dbg !152
  store i64 1, i64 addrspace(11)* %72, align 8, !dbg !152
  %73 = bitcast {}*** %3 to {}**, !dbg !152
  %74 = getelementptr inbounds {}*, {}** %73, i64 -13, !dbg !152
  %75 = getelementptr inbounds {}*, {}** %74, i64 15, !dbg !152
  %76 = bitcast {}** %75 to i8**, !dbg !152
  %77 = load i8*, i8** %76, align 8, !dbg !152
  %78 = call {} addrspace(10)* @julia.gc_alloc_obj({}** %74, i64 16, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195885311824 to {}*) to {} addrspace(10)*)), !dbg !152
  %79 = bitcast {} addrspace(10)* %78 to [2 x i64] addrspace(10)*, !dbg !152
  %80 = addrspacecast [2 x i64] addrspace(10)* %79 to [2 x i64] addrspace(11)*, !dbg !152
  %81 = getelementptr [2 x i64], [2 x i64] addrspace(11)* %80, i64 0, i32 0, !dbg !152
  store i64 %_replacementA24, i64 addrspace(11)* %81, align 8, !dbg !152
  %82 = getelementptr [2 x i64], [2 x i64] addrspace(11)* %80, i64 0, i32 1, !dbg !152
  store i64 %"'ipc29", i64 addrspace(11)* %82, align 8, !dbg !152
  %83 = bitcast {}*** %2 to {}**, !dbg !152
  %84 = getelementptr inbounds {}*, {}** %83, i64 -13, !dbg !152
  %85 = getelementptr inbounds {}*, {}** %84, i64 15, !dbg !152
  %86 = bitcast {}** %85 to i8**, !dbg !152
  %87 = load i8*, i8** %86, align 8, !dbg !152
  %88 = call {} addrspace(10)* @julia.gc_alloc_obj({}** %84, i64 8, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195837729488 to {}*) to {} addrspace(10)*)), !dbg !152
  %89 = bitcast {} addrspace(10)* %88 to [1 x i64] addrspace(10)*, !dbg !152
  %90 = addrspacecast [1 x i64] addrspace(10)* %89 to [1 x i64] addrspace(11)*, !dbg !152
  %91 = getelementptr [1 x i64], [1 x i64] addrspace(11)* %90, i64 0, i32 0, !dbg !152
  store i64 1, i64 addrspace(11)* %91, align 8, !dbg !152
  %92 = call fast double @julia_forward_2281([1 x i64] addrspace(11)* %52, [2 x i64] addrspace(11)* %61, [1 x i64] addrspace(11)* %71, [2 x i64] addrspace(11)* %80, [1 x i64] addrspace(11)* %90), !dbg !152
  call void @llvm.julia.gc_preserve_end(token %40) #17, !dbg !152
  ret void

allocsForInversion:                               ; No predecessors!
}

 pp:   %_replacementA19 = phi i64 , !dbg !78 of   %40 = ptrtoint i8* %39 to i64, !dbg !70
julia: /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:7903: void GradientUtils::eraseFictiousPHIs(): Assertion `pp->getNumUses() == 0' failed.

[107420] signal (6.-6): Aborted
in expression starting at REPL[13]:1
pthread_kill at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
raise at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7f81ede2871a)
__assert_fail at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
eraseFictiousPHIs at /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:7903
CreateForwardDiff at /workspace/srcdir/Enzyme/enzyme/Enzyme/EnzymeLogic.cpp:4648
EnzymeCreateForwardDiff at /workspace/srcdir/Enzyme/enzyme/Enzyme/CApi.cpp:502
EnzymeCreateForwardDiff at /home/sethaxen/projects/Enzyme.jl/src/api.jl:138
enzyme! at /home/sethaxen/projects/Enzyme.jl/src/compiler.jl:6956
unknown function (ip: 0x7f81c93d23b9)
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
#codegen#162 at /home/sethaxen/projects/Enzyme.jl/src/compiler.jl:8194
codegen at /home/sethaxen/projects/Enzyme.jl/src/compiler.jl:7820
unknown function (ip: 0x7f81c93aa5fd)
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
_thunk at /home/sethaxen/projects/Enzyme.jl/src/compiler.jl:8707
_thunk at /home/sethaxen/projects/Enzyme.jl/src/compiler.jl:8704 [inlined]
cached_compilation at /home/sethaxen/projects/Enzyme.jl/src/compiler.jl:8742 [inlined]
#s287#191 at /home/sethaxen/projects/Enzyme.jl/src/compiler.jl:8800 [inlined]
#s287#191 at ./none:0
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
GeneratedFunctionStub at ./boot.jl:602
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/julia.h:1879 [inlined]
jl_call_staged at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/method.c:530
ijl_code_for_staged at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/method.c:581
get_staged at ./compiler/utilities.jl:115
retrieve_code_info at ./compiler/utilities.jl:127 [inlined]
InferenceState at ./compiler/inferencestate.jl:354
typeinf_edge at ./compiler/typeinfer.jl:922
abstract_call_method at ./compiler/abstractinterpretation.jl:611
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:152
abstract_call_known at ./compiler/abstractinterpretation.jl:1949
jfptr_abstract_call_known_12792.clone_1 at /home/sethaxen/.julia/juliaup/julia-1.9.0-rc2+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
tojlinvoke21381.clone_1 at /home/sethaxen/.julia/juliaup/julia-1.9.0-rc2+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
j_abstract_call_known_12333.clone_1 at /home/sethaxen/.julia/juliaup/julia-1.9.0-rc2+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
abstract_call at ./compiler/abstractinterpretation.jl:2020
abstract_call at ./compiler/abstractinterpretation.jl:1999
abstract_eval_statement_expr at ./compiler/abstractinterpretation.jl:2183
abstract_eval_statement at ./compiler/abstractinterpretation.jl:2396
abstract_eval_basic_statement at ./compiler/abstractinterpretation.jl:2684
typeinf_local at ./compiler/abstractinterpretation.jl:2869
typeinf_nocycle at ./compiler/abstractinterpretation.jl:2957
_typeinf at ./compiler/typeinfer.jl:244
typeinf at ./compiler/typeinfer.jl:215
typeinf_ext at ./compiler/typeinfer.jl:1056
typeinf_ext_toplevel at ./compiler/typeinfer.jl:1089
typeinf_ext_toplevel at ./compiler/typeinfer.jl:1085
jfptr_typeinf_ext_toplevel_16333.clone_1 at /home/sethaxen/.julia/juliaup/julia-1.9.0-rc2+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/julia.h:1879 [inlined]
jl_type_infer at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:320
jl_generate_fptr_impl at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/jitlayers.cpp:444
jl_compile_method_internal at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2348 [inlined]
jl_compile_method_internal at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2237
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2750 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
autodiff at /home/sethaxen/projects/Enzyme.jl/src/Enzyme.jl:321
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/julia.h:1879 [inlined]
do_apply at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/builtins.c:730
autodiff at /home/sethaxen/projects/Enzyme.jl/src/Enzyme.jl:215
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/julia.h:1879 [inlined]
do_call at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:126
eval_value at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:226
eval_stmt_value at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:177 [inlined]
eval_body at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:624
jl_interpret_toplevel_thunk at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:762
jl_toplevel_eval_flex at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:912
jl_toplevel_eval_flex at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:856
ijl_toplevel_eval_in at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:971
eval at ./boot.jl:370 [inlined]
eval_user_input at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:153
repl_backend_loop at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:249
#start_repl_backend#46 at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:234
start_repl_backend at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:231
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
#run_repl#59 at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:377
run_repl at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:363
jfptr_run_repl_61794.clone_1 at /home/sethaxen/.julia/juliaup/julia-1.9.0-rc2+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
#1019 at ./client.jl:421
jfptr_YY.1019_49540.clone_1 at /home/sethaxen/.julia/juliaup/julia-1.9.0-rc2+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/julia.h:1879 [inlined]
jl_f__call_latest at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/builtins.c:774
#invokelatest#2 at ./essentials.jl:816 [inlined]
invokelatest at ./essentials.jl:813 [inlined]
run_main_repl at ./client.jl:405
exec_options at ./client.jl:322
_start at ./client.jl:522
jfptr__start_37296.clone_1 at /home/sethaxen/.julia/juliaup/julia-1.9.0-rc2+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/julia.h:1879 [inlined]
true_main at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/jlapi.c:573
jl_repl_entrypoint at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/jlapi.c:717
main at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/cli/loader_exe.c:59
unknown function (ip: 0x7f81ede29d8f)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x401098)
Allocations: 20156377 (Pool: 20133683; Big: 22694); GC: 28
Aborted (core dumped)

This is strange because only this line of code should be hit in this case, and all it does is call the primal function.

Copy link
Collaborator Author

@sethaxen sethaxen Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2-arg methods also error for reverse-mode. These are the only remaining failures in the test suite.

Edit: also, this only happens with dot and real inputs, not with dotc or dotu.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open an issue?

@sethaxen
Copy link
Collaborator Author

sethaxen commented May 8, 2023

All tests seem to pass for me. Some of the jobs in CI seem to not be picked up for some reason, but as far as I can tell, the same tests pass here that pass on main.

Here's an updated benchmark. The main takeaways are that the forward mode rules give a 2.5-7.3x speed-up vs the versions hit by main, and the reverse mode rules give a 13-60x speed-up when no tape is needed and a 5.7-8.7x speed-up when a tape is needed. In this latter case, the trade-off is that these rules allocate a tape for all entries that contribute to the output, which in a case like the one benchmarked here (where future operations mutate only a single entry in the array) is wasteful.

using BenchmarkTools, Enzyme, LinearAlgebra, Random

Random.seed!(42)

n = 1_000
x = randn(n)
y = randn(n)
∂x = randn(eltype(x), size(x))
∂y = randn(eltype(y), size(y))
incx = incy = 1

# version that triggers tape allocation
function f_overwite!(f, n, x, incx, y, incy)
    s = f(n, x, incx, y, incy)
    x[1] = 0
    y[1] = 0
    return s
end

## BLAS.dot

@btime autodiff(
    Forward, $(BLAS.dot), $n, $(Duplicated(x, ∂x)), $incx, $(Duplicated(y, ∂y)), $incy,
)
# main: 731.130 ns (0 allocations: 0 bytes)
# here: 100.365 ns (2 allocations: 64 bytes)

# no tape needed
@btime autodiff(
    ReverseWithPrimal, $(BLAS.dot), Active, $n, Dx, $incx, Dy, $incy,
) setup=(Dx=Duplicated(x, copy(∂x)); Dy=Duplicated(y, copy(∂y)))
# main: 11.139 μs (2 allocations: 32 bytes)
# here: 192.827 ns (7 allocations: 208 bytes)

# tape needed
@btime autodiff(
    ReverseWithPrimal, $(f_overwite!), Active, $(BLAS.dot), $n, Dx, $incx, Dy, $incy,
) setup=(Dx=Duplicated(copy(x), copy(∂x)); Dy=Duplicated(copy(y), copy(∂y)))
# main: 11.624 μs (2 allocations: 32 bytes)
# here: 1.342 μs (9 allocations: 16.08 KiB)

T = ComplexF64
x = randn(T, n)
y = randn(T, n)
∂x = randn(eltype(x), size(x))
∂y = randn(eltype(y), size(y))
incx = incy = 1

## BLAS.dotu

@btime autodiff(
    Forward, $(BLAS.dotu), $n, $(Duplicated(x, ∂x)), $incx, $(Duplicated(y, ∂y)), $incy,
)
# main: 1.599 μs (0 allocations: 0 bytes)
# here: 650.117 ns (2 allocations: 64 bytes)

# no tape needed
@btime autodiff(
    ReverseWithPrimal, $(BLAS.dotu), Active, $n, Dx, $incx, Dy, $incy,
) setup=(Dx=Duplicated(x, copy(∂x)); Dy=Duplicated(y, copy(∂y)))
# main: 21.493 μs (2 allocations: 48 bytes)
# here: 1.706 μs (8 allocations: 256 bytes)

# tape needed
@btime autodiff(
    ReverseWithPrimal, $(f_overwite!), Active, $(BLAS.dotu), $n, Dx, $incx, Dy, $incy,
) setup=(Dx=Duplicated(copy(x), copy(∂x)); Dy=Duplicated(copy(y), copy(∂y)))
# main: 22.231 μs (2 allocations: 48 bytes)
# here: 3.885 μs (10 allocations: 31.75 KiB)

## BLAS.dotc

@btime autodiff(
    Forward, $(BLAS.dotc), $n, $(Duplicated(x, ∂x)), $incx, $(Duplicated(y, ∂y)), $incy,
)
# main: 1.584 μs (0 allocations: 0 bytes)
# here: 647.920 ns (2 allocations: 64 bytes)

# no tape needed
@btime autodiff(
    ReverseWithPrimal, $(BLAS.dotc), Active, $n, Dx, $incx, Dy, $incy,
) setup=(Dx=Duplicated(x, copy(∂x)); Dy=Duplicated(y, copy(∂y)))
# main: 21.220 μs (2 allocations: 48 bytes)
# here: 758.145 ns (8 allocations: 256 bytes)

# tape needed
@btime autodiff(
    ReverseWithPrimal, $(f_overwite!), Active, $(BLAS.dotc), $n, Dx, $incx, Dy, $incy,
) setup=(Dx=Duplicated(copy(x), copy(∂x)); Dy=Duplicated(copy(y), copy(∂y)))
# main: 22.251 μs (2 allocations: 48 bytes)
# here: 2.977 μs (10 allocations: 31.75 KiB)

@ZuseZ4 it would be interesting to see how the tablegen versions compare.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants