Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to write Jacobian and Hessian functions stripped of ForwardDiff #865

Closed
tholdem opened this issue Dec 30, 2020 · 7 comments
Closed

How to write Jacobian and Hessian functions stripped of ForwardDiff #865

tholdem opened this issue Dec 30, 2020 · 7 comments

Comments

@tholdem
Copy link

tholdem commented Dec 30, 2020

ForwardDiff doesn't seem to support FFT or complex-type, but Zygote.gradient does. Per this discussion,
https://discourse.julialang.org/t/forwarddiff-and-zygote-cannot-automatically-differentiate-ad-function-from-c-n-to-r-that-uses-fft/52440/4?u=tholdem
I was made aware of that Zygote.hessian uses ForwardDiff. This prevents Zygote to AD the Hessian of my objective function involving FFT and complex-numbers. I don't have much experience with these things and I'm new to Julia. So I don't really know if I can pull off writing a Hessian function myself using Zygote only. I was wondering 1. why would Zygote call ForwardDiff instead of its own functions? 2. if there is more guidance or resource on how to write a Hessian AD function that calls Zygote only that supports complex-numbers and FFT? Thank you so much for your help.

For example, this code doesn't work because Zygote doesn't support mutating arrays, and it's unclear whether this code handles complex-numbers.

function jacobian(f,x)
    y,back  = Zygote.pullback(f,x)
    k  = length(y)
    n  = length(x)
    J  = Matrix{eltype(y)}(undef,k,n)
    e_i = fill!(similar(y), 0)
    @inbounds for i = 1:k
        e_i[i] = oneunit(eltype(x))
        J[i,:] = back(e_i)[1]
        e_i[i] = zero(eltype(x))
    end
    (J,)
end

hessian(f, x) = jacobian(x -> gradient(f, x)[1], x)

There seems to be a workaround for mutating arrays, https://github.com/rakeshvar/Zygote-Mutating-Arrays-WorkAround.jl, but it is much harder to find the gradient of the mutating steps in the code above so it would be very daunting to find a workaround myself.

@mcabbott
Copy link
Member

mcabbott commented Jan 3, 2021

It would be better if the gradient in #747 used reduce(hcat, ...) instead of writing into a matrix, as then it might behave well inside second derivatives. (Although its re-use of e_i might also need some trickery.)

However, if re-writing the hessian function to use that not the ForwardDiff version, the jacobian would be on the outside, so should not itself be differentiated. Maybe something else is going wrong?

The answer to 1. is that reverse-mode AD works well on (say) 100 -> 1 dim functions, i.e. (large) vector -> scalar. But the gradient being differentiated to make the Hessian is a 100 -> 100 dim function, its output is the same size as its input. Then the ideal complexity of forward & reverse-mode Jacobians should be similar (I think), but in practice forward is easier & more efficient. Zygote actually has its own forward-mode, but relatively young & much less well tested, so it still uses ForwardDiff, which is the opposite. (Although complex numbers in ForwardDiff also only started working recently, I think.) But if neither of these support FFT, then using the reverse-mode Jacobian is probably the shortest path to getting this working at all.

@tholdem
Copy link
Author

tholdem commented Jan 3, 2021

It would be better if the gradient in #747 used reduce(hcat, ...) instead of writing into a matrix, as then it might behave well inside second derivatives. (Although its re-use of e_i might also need some trickery.)

Thank you for your help Michael. I'm not sure how to replace the matrix with reduce(hcat, ...). Maybe I don't understand this function well, but it seems like you need a matrix first to perform reduce. Do you mean something like this?

function jacobian(f,x)
    y,back  = Zygote.pullback(f,x)
    k  = length(y)
    n  = length(x)
    J  = Matrix{eltype(y)}(undef,k,n)
    e_array = Matrix(I,k,k)
    @inbounds for i = 1:k
	J[i,:] = back(e_array[:,i])[1]
    end
    (reduce(hcat, J),)
end

hessian(f,x) = jacobian(x -> gradient(f, x)[1],x)

I also precomputed e_i in a matrix, to avoid mutating e_i.
Now if I use a simple function:

f(x) = x[1]*x[2]^2
x0=rand(Float64,(2,1))
hessian(f,x0)

I'm still getting the same error:

Mutating arrays is not supported

    error(::String)@error.jl:33
    (::Zygote.var"#364#365")(::Nothing)@array.jl:58
    (::Zygote.var"#2245#back#366"{Zygote.var"#364#365"})(::Nothing)@adjoint.jl:59
    (::Zygote.var"#150#151"{Zygote.var"#2245#back#366"{Zygote.var"#364#365"},Tuple{Tuple{Nothing,Nothing},Tuple{Nothing}}})(::Nothing)@lib.jl:191
    (::Zygote.var"#1693#back#152"{Zygote.var"#150#151"{Zygote.var"#2245#back#366"{Zygote.var"#364#365"},Tuple{Tuple{Nothing,Nothing},Tuple{Nothing}}}})(::Nothing)@adjoint.jl:59
    #356@array.jl:38[inlined]
    (::typeof(∂(λ)))(::Tuple{Array{Bool,1},Nothing})@interface2.jl:0
    #2209#back@adjoint.jl:59[inlined]
    (::typeof(∂(λ)))(::Tuple{Nothing,Array{Bool,1},Nothing})@interface2.jl:0
    literal_getindex@builtins.jl:15[inlined]
    (::typeof(∂(λ)))(::Tuple{Nothing,Array{Bool,1},Nothing})@interface2.jl:0
    f@Other: 1[inlined]
    (::typeof(∂(λ)))(::Tuple{Nothing,Array{Bool,1}})@interface2.jl:0
    #41@interface.jl:40[inlined]
    (::typeof(∂(λ)))(::Tuple{Array{Bool,1}})@interface2.jl:0
    gradient@interface.jl:49[inlined]
    (::typeof(∂(gradient)))(::Tuple{Array{Bool,1}})@interface2.jl:0
    #1@Other: 1[inlined]
    (::typeof(∂(#1)))(::Array{Bool,1})@interface2.jl:0
    (::Zygote.var"#41#42"{typeof(∂(#1))})(::Array{Bool,1})@interface.jl:40
    jacobian(::Function, ::Array{Int64,1})@Other: 8
    hessian@Other: 1[inlined]
    top-level scope@Local: 1[inlined]

@mcabbott
Copy link
Member

mcabbott commented Jan 3, 2021

I had in mind something like this, but I don't know how much this change matters:

function my_jacobian(f, x::AbstractArray)
    y, back  = Zygote.pullback(vec∘f, x)
    k, n = length(y), length(x)
    e_array = Matrix(I,k,k)
    slices = map(1:k) do i
        b = back(e_array[:,i])[1]
        b===nothing ? falses(n) : vec(b)
    end
    (reduce(hcat, slices),)
end
my_jacobian(identity, rand(3))[1]

my_jacobian(x -> my_jacobian(x -> x.^3, x)[1], 1:3)[1] # now this works

my_hessian(f, x::AbstractArray) = my_jacobian(x -> Zygote.gradient(f, x)[1], x)[1]
my_hessian(sum, rand(3))
my_hessian(x -> sum(x.^3), 1:3)
Zygote.hessian(x -> sum(x.^3), 1:3)

my_hessian(x -> sum(x.^3) + 100(x[1]*x[3]), rand(3)) # fails because gradient of x[1] is not itself Zygote-differentiable
Zygote.hessian(x -> sum(x.^3) + 100(x[1]*x[3]), rand(3))

I think this is the same error you have above. The gradient of indexing works by filling in an array, and attempting to differentiate that doesn't work at all. It might not be super-hard to fix by adding some @adjoint ∇getindex but nobody has yet, and it may never be quick.

But I'm not sure your original function should involve indexing at all. It may involve other things which don't work, but FFT itself has a gradient which is other FFT functions, so perhaps that can work?

my_jacobian(fft, rand(4) .+ im)[1] # ok
my_hessian(x -> abs(sum(fft(x).^2)), rand(4) .+ im) # ok
my_hessian(x -> sum(abs2, fft(x)), rand(4) .+ im) # some BoundsError?

@tholdem
Copy link
Author

tholdem commented Jan 4, 2021

Thank you so much for the detailed help! You are right that I shouldn't use indexing for the toy objective function. Your function removed the mutating array error. However, when I tried on a more fitting toy function

using LinearAlgebra,Random,ForwardDiff,FFTW,Zygote
n = 10
x0 = randn(n)

pfft = plan_fft(x0) 
pifft = plan_ifft(x0)
f(x) = real(norm(pifft.scale .* (pifft.p * fftshift(pfft*x))));

I'm getting a new error:

Can't differentiate foreigncall expression

    error(::String)@error.jl:33
    get@iddict.jl:87[inlined]
    (::typeof(∂(get)))(::Nothing)@interface2.jl:0
    accum_global@lib.jl:56[inlined]
    (::typeof(∂(accum_global)))(::Nothing)@interface2.jl:0
    #89@lib.jl:67[inlined]
    (::typeof(∂(λ)))(::Nothing)@interface2.jl:0
    #1550#back@adjoint.jl:59[inlined]
    (::typeof(∂(λ)))(::Nothing)@interface2.jl:0
    gradtuple1@adjoint.jl:22[inlined]
    #1573#back@adjoint.jl:59[inlined]
    (::typeof(∂(λ)))(::Tuple{Nothing,Array{Bool,1}})@interface2.jl:0
    #41@interface.jl:40[inlined]
    (::typeof(∂(λ)))(::Tuple{Array{Bool,1}})@interface2.jl:0
    gradient@interface.jl:49[inlined]
    (::typeof(∂(gradient)))(::Tuple{Array{Bool,1}})@interface2.jl:0
    #1@Other: 1[inlined]
    (::typeof(∂(#1)))(::Array{Bool,1})@interface2.jl:0
    (::Zygote.var"#150#151"{typeof(∂(#1)),Tuple{Tuple{Nothing}}})(::Array{Bool,1})@lib.jl:191
    #1693#back@adjoint.jl:59[inlined]
    #62@operators.jl:875[inlined]
    (::typeof(∂(#62)))(::Array{Bool,1})@interface2.jl:0
    (::Zygote.var"#41#42"{typeof(∂(#62))})(::Array{Bool,1})@interface.jl:40
    (::Main.workspace312.var"#1#2"{Zygote.var"#41#42"{typeof(∂(#62))},Int64,Array{Bool,2}})(::Int64)@Other: 6
    iterate@generator.jl:47[inlined]
    _collect(::UnitRange{Int64}, ::Base.Generator{UnitRange{Int64},Main.workspace312.var"#1#2"{Zygote.var"#41#42"{typeof(∂(#62))},Int64,Array{Bool,2}}}, ::Base.EltypeUnknown, ::Base.HasShape{1})@array.jl:699
    collect_similar(::UnitRange{Int64}, ::Base.Generator{UnitRange{Int64},Main.workspace312.var"#1#2"{Zygote.var"#41#42"{typeof(∂(#62))},Int64,Array{Bool,2}}})@array.jl:628
    map(::Function, ::UnitRange{Int64})@abstractarray.jl:2162
    my_jacobian(::Function, ::Array{Float64,1})@Other: 5
    my_hessian@Other: 1[inlined]
    top-level scope@Local: 1[inlined]

I got the same error on my original objective function so this toy function is probably representative of my problem. I couldn't find anything related to this error. Do you happen to know what might cause this? Thank you!

@mcabbott
Copy link
Member

mcabbott commented Jan 4, 2021

I don't have great suggestions, except trying variations to narrow things down. Some FFT things work, weirdly when I add broadcasting (with just a scalar, but not with a literal) then it breaks. Also, norm seems to have problems with the 2nd derivative:

julia> my_hessian(x -> abs(sum(fftshift(pfft * x))), x0 .+ im)  # this runs
10×10 Matrix{ComplexF64}:
 4.56447+3.78283im  -8.39958e-17-6.9612e-17im  0.0+0.0im  …  0.0+0.0im  8.39958e-17+6.9612e-17im
     0.0+0.0im               0.0+0.0im         0.0+0.0im     0.0+0.0im          0.0+0.0im
     0.0+0.0im               0.0+0.0im         0.0+0.0im     0.0+0.0im          0.0+0.0im
...

julia> sc = pifft.scale
0.1

julia> my_hessian(x -> abs(sum(sc .* fftshift(pfft * x))), x0 .+ im)  # not the same stack trace as yours, but the same as my_hessian(f, x0) for me.
ERROR: Can't differentiate foreigncall expression
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] Pullback
    @ ./iddict.jl:102 [inlined]
  [3] (::typeof(∂(get)))(Δ::Nothing)
    @ Zygote ~/.julia/packages/Zygote/ywhiG/src/compiler/interface2.jl:0
  [4] Pullback
    @ ~/.julia/packages/Zygote/ywhiG/src/lib/lib.jl:56 [inlined]
  [5] (::typeof(∂(accum_global)))(Δ::Nothing)
    @ Zygote ~/.julia/packages/Zygote/ywhiG/src/compiler/interface2.jl:0
...

julia> my_hessian(x -> abs(sum(0.1 .* fftshift(pfft * x))), x0 .+ im)  # with literal 0.1 it works again
10×10 Matrix{ComplexF64}:
 0.456447+0.378283im  0.0+0.0im  …  -2.21707e-17-1.83741e-17im  0.0+0.0im
      0.0+0.0im       0.0+0.0im              0.0+0.0im          0.0+0.0im
      0.0+0.0im       0.0+0.0im              0.0+0.0im          0.0+0.0im
...

julia> my_hessian(x -> abs(norm(fftshift(pfft * x))), x0 .+ im)  # norm also seems broken
ERROR: Mutating arrays is not supported
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] (::Zygote.var"#375#376")(#unused#::Nothing)
    @ Zygote ~/.julia/packages/Zygote/ywhiG/src/lib/array.jl:61
  [3] (::Zygote.var"#2258#back#377"{Zygote.var"#375#376"})(Δ::Nothing)
    @ Zygote ~/.julia/packages/ZygoteRules/OjfTt/src/adjoint.jl:59
  [4] Pullback
    @ ./broadcast.jl:893 [inlined]
  [5] Pullback
    @ ./broadcast.jl:890 [inlined]
  [6] Pullback
    @ ./broadcast.jl:886 [inlined]
  [7] Pullback
    @ ~/.julia/packages/ChainRules/sGOE6/src/rulesets/LinearAlgebra/norm.jl:216 [inlined]
  [8] (::typeof(∂(_norm2_back)))(Δ::Vector{ComplexF64})
    @ Zygote ~/.julia/packages/Zygote/ywhiG/src/compiler/interface2.jl:0

Would be curious if you get the same error with norm, I may have a fix for that bit.

@tholdem
Copy link
Author

tholdem commented Jan 4, 2021

That makes sense. It looks like because there are tons of elementwise matrix multiplication in my function, the Can't differentiate foreigncall expression just would not go away.
I tested my_hessian(x -> abs(norm(fftshift(pfft * x))), x0 .+ im) and got no error. I can't test it on my original function since the elementwise multiplication error gets in the way first.

Do you think this route is worth pursuing? Or should I just stick to writing methods for Dual Numbers instead?

@mcabbott
Copy link
Member

mcabbott commented Jan 5, 2021

At this point I think you should try the dual numbers! That's certainly solvable, whereas making 2nd derivatives work here (reverse mode) may be a bottomless pit of hard problems.

@tholdem tholdem closed this as completed Jan 5, 2021
bors bot added a commit that referenced this issue Feb 2, 2021
890: Add `jacobian`, at last? r=CarloLucibello a=mcabbott

This adds a Jacobian function. 

Compared to #747 this one:
* has tests
* accepts multiple arguments
* shouldn't fail if `back` returns `nothing`
* inserts `vec` a few more places
* has a method which accepts implicit `Params`, and returns `Grads`. (This was the only part I actually needed in real life.)
* now works on the GPU too!

Compared to #414 this one:
* always inserts `vec`, never makes higher-dimensional arrays
* runs on current Zygote
* has tests.

Compared to #235 this one:
* doesn't try to provide numerical jacobian
* doesn't alter testing infrastructure
* doesn't provide `jacobian!`, nor any code for structured matrices.

This does not address #564's concerns about functions which return a tuple of arrays. Only functions returning an array (or a scalar) are permitted. Similar considerations might give sensible jacobians when the argument of a function is a tuple, or some other struct, but for now these are handled by putting up a giant warning sign.

Nothing in the file `utils.jl` seems to have any tests at all. So I also added tests for `hessian`. 

And, while I was there, made `hessian` actually accept a real number like its docstring promises. (Hence this closes #891.) And, made a version that is reverse-over-reverse, using this `jacobian`, which works less well of course but may as well exist to test things. (See for example #865.) Ideally there would be a pure-Zygote version using its own forward mode, but I didn't write that.

Fixes #51, fixes #98, fixes #413. Closes #747.

Co-authored-by: Michael Abbott <me@escbook>
Co-authored-by: Michael Abbott <32575566+mcabbott@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants