-
-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix up Euclidean distance at the origin #787
Conversation
Yes, it will fix the issue above: julia> x = rand(1, 10)
julia> k_se(θ) = TransformedKernel(SEKernel(), ScaleTransform(θ))
julia> k_exp(θ) = TransformedKernel(ExponentialKernel(), ScaleTransform(θ))
julia> θ = 2.0
julia> for k in [k_se, k_exp]
obj(θ) = sum(kernelmatrix(k(θ), x, x[:, 1:1]; obsdim = 2))
println((obj(θ), Zygote.gradient(obj, θ)))
end
(8.579178644842253, (-1.1450487011851709,))
(7.600532275422401, (-0.9108241610417317,)) |
@DhairyaLGandhi I think this looks good now. It's already gone through a round of review with @molet who found the problem in the first place and agrees this fixes it. I'm going to be unavailable for the next week or so, so would appreciate if you could handle this from here -- feel free to push to this branch directly if you feel that there are more changes necessary. |
test/runtests.jl
Outdated
@testset "Interface" begin | ||
include("interface.jl") | ||
end | ||
# @testset "Interface" begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess these need to be uncommented?
@@ -59,26 +59,30 @@ end | |||
@adjoint function colwise(s::Euclidean, x::AbstractMatrix, y::AbstractMatrix) | |||
d = colwise(s, x, y) | |||
return d, function (Δ::AbstractVector) | |||
x̄ = (Δ ./ d)' .* (x .- y) | |||
x̄ = (Δ ./ max.(d, eps(eltype(d))))' .* (x .- y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can do a similar trick here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this would work:
(Δ .* map(di->ifelse(iszero(di), di, inv(di)), d))' .* (x .- y)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can generalize https://github.com/FluxML/Zygote.jl/pull/787/files#r487137860
then that would be cool.
If not then i think it should be done there still.
Don't forget to uncomment out the tests before merging.
Co-authored-by: Lyndon White <oxinabox@ucc.asn.au>
bors r+ |
Build succeeded: |
δ = sqrt(sum(abs2, D)) | ||
function euclidean(Δ::Real) | ||
x̄ = (Δ / δ) .* D | ||
x̄ = ifelse(iszero(δ), D, (Δ / δ) .* D) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this mathematically incorrect if x !== y
? E.g., in the 1D case the derivative of the Euclidean distance x -> |x - a|
with respect to x
is not defined for x = a
since the limit from the left and right do not agree (they are -1 and 1, so also not close to 0).
I understand that this might be unsatisfying for practical applications, also due to floating point issues, but it seems mathematically actually the NaN value resulting from the previous implementation makes sense if x !== y
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we tend to use a subgradient-based convention that if gradient limit from the left has opposite sign to the gradient approaching from the right.
Or more generally: if 0 is an element of the subgradient,
then we say the gradient is zero.
This has the benificial effect of ensuring that the gradient is considered to be zero at local minima,
which from perspective of doing gradient based optimisation is what you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see 👍 Thanks for the explanation!
It seems there is no new release yet? Would it be possible to make one? |
Will is on holidays, I guess he forgot to make one. |
Ah Dhairya is ahead of me |
(Hopefully) fixes some Euclidean distance issues when the inputs are close together.
Fixes JuliaGaussianProcesses/KernelFunctions.jl#166
@molet could you try this out and see whether or not it fixes your issue?
I've introduced
FiniteDifferences
as a test dep because Zygote's own finite differencing isn't sufficiently accurate in this case. I'm not completely sure why.