Skip to content

Commit

Permalink
Fix #71: Add Haversine distance (#77)
Browse files Browse the repository at this point in the history
* Fix #71: Add Haversine distance

* Add Haversine to README.md

* Add Haversine to benchmarks

* Add benchmark results for Haversine to README.md

* Add helper function haversine() with default Earth radius

* Add tests for Haversine distance

* Use 4 spaces for indentation

* Use 4 spaces for indentation on test suite

* Remove default radius for Haversine distance

* Fix typo in Haversine tests

* fixups
  • Loading branch information
juliohm authored and KristofferC committed Oct 5, 2017
1 parent a6de2f8 commit 2b0ab92
Show file tree
Hide file tree
Showing 6 changed files with 72 additions and 5 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ This package also provides optimized functions to compute column-wise and pairwi
* Squared Mahalanobis distance
* Bhattacharyya distance
* Hellinger distance
* Haversine distance
* Mean absolute deviation
* Mean squared deviation
* Root mean squared deviation
Expand Down Expand Up @@ -148,6 +149,7 @@ Each distance corresponds to a distance type. The type name and the correspondin
| SpanNormDist | `spannorm_dist(x, y)` | `max(x - y) - min(x - y )` |
| BhattacharyyaDist | `bhattacharyya(x, y)` | `-log(sum(sqrt(x .* y) / sqrt(sum(x) * sum(y)))` |
| HellingerDist | `hellinger(x, y) ` | `sqrt(1 - sum(sqrt(x .* y) / sqrt(sum(x) * sum(y))))` |
| Haversine | `haversine(x, y, r)` | [Haversine formula](https://en.wikipedia.org/wiki/Haversine_formula) |
| Mahalanobis | `mahalanobis(x, y, Q)` | `sqrt((x - y)' * Q * (x - y))` |
| SqMahalanobis | `sqmahalanobis(x, y, Q)` | ` (x - y)' * Q * (x - y)` |
| MeanAbsDeviation | `meanad(x, y)` | `mean(abs.(x - y))` |
Expand Down Expand Up @@ -226,6 +228,7 @@ The table below compares the performance (measured in terms of average elapsed t
| WeightedHamming | 0.009042s | 0.003182s | 2.8417 |
| SqMahalanobis | 0.070869s | 0.019199s | 3.6913 |
| Mahalanobis | 0.070980s | 0.019305s | 3.6768 |
| Haversine | 0.006549s | 0.000809s | 8.0967 |

We can see that using ``colwise`` instead of a simple loop yields considerable gain (2x - 4x), especially when the internal computation of each distance is simple. Nonetheless, when the computation of a single distance is heavy enough (e.g. *KLDivergence*, *RenyiDivergence*), the gain is not as significant.

Expand Down Expand Up @@ -257,5 +260,6 @@ The table below compares the performance (measured in terms of average elapsed t
| WeightedHamming | 0.024456s | 0.007403s | 3.3033 |
| SqMahalanobis | 0.113107s | 0.000366s | **309.3621** |
| Mahalanobis | 0.114646s | 0.000686s | **167.0595** |
| Haversine | 0.015267s | 0.003656s | 4.1763 |

For distances of which a major part of the computation is a quadratic form (e.g. *Euclidean*, *CosineDist*, *Mahalanobis*), the performance can be drastically improved by restructuring the computation and delegating the core part to ``GEMM`` in *BLAS*. The use of this strategy can easily lead to 100x performance gain over simple loops (see the highlighted part of the table above).
4 changes: 3 additions & 1 deletion benchmark/benchmarks.jl
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ function create_distances(w, Q)
BhattacharyyaDist(),
HellingerDist(),

Haversine(),

WeightedSqEuclidean(w),
WeightedEuclidean(w),
WeightedCityblock(w),
Expand Down Expand Up @@ -57,7 +59,7 @@ function evaluate_colwise(dist, x, y)
end

function add_colwise_benchmarks!(SUITE)

m = 200
n = 10000

Expand Down
3 changes: 2 additions & 1 deletion benchmark/print_table.jl
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ order = [
:WeightedHamming,
:SqMahalanobis,
:Mahalanobis,
:Haversine,
]

BenchmarkTools.DEFAULT_PARAMETERS.seconds = 2.0 # Long enough
Expand Down Expand Up @@ -67,4 +68,4 @@ function print_table(judgement)
end
end

print_table(judgement)
print_table(judgement)
5 changes: 5 additions & 0 deletions src/Distances.jl
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@ export
BhattacharyyaDist,
HellingerDist,

Haversine,

MeanAbsDeviation,
MeanSqDeviation,
RMSDeviation,
Expand Down Expand Up @@ -79,6 +81,8 @@ export
bhattacharyya,
hellinger,

haversine,

meanad,
msd,
rmsd,
Expand All @@ -88,6 +92,7 @@ include("common.jl")
include("generic.jl")
include("metrics.jl")
include("wmetrics.jl")
include("haversine.jl")
include("mahalanobis.jl")
include("bhattacharyya.jl")

Expand Down
39 changes: 39 additions & 0 deletions src/haversine.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
"""
Haversine(radius)
The haversine distance between two locations on a sphere of given `radius`.
Locations are described with longitude and latitude in degrees.
The computed distance has the same units as that of the radius.
"""
struct Haversine{T<:Real} <: Metric
radius::T
end

const VecOrLengthTwoTuple{T} = Union{AbstractVector{T}, NTuple{2, T}}

function evaluate(dist::Haversine, x::VecOrLengthTwoTuple, y::VecOrLengthTwoTuple)
length(x) == length(y) == 2 || haversine_error()

@inbounds begin
# longitudes
Δλ = deg2rad(y[1] - x[1])

# latitudes
φ₁ = deg2rad(x[2])
φ₂ = deg2rad(y[2])
end

Δφ = φ₂ - φ₁

# haversine formula
a = sin(Δφ/2)^2 + cos(φ₁)*cos(φ₂)*sin(Δλ/2)^2
c = 2atan2(a, (1-a))

# distance on the sphere
c*dist.radius
end

haversine(x::VecOrLengthTwoTuple, y::VecOrLengthTwoTuple, radius::Real) = evaluate(Haversine(radius), x, y)

@noinline haversine_error() = throw(ArgumentError("expected both inputs to have length 2 in Haversine distance"))
22 changes: 19 additions & 3 deletions test/test_dists.jl
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,13 @@ end

test_metricity(BhattacharyyaDist(), x, y, z)
test_metricity(HellingerDist(), x, y, z)


x₁ = rand(T, 2)
x₂ = rand(T, 2)
x₃ = rand(T, 2)

test_metricity(Haversine(6371.), x₁, x₂, x₃)

k = rand(1:3, n)
l = rand(1:3, n)
m = rand(1:3, n)
Expand Down Expand Up @@ -154,7 +160,7 @@ end
@test wcityblock(x, y, w) dot(abs.(x - vec(y)), w)
@test wminkowski(x, y, w, 2) weuclidean(x, y, w)
end


# Test weighted Hamming distances with even weights
a = T.([1.0, 2.0, 1.0, 3.0, 2.0, 1.0])
Expand Down Expand Up @@ -187,7 +193,7 @@ end
end
@test kl_divergence(p, q) klv
@test typeof(kl_divergence(p, q)) == T


@test renyi_divergence(p, r, 0) -log(scale)
@test renyi_divergence(p, r, 1) -log(scale)
Expand Down Expand Up @@ -272,6 +278,16 @@ end # testset
end
end #testset

@testset "haversine" begin
for T in (Float64, F64)
@test haversine([-180.,0.], [180.,0.], 1.) 0 atol=1e-10
@test haversine([0.,-90.], [0.,90.], 1.) π atol=1e-10
@test haversine((-180.,0.), (180.,0.), 1.) 0 atol=1e-10
@test haversine((0.,-90.), (0.,90.), 1.) π atol=1e-10
@test_throws ArgumentError haversine([0.,-90., 0.25], [0.,90.], 1.)
end
end

@testset "bhattacharyya / hellinger" begin
for T in (Float64, F64)
x, y = T.([4.0, 5.0, 6.0, 7.0]), T.([3.0, 9.0, 8.0, 1.0])
Expand Down

0 comments on commit 2b0ab92

Please sign in to comment.