Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colwise function #138

Open
musm opened this issue Jun 18, 2019 · 6 comments
Open

colwise function #138

musm opened this issue Jun 18, 2019 · 6 comments

Comments

@musm
Copy link

musm commented Jun 18, 2019

Why is there only a colwise function and not also a matching rowwise function?

@dkarrasch
Copy link
Member

But colwise and pairwise have very different meanings here, independent of whether data points are viewed as columns or rows. colwise is meant to compute distances only between corresponding data points/columns. pairwise is meant to compute all pairwise distances, and can be used to compute the distances between matrices with different number of data points (say, columns), which is impossible for colwise, except for when one of the two data sets is a single data point, i.e., a vector.

@musm
Copy link
Author

musm commented Jun 19, 2019

Good point regarding pairwise (comment edited). I think I was getting at a rowwise that computes the distances between rows of the matrices i.e. rowwise(dist, X,Y) = colwise(dist, X', Y')

@johnnychen94
Copy link
Contributor

johnnychen94 commented Jun 19, 2019

I suppose one of reason here is because julia matrix is stored in column-major order, so a row-wise loop might raise performance issue.

julia> x = rand(100, 100);

julia> y = rand(100, 100);

julia> @benchmark colwise(Euclidean(), x, y)
BenchmarkTools.Trial: 
  memory estimate:  896 bytes
  allocs estimate:  1
  --------------
  minimum time:     1.742 μs (0.00% GC)
  median time:      1.804 μs (0.00% GC)
  mean time:        2.268 μs (19.13% GC)
  maximum time:     3.196 ms (99.90% GC)
  --------------
  samples:          10000
  evals/sample:     10

julia> @benchmark colwise(Euclidean(), x', y')
BenchmarkTools.Trial: 
  memory estimate:  928 bytes
  allocs estimate:  3
  --------------
  minimum time:     8.752 μs (0.00% GC)
  median time:      8.893 μs (0.00% GC)
  mean time:        9.459 μs (3.82% GC)
  maximum time:     3.631 ms (99.62% GC)
  --------------
  samples:          10000
  evals/sample:     3

@nalimilan
Copy link
Member

With JuliaLang/julia#32310 we should probably drop colwise(dist, x, y) in favor of map(d, eachol(x), eachcol(y)). Cc: @simonbyrne

@dkarrasch
Copy link
Member

I guess we should keep colwise(dist, x, y), make map(d, eachcol(x), eachcol(y)) the default, but allow specialized methods to optimize for performance.

using Distances, BenchmarkTools
d = Euclidean(); a = rand(5, 100); b = rand(5, 100);
@btime map($d, $(eachcol(a)), $(eachcol(b))); # 3.090 μs (304 allocations: 13.45 KiB)
@btime colwise($d, $a, $b); # 380.907 ns (1 allocation: 896 bytes)

@simonbyrne
Copy link
Member

@dkarrasch the point of JuliaLang/julia#32310 is that you could write specialized versions that can leverage the memory layout: in this case, you could do:

Base.map(d::Distance, a::EachCol, b::EachCol) = colwise(d, parent(a), parent(b))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants