Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pairwise distance computation in condensed form for symmetric metrics. #79

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lucabrivio
Copy link

This patch adds support for computing pairwise distances (for semimetrics) in a condensed form, similar to how pdist works in several packages available for MATLAB/Octave, R, Python, etc. (although on columns).
At the moment I have added no tests for the new methods, nor have I written functions to convert between the condensed form and the redundant one.

@lucabrivio lucabrivio changed the title Add pairwise distance computation in condensed form for symmetrical metrics. Add pairwise distance computation in condensed form for symmetric metrics. Oct 20, 2017
@nalimilan
Copy link
Member

Thanks, but I'm not sure we want to provide this kind of function. Julia supports very powerful AbstractMatrix representations which can behave like a standard Matrix while storing only half of the entries. Base provides Symmetric for this, though currently it is represented as a standard Matrix whose upper/lower triangle is ignored because it generally allows more efficient computations and "only" uses twice the needed memory. See this Discourse thread and this one.

So instead of returning a vector, which is arguably a hack due to the limitations of other languages, we should use Symmetric or another similar type. We could also allow passing the expected output type as the first argument to pairwise.

What the most interesting feature for you in this PR? Saving RAM? Making subsequent computations faster?

@lucabrivio
Copy link
Author

lucabrivio commented Oct 20, 2017 via email

@nalimilan
Copy link
Member

If you mainly care about the computation speed, then returning a Symmetric matrix should be enough, as comparisons will only check one half of the tables.

I think you could just add a pairwise(metric::PreMetric, a::AbstractMatrix) method which would create a Symmetric matrix rather than a standard Matrix before passing it to pairwise!. Then you would need to add a pairwise!(r::Symmetric, metric::SemiMetric, a::AbstractMatrix) method which would call pairwise(r.data, metric, a). That would be slightly wasteful since it would fill the whole matrix, so you could imagine also passing r.uplo as an additional argument to decide which triangle needs to be filled (but that would probably not make a big difference compared with the time required to compute distances).

@yakir12
Copy link

yakir12 commented Nov 9, 2017

I too agree that pairwise(metric::PreMetric, a::AbstractMatrix) and pairwise(metric::SemiMetric, a::AbstractMatrix, b::AbstractMatrix) should return a Symmetric (came here looking for it).

@juliohm
Copy link
Contributor

juliohm commented Nov 12, 2017

What is the status on this? I also think that the output for semi-metrics should be symmetric. Also, besides saving computation, we could have a method to flatten a symmetric matrix (the upper or lower part of it as a vector). Many times that is what is needed in distance calculations on a huge point cloud.

@juliohm
Copy link
Contributor

juliohm commented Jan 2, 2018

Re-stating what was already stated in the previous comments. The main advantage of storing half of the matrix entries in a flattened vector is that we can perform subsequent calculations more efficiently. Simple statistics like histograms, means, variances are easy to apply. Even if pairwise returns a matrix type that is symmetric, its zero entries would compromise the summaries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants