Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Medoid linkage #12

Open
kno10 opened this issue May 2, 2024 · 1 comment
Open

Add Medoid linkage #12

kno10 opened this issue May 2, 2024 · 1 comment

Comments

@kno10
Copy link

kno10 commented May 2, 2024

Medoid linkage (there are actually multiple with this name, unfortunately) is inspired by the well-known k-meodids clustering concept (Partitioning Around Medoids, PAM), but in a hierarchical way instead of fixing the number of clusters beforehand.

Every cluster is represented by a medoid, which minimizes the distance sum.
$$\min_{c\in A\cup B} \sum_{o\in A\cup B} d(c,o)$$

Two versions are proposed in the article below, the second uses the change, similar to how Ward linkage uses the change in sum of squares.
$$\min_{c\in A\cup B} \sum_{o\in A\cup B} d(c,o) - \min_{c\in A} \sum_{o\in A} d(c,o) - \min_{c\in B} \sum_{o\in B} d(c,o)$$

Schubert, Erich (2021).
HACAM: Hierarchical Agglomerative Clustering Around Medoids – and its Limitations.
LWDA’21: Lernen, Wissen, Daten, Analysen September 01–03, 2021, Munich, Germany.

Another "medoid" linkage was proposed earlier based on median/centroid linkage, as the distance of the medoids of the clusters, $d(m_A, m_B)$. In my opinion this is not well defined, because the medoids $m_A$, $m_B$ are not uniquely determined. Simply consider clusters of two points, then either point can be the medoid. Given the integers 1,2,3,4, clustered into (1,2) and (3,4), the distance of the medoids can be 1,2, or 3. A more stable, but also more costly, computation would be the minimum over the pairwise distances of all medoid candidates. Above version does not have this issue, because it does not need the medoid $c$ to be unique.

Miyamoto, Sadaaki; Kaizu, Yousuke; Endo, Yasunori (2016).
Hierarchical and Non-Hierarchical Medoid Clustering Using Asymmetric Similarity Measures.
2016 Joint 8th International Conference on Soft Computing and Intelligent Systems (SCIS) and 17th International Symposium on Advanced Intelligent Systems (ISIS). pp. 400–403

Herr, Dominik; Han, Qi; Lohmann, Steffen; Ertl, Thomas (2016).
Visual Clutter Reduction through Hierarchy-based Projection of High-dimensional Labeled Data.
Graphics Interface. Graphics Interface.

@BurntSushi
Copy link
Contributor

Hiya! I'm not sure this library will be receiving new features. If you'd like to see new linkages added (including the others you've requested), I'd suggest forking the library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants