44Clustering with KMedoids and Common-nearest-neighbors
55=====================================================
66.. _k_medoids :
7+ .. currentmodule :: sklearn_extra.cluster
78
89K-Medoids
910=========
1011
11- :class: `KMedoids ` is related to the :class: `KMeans ` algorithm. While
12- :class: `KMeans ` tries to minimize the within cluster sum-of-squares,
12+
13+ :class: `KMedoids ` is related to the :class: `KMeans <sklearn.cluster.KMeans> ` algorithm. While
14+ :class: `KMeans <sklearn.cluster.KMeans> ` tries to minimize the within cluster sum-of-squares,
1315:class: `KMedoids ` tries to minimize the sum of distances between each point and
1416the medoid of its cluster. The medoid is a data point (unlike the centroid)
15- which has least total distance to the other members of its cluster. The use of
17+ which has the least total distance to the other members of its cluster. The use of
1618a data point to represent each cluster's center allows the use of any distance
1719metric for clustering. It may also be a practical advantage, for instance K-Medoids
1820algorithms have been used for facial recognition for which the medoid is a
1921typical photo of the person to recognize while K-Means would have obtained a blurry
2022image that mixed several pictures of the person to recognize.
2123
22- :class: `KMedoids ` can be more robust to noise and outliers than :class: `KMeans `
24+ :class: `KMedoids ` can be more robust to noise and outliers than :class: `KMeans <sklearn.cluster.KMeans> `
2325as it will choose one of the cluster members as the medoid while
24- :class: `KMeans ` will move the center of the cluster towards the outlier which
26+ :class: `KMeans <sklearn.cluster.KMeans> ` will move the center of the cluster towards the outlier which
2527might in turn move other points away from the cluster centre.
2628
27- :class: `KMedoids ` is also different from K-Medians, which is analogous to :class: `KMeans `
29+ :class: `KMedoids ` is also different from K-Medians, which is analogous to :class: `KMeans <sklearn.cluster.KMeans> `
2830except that the Manhattan Median is used for each cluster center instead of
2931the centroid. K-Medians is robust to outliers, but it is limited to the
30- Manhattan Distance metric and, similar to :class: `KMeans `, it does not guarantee
32+ Manhattan Distance metric and, similar to :class: `KMeans <sklearn.cluster.KMeans> `, it does not guarantee
3133that the center of each cluster will be a member of the original dataset.
3234
3335The complexity of K-Medoids is :math: `O(N^2 K T)` where :math: `N` is the number
3436of samples, :math: `T` is the number of iterations and :math: `K` is the number of
3537clusters. This makes it more suitable for smaller datasets in comparison to
36- :class: `KMeans ` which is :math: `O(N K T)`.
38+ :class: `KMeans <sklearn.cluster.KMeans> ` which is :math: `O(N K T)`.
3739
3840.. topic :: Examples:
3941
@@ -60,12 +62,12 @@ when speed is an issue.
6062* PAM method works as follows:
6163
6264 * Initialize: Greedy initialization of ``n_clusters ``. First select the point
63- in the dataset that minimize the sum of distances to a point. Then, add one
64- point that minimize the cost and loop until ``n_clusters `` point are selected.
65+ in the dataset that minimizes the sum of distances to a point. Then, add one
66+ point that minimizes the cost and loop until ``n_clusters `` points are selected.
6567 This is the ``init `` parameter called ``build ``.
66- * Swap Step: for all medoids already selected, compute the cost of swaping this
67- medoid with any non-medoid point. Then, make the swap that decrease the cost
68- the moste . Loop and stop when there is no change anymore.
68+ * Swap Step: for all medoids already selected, compute the cost of swapping this
69+ medoid with any non-medoid point. Then, make the swap that decreases the cost
70+ the most . Loop and stop when there is no change anymore.
6971
7072.. topic :: References:
7173
0 commit comments