All Classes Namespaces Files Functions Variables Typedefs Enumerations Friends Macros Pages
k-means clustering

Problem definition.

In the k-means clustering problem we are given a set of observation \((x_1,x_2, ...x_n)\), where each observation is d-dimensional real vector. k-means clustering aims to partition the n observations into k sets \((k<=n)\). \(S ={S_1,S_2,...,S_k}\) so as to minimize the within-cluster sum of squares(WCSS):

\begin{eqnarray*} \mbox{argmin}_{S}\sum\limits_{i=1}^k\sum_{x_j\in S_i} ||x_j-\mu_i ||^2 & \\ \end{eqnarray*}

where \(\mu\) is the mean of points in \(S_i\)

Solution

TODO

Example:

#include <vector>
#include <iostream>
int main() {
using Point = std::vector<double>;
// sample data
unsigned const int NUMBER_OF_CLUSTER = 2;
std::vector<Point> points = { { 0, 0 },
{ 0, 3 },
{ 4, 0 } };
std::vector<std::pair<Point , int>> point_cluster_pair;
std::vector<Point> centers(NUMBER_OF_CLUSTER);
// solution
paal::get_random_centers(points, NUMBER_OF_CLUSTER, centers.begin());
paal::k_means(points , centers,
back_inserter(point_cluster_pair));
for (auto i : point_cluster_pair) {
for(auto && j : i.first) {
std::cout << j << ",";
}
std::cout<< " " << i.second << std::endl;
}

complete example is k_means_clustering_example.cpp

Parameters

IN: ObservationRange observations

IN: GetCoordinates get_coordinates

IN: const unsigned int number_of_cluster

OUT: OutputIterator result

The Complexity

TODO

References

TODO