k-means clustering

# Problem definition.

In the k-means clustering problem we are given a set of observation $$(x_1,x_2, ...x_n)$$, where each observation is d-dimensional real vector. k-means clustering aims to partition the n observations into k sets $$(k<=n)$$. $$S ={S_1,S_2,...,S_k}$$ so as to minimize the within-cluster sum of squares(WCSS):

\begin{eqnarray*} \mbox{argmin}_{S}\sum\limits_{i=1}^k\sum_{x_j\in S_i} ||x_j-\mu_i ||^2 & \\ \end{eqnarray*}

where $$\mu$$ is the mean of points in $$S_i$$

# Solution

TODO

Example:

#include <vector>
#include <iostream>
int main() {
using Point = std::vector<double>;
// sample data
unsigned const int NUMBER_OF_CLUSTER = 2;
std::vector<Point> points = { { 0, 0 },
{ 0, 3 },
{ 4, 0 } };
std::vector<std::pair<Point , int>> point_cluster_pair;
std::vector<Point> centers(NUMBER_OF_CLUSTER);
// solution
paal::get_random_centers(points, NUMBER_OF_CLUSTER, centers.begin());
paal::k_means(points , centers,
back_inserter(point_cluster_pair));
for (auto i : point_cluster_pair) {
for(auto && j : i.first) {
std::cout << j << ",";
}
std::cout<< " " << i.second << std::endl;
}

complete example is k_means_clustering_example.cpp

## Parameters

IN: ObservationRange observations

IN: GetCoordinates get_coordinates

IN: const unsigned int number_of_cluster

OUT: OutputIterator result

TODO