Problem definition.

In the k-means clustering problem we are given a set of observation \((x_1,x_2, ...x_n)\), where each observation is d-dimensional real vector. k-means clustering aims to partition the n observations into k sets \((k<=n)\). \(S ={S_1,S_2,...,S_k}\) so as to minimize the within-cluster sum of squares(WCSS):

\begin{eqnarray*} \mbox{argmin}_{S}\sum\limits_{i=1}^k\sum_{x_j\in S_i} ||x_j-\mu_i ||^2 & \\ \end{eqnarray*}

where \(\mu\) is the mean of points in \(S_i\)

Solution

TODO

Example:

#include <vector>
#include <iostream>
#include "paal/clustering/k_means_clustering.hpp"
#include "paal/utils/functors.hpp"
int main() {
    using Point = std::vector<double>;
    // sample data
    unsigned const int NUMBER_OF_CLUSTER = 2;
    std::vector<Point> points = { { 0, 0 },
                                  { 0, 3 },
                                  { 4, 0 } };
    std::vector<std::pair<Point , int>> point_cluster_pair;
    std::vector<Point> centers(NUMBER_OF_CLUSTER);
    // solution
    paal::get_random_centers(points, NUMBER_OF_CLUSTER, centers.begin());
    paal::k_means(points , centers,
                  back_inserter(point_cluster_pair));
    for (auto i : point_cluster_pair) {
        for(auto && j : i.first) {
            std::cout << j << ",";
        }
        std::cout<< " " << i.second << std::endl;
    }

complete example is k_means_clustering_example.cpp

Parameters

IN: ObservationRange observations

IN: GetCoordinates get_coordinates

IN: const unsigned int number_of_cluster

OUT: OutputIterator result

The Complexity

TODO

References

TODO