Problem definition.
In the k-means clustering problem we are given:
- a set of observation \((x_1,x_2, ...x_n)\)
- function given distances between observation and center \(d\).
- function given center for cluster \(c\).
function given start position of cluster center \(c\). k-means clustering aims to partition the n observations into k sets \((k<=n)\). \(S ={S_1,S_2,...,S_k}\) and give for each set center so as to minimize the within-cluster sum of distances from center to all observation:
\begin{eqnarray*} \mbox{argmin}_{S}\sum\limits_{i=1}^k\sum_{x_j\in S_i} d(x_j-c(S_i)) & \\ \end{eqnarray*}
Solution
TODO
Example:
#include <vector>
#include <limits>
#include <cmath>
int main() {
using Point = std::vector<double>;
const int NUMBER_OF_CLUSTER = 2;
std::vector<Point> points = { { 0, 0 },
{ 0, 3 },
{ 4, 0 } };
std::vector<Point> centers(NUMBER_OF_CLUSTER);
std::vector<std::pair<Point, int>> point_cluster_pairs(points.size());
auto centroid = [&](const std::vector<std::vector<double>> &points) {
double best_dist = std::numeric_limits<double>::max();
std::vector<double> result;
for (auto && center : points) {
double dist = 1e99;
for (auto && point : points) {
}
if (dist < best_dist) {
best_dist = dist;
result = center;
}
}
assert(!result.empty());
return result;
};
};
point_cluster_pairs.begin());
complete example is k_means_clustering_engine_example.cpp
Parameters
IN: ObservationRange observation_range
IN: const unsigned int number_of_cluster
IN: GetCoordinates distance
IN: Center center functor return center of set of samples
IN: StarterCenter get_starter_center
OUT: OutputIterator result
The Complexity
TODO
References
TODO