K-center

Problem definition.

In the k-center problem we are given a set of items $$I$$, metric $$d$$ and number $$k$$. We want to select $$k$$ clusters' centers (items), in such a way that the maximum distance of items to clusters' centers is minimized.

In geometric terms, our goal is to find the centers of $$k$$ different balls of same radius that cover all points so that their radius is as small as possible

More formally: Given set of items $$I$$ and metric $$d(v_i,v_j)$$, find a subset $$S \subseteq I$$ with $$|S|=k$$ that minimizes $$max_{v\in I}min_{s\in S}d(v,s)$$

Solution

Here we implement the following algorithm:

1. We select first item arbitrary, and make it a cluster by putting it into $$S$$;
2. We find next cluster center by finding an item that is as fa as possible from all other cluster centers and add it to $$S$$.

We repeat step 2 until $$|S|=k$$. When the algorithm finishes we return clusters in $$S$$.

Pseudocode:

\begin{eqnarray*} & & \\ & & \textbf{Pick arbitrary: } {\color{red}i\in \color{blue}I}\\ & & \color{blue}S \leftarrow \{\color{red}i\}\\ & & \textbf{while } |\color{blue}S| < \color{red}k \textbf{ do}\\ & & \hspace{1cm}\color{red}j\leftarrow \mbox{argmax}_{\color{red}j \in \color{blue}I } d(\color{red}j,\color{blue}S)\\ & & \hspace{1cm}\color{blue}S\leftarrow \color{blue}S \cup \{\color{red}j\}\\ \end{eqnarray*}

Example

#include <iostream>
#include <vector>
int main() {
// sample data
const int parts = 2;
m(0, 1) = 3;
m(0, 2) = 4;
m(1, 2) = 5;
m(1, 0) = 3;
m(2, 0) = 4;
m(2, 1) = 5;
auto vertices = paal::irange(3);
std::vector<int> centers;
// solution
std::cout << paal::greedy::kCenter(m, parts, vertices.begin(),
vertices.end(), back_inserter(centers))
<< std::endl;
}

example file is k_center_example.cpp

Parameters

IN: const Metric& metric

IN: unsigned int numberOfClusters

OUT: OutputIterator result

IN: const ItemIterator iBegin

IN: const ItemIterator iEnd

The Items that are selected to be centers will be output to the output iterator result.

The iterator type must be a model of Output Iterator

Approximation Ratio

Approximation ratio of this algorithm is equal to 2.

Complexity

The time complexity of the algorithm is $$O(|I|*k)$$, where $$I$$ is the number of items and $$k$$ is the number of clusters' centers.

The memory complexity of the algorithm is $$O(|I|)$$, where $$I$$ is the number of items

References

The algorithm analysis is described in [23]