In the sketch problem we are given matrix \(A\). Our goal is to find matrix \(B\) which is significantly smaller than \(A\) and approximates it well.
We solve the problem using frequent-directions algorithm. We start with zero matrix \(B\). Rows from matrix \(A\) replace all-zero valued rows of sketch \(B\). If there is no zero valued row we nullify some of them in two steps:
Default \(compress\_size\) is half of number of rows in sketch.
Pseudocode:
\begin{eqnarray*} & & \\ & & \color{blue}B \leftarrow all\ zeros\ matrix\ R^{\ l \times m}\\ & & \textbf{for } \color{red}i \in [\color{red}n] \textbf{ do}\\ & & \hspace{1cm}Insert\ \color{blue}{A_i}\ into\ a\ zero\ valued\ row\ of\ \color{blue}B\\ & & \hspace{1cm}\textbf{if }\color{blue}B\ has\ no\ zero\ valued\ rows \textbf{ then}\\ & & \hspace{2cm}[\color{blue}U, \color{blue}\Sigma, \color{blue}V] \leftarrow SVD(\color{blue}B)\\ & & \hspace{2cm}\color{red}\delta \leftarrow \color{red}\sigma^2_{compress\_size}\\ & & \hspace{2cm}\color{blue}{\hat{\Sigma}} \leftarrow \sqrt{max(\color{blue}\Sigma^2 - I_l\color{red}\delta,0)}\\ & & \hspace{2cm}\color{blue}B \leftarrow \color{blue}{\hat{\Sigma}V}^T\\ \end{eqnarray*}
example file is frequent_directions_example.cpp
There are four make functions constructing the frequent_directions model. First two with user matrix as sketch:
auto make_frequent_directions(Matrix matrix); auto make_frequent_directions(Matrix matrix, size_t const compress_size);
And other which creates matrix of given sizes:
auto make_frequent_directions(size_t rows_count, size_t columns_count); auto make_frequent_directions(size_t rows_count, size_t columns_count, size_t const compress_size);
The model can be updated with additional data by using following methods:
void update(MatrixData&& matrix); void update_row(InputRow&& input_row); void update_range(RowRange&& row_range);
IN: Matrix matrix - starting sketch matrix
IN: size_t compress_size
IN: size_t rows_count - numer of rows in a sketch
IN: size_t columns_count - number of columns in a sketch
IN: MatrixData &&matrix - data to update sketch in matrix form
IN: InputRow &&input_row - single row to update
IN: RowRange &&row_range - range of rows to update
The solution can be used as a binary program frequent-directions which supports:
For more details on usage, please run the binary program with --help option.
After adding all data and running compress phase at the end, we get:
\begin{eqnarray*} &0\preceq B^TB \preceq A^TA\\ and&\\ &\Vert A^TA - B^TB \Vert \leq \Vert A \Vert ^2_f / compress\_size\\ \end{eqnarray*}
Complexity of the algorithm is \(O(n*m*l / (1-\frac{compress\_size}{l}))\) where \(n\) is number of rows in matrix \(A\), \(m\) is number of columns in matrix \(A\), \(l\) is number of rows in sketch \(B\) and \(compress\_size\) is number of nonzero rows after compress phase.
The algorithm analysis is described in [14]