Discovery of factors in binary data triangular decomposition of matrices

Radim Belohlavek.

We present new methods of decomposition of an n x m binary matrix I into a product A * B of an n x k binary matrix A and a k x m binary matrix B. These decompositions are alternative to the usual one which is sought in Boolean factor analysis (BFA), where * is a Boolean product of matrices. In the new decompositions,  are the left and the right triangular products of Boolean matrices. In BFA, I is interpreted as object x attribute matrix, A and B are interpreted as object x factor and factor x attribute matrices, and the aim is to find a decomposition with the number k of factors as small as possible. The new decompositions have different semantics from the one with Boolean matrix product. The presented methods are optimal in that they provide the smallest number k possible. We provide a geometric insight showing that the factors correspond to I-beam- and H-beam-shaped patterns in the input matrix. We present an approximation algorithm for computing the optimal decomposition. The algorithm is based on a greedy approximation algorithm for set covering optimization problem. We demonstrate our results by illustrative examples.

PDF full paper