Journal Home > Volume 3 , issue 1

Nonnegative Matrix Factorization (NMF) is a powerful technique to perform dimension reduction and pattern recognition through single-layer data representation learning. However, deep learning networks, with their carefully designed hierarchical structure, can combine hidden features to form more representative features for pattern recognition. In this paper, we proposed sparse deep NMF models to analyze complex data for more accurate classification and better feature interpretation. Such models are designed to learn localized features or generate more discriminative representations for samples in distinct classes by imposing L 1-norm penalty on the columns of certain factors. By extending a one-layer model into a multilayer model with sparsity, we provided a hierarchical way to analyze big data and intuitively extract hidden features due to nonnegativity. We adopted the Nesterov's accelerated gradient algorithm to accelerate the computing process. We also analyzed the computing complexity of our frameworks to demonstrate their efficiency. To improve the performance of dealing with linearly inseparable data, we also considered to incorporate popular nonlinear functions into these frameworks and explored their performance. We applied our models using two benchmarking image datasets, and the results showed that our models can achieve competitive or better classification performance and produce intuitive interpretations compared with the typical NMF and competing multilayer models.


menu
Abstract
Full text
Outline
About this article

Sparse Deep Nonnegative Matrix Factorization

Show Author's information Zhenxing GuoShihua Zhang( )
Academy of Mathematics and Systems Science, Chinese Academy of Sciences (CAS), Beijing 100190, China.
NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China, and also with the School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.

Abstract

Nonnegative Matrix Factorization (NMF) is a powerful technique to perform dimension reduction and pattern recognition through single-layer data representation learning. However, deep learning networks, with their carefully designed hierarchical structure, can combine hidden features to form more representative features for pattern recognition. In this paper, we proposed sparse deep NMF models to analyze complex data for more accurate classification and better feature interpretation. Such models are designed to learn localized features or generate more discriminative representations for samples in distinct classes by imposing L 1-norm penalty on the columns of certain factors. By extending a one-layer model into a multilayer model with sparsity, we provided a hierarchical way to analyze big data and intuitively extract hidden features due to nonnegativity. We adopted the Nesterov's accelerated gradient algorithm to accelerate the computing process. We also analyzed the computing complexity of our frameworks to demonstrate their efficiency. To improve the performance of dealing with linearly inseparable data, we also considered to incorporate popular nonlinear functions into these frameworks and explored their performance. We applied our models using two benchmarking image datasets, and the results showed that our models can achieve competitive or better classification performance and produce intuitive interpretations compared with the typical NMF and competing multilayer models.

Keywords:

sparse Nonnegative Matrix Factorization (NMF), deep learning, Nesterov’s accelerated gradient algorithm
Received: 27 April 2019 Revised: 15 July 2019 Accepted: 10 October 2019 Published: 19 December 2019 Issue date: March 2020
References(40)
[1]
D. D. Lee and H. S. Seung, Learning the parts of objects by non-negative matrix factorization, Nature, vol. 401, no. 6755, pp. 788-791, 1999.
[2]
D. D. Lee and and H. S. Seung, Algorithms for non-negative matrix factorization, in Proceedings of 14th Neural Information Processing Systems, Denver, CO, USA, 2000, pp. 556-562.
[3]
W. Xu, X. Liu, and Y. Gong, Document clustering based on non-negative matrix factorization, in Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, Canada, 2003, pp. 267-273.
[4]
V. P. Pauca, F. Shahnaz, M. W. Berry, and R. J. Plemmons, Text mining using non-negative matrix factorizations, in Proceedings of the 4th SIAM International Conference on Data Mining, Society for Industrial and Applied Mathematics, Lake Buena Vista, FL, USA, 2004, pp. 452-456.
[5]
F. Shahnaz, M. W. Berry, V. P. Pauca, and R. J. Plemmons, Document clustering using nonnegative matrix factorization, Information Processing and Management, vol. 42, no. 2, pp. 373-386, 2006.
[6]
Y. Wang, Y. Jia, C. Hu, and M. Turk, Non-negative matrix factorization framework for face recognition, International Journal of Pattern Recognition and Artificial Intelligence, vol. 19, no. 4, pp. 495-511, 2005.
[7]
D. Guillamet and J. Vitria, Classifying faces with nonnegative matrix factorization, in Proceedings of 5th Catalan Conference for Artificial Intelligence, Castelln, Spain, 2002, pp. 24-31.
[8]
J. P. Brunet, P. Tamayo, T. R. Golub, and J. P. Mesirov, Metagenes and molecular pattern discovery using matrix factorization, Proceedings of the National Academy of Sciences USA, vol. 101, no. 12, pp. 4164-4169, 2004.
[9]
Q. Qi, Y. Zhao, M. Li, and R. Simon, Non-negative matrix factorization of gene expression profiles: A plug-in for BRB-ArrayTools, Bioinformatics, vol. 25, no. 4, pp. 545-547, 2009.
[10]
P. O. Hoyer, Non-negative matrix factorization with sparseness constraints, Journal of Machine Learning Research, vol. 5, pp. 1457-1469, 2004.
[11]
H. Kim and H. Park, Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis, Bioinformatics, vol. 23, no. 12, pp. 1495-1502, 2007.
[12]
R. Peharz and F. Pernkopf, Sparse nonnegative matrix factorization with L0-constraints, Neurocomputing, vol. 80, pp. 38-46, 2012
[13]
D. Cai, X. He, X. Wu, and J. Han, Non-negative matrix factorization on manifold, in Proceedings of the 8th IEEE International Conference on Data Ming, Pisa, Italy, 2008, pp. 63-72.
[14]
D. Cai, X. He, J. Han, and T. S. Huang, Graph regularized nonnegative matrix factorization for data representation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 8, pp. 1548-1560, 2011.
[15]
S. Zhang, C. C. Liu, W. Li, H. Shen, P. Laird, and X. J. Zhou, Discovery of multi-dimensional modules by integrative analysis of cancer genomic data, Nucleic Acids Research, vol. 40, no. 19, pp. 9379-9391, 2012.
[16]
S. Zhang, Q. Li, J. Liu, and X. J. Zhou, Integrating multiple functional genomic data to define microRNA-genes regulatory modules by a sparse network-regularized multiple matrix factorization method, Bioinformatics, vol. 27, no. 13, pp. i401-i409, 2011.
[17]
Z. Yang and G. Michailidis, A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data, Bioinformatics, vol. 32, no. 1, pp. 1-8, 2016.
[18]
M. Žitnik and B. Zupan, Data fusion by matrix factorization, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 1, pp. 41-53, 2015.
[19]
G. E. Hinton and R. R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science, vol. 313, no. 5786, pp. 504-507, 2006.
[20]
G. E. Hinton, S. Osindero, and Y. W. Teh, A fast learning algorithm for deep belief nets, Neural Computation, vol. 18, no. 7, pp. 1527-1554, 2006.
[21]
A. R. Mohamed, G. Dahl, and G. E. Hinton, Deep belief networks for phone recognition, in Proceedings of the NIPS Workshop on Deep Learning for Speech Recognition and Related Applications, Vancouver, Canada, 2009, vol. 1, no. 9, p. 39.
[22]
A. R. Mohamed, G. Dahl, and G. E. Hinton, Acoustic modeling using deep belief networks, IEEE Transactions on Audio Speech Language Processing, vol. 20, no. 1, pp. 14-22, 2012.
[23]
G. E. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, et al., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82-97, 2012.
[24]
V. E. Liong, J. Lu, and G. Wang, Face recognition using deep PCA, in Proc. of 2013 9th IEEE International Conference on Information, Communications and Signal Processing (ICICS), Tainan, China, 2013, pp. 1-5.
[25]
T. H. Chan, K. Jia, S. Gao, J. Lu, Z. Zeng, and Y. Ma, PCANet: A simple deep learning baseline for image classification? IEEE Transactions on Image Processing, vol. 24, no. 12, pp. 5017-5032, 2015.
[26]
Z. H. Zhou and J. Feng, Deep forest: Towards an alternative to deep neural networks, arXiv preprint arXiv:1702.08835v1, 2017.
[27]
J. H. Ahn, S. Choi, and J. Oh, A multiplicative up-propagation algorithm, in Proceedings of the 21st International Conference on Machine Learning (ACM), Banff, Canada, 2004, p. 3.
[28]
H. A. Song, B. K. Kim, T. L. Xuan, and S. Y. Lee, Hierarchical feature extraction by multi-layer non-negative matrix factorization network for classification task, Neurocomputing, vol. 165, pp. 63-74, 2015.
[29]
G. Trigeorgis, K. Bousmalis, S. Zafeiriou, and B. W. Schuller, A deep matrix factorization method for learning attribute representations, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 3, pp. 417-429, 2017.
[30]
C. H. Ding, T. Li, and M. T. Jordan, Convex and semi-nonnegative matrix factorizations, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 1, pp. 45-55, 2010.
[31]
Y. Nesterov, Smooth minimization of non-smooth functions, Mathematical Programming, vol. 103, no. 1, pp. 127-152, 2005.
[32]
A. Pascual-Montano, J. M. Carazo, K. Kochi, D. Lehmann, and R. D. Pascual-Marqui, Nonsmooth nonnegative matrix factorization (nsNMF), IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 3, pp. 403-415, 2006.
[33]
C. Boutsidis and E. Gallopoulos, SVD-based initialization: A head start for nonnegative matrix factorization, Pattern Recognition, vol. 41, no. 4, pp. 1350-1362, 2008.
[34]
N. Guan, D. Tao, Z. Luo, and B. Yuan, NeNMF: An optimal gradient method for nonnegative matrix factorization, IEEE Transactions on Signal Processing, vol. 60, no. 6, pp. 2882-2898, 2012.
[35]
D. P. Bertsekas, Nonlinear Programming, 2nd ed. Belmont, MA, USA: Athena Scientific, 1999.
[36]
C. J. Lin, Projected gradient methods for nonnegative matrix factorization, Neural Computation, vol. 19, no. 10, pp. 2756-2779, 2007.
[37]
M. W. Berry, M. Browne, A. N. Langville, V. P. Pauca, and R. J. Plemmons, Algorithms and applications for approximate nonnegative matrix factorization, Computational Statistics and Data Analysis, vol. 52, no. 1, pp.155-173, 2007.
[38]
L. Grippo and M. Sciandrone, On the convergence of the block nonlinear Gauss-Seidel method under convex constraints, Operations Research Letters, vol. 26, no. 3, pp. 127-136, 2000.
[39]
L. Danon, A. Diaz-Guilera, J. Duch, and A. Arenas, Comparing community structure identification, Journal of Statistical Mechanics: Theory and Experiment, vol. 2005, no. 9, p. P09008, 2005.
[40]
Y. R. Lin, Y. Chi, S. Zhu, H. Sundaram, and B. L. Tseng, Analyzing communities and their evolutions in dynamic social networks, ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 3, no. 2, p. 8, 2009.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 27 April 2019
Revised: 15 July 2019
Accepted: 10 October 2019
Published: 19 December 2019
Issue date: March 2020

Copyright

© The author(s) 2020

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Nos. 11661141019 and 61621003), the National Ten Thousand Talent Program for Young Top-notch Talents, Chinese Academy Science (CAS) Frontier Science Research Key Project for Top Young Scientist (No. QYZDB-SSW-SYS008), and the Key Laboratory of Random Complex Structures and Data Science, CAS (No. 2008DP173182).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Reprints and Permission requests may be sought directly from editorial office.

Return