Journal Home > Volume 27 , Issue 2

Heterogeneous Information Networks (HINs) contain multiple types of nodes and edges; therefore, they can preserve the semantic information and structure information. Cluster analysis using an HIN has obvious advantages over a transformation into a homogenous information network, which can promote the clustering results of different types of nodes. In our study, we applied a Nonnegative Matrix Tri-Factorization (NMTF) in a cluster analysis of multiple metapaths in HIN. Unlike the parameter estimation method of the probability distribution in previous studies, NMTF can obtain several dependent latent variables simultaneously, and each latent variable in NMTF is associated with the cluster of the corresponding node in the HIN. The method is suited to co-clustering leveraging multiple metapaths in HIN, because NMTF is employed for multiple nonnegative matrix factorizations simultaneously in our study. Experimental results on the real dataset show that the validity and correctness of our method, and the clustering result are better than that of the existing similar clustering algorithm.


menu
Abstract
Full text
Outline
About this article

Nonnegative Matrix Tri-Factorization Based Clustering in a Heterogeneous Information Network with Star Network Schema

Show Author's information Juncheng HuYongheng XingMo HanFeng Wang( )Kuo ZhaoXilong Che
College of Computer Science and Technology, Jilin University, Changchun 130012, China
School of Intelligent Systems Science and Engineering, Jinan University, Zhuhai 519070, China

Abstract

Heterogeneous Information Networks (HINs) contain multiple types of nodes and edges; therefore, they can preserve the semantic information and structure information. Cluster analysis using an HIN has obvious advantages over a transformation into a homogenous information network, which can promote the clustering results of different types of nodes. In our study, we applied a Nonnegative Matrix Tri-Factorization (NMTF) in a cluster analysis of multiple metapaths in HIN. Unlike the parameter estimation method of the probability distribution in previous studies, NMTF can obtain several dependent latent variables simultaneously, and each latent variable in NMTF is associated with the cluster of the corresponding node in the HIN. The method is suited to co-clustering leveraging multiple metapaths in HIN, because NMTF is employed for multiple nonnegative matrix factorizations simultaneously in our study. Experimental results on the real dataset show that the validity and correctness of our method, and the clustering result are better than that of the existing similar clustering algorithm.

Keywords: clustering, data mining, heterogeneous information network, nonnegative matrix tri-factorization

References(20)

[1]
F. Wang, L. Hu, J. Zhou, and K. Zhao, A survey from the perspective of evolutionary process in the internet of things, Int. J. Distrib. Sens. Netw., vol. 2015, p. 462752, 2015.
[2]
C. Shi, Y. T. Li, J. W. Zhang, Y. Z. Sun, and P. S. Yu, A survey of heterogeneous information network analysis, IEEE Trans. Knowl. Data Eng., vol. 29, no. 1, pp. 17-37, 2017.
[3]
Y. Bengio, A. Courville, and P. Vincent, Representation learning: A review and new perspectives, IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1798-1828, 2013.
[4]
K. Yang, J. H. Zhu, and X. Guo, POI neural-rec model via graph embedding representation, Tsinghua Science and Technology, vol. 26, no. 2, pp. 208-218, 2021.
[5]
M. Nickel, K. Murphy, V. Tresp, and E. Gabrilovich, A review of relational machine learning for knowledge graphs, Proc. IEEE, vol. 104, no. 1, pp. 11-33, 2016.
[6]
Y. Z. Sun, J. W. Han, P. X. Zhao, Z. J. Yin, H. Cheng, and T. Y. Wu, RankClus: Integrating clustering with ranking for heterogeneous information network analysis, in Proc. 12th Int. Conf. Extending Database Technology: Advances in Database Technology, Saint Petersburg, Russia, 2009, pp. 565-576.
DOI
[7]
Y. Z. Sun, Y. T. Yu, and J. W. Han, Ranking-based clustering of heterogeneous information networks with star network schema, in Proc. 15th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Paris, France, 2009, pp. 797-806.
DOI
[8]
Y. Z. Sun, B. Norick, J. W. Han, X. F. Yan, P. S. Yu, and X. Yu, PathSelClus: Integrating meta-path selection with user-guided object clustering in heterogeneous information networks, ACM Trans. Knowl. Discov. Data, vol. 7, no. 3, pp. 11, 2013.
[9]
L. Hu, G. Wu, Y. H. Xing, and F. Wang, Things2Vec: Semantic modeling in the internet of things with graph representation learning, IEEE Internet Things J., vol. 7, no. 3, pp. 1939-1948, 2020.
[10]
S. F. Hou, Y. F. Ye, Y. Q. Song, and M. Abdulhayoglu, HinDroid: An intelligent android malware detection system based on structured heterogeneous information network, in Proc. 23rd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Halifax, Canada, 2017, pp. 1507-1515.
DOI
[11]
X. L. Zhang, I. Baggili, and F. Breitinger, Breaking into the vault: Privacy, security and forensic analysis of Android vault applications, Comput. Secur., vol. 70, pp. 516-531, 2017.
[12]
Y. X. Wang and Y. J. Zhang, Nonnegative matrix factorization: A comprehensive review, IEEE Trans. Knowl. Data Eng., vol. 25, no. 6, pp. 1336-1353, 2013.
[13]
J. Yoo and S. Choi, Probabilistic matrix tri-factorization, presented at 2009 IEEE Int. Conf. Acoustics, Speech and Signal Proc., Taipei, China, 2009, pp. 1553-1556.
[14]
L. Hu, Y. H. Xing, Y. L. Gong, K. Zhao, and F. Wang, Nonnegative matrix tri-factorization with user similarity for clustering in point-of-interest, Neurocomputing, vol. 363, pp. 58-65, 2019.
[15]
Y. Z. Sun and J. W. Han. Meta-path-based search and mining in heterogeneous information networks, Tsinghua Science and Technology, vol. 18, no. 4, pp. 329-338, 2013.
[16]
C. Ding, T. Li, W. Peng, and H. Park, Orthogonal nonnegative matrix t-factorizations for clustering, in Proc. 12th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Philadelphia, PA, USA, 2006, pp. 126-135.
[17]
B. Long, Z. M. Zhang, and P. S. Yu, Co-clustering by block value decomposition, in Proc. 11th ACM SIGKDD Int. Conf. Knowledge Discovery in Data Mining,  Chicago,  IL, USA, 2005, pp. 635-640.
[18]
I. S. Dhillon, Co-clustering documents and words using bipartite spectral graph partitioning, in Proc. 7th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2001, pp. 269-274.
[19]
H. B. Deng, J. W. Han, B. Zhao, Y. T. Yu, and C. X. Lin, Probabilistic topic models with biased propagation on heterogeneous information networks, in Proc. 17th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, San Diego, CA, USA, 2011, pp. 1271-1279.
[20]
D. Cai, X. He, and J. Han, Document clustering using locality preserving indexing, IEEE Trans. Knowl. Data Eng., vol. 17, no. 12, pp. 1624-1637, 2005.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 20 September 2020
Accepted: 09 October 2020
Published: 29 September 2021
Issue date: April 2022

Copyright

© The author(s) 2022

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (No. 61701190), the Youth Science Foundation of Jilin Province of China (No. 20180520021JH), the National Key Research and Development Plan of China (No. 2017YFA0604500), the Key Scientific and Technological Research and Development Plan of Jilin Province of China (No. 20180201103GX), the China Postdoctoral Science Foundation (No. 2018M631873), the Project of Jilin Province Development and Reform Commission (No. 2019FGWTZC001), and the Key Technology Innovation Cooperation Project of Government and University for the Whole Industry Demonstration (No. SXGJSF2017-4).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return