Journal Home > Volume 18 , Issue 4

Information networks that can be extracted from many domains are widely studied recently. Different functions for mining these networks are proposed and developed, such as ranking, community detection, and link prediction. Most existing network studies are on homogeneous networks, where nodes and links are assumed from one single type. In reality, however, heterogeneous information networks can better model the real-world systems, which are typically semi-structured and typed, following a network schema. In order to mine these heterogeneous information networks directly, we propose to explore the meta structure of the information network, i.e., the network schema. The concepts of meta-paths are proposed to systematically capture numerous semantic relationships across multiple types of objects, which are defined as a path over the graph of network schema. Meta-paths can provide guidance for search and mining of the network and help analyze and understand the semantic meaning of the objects and relations in the network. Under this framework, similarity search and other mining tasks such as relationship prediction and clustering can be addressed by systematic exploration of the network meta structure. Moreover, with user’s guidance or feedback, we can select the best meta-path or their weighted combination for a specific mining task.


menu
Abstract
Full text
Outline
About this article

Meta-Path-Based Search and Mining in Heterogeneous Information Networks

Show Author's information Yizhou Sun( )Jiawei Han
College of Computer and Information Science, Northeastern University, Boston, MA 02115, USA
Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA

Abstract

Information networks that can be extracted from many domains are widely studied recently. Different functions for mining these networks are proposed and developed, such as ranking, community detection, and link prediction. Most existing network studies are on homogeneous networks, where nodes and links are assumed from one single type. In reality, however, heterogeneous information networks can better model the real-world systems, which are typically semi-structured and typed, following a network schema. In order to mine these heterogeneous information networks directly, we propose to explore the meta structure of the information network, i.e., the network schema. The concepts of meta-paths are proposed to systematically capture numerous semantic relationships across multiple types of objects, which are defined as a path over the graph of network schema. Meta-paths can provide guidance for search and mining of the network and help analyze and understand the semantic meaning of the objects and relations in the network. Under this framework, similarity search and other mining tasks such as relationship prediction and clustering can be addressed by systematic exploration of the network meta structure. Moreover, with user’s guidance or feedback, we can select the best meta-path or their weighted combination for a specific mining task.

Keywords: heterogeneous information network, meta-path, similarity search, relationship prediction, user-guided clustering

References(20)

[1]
Y. Sun, J. Han, P. Zhao, Z. Yin, H. Cheng, and T. Wu, RankClus: Integrating clustering with ranking for heterogeneous information network analysis, in Proc. 2009 Int. Conf. Extending Data Base Technology (EDBT’09), Saint-Petersburg, Russia, Mar. 2009.
DOI
[2]
Y. Sun, B. Norick, J. Han, X. Yan, P. S. Yu, and X. Yu, Integrating meta-path selection with user guided object clustering in heterogeneous information networks, in Proc. of 2012 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD’12), Beijing, China, Aug. 2012.
DOI
[3]
Y. Sun, Y. Yu, and J. Han, Ranking-based clustering of heterogeneous information networks with star network schema, in Proc. 2009 ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD’09), Paris, France, June 2009.
DOI
[4]
H. Deng, J. Han, M. R. Lyu, and I. King, Modeling and exploiting heterogeneous bibliographic networks for expertise ranking, in Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’12), 2012, pp. 71-80.
DOI
[5]
H. Deng, J. Han, B. Zhao, Y. Yu, and C. X. Lin, Probabilistic topic models with biased propagation on heterogeneous information networks, in Proc. 2011 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD’11), San Diego, CA, USA, Aug. 2011.
DOI
[6]
Y. Sun, J. Han, J. Gao, and Y. Yu, Itopicmodel: Information network-integrated topic modeling, in Proc. 2009 Int. Conf. Data Mining (ICDM’09), Miami, FL, USA, Dec. 2009.
DOI
[7]
M. Ji, J. Han, and M. Danilevsky, Ranking-based classification of heterogeneous information networks, in Proc. 2011 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD’11), San Diego, CA, Aug. 2011.
DOI
[8]
M. Ji, Y. Sun, M. Danilevsky, J. Han, and J. Gao, Graph regularized transductive classification on heterogeneous information networks, in Proc. 2010 European Conf. Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD’10), Barcelona, Spain, Sept. 2010.
DOI
[9]
Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu, PathSim: Meta path-based top-k similarity search in heterogeneous information networks, in Proc. 2011 Int. Conf. Very Large Data Bases (VLDB’11), Seattle, WA, USA, Aug. 2011.
DOI
[10]
X. Yu, Y. Sun, B. Norick, T. Mao, and J. Han. User guided entity similarity search using meta-path selection in heterogeneous information networks, in Proc. 2012 Int. Conf. on Information and Knowledge Management (CIKM’12), Maui, Hawaii, USA, Oct. 2012.
DOI
[11]
Y. Sun, R. Barber, M. Gupta, C. Aggarwal, and J. Han, Co-author relationship prediction in heterogeneous bibliographic networks, in Proc. 2011 Int. Conf. Advances in Social Network Analysis and Mining (ASONAM’11), Kaohsiung, China, July 2011.
DOI
[12]
Y. Sun, J. Han, C. C. Aggarwal, and N. Chawla, When will it happen? Relationship prediction in heterogeneous information networks, in Proc. 2012 ACM Int. Conf. on Web Search and Data Mining (WSDM’12), Seattle, WA, USA, Feb. 2012.
DOI
[13]
G. Jeh and J. Widom, Scaling personalized web search, in Proc. 2003 Int. World Wide Web Conf. (WWW’03), Budapest, Hungary, May 2003.
DOI
[14]
G. Jeh and J. Widom, Simrank: A measure of structural-context similarity, in Proc. 2002 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD’02), Edmonton, Canada, July 2002.
DOI
[15]
C. Shi, X. Kong, P. S. Yu, S. Xie, and B. Wu, Relevance search in heterogeneous networks, in Proc. 2012 Int. Conf. on Extending Database Technology (EDBT’12), Berlin, Germany, March 2012, pp. 180-191.
DOI
[16]
A. J. Dobson, An Introduction to Generalized Linear Models, Second Edition. Chapman & Hall/CRC, 2001.
DOI
[17]
E. M. Rogers, Diffusion of Innovations, 5th Edition. Free Press, 2003.
[18]
N. A. Christakis and J. H. Fowler, The spread of obesity in a large social network over 32 years, The New England Journal of Medicine, vol. 357, no. 4, pp. 370-379, 2007.
[19]
X. Yu, X. Ren, Y. Sun, B. Sturt, U. Khandelwal, Q. Gu, B. Norick, and J. Han, HeteRec: Entity recommendation in heterogeneous information networks with implicit user feedback, in Proc. of 2013 ACM Int. Conf. Series on Recommendation Systems (RecSys’13), Hong Kong, China, Oct. 2013.
DOI
[20]
X. Yu, Y. Sun, P. Zhao, and J. Han, Query-driven discovery of semantically similar substructures in heterogeneous networks, in Proc. of 2012 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD’12), Beijing, China, Aug. 2012.
DOI
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 17 July 2013
Accepted: 17 July 2013
Published: 05 August 2013
Issue date: August 2013

Copyright

© The author(s) 2013

Acknowledgements

The work was supported in part by the U.S. Army Research Laboratory under Cooperative Agreement No. W911NF-09-2-0053 (NS-CTA), NSF IIS-0905215, CNS-09-31975, and MIAS, a DHS-IDS Center for Multimodal Information Access and Synthesis at UIUC.

Rights and permissions

Return