BPGM: A Big Graph Mining Tool

Yang Liu; Bin Wu; Hongxu Wang; Pengjiang Ma

doi:10.1109/TST.2014.6733206

Tsinghua Science and Technology 2014, 19(1): 33-38 https://doi.org/10.1109/TST.2014.6733206

Open Access | Issue | Published: 07 February 2014

BPGM: A Big Graph Mining Tool

Show Author's Information Hide Author's Information Yang Liu, Bin Wu(

), Hongxu Wang, Pengjiang Ma

School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China

Keywords:

cloud computing, data mining, parallel algorithms, graph data analysis, social network analysis

Cite this article:

Liu Y, Wu B, Wang H, et al. BPGM: A Big Graph Mining Tool. Tsinghua Science and Technology, 2014, 19(1): 33-38. https://doi.org/10.1109/TST.2014.6733206

Download citation

EndNote(RIS)

BibTeX

435

Views

Downloads

Citations

Crossref

N/A

WoS

Scopus

CSCD

Abstract Full text About this article

Abstract

The design and implementation of a scalable parallel mining system target for big graph analysis has proven to be challenging. In this study, we propose a parallel data mining system for analyzing big graph data generated on a Bulk Synchronous Parallel (BSP) computing model named BSP-based Parallel Graph Mining (BPGM). This system has four sets of parallel graph mining algorithms programmed in the BSP parallel model and a well-designed workflow engine optimized for cloud computing to invoke these algorithms. Experimental results show that the graph mining algorithm components in BPGM are efficient and have better performance than big cloud-based parallel data miner and BC-BSP.

Full text

Abstract

Full text

Outline

About this article

BPGM: A Big Graph Mining Tool

Show Author's information Hide Author's Information Yang Liu, Bin Wu(

), Hongxu Wang, Pengjiang Ma

School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China

Abstract

Keywords: cloud computing, data mining, parallel algorithms, graph data analysis, social network analysis

References(21)

[1]

J. Dean and G. Sanjay, MapReduce: Simplified data processing on large clusters, Communications of the ACM, vol. 51, no. 1, pp. 107-113, 2008.

DOI Google Scholar

[2]

S. Marc, S. W. Otto, D. W. Walker, J. Dongarra, and S. Huss-Lederman, MPI: The Complete Reference. MIT press, 1995.

[3]

L. G. Valiant, A bridging model for parallel computation, Communications of the ACM, vol. 33, no. 8, pp. 103-111, 1990.

DOI Google Scholar

[4]

S. Owen, A. Robin, T. Dunning, and E. Friedman, Mahout in Action. Manning, 2011.

[5]

Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein, Graphlab: A new framework for parallel machine learning, arXiv preprint., arXiv:1006.4990, 2010.

[6]

U. Kang, C. E. Tsourakakis, and C. Faloutsos, Pegasus: A peta-scale graph mining system implementation and observations, in Data Mining, 2009 ICDM’09 Ninth IEEE International Conference on, 2009.

DOI

[7]

M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, Dryad: Distributed data-parallel programs from sequential building blocks, ACM SIGOPS Operating Systems Review, vol. 41, no. 3, pp. 59-72, 2007.

DOI Google Scholar

[8]

L. Yu, J. Zheng, W. Shen, B. Wu, B. Wang, L. Qian, and B. Zhang, BC-PDM: Data mining, social network analysis and text mining system based on cloud computing, presented at the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2012.

DOI

[9]

G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, Pregel: A system for large-scale graph processing, in Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, 2010.

DOI

[10]

Y. Bao, Z. Wang, Y. Gu, G. Yu, F. Leng, H. Zhang, B. Chen, C. Deng, and L. Guo, BC-BSP: A BSP-based parallel iterative processing system for big data on cloud architecture, in Proc. Database Systems for Advanced Applications, Springer Berlin Heidelberg, 2013.

DOI

[11]

S. Seo, E. J. Yoon, J. Kim, S. Jin, J. Kim, and S. Maeng, Hama: An efficient matrix computation with the mapreduce framework, in Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on, 2010.

DOI

[12]

L. Page, S. Brin, R. Motwani, and T. Winograd, The PageRank citation ranking: Bringing order to the web, Technical report, SIDL-WP-1999-0120, Stanford University, 1998.

[13]

J. Pan, H. J. Yang, C. Faloutsos, and P. Duygulu, Automatic multimedia cross-modal correlation discovery, in Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, 2004, pp. 653-658.

DOI

[14]

J. M. Kleinberg, Authoritative sources in a hyperlinked environment, Journal of the ACM, vol. 46, no. 5, pp. 604-632, 1999.

DOI Google Scholar

[15]

M. Girvan and M. E. J. Newman, Community structure in social and biological networks, Proceedings of the National Academy of Sciences, vol. 99, no. 12, pp. 7821-7826, 2002.

DOI Google Scholar

[16]

A. Clauset, M. E. J. Newman, and C. Moore, Finding community structure in very large networks, Physical Review E, vol. 70, no. 6, pp. 1-6, 2004.

DOI Google Scholar

[17]

M. E. J. Newman and M. Girvan, Finding and evaluating community structure in networks, Physical Review E, vol. 69, no. 2, pp. 1-15, 2004.

DOI Google Scholar

[18]

U. N. Raghavan, R. Albert, and S. Kumara, Near linear time algorithm to detect community structures in large-scale networks, Physical Review E, vol. 76, no. 3, pp. 1-11, 2007.

DOI Google Scholar

[19]

Z. Zeng, B. Wu, and H. Wang, A parallel graph partitioning algorithm to speed up the largescale distributed graph mining, in The 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, 2012.

DOI

[20]

G. Karypis and V. Kumar, Metis-unstructured graph partitioning and sparse matrix ordering system, version 2.0, http://dm.kaist.ac.kr/kse625/resources/metis.pdf, 1995.

[21]

J. Yang and D. Zhang, Lightweight workflow engine based on Hadoop and OSGI, presented at the 5th IEEE International Conference on Broadband Network & Multimedia Technology, Beijing, China, 2013.

About this article

Publication history

Acknowledgements

Rights and permissions

Publication history

Received: 13 December 2013

Revised: 23 December 2013

Accepted: 24 December 2013

Published: 07 February 2014

Issue date: February 2014

Copyright

Acknowledgements

This work was supported by the National Key Basic Research and Department (973) Program of China (No. 2013CB329603) and the National Natural Science Foundation of China (Nos. 61074128, 61375058, and 71231002).