Journal Home > Volume 19 , Issue 1

The design and implementation of a scalable parallel mining system target for big graph analysis has proven to be challenging. In this study, we propose a parallel data mining system for analyzing big graph data generated on a Bulk Synchronous Parallel (BSP) computing model named BSP-based Parallel Graph Mining (BPGM). This system has four sets of parallel graph mining algorithms programmed in the BSP parallel model and a well-designed workflow engine optimized for cloud computing to invoke these algorithms. Experimental results show that the graph mining algorithm components in BPGM are efficient and have better performance than big cloud-based parallel data miner and BC-BSP.


menu
Abstract
Full text
Outline
About this article

BPGM: A Big Graph Mining Tool

Show Author's information Yang LiuBin Wu( )Hongxu WangPengjiang Ma
School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China

Abstract

The design and implementation of a scalable parallel mining system target for big graph analysis has proven to be challenging. In this study, we propose a parallel data mining system for analyzing big graph data generated on a Bulk Synchronous Parallel (BSP) computing model named BSP-based Parallel Graph Mining (BPGM). This system has four sets of parallel graph mining algorithms programmed in the BSP parallel model and a well-designed workflow engine optimized for cloud computing to invoke these algorithms. Experimental results show that the graph mining algorithm components in BPGM are efficient and have better performance than big cloud-based parallel data miner and BC-BSP.

Keywords: cloud computing, data mining, parallel algorithms, graph data analysis, social network analysis

References(21)

[1]
J. Dean and G. Sanjay, MapReduce: Simplified data processing on large clusters, Communications of the ACM, vol. 51, no. 1, pp. 107-113, 2008.
[2]
S. Marc, S. W. Otto, D. W. Walker, J. Dongarra, and S. Huss-Lederman, MPI: The Complete Reference. MIT press, 1995.
[3]
L. G. Valiant, A bridging model for parallel computation, Communications of the ACM, vol. 33, no. 8, pp. 103-111, 1990.
[4]
S. Owen, A. Robin, T. Dunning, and E. Friedman, Mahout in Action. Manning, 2011.
[5]
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein, Graphlab: A new framework for parallel machine learning, arXiv preprint., arXiv:1006.4990, 2010.
[6]
U. Kang, C. E. Tsourakakis, and C. Faloutsos, Pegasus: A peta-scale graph mining system implementation and observations, in Data Mining, 2009 ICDM’09 Ninth IEEE International Conference on, 2009.
DOI
[7]
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, Dryad: Distributed data-parallel programs from sequential building blocks, ACM SIGOPS Operating Systems Review, vol. 41, no. 3, pp. 59-72, 2007.
[8]
L. Yu, J. Zheng, W. Shen, B. Wu, B. Wang, L. Qian, and B. Zhang, BC-PDM: Data mining, social network analysis and text mining system based on cloud computing, presented at the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2012.
DOI
[9]
G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, Pregel: A system for large-scale graph processing, in Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, 2010.
DOI
[10]
Y. Bao, Z. Wang, Y. Gu, G. Yu, F. Leng, H. Zhang, B. Chen, C. Deng, and L. Guo, BC-BSP: A BSP-based parallel iterative processing system for big data on cloud architecture, in Proc. Database Systems for Advanced Applications, Springer Berlin Heidelberg, 2013.
DOI
[11]
S. Seo, E. J. Yoon, J. Kim, S. Jin, J. Kim, and S. Maeng, Hama: An efficient matrix computation with the mapreduce framework, in Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on, 2010.
DOI
[12]
L. Page, S. Brin, R. Motwani, and T. Winograd, The PageRank citation ranking: Bringing order to the web, Technical report, SIDL-WP-1999-0120, Stanford University, 1998.
[13]
J. Pan, H. J. Yang, C. Faloutsos, and P. Duygulu, Automatic multimedia cross-modal correlation discovery, in Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, 2004, pp. 653-658.
DOI
[14]
J. M. Kleinberg, Authoritative sources in a hyperlinked environment, Journal of the ACM, vol. 46, no. 5, pp. 604-632, 1999.
[15]
M. Girvan and M. E. J. Newman, Community structure in social and biological networks, Proceedings of the National Academy of Sciences, vol. 99, no. 12, pp. 7821-7826, 2002.
[16]
A. Clauset, M. E. J. Newman, and C. Moore, Finding community structure in very large networks, Physical Review E, vol. 70, no. 6, pp. 1-6, 2004.
[17]
M. E. J. Newman and M. Girvan, Finding and evaluating community structure in networks, Physical Review E, vol. 69, no. 2, pp. 1-15, 2004.
[18]
U. N. Raghavan, R. Albert, and S. Kumara, Near linear time algorithm to detect community structures in large-scale networks, Physical Review E, vol. 76, no. 3, pp. 1-11, 2007.
[19]
Z. Zeng, B. Wu, and H. Wang, A parallel graph partitioning algorithm to speed up the largescale distributed graph mining, in The 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, 2012.
DOI
[20]
G. Karypis and V. Kumar, Metis-unstructured graph partitioning and sparse matrix ordering system, version 2.0, http://dm.kaist.ac.kr/kse625/resources/metis.pdf, 1995.
[21]
J. Yang and D. Zhang, Lightweight workflow engine based on Hadoop and OSGI, presented at the 5th IEEE International Conference on Broadband Network & Multimedia Technology, Beijing, China, 2013.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 13 December 2013
Revised: 23 December 2013
Accepted: 24 December 2013
Published: 07 February 2014
Issue date: February 2014

Copyright

© The author(s) 2014

Acknowledgements

This work was supported by the National Key Basic Research and Department (973) Program of China (No. 2013CB329603) and the National Natural Science Foundation of China (Nos. 61074128, 61375058, and 71231002).

Rights and permissions

Return