Distributed and Weighted Extreme Learning Machine for Imbalanced Big Data Learning

Zhiqiong Wang; Junchang Xin; Hongxu Yang; Shuo Tian; Ge Yu; Chenren Xu; Yudong Yao

doi:10.23919/TST.2017.7889638

Tsinghua Science and Technology 2017, 22(2): 160-173 https://doi.org/10.23919/TST.2017.7889638

Open Access | Issue | Published: 06 April 2017

Distributed and Weighted Extreme Learning Machine for Imbalanced Big Data Learning

Show Author's Information Hide Author's Information Zhiqiong Wang, Junchang Xin(

), Hongxu Yang, Shuo Tian, Ge Yu, Chenren Xu, Yudong Yao

Sino-Dutch Biomedical & Information Engineering School, Northeastern University, Shenyang 110169, China.

School of Computer Science & Engineering, Northeastern University, Shenyang 110169, China.

School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China.

Department of Electrical and Computer Engineering, Stevens Institute of Technology, Castle Point on Hudson Hoboken, NJ 07030, USA.

Keywords:

weighted Extreme Learning Machine (ELM), imbalanced big data, MapReduce framework, user-defined counter

Cite this article:

Wang Z, Xin J, Yang H, et al. Distributed and Weighted Extreme Learning Machine for Imbalanced Big Data Learning. Tsinghua Science and Technology, 2017, 22(2): 160-173. https://doi.org/10.23919/TST.2017.7889638

Download citation

EndNote(RIS)

BibTeX

446

Views

Downloads

Citations

Crossref

N/A

WoS

Scopus

CSCD

Abstract Full text About this article

Abstract

The Extreme Learning Machine (ELM) and its variants are effective in many machine learning applications such as Imbalanced Learning (IL) or Big Data (BD) learning. However, they are unable to solve both imbalanced and large-volume data learning problems. This study addresses the IL problem in BD applications. The Distributed and Weighted ELM (DW-ELM) algorithm is proposed, which is based on the MapReduce framework. To confirm the feasibility of parallel computation, first, the fact that matrix multiplication operators are decomposable is illustrated. Then, to further improve the computational efficiency, an Improved DW-ELM algorithm (IDW-ELM) is developed using only one MapReduce job. The successful operations of the proposed DW-ELM and IDW-ELM algorithms are finally validated through experiments.

Full text

Abstract

Full text

Outline

About this article

Distributed and Weighted Extreme Learning Machine for Imbalanced Big Data Learning

Show Author's information Hide Author's Information Zhiqiong Wang, Junchang Xin(

), Hongxu Yang, Shuo Tian, Ge Yu, Chenren Xu, Yudong Yao

Sino-Dutch Biomedical & Information Engineering School, Northeastern University, Shenyang 110169, China.

School of Computer Science & Engineering, Northeastern University, Shenyang 110169, China.

School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China.

Department of Electrical and Computer Engineering, Stevens Institute of Technology, Castle Point on Hudson Hoboken, NJ 07030, USA.

Abstract

Keywords: weighted Extreme Learning Machine (ELM), imbalanced big data, MapReduce framework, user-defined counter

References(31)

[1]

Langley P., The changing science of machine learning, Mach. Learn., vol. 82, no. 3, pp. 275-279, 2011.

DOI Google Scholar

[2]

Huang G.-B., Zhu Q.-Y., and Siew C.-K., Extreme learning machine: Theory and applications, Neurocomputing, vol. 70, nos. 1–3, pp. 489-501, 2006.

DOI Google Scholar

[3]

Huang G.-B., Chen L., and Siew C.-K., Universal approximation using incremental constructive feedforward networks with random hidden nodes, IEEE Trans. Neural Netw., vol. 17, no. 4, pp. 879-892, 2006.

DOI Google Scholar

[4]

Huang G.-B. and Chen L., Convex incremental extreme learning machine, Neurocomputing, vol. 70, nos. 16–18, pp. 3056-3062, 2007.

DOI Google Scholar

[5]

Huang G.-B. and Chen L., Enhanced random search based incremental extreme learning machine, Neurocomputing, vol. 71, nos. 16–18, pp. 3460-3468, 2008.

DOI Google Scholar

[6]

Huang G.-B., Ding X., and Zhou H., Optimization method based extreme learning machine for classification, Neurocomputing, vol. 74, nos. 1–3, pp. 155-163, 2010.

DOI Google Scholar

[7]

Huang G.-B., Zhou H., Ding X., and Zhang R., Extreme learning machine for regression and multiclass classification, IEEE Trans. Syst. Man Cybern. Part B-Cybern., vol. 42, no. 2, pp. 513-529, 2012.

DOI Google Scholar

[8]

Huang G.-B., Wang D., and Lan Y., Extreme learning machines: A survey, Int. J. Mach. Learn. Cybern., vol. 2, no. 2, pp. 107-122, 2011.

DOI Google Scholar

[9]

Zhu Q.-Y., Qin A. K., Suganthan P. N., and Huang G.-B., Evolutionary extreme learning machine, Pattern Recognit., vol. 38, no. 10, pp. 1759-1763, 2005.

DOI Google Scholar

[10]

Huang G.-B., Liang N.-Y., Rong H.-J., Saratchandran P., and Sundararajan N., On-line sequential extreme learning machine, in Proc. of the IASTED Int. Conf. on Computational Intelligence, Calgary, Canada, 2005, pp. 232-237.

[11]

Liang N.-Y., Huang G.-B., Saratchandran P., and Sundararajan N., A fast and accurate on-line sequential learning algorithm for feedforward networks, IEEE Trans. Neural Netw., vol. 17, no. 6, pp. 1411-1423, 2006.

DOI Google Scholar

[12]

Rong H.-J., Huang G.-B., Sundararajan N., and Saratchandran P., On-line sequential fuzzy extreme learning machine for function approximation and classification problems, IEEE Trans. Syst. Man Cybern. Part B-Cybern., vol. 39, no. 4, pp. 1067-1072, 2009.

DOI Google Scholar

[13]

He H. and Garcia E. A., Learning from imbalanced data, IEEE Trans. Knowl. Data Eng., vol. 21, no. 9, pp. 1263-1284, 2009.

DOI Google Scholar

[14]

Liu X., Wu J., and Zhou Z., Exploratory undersampling for class-imbalance learning, IEEE Trans. Syst. Man Cybern. Part B-Cybern., vol. 39, no.2, pp. 539-550, 2006.

DOI Google Scholar

[15]

Han H., Wang W.-Y., and Mao B.-H., Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning, in Proc. of Int. Conf. on Intelligent Computing, Hefei, China, 2005, pp. 878-887.

DOI

[16]

Zong W., Huang G.-B., and Chen Y., Weighted extreme learning machine for imbalance learning, Neurocomputing, vol. 101, no. 3, pp. 229-242, 2013.

DOI Google Scholar

[17]

Chen M., Mao S., and Liu Y., Big data: A survey, Mobile Netw. Appl., vol. 19, no. 2, pp. 171-209, 2014.

DOI Google Scholar

[18]

Chen J., Chen Y., Du X., Li C., Lu J., Zhao S., and Zhou X., Big data challenge: A data management perspective, Front.. Comput. Sci., vol. 7, no. 2, pp. 157-164, 2013.

DOI Google Scholar

[19]

He Q., Shang T., Zhuang F., and Shi Z., Parallel extreme learning machine for regression based on mapreduce, Neurocomputing, vol. 102, no. 2, pp. 52-58, 2013.

DOI Google Scholar

[20]

Xin J., Wang Z., Chen C., Ding L., Wang G., and Zhao Y., ELM∗: Distributed extreme learning machine with mapreduce, World Wide Web, vol. 17, no. 5, pp. 1189-1204, 2014.

DOI Google Scholar

[21]

Bi X., Zhao X., Wang G., Zhang P., and Wang C., Distributed extreme learning machine with kernels based on mapreduce, Neurocomputing, vol. 149, no. 1, pp. 456-463, 2015.

DOI Google Scholar

[22]

Xin J., Wang Z., Qu L., and Wang G., Elastic extreme learning machine for big data classification, Neurocomputing, vol. 149, no. 1, pp. 464-471, 2015.

DOI Google Scholar

[23]

Xin J., Wang Z., Qu L., Yu G., and Kang Y., A-ELM∗: Adaptive distributed extreme learning machine with MapReduce, Neurocomputing, vol. 174, no. 1, pp. 368-374, 2016.

DOI Google Scholar

[24]

Dean J. and Ghemawat S., MapReduce: Simplified data processing on large clusters, in Proc. Symposium on Operating System Design and Implementation, San Francisco, CA, USA, 2004, pp. 137-150.

[25]

Dean J. and Ghemawat S., MapReduce: Simplified data processing on large clusters, Commun. ACM, vol. 51, no. 1, pp. 107-113, 2008.

DOI Google Scholar

[26]

Dean J. and Ghemawat S., MapReduce: A flexible data processing tool, Commun. ACM, vol. 53, no. 1, pp. 72-77, 2010.

DOI Google Scholar

[27]

Chu C.-T., Kim S.-K., Lin Y.-A., Yu Y.-Y., Bradski G., Ng A.-Y., and Olukotun K., Map-reduce for machine learning on multicore, in Proc. 20th Annual Conf. on Neural Information Processing Systems, Vancouver, Canada, 2007, pp. 281-288.

[28]

Meng X. and Mahoney M. W., Robust regression on mapreduce, in Proc. 30th Int. Conf. on Machine Learning, Atlanta, GA, USA, 2013, pp. 888-896.

[29]

Fletcher R., Constrained optimization, in Practical Methods of Optimization. Hoboken N. J., Ed. John Wiley & Sons Ltd, 1981, pp. 127-416.

[30]

Ghemawat S., Gobioff H., and Leung S.-T., The google file system, in Proc. 19th ACM Symposium on Operating Systems Principles, New York, UK, USA, 2003, pp. 29-43.

DOI

[31]

Shvachko K., Kuang H., Radia S., and Chansler R., The hadoop distributed file system, in Proc. 26th IEEE Symposium on Mass Storage Systems and Technologies, Incline Village, NV, USA, 2010, pp. 1-10.

DOI

About this article

Publication history

Acknowledgements

Rights and permissions

Publication history

Received: 27 August 2016

Revised: 14 January 2017

Accepted: 18 January 2017

Published: 06 April 2017

Issue date: April 2017

Copyright

Acknowledgements

This research was partially supported by the National Natural Science Foundation of China (Nos. 61402089, 61472069, and 61501101), the Fundamental Research Funds for the Central Universities (Nos. N161904001, N161602003, and N150408001), the Natural Science Foundation of Liaoning Province (No. 2015020553), the China Postdoctoral Science Foundation (No. 2016M591447), and the Postdoctoral Science Foundation of Northeastern University (No. 20160203).