AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (32 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem

Directorate of Livestock Farms, Guru Angad Dev Veterinary and Animal Sciences University, Ludhiana 141001, India.
Department of Computer Science, Punjabi University, Punjab 147002, India.
Show Author Information

Abstract

Big data analytics and data mining are techniques used to analyze data and to extract hidden information. Traditional approaches to analysis and extraction do not work well for big data because this data is complex and of very high volume. A major data mining technique known as data clustering groups the data into clusters and makes it easy to extract information from these clusters. However, existing clustering algorithms, such as k-means and hierarchical, are not efficient as the quality of the clusters they produce is compromised. Therefore, there is a need to design an efficient and highly scalable clustering algorithm. In this paper, we put forward a new clustering algorithm called hybrid clustering in order to overcome the disadvantages of existing clustering algorithms. We compare the new hybrid algorithm with existing algorithms on the bases of precision, recall, F-measure, execution time, and accuracy of results. From the experimental results, it is clear that the proposed hybrid clustering algorithm is more accurate, and has better precision, recall, and F-measure values.

References

[1]
C. L. P. Chen and C. Y. Zhang, Data-intensive applications, challenges, techniques and technologies: A survey on big data, Inf. Sci., vol. 275, pp. 314-347, 2014.
[2]
A. Gandomi and M. Haider, Beyond the hype: Big data concepts, methods, and analytics, Int. J. Inf. Manage., vol. 35, no. 2, pp. 137-144, 2015.
[3]
W. Raghupathi and V. Raghupathi, Big data analytics in healthcare: Promise and potential, Health Inf. Sci. Syst., vol. 2, p. 3, 2014.
[4]
B. Saraladevi, N. Pazhaniraja, P. V. Paul, M. S. S. Basha, and P. Dhavachelvan, Big data and Hadoop-A study in security perspective, Procedia Computer Science, vol. 50, pp. 596-601, 2015.
[5]
A. Katal, M. Wazid, and R. H. Goudar, Big data: Issues, challenges, tools and good practices, in Proc. 6th Int. Conf. Contemporary Computing, Noida, India, 2013, pp. 404-409.
[6]
M. Herland, T. M. Khoshgoftaar, and R. Wald, A review of data mining using big data in health informatics, J. Big Data, vol. 1, p. 2, 2014.
[7]
A. Fahad, N. Alshatri, Z. Tari, A. Alamri, I. Khalil, A. Y. Zomaya, S. Foufou, and A. Bouras, A survey of clustering algorithms for Big Data: Taxonomy and empirical analysis, IEEE Trans. Emerg. Top. Comput., vol. 2, no. 3, pp. 267-279, 2014.
[8]
X. B. Li and Z. X. Fang, Parallel clustering algorithms, Parallel Comput., vol. 11, no. 3, pp. 275-290, 1989.
[9]
J. Dittrich and J. A. Quiane-Ruiz, Efficient big data processing in Hadoop MapReduce, Proceedings of the VLDB Endowment, vol. 5, no. 12, pp. 2014-2015, 2011.
[10]
C. C. Aggarwal and C. X. Zhai, A survey of text clustering algorithms, in Mining Text Data, C. C. Aggarwal and C. X. Zhai, eds. Springer, 2012, pp. 77-128.
[11]
A. Hatamlou, In search of optimal centroids on data clustering using a binary search algorithm, Pattern Recognit. Lett., vol. 33, no. 13, pp. 1756-1760, 2012.
[12]
D. Pandove and S. Goel, A comprehensive study on clustering approaches for big data mining, in Proc. 2nd Int. Conf. Electronics and Communication System, Coimbatore, India, 2015, pp. 1333-1338.
[13]
R. Jensi and G. W. Jiji, Hybrid data clustering approach using k-means and flower pollination algorithm, Adv. Comput. Intell.: Int. J., vol. 2, no. 2, pp. 15-25, 2015.
[14]
B. B. Ali and Y. Massmoudi, K-means clustering based on Gower Similarity Coefficient: A comparative study, in Proc. 5th Int. Conf. Modeling, Simulation and Applied Optimization, Hammamet, Tunisia, 2013.
[15]
A. Hatamlou, S. Abdullah, and H. Nezamabadi-Pour, A combined approach for clustering based on k-means and gravitational search algorithms, Swarm Evol. Comput., vol. 6, pp. 47-52, 2012.
[16]
T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu, An efficient k-means clustering algorithm: Analysis and implementation, IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 7, pp. 881-892, 2002.
[17]
B. B. Firouzi, M. S. Sadeghi, and T. Niknam, A new hybrid algorithm based on PSO, SA, and K-means for cluster analysis, Int. J. Innova. Comput., Inf. Control, vol. 6, no. 7, pp. 3177-3192, 2010.
[18]
Y. K. Patil and V. S. Nandedkar, Design and implementation of k-means and hierarchical document clustering on hadoop, Int. J. Sci. Res., vol. 3, no. 10, pp. 1566-1570, 2014.
[19]
E. Rashedi and A. Mirzaei, A novel multi-clustering method for hierarchical clusterings based on boosting, in Proc. 9th Iranian Conf. Electrical Engineering, 2011, pp. 1-5.
[20]
R. T. Ng and J. W. Han, CLARANS: A method for clustering objects for spatial data mining, IEEE Trans. Knowl. Data Eng., vol. 14, no. 5, pp. 1003-1016, 2002.
[21]
A. Farinelli, M. Bicego, S. Ramchurn, and M. Zucchelli, C-link: A hierarchical clustering approach to large-scale near-optimal coalition formation, in Proc. 23rd Int. Joint Conf. Artificial Intelligence, Beijing, China, 2013, pp. 106-112.
[22]
A. Mirzaei and M. Rahmati, A novel hierarchical-clustering-combination scheme based on fuzzy-similarity relations, IEEE Trans. Fuzzy Syst., vol. 18, no. 1, pp. 27-39, 2010.
[23]
E. M. Rasmussen and P. Willett, Efficiency of hierarchic agglomerative clustering using the ICL distributed array processor, J. Doc., vol. 45, no. 1, pp. 1-24, 1989.
[24]
Apache Hadoop, http://hadoop.apache.org/, 2018.
[25]
National Climatic Data Centre (NCDC) Data Access, https://www.ncdc.noaa.gov/data-access, 2018.
Big Data Mining and Analytics
Pages 240-247
Cite this article:
Kumar S, Singh M. A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem. Big Data Mining and Analytics, 2019, 2(4): 240-247. https://doi.org/10.26599/BDMA.2018.9020037

1132

Views

96

Downloads

29

Crossref

26

Web of Science

47

Scopus

0

CSCD

Altmetrics

Received: 08 November 2018
Revised: 09 January 2019
Accepted: 12 January 2019
Published: 05 August 2019
© The author(s) 2019

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return