AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (5.5 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Effective Density-Based Clustering Algorithms for Incomplete Data

USC Viterbi School of Engineering, University of Southern California, Los Angeles, CA 90007, USA
Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
Show Author Information

Abstract

Density-based clustering is an important category among clustering algorithms. In real applications, many datasets suffer from incompleteness. Traditional imputation technologies or other techniques for handling missing values are not suitable for density-based clustering and decrease clustering result quality. To avoid these problems, we develop a novel density-based clustering approach for incomplete data based on Bayesian theory, which conducts imputation and clustering concurrently and makes use of intermediate clustering results. To avoid the impact of low-density areas inside non-convex clusters, we introduce a local imputation clustering algorithm, which aims to impute points to high-density local areas. The performances of the proposed algorithms are evaluated using ten synthetic datasets and five real-world datasets with induced missing values. The experimental results show the effectiveness of the proposed algorithms.

References

[1]
R. J. G. B. Campello, P. Kröger, J. Sander, and A. Zimek, Density-based clustering, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 10, no. 2, p. e1343, 2020.
[2]
X. W. Xu, M. Ester, H. P. Kriegel, and J. Sander, A distribution-based clustering algorithm for mining in large spatial databases, in Proc. 14th Int. Conf. Data Engineering, Washington, DC, USA, 1998, pp. 324-331.
[3]
H. O. Hartley, R. R. Hocking, The analysis of incomplete data, Biometrics, vol. 27, no. 4, pp. 783-823, 1971.
[4]
J. MacQueen, Some methods for classification and analysis of multivariate observations, in Proc. 5th Berkeley Symp. Mathematical Statistics and Probability, Oakland, CA, USA, 1967, pp. 281-297.
[5]
A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society. Series B (Methodological), vol. 39, no. 1, pp. 1-38, 1977.
[6]
E. Acuña and C. Rodriguez, The treatment of missing values and its effect on classifier accuracy, in Classification, Clustering, and Data Mining Applications, D. Banks, F. R. McMorris, P. Arabie, and W. Gaul, eds. Berlin, Germany: Springer, 2004, pp. 639-647.
[7]
R. J. Hathaway, J. C. Bezdek, Fuzzy c-means clustering of incomplete data, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 31, no. 5, pp. 735-744, 2001.
[8]
M. G. Kendall, Advanced Theory of Statistics Vol.-I. London, UK: Charles Griffin, 1943.
[9]
D. T. Lee, C. K. Wong, Worst-case analysis for region and partial region searches in multidimensional binary search trees and balanced quad trees, Acta Informatica, vol. 9, no. 1, pp. 23-29, 1977.
[10]
J. L. Bentley, Multidimensional binary search trees used for associative searching, Communications of the ACM, vol. 18, no. 9, pp. 509-517, 1975.
[11]
L. J. Gleser, Multivariate statistics: A vector space approach, Journal of the American Statistical Association, vol. 80, no. 392, pp. 1069-1070, 1985.
[12]
B. A. Galler, M. J. Fisher, An improved equivalence algorithm, Communications of the ACM, vol. 7, no. 5, pp. 301-303, 1964.
[13]
K. Lai, L. F. Bo, X. F. Ren, and D. Fox, A large-scale hierarchical multi-view RGB-D object dataset, in Proc. 2011 IEEE Int. Conf. Robotics and Automation, Shanghai, China, 2011, pp. 1817-1824.
[14]
A. Martiniano, R. P. Ferreira, R. J. Sassi, and C. Affonso, Application of a neuro fuzzy network in prediction of absenteeism at work, in Proc. 7th Iberian Conf. Information Systems and Technologies (CISTI 2012), Madrid, Spain, 2012, pp. 1-4.
[15]
S. Renjith and C. Anjali, A personalized mobile travel recommender system using hybrid algorithm, in Proc. 2014 1st Int. Conf. Computational Systems and Communications (ICCSC), Trivandrum, India, 2014, pp. 12-17.
[16]
R. C. B. Madeo, C. A. M. Lima, and S. M. Peres, Gesture unit segmentation using support vector machines: Segmenting gestures from rest positions, in Proc. 28th Annu. ACM Symp. Applied Computing, New York, NY, USA, 2013, pp. 46-52.
[17]
A. Jacobson, D. Panozzo, C. Schüller, O. Diamanti, Q. N. Zhou, S. Koch, J. Dumas, A. Vaxman, N. Pietroni, S. Brugger, et al., libigl: A simple C++ geometry processing library, http://libigl.github.io/libigl/, 2018.
[18]
Y. Sasaki, The truth of the F-measure, Teach Tutor Mater, vol. 1, no. 5, pp. 1-5, 2007.
Big Data Mining and Analytics
Pages 183-194
Cite this article:
Xue Z, Wang H. Effective Density-Based Clustering Algorithms for Incomplete Data. Big Data Mining and Analytics, 2021, 4(3): 183-194. https://doi.org/10.26599/BDMA.2021.9020001

1164

Views

81

Downloads

39

Crossref

35

Web of Science

42

Scopus

0

CSCD

Altmetrics

Received: 13 December 2020
Accepted: 13 January 2021
Published: 12 May 2021
© The author(s) 2021

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return