Journal Home > Volume 28 , Issue 6

Convex clustering, turning clustering into a convex optimization problem, has drawn wide attention. It overcomes the shortcomings of traditional clustering methods such as K-means, Density-Based Spatial Clustring of Applications with Noise (DBSCAN) and hierarchical clustering that can easily fall into the local optimal solution. However, convex clustering is vulnerable to the occurrence of outlier features, as it uses the Frobenius norm to measure the distance between data points and their corresponding cluster centers and evaluate clusters. To accurately identify outlier features, this paper decomposes data into a clustering structure component and a normalized component that captures outlier features. Different from existing convex clustering evaluating features with the exact measurement, the proposed model can overcome the vast difference in the magnitude of different features and the outlier features can be efficiently identified and removed. To solve the proposed model, we design an efficient algorithm and prove the global convergence of the algorithm. Experiments on both synthetic datasets and UCI datasets demonstrate that the proposed method outperforms the compared approaches in convex clustering.


menu
Abstract
Full text
Outline
About this article

An Improved Robust Sparse Convex Clustering

Show Author's information Jinyao Ma1Haibin Zhang1( )Shanshan Yang1Jiaojiao Jiang2Gaidi Li1
Beijing Institute for Scientific and Engineering Computing, Faculty of Science, Beijing University of Technology, Beijing 100124, China
School of Computer Science and Engineering, University of New South Wales, Sydney 2052, Australia

Abstract

Convex clustering, turning clustering into a convex optimization problem, has drawn wide attention. It overcomes the shortcomings of traditional clustering methods such as K-means, Density-Based Spatial Clustring of Applications with Noise (DBSCAN) and hierarchical clustering that can easily fall into the local optimal solution. However, convex clustering is vulnerable to the occurrence of outlier features, as it uses the Frobenius norm to measure the distance between data points and their corresponding cluster centers and evaluate clusters. To accurately identify outlier features, this paper decomposes data into a clustering structure component and a normalized component that captures outlier features. Different from existing convex clustering evaluating features with the exact measurement, the proposed model can overcome the vast difference in the magnitude of different features and the outlier features can be efficiently identified and removed. To solve the proposed model, we design an efficient algorithm and prove the global convergence of the algorithm. Experiments on both synthetic datasets and UCI datasets demonstrate that the proposed method outperforms the compared approaches in convex clustering.

Keywords: convex clustering, outlier features, block coordinate descent, Newton’s method

References(23)

[1]
T. Maekaku, X. Chang, Y. Fujita, L. W. Chen, S. Watanabe, and A. Rudnicky, Speech representation learning combining conformer CPC with deep cluster for the ZeroSpeech challenge 2021, arXiv preprint arXiv: 2107.05899, 2022.
[2]
J. Sierra-Pérez, M. A. Torres-Arredondo, and J. Alvarez-Montoya, Damage detection methodology under variable load conditions based on strain field pattern recognition using FBGs, nonlinear principal component analysis, and clustering techniques, Smart Mater. Struct., vol. 27, no. 1, p. 015002, 2018.
[3]
L. Khrissi, N. El Akkad, H. Satori, and K. Satori, Clustering method and sine cosine algorithm for image segmentation, Evol. Intell., vol. 15, no. 1, pp. 669–682, 2022.
[4]
M. Schiffman, P. E. Castle, J. Jeronimo, A. C. Rodriguez, and S. Wacholder, Human papillomavirus and cervical cancer, Lancet, vol. 370, no. 9590, pp. 890–907, 2007.
[5]
T. Tango, Statistical Methods for Disease Clustering. New York, NY, USA: Springer, 2010.
DOI
[6]
E. S. Malinverni and G. Fangi, Comparative cluster analysis to localize emergencies in archaeology, J. Cult. Herit., vol. 10, no. S1, pp. e10–e19, 2009.
[7]
C. A. Emery, Considering cluster analysis in sport medicine and injury prevention research, Clin. J. Sport Med., vol. 17, no. 3, pp. 211–214, 2007.
[8]
P. Arabie and J. H. Lawrence, Cluster analysis in marketing research, in Advanced Methods of Marketing Research, R. P. Bagozzi, ed. Cambridge, MA: Blackwell, 1994, pp. 160–189.
[9]
T. D. Hocking, A. Joulin, F. Bach, and J. P. Vert, Clusterpath: An algorithm for clustering using convex fusion penalties, in Proc. 28th Int. Conf. Machine Learning, Bellevue, WA, USA, 2011, pp. 745–752.
[10]
F. Lindsten, H. Ohlsson, and L. Ljung, Clustering using sum-of-norms regularization: With application to particle filter output computation, in Proc. of the 2011 IEEE Statistical Signal Processing Workshop (SSP), Nice, France, 2011, pp. 201–204.
[11]
K. M. Tan and D. Witten, Statistical properties of convex clustering, Electronic journal of statistics, vol. 9, no. 2, p. 2324, 2015.
[12]
C. Zhu, H. Xu, C. Leng, and S. Yan, Convex optimization procedure for clustering: Theoretical revisit, in Proc. 27th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 1619–1627.
[13]
F. Lindsten, H. Ohlsson, and L. Ljung, Clustering using sum-of-norms regularization: With application to particle filter output computation, in Prof. IEEE Statistical Signal Processing Workshop (SSP), Nice, France, 2011, pp. 201–204.
[14]
B. Wang, Y. Zhang, W. W. Sun, and Y. Fang, Sparse convex clustering, J. Comput. Graph. Stat., vol. 27, no. 2, pp. 393–403, 2018.
[15]
X. Zhou, C. Du, and X. Cai, An efficient smoothing proximal gradient algorithm for convex clustering, arXiv preprint arXiv: 2006.12592, 2020.
[16]
Q. Wang, P. Gong, S. Chang, T. S. Huang, and J. Zhou, Robust convex clustering analysis, in Proc. of the 2016 IEEE Int. Conf. Data Mining (ICDM), Barcelona, Spain, 2016, pp. 1263–1268.
[17]
H. B. Zhang, J. J. Jiang, and Y. B. Zhao, On the proximal Landweber Newton method for a class of nonsmooth convex problems, Comput. Optim. Appl., vol. 61, no. 1, pp. 79–99, 2015.
[18]
G. N. Grapiglia and Y. Nesterov, Tensor methods for minimizing functions with Hölder continuous higher-order derivatives, arXiv preprint arXiv: 1904.12559, 2021.
[19]
J. R. Sharma, R. K. Guha, and R. Sharma, An efficient fourth order weighted-Newton method for systems of nonlinear equations, Numer. Algor., vol. 62, no. 2, pp. 307–323, 2013.
[20]
A. Beck, First-Order Methods in Optimization. Philadelphia, PA, USA: Society for Industrial and Applied Mathematics, 2017.
DOI
[21]
P. Tseng, Convergence of a block coordinate descent method for nondifferentiable minimization, J. Optim. Theory Appl., vol. 109, no. 3, pp. 475–494, 2001.
[22]
D. Sun, K. C. Toh, and Y. Yuan, Convex clustering: Model, theoretical guarantee and efficient algorithm, J. Mach. Learn. Res., vol. 22, no. 1, pp. 427–458, 2021.
[23]
C. Carpineto and G. Romano, Consensus clustering based on a new probabilistic rand index with application to subtopic retrieval, IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 12, pp. 2315–2326, 2012.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 30 June 2022
Revised: 02 August 2022
Accepted: 04 October 2022
Published: 28 July 2023
Issue date: December 2023

Copyright

© The author(s) 2023.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 11771003).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return