An Improved Robust Sparse Convex Clustering

Jinyao Ma; Haibin Zhang; Shanshan Yang; Jiaojiao Jiang; Gaidi Li

doi:10.26599/TST.2022.9010046

Tsinghua Science and Technology 2023, 28(6): 989-998 https://doi.org/10.26599/TST.2022.9010046

Open Access | Issue | Published: 28 July 2023

An Improved Robust Sparse Convex Clustering

Show Author's Information Hide Author's Information Jinyao Ma^¹, Haibin Zhang^¹(

), Shanshan Yang^¹, Jiaojiao Jiang^², Gaidi Li^¹

1Beijing Institute for Scientific and Engineering Computing, Faculty of Science, Beijing University of Technology, Beijing 100124, China

2School of Computer Science and Engineering, University of New South Wales, Sydney 2052, Australia

Keywords:

convex clustering, outlier features, block coordinate descent, Newton’s method

Cite this article:

Ma J, Zhang H, Yang S, et al. An Improved Robust Sparse Convex Clustering. Tsinghua Science and Technology, 2023, 28(6): 989-998. https://doi.org/10.26599/TST.2022.9010046

Download citation

EndNote(RIS)

BibTeX

604

Views

133

Downloads

Citations

Crossref

WoS

Scopus

CSCD

Abstract Full text About this article

Abstract

Convex clustering, turning clustering into a convex optimization problem, has drawn wide attention. It overcomes the shortcomings of traditional clustering methods such as K-means, Density-Based Spatial Clustring of Applications with Noise (DBSCAN) and hierarchical clustering that can easily fall into the local optimal solution. However, convex clustering is vulnerable to the occurrence of outlier features, as it uses the Frobenius norm to measure the distance between data points and their corresponding cluster centers and evaluate clusters. To accurately identify outlier features, this paper decomposes data into a clustering structure component and a normalized component that captures outlier features. Different from existing convex clustering evaluating features with the exact measurement, the proposed model can overcome the vast difference in the magnitude of different features and the outlier features can be efficiently identified and removed. To solve the proposed model, we design an efficient algorithm and prove the global convergence of the algorithm. Experiments on both synthetic datasets and UCI datasets demonstrate that the proposed method outperforms the compared approaches in convex clustering.

Full text

Abstract

Full text

Outline

About this article

An Improved Robust Sparse Convex Clustering

Show Author's information Hide Author's Information Jinyao Ma^¹, Haibin Zhang^¹(

), Shanshan Yang^¹, Jiaojiao Jiang^², Gaidi Li^¹

1Beijing Institute for Scientific and Engineering Computing, Faculty of Science, Beijing University of Technology, Beijing 100124, China

2School of Computer Science and Engineering, University of New South Wales, Sydney 2052, Australia

Abstract

Keywords: convex clustering, outlier features, block coordinate descent, Newton’s method

References(23)

[1]

T. Maekaku, X. Chang, Y. Fujita, L. W. Chen, S. Watanabe, and A. Rudnicky, Speech representation learning combining conformer CPC with deep cluster for the ZeroSpeech challenge 2021, arXiv preprint arXiv: 2107.05899, 2022.

DOI Google Scholar

[2]

J. Sierra-Pérez, M. A. Torres-Arredondo, and J. Alvarez-Montoya, Damage detection methodology under variable load conditions based on strain field pattern recognition using FBGs, nonlinear principal component analysis, and clustering techniques, Smart Mater. Struct., vol. 27, no. 1, p. 015002, 2018.

DOI Google Scholar

[3]

L. Khrissi, N. El Akkad, H. Satori, and K. Satori, Clustering method and sine cosine algorithm for image segmentation, Evol. Intell., vol. 15, no. 1, pp. 669–682, 2022.

DOI Google Scholar

[4]

M. Schiffman, P. E. Castle, J. Jeronimo, A. C. Rodriguez, and S. Wacholder, Human papillomavirus and cervical cancer, Lancet, vol. 370, no. 9590, pp. 890–907, 2007.

DOI Google Scholar

[5]

T. Tango, Statistical Methods for Disease Clustering. New York, NY, USA: Springer, 2010.

DOI

[6]

E. S. Malinverni and G. Fangi, Comparative cluster analysis to localize emergencies in archaeology, J. Cult. Herit., vol. 10, no. S1, pp. e10–e19, 2009.

DOI Google Scholar

[7]

C. A. Emery, Considering cluster analysis in sport medicine and injury prevention research, Clin. J. Sport Med., vol. 17, no. 3, pp. 211–214, 2007.

DOI Google Scholar

[8]

P. Arabie and J. H. Lawrence, Cluster analysis in marketing research, in Advanced Methods of Marketing Research, R. P. Bagozzi, ed. Cambridge, MA: Blackwell, 1994, pp. 160–189.

[9]

T. D. Hocking, A. Joulin, F. Bach, and J. P. Vert, Clusterpath: An algorithm for clustering using convex fusion penalties, in Proc. 28^th Int. Conf. Machine Learning, Bellevue, WA, USA, 2011, pp. 745–752.

Google Scholar

[10]

F. Lindsten, H. Ohlsson, and L. Ljung, Clustering using sum-of-norms regularization: With application to particle filter output computation, in Proc. of the 2011 IEEE Statistical Signal Processing Workshop (SSP), Nice, France, 2011, pp. 201–204.

DOI Google Scholar

[11]

K. M. Tan and D. Witten, Statistical properties of convex clustering, Electronic journal of statistics, vol. 9, no. 2, p. 2324, 2015.

DOI Google Scholar

[12]

C. Zhu, H. Xu, C. Leng, and S. Yan, Convex optimization procedure for clustering: Theoretical revisit, in Proc. 27^th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 1619–1627.

Google Scholar

[13]

F. Lindsten, H. Ohlsson, and L. Ljung, Clustering using sum-of-norms regularization: With application to particle filter output computation, in Prof. IEEE Statistical Signal Processing Workshop (SSP), Nice, France, 2011, pp. 201–204.

DOI Google Scholar

[14]

B. Wang, Y. Zhang, W. W. Sun, and Y. Fang, Sparse convex clustering, J. Comput. Graph. Stat., vol. 27, no. 2, pp. 393–403, 2018.

DOI Google Scholar

[15]

X. Zhou, C. Du, and X. Cai, An efficient smoothing proximal gradient algorithm for convex clustering, arXiv preprint arXiv: 2006.12592, 2020.

Google Scholar

[16]

Q. Wang, P. Gong, S. Chang, T. S. Huang, and J. Zhou, Robust convex clustering analysis, in Proc. of the 2016 IEEE Int. Conf. Data Mining (ICDM), Barcelona, Spain, 2016, pp. 1263–1268.

DOI Google Scholar

[17]

H. B. Zhang, J. J. Jiang, and Y. B. Zhao, On the proximal Landweber Newton method for a class of nonsmooth convex problems, Comput. Optim. Appl., vol. 61, no. 1, pp. 79–99, 2015.

DOI Google Scholar

[18]

G. N. Grapiglia and Y. Nesterov, Tensor methods for minimizing functions with Hölder continuous higher-order derivatives, arXiv preprint arXiv: 1904.12559, 2021.

Google Scholar

[19]

J. R. Sharma, R. K. Guha, and R. Sharma, An efficient fourth order weighted-Newton method for systems of nonlinear equations, Numer. Algor., vol. 62, no. 2, pp. 307–323, 2013.

DOI Google Scholar

[20]

A. Beck, First-Order Methods in Optimization. Philadelphia, PA, USA: Society for Industrial and Applied Mathematics, 2017.

DOI

[21]

P. Tseng, Convergence of a block coordinate descent method for nondifferentiable minimization, J. Optim. Theory Appl., vol. 109, no. 3, pp. 475–494, 2001.

DOI Google Scholar

[22]

D. Sun, K. C. Toh, and Y. Yuan, Convex clustering: Model, theoretical guarantee and efficient algorithm, J. Mach. Learn. Res., vol. 22, no. 1, pp. 427–458, 2021.

Google Scholar

[23]

C. Carpineto and G. Romano, Consensus clustering based on a new probabilistic rand index with application to subtopic retrieval, IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 12, pp. 2315–2326, 2012.

DOI Google Scholar

About this article

Publication history

Acknowledgements

Rights and permissions

Publication history

Received: 30 June 2022

Revised: 02 August 2022

Accepted: 04 October 2022

Published: 28 July 2023

Issue date: December 2023

Copyright

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 11771003).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).