Journal Home > Volume 27 , Issue 5

The Correlation Clustering Problem (CorCP) is a significant clustering problem based on the similarity of data. It has significant applications in different fields, such as machine learning, biology, and data mining, and many different problems in other areas. In this paper, the Balanced 2-CorCP (B 2-CorCP) is introduced and examined, and a new interesting variant of the CorCP is described. The goal of this clustering problem is to partition the vertex set into two clusters with equal size, such that the number of disagreements is minimized. We first present a polynomial time algorithm for the B 2-CorCP on M-positive edge dominant graphs (M3). Then, we provide a series of numerical experiments, and the results show the effectiveness of our algorithm.


menu
Abstract
Full text
Outline
About this article

Approximation Algorithm for the Balanced 2-Correlation Clustering Problem

Show Author's information Sai JiDachuan XuDonglei Du( )Ling Gai( )Zhongrui Zhao
Department of Operations Research and Information Engineering, Beijing University of Technology, Beijing 100124, China
Beijing Institute for Scientific and Engineering Computing, Beijing University of Technology, Beijing 100124, China
Faculty of Business Administration, University of New Brunswick, Fredericton, NB E3B 5A3, Canada
Glorious Sun School of Business and Management, Donghua University, Shanghai 200051, China

Abstract

The Correlation Clustering Problem (CorCP) is a significant clustering problem based on the similarity of data. It has significant applications in different fields, such as machine learning, biology, and data mining, and many different problems in other areas. In this paper, the Balanced 2-CorCP (B 2-CorCP) is introduced and examined, and a new interesting variant of the CorCP is described. The goal of this clustering problem is to partition the vertex set into two clusters with equal size, such that the number of disagreements is minimized. We first present a polynomial time algorithm for the B 2-CorCP on M-positive edge dominant graphs (M3). Then, we provide a series of numerical experiments, and the results show the effectiveness of our algorithm.

Keywords:

balanced clustering, k-correlation clustering, positive edge dominant graphs, approximation algorithm
Received: 13 February 2021 Revised: 27 May 2021 Accepted: 28 June 2021 Published: 17 March 2022 Issue date: October 2022
References(26)
[1]
D. Arthur and S. Vassilvitskii, k-means++: The advantages of careful seeding, in Proc. 18th Annu. ACM-SIAM Symp. Discrete Algorithms, Philadelphia, PA, USA, 2007, pp. 1027–1035.
[2]
S. Ahmadian, A. Norouzi-Fard, O. Svensson, and J. Ward, Better guarantees for k-means and euclidean k-median by primal-dual algorithms, in Proc. 58th Annu. IEEE Symp. Foundations of Computer Science, Berkeley, CA, USA, 2017, pp. 61–72.
[3]
J. Castro, S. Nasini, and F. Saldanha-Da-Gama, A cutting-plane approach for large-scale capacitated multi-period facility location using a specialized interior-point method, Mathemat. Programm., vol. 163, nos. 1&2, pp. 411–444, 2017.
DOI
[4]
S. Li and O. Svensson, Approximating k-median via pseudo-approximation, SIAM J. Comput., vol. 45, no. 2, pp. 530–547, 2016.
[5]
Y. Tian, R. Q. Zheng, Z. L. Liang, S. N. Li, F. X. Wu, and M. Li, A datadriven clustering recommendation method for single-cell RNA-sequencing data, Tsinghua Science and Technology, vol. 26, no. 5, pp. 772–789, 2021.
[6]
Y. C. Xu, D. C. Xu, D. L. Du, and D. M. Zhang, Approximation algorithm for squared metric facility location problem with nonuniform capacities, J. Supercomput., .
[7]
X. Zhao, Z. D. Wang, L. Gao, Y. L. Li, and S. J. Wang, Incremental face clustering with optimal summary learning via graph convolutional network, Tsinghua Science and Technology, vol. 26, no. 4, pp. 536–547, 2021.
[8]
N. Bansal, A. Blum, and S. Chawla, Correlation clustering, Mach. Learn., vol. 56, nos. 1–3, pp. 89–113, 2004.
[9]
F. Bonchi, A. Gionis, and A. Ukkonen, Overlapping correlation clustering, Knowl. Inform. Syst., vol. 35, no. 1, pp. 1–32, 2013.
[10]
E. D. Demaine, D. Emanuel, A. Fiat, and N. Immorlica, Correlation clustering in general weighted graphs, Theoret. Computer Sci., vol. 361, nos. 2&3, pp. 172–187, 2006.
DOI
[11]
I. Giotis and V. Guruswami, Correlation clustering with a fixed number of clusters, in Proc. 17th Annu. ACM-SIAM Symp. Discrete Algorithms, Miami, FL, USA, 2006, pp. 1167–1176.
[12]
N. Ailon, N. Avigdor-Elgrabli, E. Liberty, and A. Van Zuylen, Improved approximation algorithms for bipartite correlation clustering, SIAM J. Comput., vol. 41, no. 5, pp. 1110–1121, 2012.
[13]
C. Mathieu and W. Schudy, Correlation clustering with noisy input, in Proc. 21th Annu. ACM-SIAM Symp. Discrete Algorithms, Austin, TX, USA, 2010, pp. 712–728.
[14]
E. Achtert, C. Bohm, J. David, P. Kröger, and A. Zimek, Global correlation clustering based on the Hough transform, Stat. Anal. Data Min., vol. 1, no. 3, pp. 111–127, 2008.
[15]
M. Charikar, V. Guruswami, and A. Wirth, Clustering with qualitative information, J. Comput. Syst. Sci., vol. 71, no. 3, pp. 360–383, 2005.
[16]
S. Chawla, K. Makarychev, T. Schramm, and G. Yaroslavtsev, Near optimal LP rounding algorithm for correlation clustering on complete and complete k-partite graphs, in Proc. 47th Annu. ACM Symp. Theory of Computing, New York, NY, USA, 2015, pp. 219–228.
[17]
S. Ahmadi, S. Khuller, and B. Saha, Min-max correlation clustering via multiCut, in Proc. Int. Conf. Integer Programming and Combinatorial Optimization, Ann Arbor, MI, USA, 2019, pp. 13–26.
[18]
G. J. Puleo and O. Milenkovic, Correlation clustering and biclustering with locally bounded errors, IEEE Trans. Inform. Theory, vol. 64, no. 6, pp. 4105–4119, 2018.
[19]
K. J. Ahn, G. Cormode, S. Guha, A. McGregor, and A. Wirth, Correlation clustering in data streams, Algorithmica, vol. 83, no. 7, pp. 1980–2017, 2021.
[20]
S. Ahmadian, A. Epasto, R. Kumar, and M. Mahdian, Fair correlation clustering, in Proc. 23rd Int. Conf. on Artificial Intelligence and Statistics, Palermo, Italy, 2020, pp. 4195–4205.
[21]
G. J. Puleo and O. Milenkovic, Correlation clustering with constrained cluster sizes and extended weights bounds, SIAM J. Optimizat., vol. 25, no. 3, pp. 1857–1872, 2015.
[22]
K. Makarychev, Y. Makarychev, and A. Vijayaraghavan, Correlation clustering with noisy partial information, in Proc. 28th Annu. Conf. Computational Learning Theory, Paris, France, 2015, pp. 1321–1342.
[23]
P. Kuila and P. K. Jana, Approximation schemes for load balanced clustering in wireless sensor networks, J. Supercomput., vol. 68, no. 1, pp. 87–105, 2014.
[24]
B. Behsaz, Z. Friggstad, M. R. Salavatipour, and R. Sivakumar, Approximation algorithms for min-sum k-clustering and balanced k-median, Algorithmica, vol. 81, no. 3, pp. 1006–1030, 2019.
[25]
J. M. Hendrickx and J. N. Tsitsiklis, Convergence of type-symmetric and cut-balanced consensus seeking systems, IEEE Trans. Automat. Control, vol. 58, no. 1, pp. 214–218, 2013.
[26]
M. Zhao, Y. Y. Yang, and C. Wang, Mobile data gathering with load balanced clustering and dual data uploading in wireless sensor networks, IEEE Trans. Mobile Comput., vol. 14, no. 4, pp. 770–785, 2015.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 13 February 2021
Revised: 27 May 2021
Accepted: 28 June 2021
Published: 17 March 2022
Issue date: October 2022

Copyright

© The author(s) 2022.

Acknowledgements

This research was supported by the National Natural Science Foundation of China (Nos. 12131003, 12101594, 11771386, 11728104, and 11201333), the Beijing Natural Science Foundation Project (No. Z200002), the China Postdoctoral Science Foundation (No. 2021M693337), and the Natural Sciences and Engineering Research Council of Canada (NSERC) (No. 06446).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return