Fair <i>k</i>-Center Problem with Outliers on Massive Data

Fan Yuan; Luhong Diao; Donglei Du; Lei Liu

doi:10.26599/TST.2023.9010013

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (3.3 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

Fair k-Center Problem with Outliers on Massive Data

Fan Yuan^¹, Luhong Diao^¹, Donglei Du^², Lei Liu^¹(

)

1Beijing Institute for Scientific and Engineering Computing, Faculty of Science, Beijing University of Technology, Beijing 100124, China

2Faculty of Management, University of New Brunswick, Fredericton, E3B 5A3, Canada

Show Author Information

Abstract

The clustering problem of big data in the era of artificial intelligence has been widely studied. Because of the huge amount of data, distributed algorithms are often used to deal with big data problems. The distributed computing model has an attractive feature: it can handle massive datasets that cannot be put into the main memory. On the other hand, since many decisions are made automatically by machines in today’s society, algorithm fairness is also an important research area of machine learning. In this paper, we study two fair clustering problems: the centralized fair $k$ -center problem with outliers and the distributed fair $k$ -center problem with outliers. For these two problems, we have designed corresponding constant approximation ratio algorithms. The theoretical proof and analysis of the approximation ratio, and the running space of the algorithm are given.

Keywords

machine learning distributed algorithm fairness constraints outlier constraints k-center problem

References

[1]

M. Kay, C. Matuszek, and S. A. Munson, Unequal representation and gender stereotypes in image search results for occupations, in Proc. 33^rd Annu. ACM Conf. Human Factors in Computing Systems, Seoul, Republic of Korea, 2015, pp. 3819–3828.

Crossref Google Scholar

[2]

M. Kleindessner, P. Awasthi, and J. Morgenstern, Fair k-center clustering for data summarization, in Proc. 36^th Int. Conf. Machine Learning, Long Beach, CA, USA, 2019, pp. 3448–3457.

Google Scholar

[3]

A. Chiplunkar, S. Kale, and S. N. Ramamoorthy, How to solve fair k-center in massive data models, in Proc. 37^th Int. Conf. Machine Learning, Virtual Event, 2020, pp. 1877–1886.

Google Scholar

[4]

T. F. Gonzalez, Clustering to minimize the maximum intercluster distance, Theor. Comput. Sci., vol. 38, pp. 293–306, 1985.

Crossref Google Scholar

[5]

D. S. Hochbaum and D. B. Shmoys, A best possible heuristic for the k-center problem, Math. Operat. Res., vol. 10, no. 2, pp. 180–184, 1985.

Crossref Google Scholar

[6]

W. L. Hsu and G. L. Nemhauser, Easy and hard bottleneck location problems, Discrete Appl. Math., vol. 1, no. 3, pp. 209–215, 1979.

Crossref Google Scholar

[7]

D. Z. Chen, J. Li, H. Liang, and H. Wang, Matroid and knapsack center problems, Algorithmica, vol. 75, no. 1, pp. 27–52, 2016.

Crossref Google Scholar

[8]

P. K. Agarwal and J. M. Phillips, An efficient algorithm for 2D Euclidean 2-center with outliers, in Proc. 16^th Annu. European Symp. Algorithms, Karlsruhe, Germany, 2008, pp. 64–75.

Crossref Google Scholar

[9]

S. Guha, R. Rastogi, and K. Shim, Techniques for clustering massive data sets, in Clustering and Information Retrieval, W. Wu, H. Xiong, and S. Shekhar, eds. New York, NY, USA: Springer, 2004, pp. 35–82.

Crossref

[10]

M. Hassani, E. Müller, and T. Seidl, EDISKCO: Energy efficient distributed in-sensor-network k-center clustering with outliers, in Proc. 3^rd Int. Workshop on Knowledge Discovery from Sensor Data, Paris, France, 2009, pp. 39–48.

Crossref Google Scholar

[11]

M. Charikar, S. Khuller, D. M. Mount, and G. Narasimhan, Algorithms for facility location problems with outliers, in Proc. 12^th Annu. ACM-SIAM Symp. Discrete Algorithms, Washington, DC, USA, 2001, pp. 642–651.

Google Scholar

[12]

S. Guha, Y. Li, and Q. Zhang, Distributed partial clustering, ACM Trans. Parallel Comput., vol. 6, no. 3, p. 11, 2019.

Crossref Google Scholar

[13]

X. Guo and S. Li, Distributed k-clustering for data with heavy noise, in Proc. 32^nd Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2018, pp. 7849–7857.

Google Scholar

[14]

G. Malkomes, M. J. Kusner, W. Chen, K. Q. Weinberger, and B. Moseley, Fast distributed k-center clustering with outliers on massive data, in Proc. 28^th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2015, pp. 1063–1071.

Google Scholar

[15]

N. Ailon, R. Jaiswal, and C. Monteleoni, Streaming k-means approximation, in Proc. 22^nd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2009, pp. 10–18.

Google Scholar

[16]

S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O’Callaghan, Clustering data streams: Theory and practice, IEEE Trans. Knowl. Data Eng., vol. 15, no. 3, pp. 515–528, 2003.

Crossref Google Scholar

[17]

R. M. McCutchen, and S. Khuller, Streaming algorithms for k-center clustering with outliers and with anonymity, in Proc. 11^th Int. Workshop, APPROX 2008, and 12^th Int. Workshop, RANDOM 2008 on Approximation, Randomization and Combinatorial Optimization: Algorithms and Techniques, Boston, MA, USA, 2008, pp. 165–178.

Google Scholar

[18]

M. Shindler, A. Wong, and A. Meyerson, Fast and accurate k-means for large datasets, in Proc. 24^th Int. Conf. Neural Information Processing Systems, Granada, Spain, 2011, pp. 2375–2383.

Google Scholar

[19]

B. Bahmani, B. Moseley, A. Vattani, R. Kumar, and S. Vassilvitskii, Scalable k-means++, arXiv preprint arXiv: 1203.6402, 2012.

Google Scholar

[20]

M. F. Balcan, S. Ehrlich, and Y. Liang, Distributed k-means and k-median clustering on general topologies, in Proc. 26^th Int. Conf. Neural Information Processing Systems, Lake, Tahoe, NV, USA, 2013, pp. 1995–2003.

Google Scholar

[21]

A. Ene, S. Im, and B. Moseley, Fast clustering using MapReduce, in Proc. 17^th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, San Diego, CA, USA, 2011, pp. 681–689.

Crossref Google Scholar

[22]

B. Mirzasoleiman, A. Karbasi, R. Sarkar, and A. Krause, Distributed submodular maximization: Identifying representative elements in massive data, in Proc. 26^th Int. Conf. Neural Information Processing Systems, Lake Tahoe, NV, USA, 2013, pp. 2049–2057.

Google Scholar

[23]

V. Cohen-Addad, C. Schwiegelshohn, and C. Sohler, Diameter and k-center in sliding windows, in Proc. 43^rd Int. Colloquium on Automata, Languages, and Programming, Rome, Italy, 2016, pp. 19:1–9:12.

Google Scholar

[24]

P. Pellizzoni, A. Pietracaprina, and G. Pucci, Dimensionality-adaptive k-center in sliding windows, in Proc. 2020 IEEE 7^th Int. Conf. Data Science and Advanced Analytics (DSAA), Sydney, Australia, 2020, pp. 197–206.

Crossref Google Scholar

[25]

P. Pellizzoni, A. Pietracaprina, and G. Pucci, k-center clustering with outliers in sliding windows, Algorithms, vol. 15, no. 2, p. 52, 2022.

Crossref Google Scholar

[26]

H. Ding, H. Yu, and Z. Wang, Greedy strategy works for k-center clustering with outliers and coreset construction, in Proc. 27^th Annu. European Symp. Algorithms (ESA 2019), Munich/Garching, Germany, 2019, pp. 40:1–40:16.

Google Scholar

[27]

H. Zarrabi-Zadeh, Core-preserving algorithms, in Proc. 20^th Annu. Canadian Conference on Computational Geometry, Montreal, Canada, 2008, pp. 159–162.

Google Scholar

[28]

F. Yuan, L. Diao, D. Du, and L. Liu, Distributed fair k-Center clustering problems with outliers, in Proc. 22^nd Int. Conf. Parallel and Distributed Computing, Applications and Technologies, Guangzhou, China, 2022, pp. 430–440.

Crossref Google Scholar

[29]

S. Kale, Small space stream summary for matroid center, arXiv preprint arXiv: 1810.06267, 2020.

Google Scholar

[30]

L. E. Celis, V. Keswani, D. Straszak, A. Deshpande, T. Kathuria, and N. K. Vishnoi, Fair and diverse DPP-based data summarization, in Proc. 35^th Int. Conf. Machine Learning, Stockholm, Sweden, 2018, pp. 716–725.

Google Scholar

[31]

S. Ahmadian, A. Epasto, R. Kumar, and M. Mahdian, Clustering without over-representation, in Proc. 25th ACM SIGKDD Int. Conf. on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 2019, pp. 267–275.

Crossref Google Scholar

[32]

S. K. Bera, D. Chakrabarty, N. J. Flores, and M. Negahbani, Fair algorithms for clustering, in Proc. 33rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 4954–4965.

Google Scholar

[33]

S. Bandyapadhyay, T. Inamdar, S. Pai, and K. Varadarajan, A constant approximation for colorful k-center, arXiv preprint arXiv: 1907.08906, 2019.

Google Scholar

[34]

F. Chierichetti, R. Kumar, S. Lattanzi, and S. Vassilvitskii, Fair clustering through fairlets, in Proc. 31^st Int. Conf. Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 5029–5037.

Google Scholar

[35]

X. Jia, K. Sheth, and O. Svensson, Fair colorful k-center clustering, Math. Program., vol. 192, nos. 1&2, pp. 339–360, 2022.

Crossref Google Scholar

[36]

M. Schmidt, C. Schwiegelshohn, and C. Sohler, Fair coresets and streaming algorithms for fair k-means, in Proc. 17^th Int. Workshop on Approximation and Online Algorithms, Munich, Germany, 2019, pp. 232–251.

Crossref Google Scholar

[37]

I. O. Bercea, M. Groβ, S. Khuller, A. Kumar, C. Rösner, D. R. Schmidt, and M. Schmidt, On the cost of essentially fair clusterings, in Proc. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, Cambridge, MA, USA, 2019, pp. 18:1–18:22.

Google Scholar

[38]

M. Böhm, A. Fazzone, S. Leonardi, C. Menghini, and C. Schwiegelshohn, Algorithms for fair k-clustering with multiple protected attributes, Oper. Res. Lett., vol. 49, no. 5, pp. 787–789, 2021.

Crossref Google Scholar

[39]

M. Hajiaghayi, R. Khandekar, and G. Kortsarz, Budgeted red-blue median and its generalizations, in Proc. 18^th Annu. European Symp. Algorithms, Liverpool, UK, 2010, pp. 314–325.

Crossref Google Scholar

[40]

R. Krishnaswamy, A. Kumar, V. Nagarajan, Y. Sabharwal, and B. Saha, The matroid median problem, in Proc. 2011 Annu. ACM-SIAM Symp. Discrete Algorithms (SODA), San Francisco, CA, USA, 2011, pp. 1117–1130.

Crossref Google Scholar

[41]

M. E. Dyer and A. M. Frieze, A simple heuristic for the p-centre problem, Oper. Res. Lett., vol. 3, no. 6, pp. 285–288, 1985.

Crossref Google Scholar

Tsinghua Science and Technology

Volume 28 Issue 6,
December 2023

Pages 1072-1084

DOI: 10.26599/TST.2023.9010013

Cite this article:

Yuan F, Diao L, Du D, et al. Fair k-Center Problem with Outliers on Massive Data. Tsinghua Science and Technology, 2023, 28(6): 1072-1084. https://doi.org/10.26599/TST.2023.9010013

501

Views

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 04 October 2022

Revised: 02 December 2022

Accepted: 13 March 2023

Published: 28 July 2023

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).