Scholar - SciOpen

Convex clustering, turning clustering into a convex optimization problem, has drawn wide attention. It overcomes the shortcomings of traditional clustering methods such as K-means, Density-Based Spatial Clustring of Applications with Noise (DBSCAN) and hierarchical clustering that can easily fall into the local optimal solution. However, convex clustering is vulnerable to the occurrence of outlier features, as it uses the Frobenius norm to measure the distance between data points and their corresponding cluster centers and evaluate clusters. To accurately identify outlier features, this paper decomposes data into a clustering structure component and a normalized component that captures outlier features. Different from existing convex clustering evaluating features with the exact measurement, the proposed model can overcome the vast difference in the magnitude of different features and the outlier features can be efficiently identified and removed. To solve the proposed model, we design an efficient algorithm and prove the global convergence of the algorithm. Experiments on both synthetic datasets and UCI datasets demonstrate that the proposed method outperforms the compared approaches in convex clustering.

Open Access Issue

Approximating

{(m}_{B}, m_{P})

-Monotone BP Maximization and Extensions

Ruiqi Yang, Suixiang Gao, Lu Han, Gaidi Li, Zhongrui Zhao

Tsinghua Science and Technology 2023, 28 (5): 906-915

Published: 19 May 2023

Abstract

PDF (1.4 MB) Collect Collected

Downloads：34

The paper proposes the optimization problem of maximizing the sum of suBmodular and suPermodular (BP) functions with partial monotonicity under a streaming fashion. In this model, elements are randomly released from the stream and the utility is encoded by the sum of partial monotone suBmodular and suPermodular functions. The goal is to determine whether a subset from the stream of size bounded by parameter $k$ subject to the summarized utility is as large as possible. In this work, a threshold-based streaming algorithm is presented for the BP maximization that attains a $((1 - κ) / (2 - κ) - 𝒪 (ε))$ -approximation with $𝒪 (1 / ε^{4} \log^{3} (1 / ε) \log ((2 - κ) k / {(1 - κ)}^{2}))$ memory complexity, where $κ$ denotes the parameter of supermodularity ratio. We further consider a more general model with fair constraints and present a greedy-based algorithm that obtains the same approximation. We finally investigate this fair model under the streaming fashion and provide a $({(1 - κ)}^{4} / {(2 - 2 κ + κ^{2})}^{2} - 𝒪 (ε))$ -approximation algorithm.

Total 2