Open Access Issue
An Improved Robust Sparse Convex Clustering
Tsinghua Science and Technology 2023, 28 (6): 989-998
Published: 28 July 2023

Convex clustering, turning clustering into a convex optimization problem, has drawn wide attention. It overcomes the shortcomings of traditional clustering methods such as K-means, Density-Based Spatial Clustring of Applications with Noise (DBSCAN) and hierarchical clustering that can easily fall into the local optimal solution. However, convex clustering is vulnerable to the occurrence of outlier features, as it uses the Frobenius norm to measure the distance between data points and their corresponding cluster centers and evaluate clusters. To accurately identify outlier features, this paper decomposes data into a clustering structure component and a normalized component that captures outlier features. Different from existing convex clustering evaluating features with the exact measurement, the proposed model can overcome the vast difference in the magnitude of different features and the outlier features can be efficiently identified and removed. To solve the proposed model, we design an efficient algorithm and prove the global convergence of the algorithm. Experiments on both synthetic datasets and UCI datasets demonstrate that the proposed method outperforms the compared approaches in convex clustering.

Open Access Issue
Approximating (mB,mP)-Monotone BP Maximization and Extensions
Tsinghua Science and Technology 2023, 28 (5): 906-915
Published: 19 May 2023

The paper proposes the optimization problem of maximizing the sum of suBmodular and suPermodular (BP) functions with partial monotonicity under a streaming fashion. In this model, elements are randomly released from the stream and the utility is encoded by the sum of partial monotone suBmodular and suPermodular functions. The goal is to determine whether a subset from the stream of size bounded by parameter k subject to the summarized utility is as large as possible. In this work, a threshold-based streaming algorithm is presented for the BP maximization that attains a ((1-κ)/(2-κ)-𝒪(ε))-approximation with 𝒪(1/ε4log3(1/ε)log((2-κ)k/(1-κ)2)) memory complexity, where κ denotes the parameter of supermodularity ratio. We further consider a more general model with fair constraints and present a greedy-based algorithm that obtains the same approximation. We finally investigate this fair model under the streaming fashion and provide a ((1-κ)4/(2-2κ+κ2)2-𝒪(ε))-approximation algorithm.

total 2