Sort:
Open Access Research Article Issue
Group Collaborative Unsupervised Deep Metric Learning for Feature Embedding
Tsinghua Science and Technology 2026, 31(4): 2092-2103
Published: 24 December 2025
Abstract PDF (4.9 MB) Collect
Downloads:64

Learning a compact feature embedding is crucial for effective image representation. Current feature embedding methods, including both supervised and unsupervised approaches, rely on deep metric learning techniques that aim to pull positive samples of the same class closer and push negative samples from different classes farther apart. However, supervised metric learning methods may exhibit bias towards the ground truth labels, leading to overfitting on the training set. On the other hand, unsupervised metric learning methods could suffer from degraded performance due to the long-tailed distribution of the clusters. To address these challenges, we propose a group collaborative unsupervised deep metric learning method for feature embedding. Specifically, we train the deep feature embedding model based on the teacher-student framework. The student network produces the final compact embedding, while the teacher network generates pseudo-labels for group collaborative learning and knowledge distillation. Both networks share a similar network structure, and the parameters of the teacher network are updated using the momentum-based moving average of the parameters of the student network. Experimental results on benchmark image retrieval datasets demonstrate the effectiveness and efficiency of the proposed method, achieving an improvement in Recall@1 of up to 1.8%.

Open Access Issue
A Flexible Data-Driven Framework for Correcting Coarsely Annotated scRNA-seq Data
Big Data Mining and Analytics 2025, 8(5): 997-1010
Published: 14 July 2025
Abstract PDF (5.5 MB) Collect
Downloads:148

Cells are the fundamental units of life and exhibit significant diversity in structure, behavior, and function, known as cell heterogeneity. The advent and development of single-cell RNA sequencing (scRNA-seq) technology have provided a crucial data foundation for studying cellular heterogeneity. Currently, most computational methods based on scRNA-seq involve a sequential process of clustering followed by annotation. However, those clustering-based methods are susceptible to the selection of genes and clustering parameters, resulting in inaccuracies in cell annotation. To address this issue, we develop a flexible data-driven cell correction framework based on partially annotated scRNA-seq data. This framework employs a neighborhood purity strategy and global selection strategies to select the anchor cells. Then, it optimizes a prediction neural network model using a classification loss with a contrastive regularization term to correct the labels of the remaining cells. The validity of this correction framework is demonstrated through various assessments on real scRNA-seq datasets. Based on the correct labels of scRNA-seq data, we further assess the latest unsupervised clustering methods, thereby establishing a more objective benchmark to compare their performance.

Open Access Issue
SGCL-LncLoc: An Interpretable Deep Learning Model for Improving lncRNA Subcellular Localization Prediction with Supervised Graph Contrastive Learning
Big Data Mining and Analytics 2024, 7(3): 765-780
Published: 28 August 2024
Abstract PDF (7.9 MB) Collect
Downloads:646

Understanding the subcellular localization of long non-coding RNAs (lncRNAs) is crucial for unraveling their functional mechanisms. While previous computational methods have made progress in predicting lncRNA subcellular localization, most of them ignore the sequence order information by relying on k-mer frequency features to encode lncRNA sequences. In the study, we develope SGCL-LncLoc, a novel interpretable deep learning model based on supervised graph contrastive learning. SGCL-LncLoc transforms lncRNA sequences into de Bruijn graphs and uses the Word2Vec technique to learn the node representation of the graph. Then, SGCL-LncLoc applies graph convolutional networks to learn the comprehensive graph representation. Additionally, we propose a computational method to map the attention weights of the graph nodes to the weights of nucleotides in the lncRNA sequence, allowing SGCL-LncLoc to serve as an interpretable deep learning model. Furthermore, SGCL-LncLoc employs a supervised contrastive learning strategy, which leverages the relationships between different samples and label information, guiding the model to enhance representation learning for lncRNAs. Extensive experimental results demonstrate that SGCL-LncLoc outperforms both deep learning baseline models and existing predictors, showing its capability for accurate lncRNA subcellular localization prediction. Furthermore, we conduct a motif analysis, revealing that SGCL-LncLoc successfully captures known motifs associated with lncRNA subcellular localization. The SGCL-LncLoc web server is available at http://csuligroup.com:8000/SGCL-LncLoc. The source code can be obtained from https://github.com/CSUBioGroup/SGCL-LncLoc.

Total 3