Sort:
Open Access Issue
A Flexible Data-Driven Framework for Correcting Coarsely Annotated scRNA-seq Data
Big Data Mining and Analytics 2025, 8(5): 997-1010
Published: 14 July 2025
Abstract PDF (5.5 MB) Collect
Downloads:148

Cells are the fundamental units of life and exhibit significant diversity in structure, behavior, and function, known as cell heterogeneity. The advent and development of single-cell RNA sequencing (scRNA-seq) technology have provided a crucial data foundation for studying cellular heterogeneity. Currently, most computational methods based on scRNA-seq involve a sequential process of clustering followed by annotation. However, those clustering-based methods are susceptible to the selection of genes and clustering parameters, resulting in inaccuracies in cell annotation. To address this issue, we develop a flexible data-driven cell correction framework based on partially annotated scRNA-seq data. This framework employs a neighborhood purity strategy and global selection strategies to select the anchor cells. Then, it optimizes a prediction neural network model using a classification loss with a contrastive regularization term to correct the labels of the remaining cells. The validity of this correction framework is demonstrated through various assessments on real scRNA-seq datasets. Based on the correct labels of scRNA-seq data, we further assess the latest unsupervised clustering methods, thereby establishing a more objective benchmark to compare their performance.

Total 1