A Flexible Data-Driven Framework for Correcting Coarsely Annotated scRNA-seq Data

Ruiqing Zheng; Yongxin He; Jiawen Huang; Shichao Kan; Hui Wang; Edwin Wang; Min Li

doi:10.26599/BDMA.2025.9020009

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

PDF (5.5 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

A Flexible Data-Driven Framework for Correcting Coarsely Annotated scRNA-seq Data

Ruiqing Zheng^¹, Yongxin He^¹, Jiawen Huang^¹, Shichao Kan^¹, Hui Wang^², Edwin Wang^³, Min Li^¹(

)

1School of Computer Science and Engineering, Central South University, Changsha 410083, China

2Key Laboratory of Molecular Biophysics, Hebei Province, Institute of Biophysics, School of Health Sciences and Biomedical Engineering, Hebei University of Technology, Tianjin 300401, China

3Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary, T2N 4N1, Canada

Show Author Information

Abstract

Cells are the fundamental units of life and exhibit significant diversity in structure, behavior, and function, known as cell heterogeneity. The advent and development of single-cell RNA sequencing (scRNA-seq) technology have provided a crucial data foundation for studying cellular heterogeneity. Currently, most computational methods based on scRNA-seq involve a sequential process of clustering followed by annotation. However, those clustering-based methods are susceptible to the selection of genes and clustering parameters, resulting in inaccuracies in cell annotation. To address this issue, we develop a flexible data-driven cell correction framework based on partially annotated scRNA-seq data. This framework employs a neighborhood purity strategy and global selection strategies to select the anchor cells. Then, it optimizes a prediction neural network model using a classification loss with a contrastive regularization term to correct the labels of the remaining cells. The validity of this correction framework is demonstrated through various assessments on real scRNA-seq datasets. Based on the correct labels of scRNA-seq data, we further assess the latest unsupervised clustering methods, thereby establishing a more objective benchmark to compare their performance.

Keywords

single-cell RNA sequencing (scRNA-seq)cell heterogeneity cell annotation supervised contrastive learning

References

【1】

Crossref Google Scholar

Big Data Mining and Analytics

Volume 8 Issue 5,
October 2025

Pages 997-1010

DOI: 10.26599/BDMA.2025.9020009

	{{item.num}}
{{version.versionName}} Author Response
{{version.versionName}} Review comment

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Cite this Report

. . , , {{reviewData.reportCite.doi}}

Cite this article:

Zheng R, He Y, Huang J, et al. A Flexible Data-Driven Framework for Correcting Coarsely Annotated scRNA-seq Data. Big Data Mining and Analytics, 2025, 8(5): 997-1010. https://doi.org/10.26599/BDMA.2025.9020009

1113

Views

154

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Received: 28 September 2024

Revised: 08 January 2025

Accepted: 18 January 2025

Published: 14 July 2025

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).