Scholar - SciOpen

The development of single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) has significantly advanced the study of cell heterogeneity in the epigenetic landscape. Numerous studies have leveraged scATAC-seq data to explore deeper gene regulatory relationships. However, scATAC-seq usually faces dropout events which may result in data sparsity and noise. In this work, we propose a method (scMCG) for analyzing scATAC-seq data that employs contrastive learning and a generative adversarial network (GAN). First, the scMCG method uses two distinct encoders for contrastive learning to solve the issues of feature redundancy and data sparsity in scATAC-seq data. Subsequently, a generator is used to reconstruct the latent embedding. Finally, a decoder is used to generate binary accessibility. We conduct experiments on multiple scATAC-seq datasets. The results demonstrate that the scMCG method achieves excellent performance in multiple tasks such as cell clustering and transcription factor activity influence.

Open Access Issue

A Flexible Data-Driven Framework for Correcting Coarsely Annotated scRNA-seq Data

Ruiqing Zheng, Yongxin He, Jiawen Huang, Shichao Kan, Hui Wang, Edwin Wang, Min Li

Big Data Mining and Analytics 2025, 8(5): 997-1010

Published: 14 July 2025

Abstract

PDF (5.5 MB) Collect Collected

Downloads：156

Cells are the fundamental units of life and exhibit significant diversity in structure, behavior, and function, known as cell heterogeneity. The advent and development of single-cell RNA sequencing (scRNA-seq) technology have provided a crucial data foundation for studying cellular heterogeneity. Currently, most computational methods based on scRNA-seq involve a sequential process of clustering followed by annotation. However, those clustering-based methods are susceptible to the selection of genes and clustering parameters, resulting in inaccuracies in cell annotation. To address this issue, we develop a flexible data-driven cell correction framework based on partially annotated scRNA-seq data. This framework employs a neighborhood purity strategy and global selection strategies to select the anchor cells. Then, it optimizes a prediction neural network model using a classification loss with a contrastive regularization term to correct the labels of the remaining cells. The validity of this correction framework is demonstrated through various assessments on real scRNA-seq datasets. Based on the correct labels of scRNA-seq data, we further assess the latest unsupervised clustering methods, thereby establishing a more objective benchmark to compare their performance.

Open Access Issue

A Data-Driven Clustering Recommendation Method for Single-Cell RNA-Sequencing Data

Yu Tian, Ruiqing Zheng, Zhenlan Liang, Suning Li, Fang-Xiang Wu, Min Li

Tsinghua Science and Technology 2021, 26(5): 772-789

Published: 20 April 2021

Abstract

PDF (2.3 MB) Collect Collected

Downloads：149

Recently, the emergence of single-cell RNA-sequencing (scRNA-seq) technology makes it possible to solve biological problems at the single-cell resolution. One of the critical steps in cellular heterogeneity analysis is the cell type identification. Diverse scRNA-seq clustering methods have been proposed to partition cells into clusters. Among all the methods, hierarchical clustering and spectral clustering are the most popular approaches in the downstream clustering analysis with different preprocessing strategies such as similarity learning, dropout imputation, and dimensionality reduction. In this study, we carry out a comprehensive analysis by combining different strategies with these two categories of clustering methods on scRNA-seq datasets under different biological conditions. The analysis results show that the methods with spectral clustering tend to perform better on datasets with continuous shapes in two-dimension, while those with hierarchical clustering achieve better results on datasets with obvious boundaries between clusters in two-dimension. Motivated by this finding, a new strategy, called QRS, is developed to quantitatively evaluate the latent representative shape of a dataset to distinguish whether it has clear boundaries or not. Finally, a data-driven clustering recommendation method, called DDCR, is proposed to recommend hierarchical clustering or spectral clustering for scRNA-seq data. We perform DDCR on two typical single cell clustering methods, SC3 and RAFSIL, and the results show that DDCR recommends a more suitable downstream clustering method for different scRNA-seq datasets and obtains more robust and accurate results.

Total 3