Journal Home > Volume 8 , Issue 3

Single-cell RNA sequencing (scRNA-seq) is a revolutionary tool to explore cells. With an increasing number of scRNA-seq data analysis tools that have been developed, it is challenging for users to choose and compare their performance. Here, we present an overview of the workflow for computational analysis of scRNA-seq data. We detail the steps of a typical scRNA-seq analysis, including experimental design, pre-processing and quality control, feature selection, dimensionality reduction, cell clustering and annotation, and downstream analysis including batch correction, trajectory inference and cell–cell communication. We provide guidelines according to our best practice. This review will be helpful for the experimentalists interested in analyzing their data, and will aid the users seeking to update their analysis pipelines.

Full text
About this article

Practical bioinformatics pipelines for single-cell RNA-seq data analysis

Show Author's information Jiangping He1Lihui Lin2Jiekai Chen1,2( )
Center for Cell Lineage and Atlas (CCLA), Bioland Laboratory (Guangzhou Regenerative Medicine and Health Guangdong Laboratory), Guangzhou 510320, China
Key Laboratory of Regenerative Biology of the Chinese Academy of Sciences and Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China


Single-cell RNA sequencing (scRNA-seq) is a revolutionary tool to explore cells. With an increasing number of scRNA-seq data analysis tools that have been developed, it is challenging for users to choose and compare their performance. Here, we present an overview of the workflow for computational analysis of scRNA-seq data. We detail the steps of a typical scRNA-seq analysis, including experimental design, pre-processing and quality control, feature selection, dimensionality reduction, cell clustering and annotation, and downstream analysis including batch correction, trajectory inference and cell–cell communication. We provide guidelines according to our best practice. This review will be helpful for the experimentalists interested in analyzing their data, and will aid the users seeking to update their analysis pipelines.

Keywords: Single-cell RNA sequencing (scRNA-seq), scRNA-seq analysis, Practical bioinformatics pipeline


Anders S, Pyl PT, Huber W (2015) HTSeq - A Python framework to work with high-throughput sequencing data. Bioinformatics 31(2): 166−169

Andrews TS, Kiselev VY, McCarthy D, Hemberg M (2021) Tutorial: guidelines for the computational analysis of single-cell RNA sequencing data. Nat Protoc 16(1): 1−9

Armingol E, Officer A, Harismendy O, Lewis NE (2021) Deciphering cell-cell interactions and communication from gene expression. Nat Rev Genet 22(2): 71−88

Bacher R, Kendziorski C (2016) Design and computational analysis of single-cell RNA-sequencing experiments. Genome Biol 17: 63.

Bais AS, Kostka D (2020) scds: computational annotation of doublets in single-cell RNA sequencing data. Bioinformatics 36(4): 1150−1158

Baran-Gale J, Chandra T, Kirschner K (2018) Experimental design for single-cell RNA sequencing. Brief Funct Genomics 17(4): 233−239

Barkas N, Petukhov V, Nikolaeva D, Lozinsky Y, Demharter S, Khodosevich K, Kharchenko PV (2019) Joint analysis of heterogeneous single-cell RNA-seq dataset collections. Nat Methods 16(8): 695−698

Becht E, McInnes L, Healy J, Dutertre CA, Kwok IWH, Ng LG, Ginhoux F, Newell EW (2018) Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol.
Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech10: P10008.

Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30(15): 2114−2120

Brennecke P, Anders S, Kim JK, Kolodziejczyk AA, Zhang X, Proserpio V, Baying B, Benes V, Teichmann SA, Marioni JC, Heisler MG (2013) Accounting for technical noise in single-cell RNA-seq experiments. Nat Methods 10(11): 1093−1095

Browaeys R, Saelens W, Saeys Y (2020) NicheNet: modeling intercellular communication by linking ligands to target genes. Nat Methods 17(2): 159−162

Butler A, Hoffman P, Smibert P, Papalexi E, Satija R (2018) Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol 36(5): 411−420

Buttner M, Miao Z, Wolf FA, Teichmann SA, Theis FJ (2019) A test metric for assessing single-cell RNA-seq batch correction. Nat Methods 16(1): 43−49

Cao J, Cusanovich DA, Ramani V, Aghamirzaie D, Pliner HA, Hill AJ, Daza RM, McFaline-Figueroa JL, Packer JS, Christiansen L, Steemers FJ, Adey AC, Trapnell C, Shendure J (2018) Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science 361(6409): 1380−1385

Clarke ZA, Andrews TS, Atif J, Pouyabahar D, Innes BT, MacParland SA, Bader GD (2021) Tutorial: guidelines for annotating single-cell transcriptomic maps using automated and manual methods. Nat Protoc 16(6): 2749−2764

Cole MB, Risso D, Wagner A, DeTomaso D, Ngai J, Purdom E, Dudoit S, Yosef N (2019) Performance assessment and selection of normalization procedures for single-cell RNA-Seq. Cell Syst 8(4): 315−328

Crow M, Paul A, Ballouz S, Huang ZJ, Gillis J (2018) Characterizing the replicability of cell types defined by single cell RNA-sequencing data using MetaNeighbor. Nat Commun 9(1): 884.

Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1): 15−21

Efremova M, Vento-Tormo M, Teichmann SA, Vento-Tormo R (2020) CellPhoneDB: inferring cell-cell communication from combined expression of multi-subunit ligand-receptor complexes. Nat Protoc 15(4): 1484−1506

Feng H, Lin L, Chen J (2022) scDIOR: single cell RNA-seq data IO software. BMC Bioinformatics 23(1): 16.

Goke J, Lu X, Chan YS, Ng HH, Ly LH, Sachs F, Szczerbinska I (2015) Dynamic transcription of distinct classes of endogenous retroviral elements marks specific populations of early human embryonic cells. Cell Stem Cell 16(2): 135−141

Grindberg RV, Yee-Greenbaum JL, McConnell MJ, Novotny M, O'Shaughnessy AL, Lambert GM, Arauzo-Bravo MJ, Lee J, Fishman M, Robbins GE, Lin X, Venepally P, Badger JH, Galbraith DW, Gage FH, Lasken RS (2013) RNA-sequencing from single nuclei. Proc Natl Acad Sci USA 110(49): 19802−19807

Guo L, Lin L, Wang X, Gao M, Cao S, Mai Y, Wu F, Kuang J, Liu H, Yang J, Chu S, Song H, Li D, Liu Y, Wu K, Liu J, Wang J, Pan G, Hutchins AP, Liu J, Pei D, Chen J (2019) Resolving cell fate decisions during somatic cell reprogramming by single-cell RNA-Seq. Mol Cell 73(4): 815−829

Haghverdi L, Lun ATL, Morgan MD, Marioni JC (2018) Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat Biotechnol 36(5): 421−427

Hainer SJ, Boskovic A, McCannell KN, Rando OJ, Fazzio TG (2019) Profiling of pluripotency factors in single cells and early embryos. Cell 177(5): 1319−1329

He J, Babarinde IA, Sun L, Xu S, Chen R, Shi J, Wei Y, Li Y, Ma G, Zhuang Q, Hutchins AP, Chen J (2021) Identifying transposable element expression dynamics and heterogeneity during development at the single-cell level with a processing pipeline scTE. Nat Commun 12(1): 1456.

He J, Cai S, Feng H, Cai B, Lin L, Mai Y, Fan Y, Zhu A, Huang H, Shi J, Li D, Wei Y, Li Y, Zhao Y, Pan Y, Liu H, Mo X, He X, Cao S, Hu F, Zhao J, Wang J, Zhong N, Chen X, Deng X, Chen J (2020) Single-cell analysis reveals bronchoalveolar epithelial dysfunction in COVID-19 patients. Protein Cell 11(9): 680−687

Hie B, Bryson B, Berger B (2019) Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat Biotechnol 37(6): 685−691

Jiang L, Chen H, Pinello L, Yuan GC (2016) GiniClust: detecting rare cell types from single-cell gene expression data with Gini index. Genome Biol 17(1): 144.

Jin S, Guerrero-Juarez CF, Zhang L, Chang I, Ramos R, Kuan CH, Myung P, Plikus MV, Nie Q (2021) Inference and analysis of cell-cell communication using CellChat. Nat Commun 12(1): 1088.

Kaminow B, Yunusov D, Dobin A (2021) STARsolo: accurate, fast and versatile mapping/quantification of single-cell and single-nucleus RNA-seq data. bioRxiv.

Kiselev VY, Kirschner K, Schaub MT, Andrews T, Yiu A, Chandra T, Natarajan KN, Reik W, Barahona M, Green AR, Hemberg M (2017) SC3: consensus clustering of single-cell RNA-seq data. Nat Methods 14(5): 483−486

Kiselev VY, Yiu A, Hemberg M (2018) scmap: projection of single-cell RNA-seq data across data sets. Nat Methods 15(5): 359−362

Korsunsky I, Millard N, Fan J, Slowikowski K, Zhang F, Wei K, Baglaenko Y, Brenner M, Loh PR, Raychaudhuri S (2019) Fast, sensitive and accurate integration of single-cell data with Harmony. Nat Methods 16(12): 1289−1296

La Manno G, Soldatov R, Zeisel A, Braun E, Hochgerner H, Petukhov V, Lidschreiber K, Kastriti ME, Lonnerberg P, Furlan A, Fan J, Borm LE, Liu Z, van Bruggen D, Guo J, He X, Barker R, Sundstrom E, Castelo-Branco G, Cramer P, Adameyko I, Linnarsson S, Kharchenko PV (2018) RNA velocity of single cells. Nature 560(7719): 494−498

Lacar B, Linker SB, Jaeger BN, Krishnaswami SR, Barron JJ, Kelder MJE, Parylak SL, Paquola ACM, Venepally P, Novotny M, O'Connor C, Fitzpatrick C, Erwin JA, Hsu JY, Husband D, McConnell MJ, Lasken R, Gage FH (2016) Nuclear RNA-seq of single neurons reveals molecular signatures of activation. Nat Commun 7: 11022.

Lafzi A, Moutinho C, Picelli S, Heyn H (2018) Tutorial: guidelines for the experimental design of single-cell RNA sequencing studies. Nat Protoc 13(12): 2742−2757

Lareau CA, Ma S, Duarte FM, Buenrostro JD (2020) Inference and effects of barcode multiplets in droplet-based single-cell assays. Nat Commun 11(1): 866.

Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, Geman D, Baggerly K, Irizarry RA (2010) Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet 11(10): 733−739

Li B, Dewey CN (2011) RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12: 323.

Lin Y, Ghazanfar S, Wang KYX, Gagnon-Bartsch JA, Lo KK, Su X, Han ZG, Ormerod JT, Speed TP, Yang P, Yang JYH (2019) scMerge leverages factor analysis, stable expression, and pseudoreplication to merge multiple single-cell RNA-seq datasets. Proc Natl Acad Sci USA 116(20): 9775−9784

Litvinukova M, Talavera-Lopez C, Maatz H, Reichart D, Worth CL, Lindberg EL, Kanda M, Polanski K, Heinig M, Lee M, Nadelmann ER, Roberts K, Tuck L, Fasouli ES, DeLaughter DM, McDonough B, Wakimoto H, Gorham JM, Samari S, Mahbubani KT, Saeb-Parsy K, Patone G, Boyle JJ, Zhang H, Zhang H, Viveiros A, Oudit GY, Bayraktar OA, Seidman JG, Seidman CE, Noseda M, Hubner N, Teichmann SA (2020) Cells of the adult human heart. Nature 588(7838): 466−472

Liu J, Gao C, Sodicoff J, Kozareva V, Macosko EZ, Welch JD (2020a) Jointly defining cell types from multiple single-cell datasets using LIGER. Nat Protoc 15(11): 3632−3662

Liu X, Zhu A, He J, Chen Z, Liu L, Xu Y, Ye F, Feng H, Luo L, Cai B, Mai Y, Lin L, Zhang Z, Chen S, Shi J, Wen L, Wei Y, Zhuo J, Zhao Y, Li F, Wei X, Chen D, Zhang X, Zhong N, Huang Y, Liu H, Wang J, Xu X, Wang J, Chen R, Chen X, Zhong N, Zhao J, Li Y, Zhao J, Chen J (2020b) Single-cell analysis reveals macrophage-driven T cell dysfunction in severe COVID-19 patients. medRxiv.

Liu Y, Wang T, Zhou B, Zheng D (2021) Robust integration of multiple single-cell RNA sequencing datasets using a single reference space. Nat Biotechnol 39(7): 877−884

Luecken MD, Theis FJ (2019) Current best practices in single-cell RNA-seq analysis: a tutorial. Mol Syst Biol 15(6): e8746.

Lun AT, Bach K, Marioni JC (2016) Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol 17: 75.

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J 17(1): 3.

McGinnis CS, Murrow LM, Gartner ZJ (2019) DoubletFinder: doublet detection in single-cell RNA sequencing data using artificial nearest neighbors. Cell Syst 8(4): 329−337

Mohammed H, Hernando-Herraez I, Savino A, Scialdone A, Macaulay I, Mulas C, Chandra T, Voet T, Dean W, Nichols J, Marioni JC, Reik W (2017) Single-cell landscape of transcriptional heterogeneity and cell fate decisions during mouse early gastrulation. Cell Rep 20(5): 1215−1228

Nowotschin S, Setty M, Kuo YY, Liu V, Garg V, Sharma R, Simon CS, Saiz N, Gardner R, Boutet SC, Church DM, Hoodless PA, Hadjantonakis AK, Pe'er D (2019) The emergent landscape of the mouse gut endoderm at single-cell resolution. Nature 569(7756): 361−367

Paik DT, Cho S, Tian L, Chang HY, Wu JC (2020) Single-cell RNA sequencing in cardiovascular development, disease and medicine. Nat Rev Cardiol 17(8): 457−473

Papalexi E, Satija R (2018) Single-cell RNA sequencing to explore immune cell heterogeneity. Nat Rev Immunol 18(1): 35−45

Pliner HA, Shendure J, Trapnell C (2019) Supervised classification enables rapid annotation of cell atlases. Nat Methods 16(10): 983−986

Potter SS (2018) Single-cell RNA sequencing for the study of development, physiology and disease. Nat Rev Nephrol 14(8): 479−492

Regev A, Teichmann SA, Lander ES, Amit I, Benoist C, Birney E, Bodenmiller B, Campbell P, Carninci P, Clatworthy M, Clevers H, Deplancke B, Dunham I, Eberwine J, Eils R, Enard W, Farmer A, Fugger L, Gottgens B, Hacohen N, Haniffa M, Hemberg M, Kim S, Klenerman P, Kriegstein A, Lein E, Linnarsson S, Lundberg E, Lundeberg J, Majumder P, Marioni JC, Merad M, Mhlanga M, Nawijn M, Netea M, Nolan G, Pe'er D, Phillipakis A, Ponting CP, Quake S, Reik W, Rozenblatt-Rosen O, Sanes J, Satija R, Schumacher TN, Shalek A, Shapiro E, Sharma P, Shin JW, Stegle O, Stratton M, Stubbington MJT, Theis FJ, Uhlen M, van Oudenaarden A, Wagner A, Watt F, Weissman J, Wold B, Xavier R, Yosef N, Human Cell Atlas Meeting P (2017) The Human Cell Atlas. Elife 6: e27041.

Ren X, Wen W, Fan X, Hou W, Su B, Cai P, Li J, Liu Y, Tang F, Zhang F, Yang Y, He J, Ma W, He J, Wang P, Cao Q, Chen F, Chen Y, Cheng X, Deng G, Deng X, Ding W, Feng Y, Gan R, Guo C, Guo W, He S, Jiang C, Liang J, Li YM, Lin J, Ling Y, Liu H, Liu J, Liu N, Liu SQ, Luo M, Ma Q, Song Q, Sun W, Wang G, Wang F, Wang Y, Wen X, Wu Q, Xu G, Xie X, Xiong X, Xing X, Xu H, Yin C, Yu D, Yu K, Yuan J, Zhang B, Zhang P, Zhang T, Zhao J, Zhao P, Zhou J, Zhou W, Zhong S, Zhong X, Zhang S, Zhu L, Zhu P, Zou B, Zou J, Zuo Z, Bai F, Huang X, Zhou P, Jiang Q, Huang Z, Bei JX, Wei L, Bian XW, Liu X, Cheng T, Li X, Zhao P, Wang FS, Wang H, Su B, Zhang Z, Qu K, Wang X, Chen J, Jin R, Zhang Z (2021) COVID-19 immune features revealed by a large-scale single-cell transcriptome atlas. Cell 184(7): 1895−1913

Risso D, Ngai J, Speed TP, Dudoit S (2014) Normalization of RNA-seq data using factor analysis of control genes or samples. Nat Biotechnol 32(9): 896−902

Risso D, Perraudeau F, Gribkova S, Dudoit S, Vert JP (2018) A general and flexible method for signal extraction from single-cell RNA-seq data. Nat Commun 9(1): 284.

Saelens W, Cannoodt R, Todorov H, Saeys Y (2019) A comparison of single-cell trajectory inference methods. Nat Biotechnol 37(5): 547−554

Setty M, Kiseliovas V, Levine J, Gayoso A, Mazutis L, Pe’er D (2019) Characterization of cell fate probabilities in single-cell data with Palantir. Nat Biotechnol 37(4): 451−460

Setty M, Tadmor MD, Reich-Zeliger S, Angel O, Salame TM, Kathail P, Choi K, Bendall S, Friedman N, Pe'er D (2016) Wishbone identifies bifurcating developmental trajectories from single-cell data. Nat Biotechnol 34(6): 637−645

Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM, 3rd, Hao Y, Stoeckius M, Smibert P, Satija R (2019) Comprehensive integration of single-cell data. Cell 177(7): 1888−1902

Stuart T, Satija R (2019) Integrative single-cell analysis. Nat Rev Genet 20(5): 257−272

Traag VA, Waltman L, van Eck NJ (2019) From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep 9(1): 5233.

Trapnell C, Cacchiarelli D, Grimsby J, Pokharel P, Li S, Morse M, Lennon NJ, Livak KJ, Mikkelsen TS, Rinn JL (2014) The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol 32(4): 381−386

van der Maaten L, Hinton G (2008) Viualizing data using t-SNE. J Mach Learn Res 9: 2579−2605

Wang Q, Xiong H, Ai S, Yu X, Liu Y, Zhang J, He A (2019) CoBATCH for high-throughput single-cell epigenomic profiling. Mol Cell 76(1): 206−216

Wolf FA, Angerer P, Theis FJ (2018) SCANPY: large-scale single-cell gene expression data analysis. Genome Biol 19(1): 15.

Wolf FA, Hamey FK, Plass M, Solana J, Dahlin JS, Gottgens B, Rajewsky N, Simon L, Theis FJ (2019) PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. Genome Biol 20(1): 59.

Wolock SL, Lopez R, Klein AM (2019) Scrublet: computational identification of cell doublets in single-cell transcriptomic data. Cell Syst 8(4): 281−291

Yu S, Zhou C, He J, Yao Z, Huang X, Rong B, Zhu H, Wang S, Chen S, Wang X, Cai B, Zhao G, Chen Y, Xiao L, Liu H, Qin Y, Guo J, Wu H, Zhang Z, Zhang M, Zhao X, Lan F, Wang Y, Chen J, Cao S, Pei D, Liu J (2022) BMP4 drives primed to naive transition through PGC-like state. Nat Commun 13(1): 2756.

Zappia L, Phipson B, Oshlack A (2018) Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database. PLoS Comput Biol 14(6): e1006245.

Zhang AW, O'Flanagan C, Chavez EA, Lim JLP, Ceglia N, McPherson A, Wiens M, Walters P, Chan T, Hewitson B, Lai D, Mottok A, Sarkozy C, Chong L, Aoki T, Wang X, Weng AP, McAlpine JN, Aparicio S, Steidl C, Campbell KR, Shah SP (2019) Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling. Nat Methods 16(10): 1007−1015

Zheng GX, Terry JM, Belgrader P, Ryvkin P, Bent ZW, Wilson R, Ziraldo SB, Wheeler TD, McDermott GP, Zhu J, Gregory MT, Shuga J, Montesclaros L, Underwood JG, Masquelier DA, Nishimura SY, Schnall-Levin M, Wyatt PW, Hindson CM, Bharadwaj R, Wong A, Ness KD, Beppu LW, Deeg HJ, McFarland C, Loeb KR, Valente WJ, Ericson NG, Stevens EA, Radich JP, Mikkelsen TS, Hindson BJ, Bielas JH (2017) Massively parallel digital transcriptional profiling of single cells. Nat Commun 8: 14049.

Publication history
Rights and permissions

Publication history

Received: 19 August 2021
Accepted: 01 March 2022
Published: 25 July 2022
Issue date: June 2022


© The Author(s) 2022



We thank research grants from National Key R&D Program of China (2021YFA1102200), Frontier Science Research Program of the CAS (ZDBS-LY-SM007), Science and Technology Planning Project of Guangdong Province, China (2020B1212060052)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit