AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

PDF (21.9 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access | Just Accepted

Enhancing Distance Entropy Preservation via L2 Normalization and Geodesic Distances in High-dimensional Single-Cell Data Visualization

Ziqi Rong^{¹^,^†}, Jinpu Cai^{¹^,^†}, Jiahao Qiu^², Pengcheng Xu^³, Lana X. Garmire^⁴, Qiuyu Lian^{⁵^,⁶}(

), Hongyi Xin^⁷(

)

¹ UM-SJTU Joint Institute, Shanghai Jiao Tong University, Shanghai 200240, China

² Department of Electrical and Computer Engineering, Princeton University, Princeton, NJ 08544, USA

³ Department of Computer Science, University of California, Irvine, CA 92697, USA

⁴ Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA

⁵ Gurdon Institute, University of Cambridge, Cambridge, CB2 1QN, UK

⁶ Department of Applied Mathematics and Theoretical Physics, Centre for Mathematical Sciences, University of Cambridge, Cambridge, CB3 0WA, UK

⁷ Global Institute of Future Technology and Department of Automation, Shanghai Jiao Tong University, Shanghai 200240, China

^† Ziqi Rong and Jinpu Cai contributed equally to this paper.

Show Author Information

Abstract

In the realm of high-dimensional single-cell sequencing data analysis, the accurate measurement of similarity between cells is pivotal. However, conventional metrics like Euclidean distance after L1-normalization may fail by losing distinguishable information when handling high-dimensional data, where the distance between different observations gradually converges to a shrinking interval. In this article, we use distance entropy to quantify the amount of information contained in the distances, and discuss the influence of normalization by different p-norms and the defect of Euclidean distance. We discover that observation differences are better preserved when normalizing data by a higher p-norm and using geodesic distance rather than Euclidean distance as the similarity measurement. We further identify that L2-normalization onto the hypersphere is often sufficient in preserving delicate differences even in relatively high dimensional data while maintaining computational efficiency. Subsequently, we present hypersphere t-distributed stochastic neighbor embedding (HS-SNE), a hypersphere-representation-system-based augmentation to t-SNE, which effectively addresses the intricacy of high-dimensional data visualization and similarity measurement. Our results on multiple single-cell sequencing datasets show that this hypersphere representation system has improved resolution to identify more subtle differences between high-dimensional data points, while balancing distance entropy preservation and computational efficiency.

Keywords

dimension reduction normalization geodesic distance manifold learning bioinformatics

References

【1】

Crossref Google Scholar

Big Data Mining and Analytics

	{{item.num}}
{{version.versionName}} Author Response
{{version.versionName}} Review comment

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Cite this Report

. . , , {{reviewData.reportCite.doi}}

Cite this article:

Rong Z, Cai J, Qiu J, et al. Enhancing Distance Entropy Preservation via L2 Normalization and Geodesic Distances in High-dimensional Single-Cell Data Visualization. Big Data Mining and Analytics, 2025, https://doi.org/10.26599/BDMA.2025.9020085

646

Views

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Received: 15 February 2025

Revised: 16 June 2025

Accepted: 21 July 2025

Available online: 21 November 2025

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).