AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (21.9 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access | Just Accepted

Enhancing Distance Entropy Preservation via L2 Normalization and Geodesic Distances in High-dimensional Single-Cell Data Visualization

Ziqi Rong1,Jinpu Cai1,Jiahao Qiu2Pengcheng Xu3Lana X. Garmire4Qiuyu Lian5,6( )Hongyi Xin7( )

1 UM-SJTU Joint Institute, Shanghai Jiao Tong University, Shanghai 200240, China

2 Department of Electrical and Computer Engineering, Princeton University, Princeton, NJ 08544, USA

3 Department of Computer Science, University of California, Irvine, CA 92697, USA

4 Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA

5 Gurdon Institute, University of Cambridge, Cambridge, CB2 1QN, UK

6 Department of Applied Mathematics and Theoretical Physics, Centre for Mathematical Sciences, University of Cambridge, Cambridge, CB3 0WA, UK

7 Global Institute of Future Technology and Department of Automation, Shanghai Jiao Tong University, Shanghai 200240, China

Ziqi Rong and Jinpu Cai contributed equally to this paper.

Show Author Information

Abstract

In the realm of high-dimensional single-cell sequencing data analysis, the accurate measurement of similarity between cells is pivotal. However, conventional metrics like Euclidean distance after L1-normalization may fail by losing distinguishable information when handling high-dimensional data, where the distance between different observations gradually converges to a shrinking interval. In this article, we use distance entropy to quantify the amount of information contained in the distances, and discuss the influence of normalization by different p-norms and the defect of Euclidean distance. We discover that observation differences are better preserved when normalizing data by a higher p-norm and using geodesic distance rather than Euclidean distance as the similarity measurement. We further identify that L2-normalization onto the hypersphere is often sufficient in preserving delicate differences even in relatively high dimensional data while maintaining computational efficiency. Subsequently, we present hypersphere t-distributed stochastic neighbor embedding (HS-SNE), a hypersphere-representation-system-based augmentation to t-SNE, which effectively addresses the intricacy of high-dimensional data visualization and similarity measurement. Our results on multiple single-cell sequencing datasets show that this hypersphere representation system has improved resolution to identify more subtle differences between high-dimensional data points, while balancing distance entropy preservation and computational efficiency.

References

【1】
【1】
 
 
Big Data Mining and Analytics

{{item.num}}

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Close
Close
Cite this article:
Rong Z, Cai J, Qiu J, et al. Enhancing Distance Entropy Preservation via L2 Normalization and Geodesic Distances in High-dimensional Single-Cell Data Visualization. Big Data Mining and Analytics, 2025, https://doi.org/10.26599/BDMA.2025.9020085

646

Views

51

Downloads

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Received: 15 February 2025
Revised: 16 June 2025
Accepted: 21 July 2025
Available online: 21 November 2025

© The author(s) 2025

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).