Scholar - SciOpen

The interactions between circular RNAs (circRNAs) and microRNAs are one of the key mechanisms determining the functions of non-coding RNAs (ncRNAs) in biological processes such as DNA methylation and RNA-induced silencing. Studying these relationships can deepen our understanding of the function of these RNAs’ roles in developing cancer vaccines and designing treatments. Therefore, we propose a knowledge graph enhanced pre-trained Large Language Model (LLM) for predicting circRNA-microRNA interactions. Our approach employs graph contrastive learning to represent a knowledge graph consisting of circRNA and microRNA entities from multi-views. The features of these entities are derived by fine-tuning a sequential LLM by two types of ncRNAs separately. At the final, the embedding is fed into classifier for prediction. We employ an independent testing set to evaluate the model’s performance and against our model with recently reported models on two datasets. Our model achieves approximately a 3% improvement in Area Under the Receiver Operating Characteristic Curve (AUROC), reaching 93.77% and 93.07%, respectively. The stability of our model is tested by performing 10-fold cross-validation on the remaining training set where our model performs the best stability. In ablation study, we comprehensively compare strategies for sequence processing and effectiveness of independent module. Finally, on a case study dataset derived from real-world scenarios, the model assign scores to all candidates and rank them accordingly. Among the top 10 highest-scoring results, 7 have been validated by wet-lab experiments, highlighting the model’s strong generalization capability.

Open Access Issue

SpaCCC: Large Language Model-Based Cell-Cell Communication Inference for Spatially Resolved Transcriptomic Data

Boya Ji, Xiaoqi Wang, Debin Qiao, Liwen Xu, Shaoliang Peng

Big Data Mining and Analytics 2024, 7(4): 1129-1147

Published: 04 December 2024

Abstract

PDF (28 MB) Collect Collected

Downloads：291

Drawing parallels between linguistic constructs and cellular biology, Large Language Models (LLMs) have achieved success in diverse downstream applications for single-cell data analysis. However, to date, it still lacks methods to take advantage of LLMs to infer Ligand-Receptor (LR)-mediated cell-cell communications for spatially resolved transcriptomic data. Here, we propose SpaCCC to facilitate the inference of spatially resolved cell-cell communications, which relies on our fine-tuned single-cell LLM and functional gene interaction network to embed ligand and receptor genes into a unified latent space. The LR pairs with a significant closer distance in latent space are taken to be more likely to interact with each other. After that, the molecular diffusion and permutation test strategies are respectively employed to calculate the communication strength and filter out communications with low specificities. The benchmarked performance of SpaCCC is evaluated on real single-cell spatial transcriptomic datasets with superiority over other methods. SpaCCC also infers known LR pairs concealed by existing aggregative methods and then identifies communication patterns for specific cell types and their signaling pathways. Furthermore, SpaCCC provides various cell-cell communication visualization results at both single-cell and cell type resolution. In summary, SpaCCC provides a sophisticated and practical tool allowing researchers to decipher spatially resolved cell-cell communications and related communication patterns and signaling pathways based on spatial transcriptome data. SpaCCC is free and publicly available at https://github.com/jiboyalab/SpaCCC.

Total 2