AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (7.9 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

SGCL-LncLoc: An Interpretable Deep Learning Model for Improving lncRNA Subcellular Localization Prediction with Supervised Graph Contrastive Learning

School of Computer Science and Engineering, Central South University, Changsha 410083, China
Center for Artificial Intelligence in Drug Discovery, Case Western Reserve University, Cleveland, OH 44106, USA
Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL 32603, USA
Show Author Information

Abstract

Understanding the subcellular localization of long non-coding RNAs (lncRNAs) is crucial for unraveling their functional mechanisms. While previous computational methods have made progress in predicting lncRNA subcellular localization, most of them ignore the sequence order information by relying on k-mer frequency features to encode lncRNA sequences. In the study, we develope SGCL-LncLoc, a novel interpretable deep learning model based on supervised graph contrastive learning. SGCL-LncLoc transforms lncRNA sequences into de Bruijn graphs and uses the Word2Vec technique to learn the node representation of the graph. Then, SGCL-LncLoc applies graph convolutional networks to learn the comprehensive graph representation. Additionally, we propose a computational method to map the attention weights of the graph nodes to the weights of nucleotides in the lncRNA sequence, allowing SGCL-LncLoc to serve as an interpretable deep learning model. Furthermore, SGCL-LncLoc employs a supervised contrastive learning strategy, which leverages the relationships between different samples and label information, guiding the model to enhance representation learning for lncRNAs. Extensive experimental results demonstrate that SGCL-LncLoc outperforms both deep learning baseline models and existing predictors, showing its capability for accurate lncRNA subcellular localization prediction. Furthermore, we conduct a motif analysis, revealing that SGCL-LncLoc successfully captures known motifs associated with lncRNA subcellular localization. The SGCL-LncLoc web server is available at http://csuligroup.com:8000/SGCL-LncLoc. The source code can be obtained from https://github.com/CSUBioGroup/SGCL-LncLoc.

Electronic Supplementary Material

Download File(s)
BDMA-2023-0353_ESM.pdf (3.3 MB)

References

[1]

C.-C. Hon, J. A. Ramilowski, J. Harshbarger, N. Bertin, O. J. L. Rackham, J. Gough, E. Denisenko, S. Schmeier, T. M. Poulsen, J. Severin et al., An atlas of human long non-coding RNAs with accurate 5’ ends, Nature, vol. 543, no. 7644, pp. 199–204, 2017.

[2]
M. Zeng, C. Lu, Z. Fei, F.-X. Wu, Y. Li, J. Wang, and M. Li, DMFLDA: A deep learning framework for predicting lncRNA–disease associations, IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 18, no. 6, pp. 2353–2363, 2021.
[3]

J. J. Quinn and H. Y. Chang, Unique features of long non-coding RNA biogenesis and function, Nat. Rev. Genet., vol. 17, no. 1, pp. 47–62, 2016.

[4]

U. A. Ørom, T. Derrien, M. Beringer, K. Gumireddy, A. Gardini, G. Bussotti, F. Lai, M. Zytnicki, C. Notredame, Q. Huang, et al., Long noncoding RNAs with enhancer-like function in human cells, Cell, vol. 143, no. 1, pp. 46–58, 2010.

[5]

R. A. Gupta, N. Shah, K. C. Wang, J. Kim, H. M. Horlings, D. J. Wong, M. C. Tsai, T. Hung, P. Argani, J. L. Rinn, et al., Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis, Nature, vol. 464, no. 7291, pp. 1071–1076, 2010.

[6]
F. Zhang, W. Shi, J. Zhang, M. Zeng, M. Li, and L. Kurgan, PROBselect: Accurate prediction of protein-binding residues from proteins sequences via dynamic predictor selection, Bioinformatics, vol. 36, no. Supplement_2, pp. i735–i744, 2020.
[7]

E. Hacisuleyman, L. A. Goff, C. Trapnell, A. Williams, J. Henao-Mejia, L. Sun, P. McClanahan, D. G. Hendrickson, M. Sauvageau, D. R. Kelley, et al., Topological organization of multichromosomal regions by the long intergenic noncoding RNA Firre, Nat. Struct. Mol. Biol., vol. 21, no. 2, pp. 198–206, 2014.

[8]

C. Carrieri, L. Cimatti, M. Biagioli, A. Beugnet, S. Zucchelli, S. Fedele, E. Pesce, I. Ferrer, L. Collavin, C. Santoro, et al., Long non-coding antisense RNA controls Uchl1 translation through an embedded SINEB2 repeat, Nature, vol. 491, no. 7424, pp. 454–457, 2012.

[9]

F. Karreth, M. Reschke, A. Ruocco, C. Ng, B. Chapuy, V. Léopold, M. Sjoberg, T. Keane, A. Verma, U. Ala, et al., The BRAF pseudogene functions as a competitive endogenous RNA and induces lymphoma InVivo, Cell, vol. 161, no. 2, pp. 319–332, 2015.

[10]

D. M. Anderson, K. M. Anderson, C.-L. Chang, C. A. Makarewich, B. R. Nelson, J. R. McAnally, P. Kasaragod, J. M. Shelton, J. Liou, R. Bassel-Duby, et al., A micropeptide encoded by a putative long noncoding RNA regulates muscle performance, Cell, vol. 160, no. 4, pp. 595–606, 2015.

[11]
M. Zeng, C. Lu, F. Zhang, Y. Li, F.-X. Wu, Y. Li, and M. Li, SDLDA: lncRNA-disease association prediction based on singular value decomposition and deep learning, Methods, vol. 179, pp. 73–80, 2020.
[12]

Z. D. Su, Y. Huang, Z. Y. Zhang, Y. W. Zhao, D. Wang, W. Chen, K. C. Chou, and H. Lin, iLoc-lncRNA: Predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC, Bioinformatics, vol. 34, no. 24, pp. 4196–4204, 2018.

[13]

A. Ahmad, H. Lin, and S. Shatabda, Locate-R: Subcellular localization of long non-coding RNAs using nucleotide compositions, Genomics, vol. 112, no. 3, pp. 2583–2589, 2020.

[14]

Z. Y. Zhang, Z. J. Sun, Y. H. Yang, and H. Lin, Towards a better prediction of subcellular location of long non-coding RNA, Front. Comput. Sci., vol. 16, no. 5, p. 165903, 2022.

[15]

Z. Cao, X. Pan, Y. Yang, Y. Huang, and H. B. Shen, The lncLocator: A subcellular localization predictor for long non-coding RNAs based on a stacked ensemble classifier, Bioinformatics, vol. 34, no. 13, pp. 2185–2194, 2018.

[16]
G. H. Yuan, Y. Wang, G. Z. Wang, and L. Yang, RNAlight: A machine learning model to identify nucleotide features determining RNA subcellular localization, Brief. Bioinform., vol. 24, no. 1, p. bbac509, 2023.
[17]
J. Cai, T. Wang, X. Deng, L. Tang, and L. Liu, GM-lncLoc: LncRNAs subcellular localization prediction based on graph neural network with meta-learning, BMC Genom., vol. 24, no. 1, p. 52, 2023.
[18]

B. L. Gudenas and L. Wang, Prediction of LncRNA subcellular localization with deep learning from sequence features, Sci. Rep., vol. 8, no. 1, p. 16385, 2018.

[19]
Y. Fan, M. Chen, and Q. Zhu, lncLocPred: Predicting LncRNA subcellular localization using multiple sequence feature information, IEEE Access, vol. 8, pp. 124702–124711, 2020.
[20]

S. Feng, Y. Liang, W. Du, W. Lv, and Y. Li, LncLocation: Efficient subcellular location prediction of long non-coding RNA-based multi-source heterogeneous feature fusion, Int. J. Mol. Sci., vol. 21, no. 19, p. 7271, 2020.

[21]
Y. J. Jeon, M. M. Hasan, H. W. Park, K. W. Lee, and B. Manavalan, TACOS: A novel approach for accurate prediction of cell-specific long noncoding RNAs subcellular localization, Brief. Bioinform., vol. 23, no. 4, p. bbac243, 2022.
[22]

J. Lyu, P. Zheng, Y. Qi, and G. Huang, LightGBM-LncLoc: A LightGBM-based computational predictor for recognizing long non-coding RNA subcellular localization, Mathematics, vol. 11, no. 3, p. 602, 2023.

[23]
Y. Lin, X. Pan, and H. B. Shen, lncLocator 2.0: A cell-line-specific subcellular localization predictor for long non-coding RNAs with interpretable deep learning, Bioinformatics, vol. 37, no. 16, pp. 2308–2316, 2021.
[24]

M. Zeng, Y. Wu, C. Lu, F. Zhang, F. X. Wu, and M. Li, DeepLncLoc: A deep learning framework for long non-coding RNA subcellular localization prediction based on subsequence embedding, Brief. Bioinform., vol. 23, no. 1, p. bbab360, 2022.

[25]

M. Zeng, Y. Wu, Y. Li, R. Yin, C. Lu, J. Duan, and M. Li, LncLocFormer: A Transformer-based deep learning model for multi-label lncRNA subcellular localization prediction by using localization-specific attention mechanism, Bioinformatics, vol. 39, no. 12, p. btad752, 2023.

[26]

M. Li, B. Zhao, R. Yin, C. Lu, F. Guo, and M. Zeng, GraphLncLoc: Long non-coding RNA subcellular localization prediction using graph convolutional networks based on sequence to graph transformation, Brief. Bioinform., vol. 24, no. 1, p. bbac565, 2023.

[27]

D. Mas-Ponte, J. Carlevaro-Fita, E. Palumbo, T. Hermoso Pulido, R. Guigo, and R. Johnson, LncATLAS database for subcellular localization of long noncoding RNAs, RNA, vol. 23, no. 7, pp. 1080–1087, 2017.

[28]

L. P. B. Bouvrette, N. A. L. Cody, J. Bergalet, F. A. Lefebvre, C. Diot, X. Wang, M. Blanchette, and E. Lécuyer, CeFra-seq reveals broad asymmetric mRNA and noncoding RNA distribution profiles in Drosophila and human cells, RNA, vol. 24, no. 1, pp. 98–113, 2018.

[29]

F. M. Fazal, S. Han, K. R. Parker, P. Kaewsapsak, J. Xu, A. N. Boettiger, H. Y. Chang, and A. Y. Ting, Atlas of subcellular RNA localization revealed by APEX-seq, Cell, vol. 178, no. 2, pp. 473–490.e26, 2019.

[30]
T. Zhang, P. Tan, L. Wang, N. Jin, Y. Li, L. Zhang, H. Yang, Z. Hu, L. Zhang, C. Hu, et al., RNALocate: A resource for RNA subcellular localizations, Nucleic Acids Res., vol. 45, no. D1, pp. D135–D138, 2017.
[31]
Y. Huang, B. Niu, Y. Gao, L. Fu, and W. Li, CD-HIT Suite: A web server for clustering and comparing biological sequences, Bioinformatics, vol. 26, no. 5, pp. 680–682, 2010.
[32]
T. Cui, Y. Dou, P. Tan, Z. Ni, T. Liu, D. Wang, Y. Huang, K. Cai, X. Zhao, D. Xu, et al., RNALocate v2.0: An updated resource for RNA subcellular localization with increased coverage and annotation, Nucleic Acids Res., vol. 50, no. D1, pp. D333–D339, 2022.
[33]

Y. Wu, M. Gao, M. Zeng, J. Zhang, and M. Li, BridgeDPI: A novel Graph Neural Network for predicting drug-protein interactions, Bioinformatics, vol. 38, no. 9, pp. 2571–2578, 2022.

[34]

S. Kan, Y. Cen, Y. Li, M. Vladimir, and Z. He, Local semantic correlation modeling over graph neural networks for deep feature embedding and image retrieval, IEEE Trans. Image Process., vol. 31, pp. 2988–3003, 2022.

[35]

M. Chen, Y. Jiang, X. Lei, Y. Pan, C. Ji, W. Jiang, and H. Xiong, Drug-target interactions prediction based on signed heterogeneous graph neural networks, Chin. J. Electron., vol. 33, no. 1, pp. 231–244, 2024.

[36]

S. Kan, Z. He, Y. Cen, Y. Li, V. Mladenovic, and Z. He, Contrastive Bayesian analysis for deep metric learning, IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 6, pp. 7220–7238, 2023.

[37]
J. Chen, R. Zhang, Y. Mao, and J. Xu, Contrastnet: A contrastive learning framework for few-shot text classification, in Proc. AAAI Conference on Artificial Intelligence, https://doi.org/10.1609/aaai.v36i10.21292, 2023.
[38]

P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot, C. Liu, and D. Krishnan, Supervised contrastive learning, Advances in Neural Information Processing Systems, vol. 33, pp. 18661–18673, 2020.

[39]

S. Chen and C. Geng, A comprehensive perspective of contrastive self-supervised learning, Front. Comput. Sci., vol. 15, no. 4, p. 154332, 2021.

[40]
T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, A next-generation hyperparameter optimization framework, in Proc. 25th ACM SIGKDD International Conference on Knowledge Discovery Data Mining, https://doi.org/10.1145/3292500.3330701, 2023.
[41]
T. L. Bailey, M. Boden, F. A. Buske, M. Frith, C. E. Grant, L. Clementi, J. Ren, W. W. Li, and W. S. Noble, MEME suite: Tools for motif discovery and searching, Nucleic Acids Res., vol. 37, no. suppl_2, pp. W202–W208, 2009.
[42]

Y. Guo, X. Lei, Y. Pan, and R. Su, An encoding-decoding framework based on CNN for circRNA-RBP binding sites prediction, Chin. J. Electron., vol. 33, no. 1, pp. 256–263, 2024.

[43]

Y. Lubelsky and I. Ulitsky, Sequences enriched in Alu repeats drive nuclear localization of long RNAs in human cells, Nature, vol. 555, no. 7694, pp. 107–111, 2018.

[44]

B. Zhang, L. Gunawardane, F. Niazi, F. Jahanbani, X. Chen, and S. Valadkhan, A novel RNA motif mediates the strict nuclear localization of a long noncoding RNA, Mol. Cell. Biol., vol. 34, no. 12, pp. 2318–2329, 2014.

[45]

X. Yang, X. Lei, and J. Zhao, Essential protein prediction based on shuffled frog-leaping algorithm, Chin. J Electronics, vol. 30, no. 4, pp. 704–711, 2021.

[46]

Y. Zhang, X. Lei, Z. Fang, and Y. Pan, CircRNA-disease associations prediction based on metapath2vec++ and matrix factorization, Big Data Mining and Analytics, no. 4, pp. 280–291, 2020.

[47]

Y. Li, M. Zeng, F. Zhang, F. X. Wu, and M. Li, DeepCellEss: Cell line-specific essential protein prediction with attention-based interpretable deep learning, Bioinformatics, vol. 39, no. 1, p. btac779, 2023.

Big Data Mining and Analytics
Pages 765-780
Cite this article:
Li M, Zhao B, Li Y, et al. SGCL-LncLoc: An Interpretable Deep Learning Model for Improving lncRNA Subcellular Localization Prediction with Supervised Graph Contrastive Learning. Big Data Mining and Analytics, 2024, 7(3): 765-780. https://doi.org/10.26599/BDMA.2024.9020002

1200

Views

385

Downloads

1

Crossref

1

Web of Science

1

Scopus

0

CSCD

Altmetrics

Received: 22 November 2023
Revised: 28 December 2023
Accepted: 02 January 2024
Published: 28 August 2024
© The author(s) 2024.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return