Protein Residue Contact Prediction Based on Deep Learning and Massive Statistical Features from Multi-Sequence Alignment

Huiling Zhang; Min Hao; Hao Wu; Hing-Fung Ting; Yihong Tang; Wenhui Xi; Yanjie Wei

doi:10.26599/TST.2021.9010064

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (21.6 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

Protein Residue Contact Prediction Based on Deep Learning and Massive Statistical Features from Multi-Sequence Alignment

Huiling Zhang^{^†}, Min Hao^{^†}, Hao Wu^{^†}, Hing-Fung Ting, Yihong Tang(

), Wenhui Xi(

), Yanjie Wei(

)

Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China

University of Chinese Academy of Sciences, Beijing 100049, China

College of Electronic and Information Engineering, Southwest University, Chongqing 400715, China

School of Software Engineering, University of Science and Technology of China, Hefei 230051, China

Department of Computer Science, The University of Hong Kong, Hong Kong 999077, China

School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China

†Huiling Zhang, Min Hao, and Hao Wu contribute equally to this work.

Show Author Information

Abstract

Sequence-based protein tertiary structure prediction is of fundamental importance because the function of a protein ultimately depends on its 3D structure. An accurate residue-residue contact map is one of the essential elements for current ab initio prediction protocols of 3D structure prediction. Recently, with the combination of deep learning and direct coupling techniques, the performance of residue contact prediction has achieved significant progress. However, a considerable number of current Deep-Learning (DL)-based prediction methods are usually time-consuming, mainly because they rely on different categories of data types and third-party programs. In this research, we transformed the complex biological problem into a pure computational problem through statistics and artificial intelligence. We have accordingly proposed a feature extraction method to obtain various categories of statistical information from only the multi-sequence alignment, followed by training a DL model for residue-residue contact prediction based on the massive statistical information. The proposed method is robust in terms of different test sets, showed high reliability on model confidence score, could obtain high computational efficiency and achieve comparable prediction precisions with DL methods that relying on multi-source inputs.

Keywords

feature extraction Deep Learning (DL)multi-sequence alignment residue-residue contact prediction statistical information high computational efficiency

References

[1]

J. S. Zhang, W. K. Li, M. Zeng, X. M. Meng, L. Kurgan, F. X. Wu, and M. Li, NetEPD: A network-based essential protein discovery platform, Tsinghua Science and Technology, vol. 25, no. 4, pp. 542–552, 2020.

Crossref Google Scholar

[2]

D. S. Marks, T. A. Hopf, and C. Sander, Protein structure prediction from sequence variation, Nat. Biotechnol., vol. 30, no. 11, pp. 1072–1080, 2012.

Crossref Google Scholar

[3]

B. Adhikari, D. Bhattacharya, R. Z. Cao, and J. L. Cheng, CONFOLD: Residue-residue contact-guided ab initio protein folding, Proteins: Struct., Funct., Bioinformatics, vol. 83, no. 8, pp. 1436–1449, 2015.

Crossref Google Scholar

[4]

J. B. Xu, Distance-based protein folding powered by deep learning, Proc. Natl. Acad. Sci. USA, vol. 116, no. 34, pp. 16856–16865, 2019.

Crossref Google Scholar

[5]

A. W. Senior, R. Evans, J. Jumper, J. Kirkpatrick, L. Sifre, T. Green, C. L. Qin, A. Žídek, A. W. R. Nelson, A. Bridgland, H. Penedones, et al., Improved protein structure prediction using potentials from deep learning, Nature, vol. 577, no. 7792, pp. 706–710, 2020.

Crossref Google Scholar

[6]

J. Y. Yang, I. Anishchenko, H. Park, Z. L. Peng, S. Ovchinnikov, and D. Baker, Improved protein structure prediction using predicted interresidue orientations, Proc. Natl. Acad. Sci. USA, vol. 117, no. 3, pp. 1496–1503, 2020.

Crossref Google Scholar

[7]

M. Baek, F. Dimaio, I. Anishchenko, J. Dauparas, S. Ovchinnikov, G. R. Lee, J. Wang, Q. Cong, L. N. Kinch, R. D. Schaeffer, et al., Accurate prediction of protein structures and interactions using a three-track neural network, Science, vol. 373, no. 6557, pp. 871–876, 2021.

Crossref Google Scholar

[8]

A. Raval, S. Piana, M. P. Eastwood, and D. E. Shaw, Assessment of the utility of contact-based restraints in accelerating the prediction of protein structure using molecular dynamics simulations, Protein Sci., vol. 25, no. 1, pp. 19–29, 2016.

Crossref Google Scholar

[9]

E. A. Lubecka and A. Liwo, Introduction of a bounded penalty function in contact-assisted simulations of protein structures to omit false restraints, J. Comput. Chem., vol. 40, no. 25, pp. 2164–2178, 2019.

Crossref Google Scholar

[10]

Q. Cong, I. Anishchenko, S. Ovchinnikov, and D. Baker, Protein interaction networks revealed by proteome coevolution, Science, vol. 365, no. 6449, pp. 185–189, 2019.

Crossref Google Scholar

[11]

D. D. Pollock and W. R. Taylor, Effectiveness of correlation analysis in identifying protein residues undergoing correlated evolution, Protein Eng. Des. Sel., vol. 10, no. 6, pp. 647–657, 1997.

Crossref Google Scholar

[12]

S. D. Dunn, L. M. Wahl, and G. B. Gloor, Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction, Bioinformatics, vol. 24, no. 3, pp. 333–340, 2007.

Crossref Google Scholar

[13]

B. C. Lee and D. Kim, A new method for revealing correlated mutations under the structural and functional constraints in proteins, Bioinformatics, vol. 25, no. 19, pp. 2506–2513, 2009.

Crossref Google Scholar

[14]

R. Rajgaria, S. R. McAllister, and C. A. Floudas, Towards accurate residue-residue hydrophobic contact prediction for α helical proteins via integer linear optimization, Proteins: Struct., Funct., Bioinformatics, vol. 74, no. 4, pp. 929–947, 2009.

Crossref Google Scholar

[15]

R. Rajgaria, Y. Wei, and C. A. Floudas, Contact prediction for beta and alpha-beta proteins using integer linear optimization and its impact on the first principles 3D structure prediction method ASTRO-FOLD, Proteins: Struct., Funct., Bioinformatics, vol. 78, no. 8, pp. 1825–1846, 2010.

Crossref Google Scholar

[16]

J. L. Cheng and P. Baldi, Improved residue contact prediction using support vector machines and a large feature set, BMC Bioinformatics, vol. 8, no. 1, p. 113, 2007.

Crossref Google Scholar

[17]

A. N. Tegge, Z. Wang, J. Eickholt, and J. L. Cheng, NNcon: Improved protein contact map prediction using 2D-recursive neural networks, Nucl. Acids Res., vol. 37, no. S2, pp. W515–W518, 2009.

Crossref Google Scholar

[18]

S. T. Wu and Y. Zhang, A comprehensive assessment of sequence-based and template-based methods for protein contact prediction, Bioinformatics, vol. 24, no. 7, pp. 924–931, 2008.

Crossref Google Scholar

[19]

Z. Y. Wang and J. B. Xu, Predicting protein contact map using evolutionary and physical constraints by integer programming, Bioinformatics, vol. 29, no. 13, pp. i266–i273, 2013.

Crossref Google Scholar

[20]

H. L. Zhang, Q. S. Huang, Z. D. Bei, Y. J. Wei, and C. A. Floudas, COMSAT: Residue contact prediction of transmembrane proteins based on support vector machines and mixed integer linear programming, Proteins: Struct., Funct., Bioinformatics, vol. 84, no. 3, pp. 332–348, 2016.

Crossref Google Scholar

[21]

M. Weigt, R. A. White, H. Szurmant, J. A. Hoch, and T. Hwa, Identification of direct residue contacts in protein-protein interaction by message passing, Proc. Natl. Acad. Sci. USA, vol. 106, no. 1, pp. 67–72, 2009.

Crossref Google Scholar

[22]

D. T. Jones, D. W. A. Buchan, D. Cozzetto, and M. Pontil, PSICOV: Precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments, Bioinformatics, vol. 28, no. 2, pp. 184–90, 2012.

Crossref Google Scholar

[23]

F. Morcos, A. Pagnani, B. Lunt, A. Bertolino, D. S. Marks, C. Sander, R. Zecchina, J. N. Onuchic, T. Hwa, and M. Weigt, Direct-coupling analysis of residue coevolution captures native contacts across many protein families, Proc. Natl. Acad. Sci. USA, vol. 108, no. 49, pp. E1293–E1301, 2011.

Crossref Google Scholar

[24]

C. Baldassi, M. Zamparo, C. Feinauer, A. Procaccini, R. Zecchina, M. Weigt, and A. Pagnani, Fast and accurate multivariate Gaussian modeling of protein families: Predicting residue contacts and protein-interaction partners, PLoS One, vol. 9, no. 3, p. e92721, 2014.

Crossref Google Scholar

[25]

M. Ekeberg, C. Lövkvist, Y. H. Lan, M. Weigt, and E. Aurell, Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models, Phys. Rev.E, vol. 87, no. 1, p. 012707, 2013.

Crossref Google Scholar

[26]

H. Kamisetty, S. Ovchinnikov, and D. Baker, Assessing the utility of coevolution-based residue-residue contact predictions in a sequence-and structure-rich era, Proc. Natl. Acad. Sci. USA, vol. 110, no. 39, pp. 15674–15679, 2013.

Crossref Google Scholar

[27]

S. Seemayer, M. Gruber, and J. Söding, CCMpred-fast and precise prediction of protein residue-residue contacts from correlated mutations, Bioinformatics, vol. 30, no. 21, pp. 3128–3130, 2014.

Crossref Google Scholar

[28]

M. J. Skwark, A. Abdel-Rehim, and A. Elofsson, PconsC: Combination of direct information methods and alignments improves contact prediction, Bioinformatics, vol. 29, no. 14, pp. 1815–1816, 2013.

Crossref Google Scholar

[29]

D. T. Jones, T. Singh, T. Kosciolek, and S. Tetchner., MetaPSICOV: Combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins, Bioinformatics, vol. 31, no. 7, pp. 999–1006, 2015.

Crossref Google Scholar

[30]

B. He, S. M. Mortuza, Y. T. Wang, H. B. Shen, and Y. Zhang, NeBcon: Protein contact map prediction using neural network training coupled with naïve Bayes classifiers, Bioinformatics, vol. 33, no. 15, pp. 2296–2306, 2017.

Crossref Google Scholar

[31]

D. T. Jones and S. M. Kandathil, High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features, Bioinformatics, vol. 34, no. 19, pp. 3308–3315, 2018.

Crossref Google Scholar

[32]

S. Wang, S. Q. Sun, Z. Li, R. Y. Zhang, and J. B. Xu, Accurate de novo prediction of protein contact map by ultra-deep learning model, PLoS Comput. Biol., vol. 13, no. 1, p. e1005324, 2017.

Crossref Google Scholar

[33]

W. Z. Ding, W. Z. Mao, D. Shao, W. X. Zhang, and H. P. Gong, DeepConPred2: An improved method for the prediction of protein residue contacts, Comput. Struct. Biotechnol. J., vol. 16. pp. 503–510, 2018.

Crossref Google Scholar

[34]

B. Adhikari, J. Hou, and J. L. Cheng, DNCON2: Improved protein contact prediction using two-level deep convolutional neural networks, Bioinformatics, vol. 34, no. 9, pp. 1466–1472, 2018.

Crossref Google Scholar

[35]

B. Adhikari, DEEPCON: Protein contact prediction using dilated convolutional neural networks with dropout, Bioinformatics, vol. 36, no. 2, pp. 470–477, 2020.

Crossref Google Scholar

[36]

J. Hanson, K. Paliwal, T. Litfin, Y. D. Yang, and Y. Q. Zhou, Accurate prediction of protein contact maps by coupling residual two-dimensional bidirectional long short-term memory with convolutional neural networks, Bioinformatics, vol. 34, no. 23, pp. 4039–4045, 2018.

Crossref Google Scholar

[37]

Q. Wu, Z. L. Peng, I. Anishchenko, Q. Cong, D. Baker, and J. Y. Yang, Protein contact prediction using metagenome sequence data and residual neural networks, Bioinformatics, vol. 36, no. 1, pp. 41–48, 2020.

Crossref Google Scholar

[38]

A. Lo, Y. Y. Chiu, E. A. Rødland, P. C. Lyu, T. Y. Sung, and W. L. Hsu, Predicting helix-helix interactions from residue contacts in membrane proteins, Bioinformatics, vol. 25, no. 8, pp. 996–1003, 2009.

Crossref Google Scholar

[39]

T. Nugent and D. T. Jones, Predicting transmembrane helix packing arrangements using residue contacts and a force-directed algorithm, PLoS Comput. Biol., vol. 6, no. 3, p. e1000714, 2010.

Crossref Google Scholar

[40]

H. L. Zhang, Z. D. Bei, W. H. Xi, M. Hao, Z. Ju, K. M. Saravanan, H. P. Zhang, N. Guo, and Y. J. Wei, Evaluation of residue-residue contact prediction methods: From retrospective to prospective, PLoS Comput. Biol., vol. 17, no. 5, p. e1009027, 2021.

Crossref Google Scholar

[41]

D. Kozma, I. Simon, and G. E. Tusnády, PDBTM: Protein data bank of transmembrane proteins after 8 years, Nucl. Acids Res., vol. 41, no. D1, pp. D524–D529, 2013.

Crossref Google Scholar

[42]

Y. Zhang, J. W. T. Chan, F. Y. L. Chin, H. F. Ting, D. S. Ye, F. Zhang, and J. Y. Shi, Constrained pairwise and center-star sequences alignment problems, J. Comb. Optim., vol. 32, no. 1, pp. 79–94, 2016.

Crossref Google Scholar

[43]

W. T. Chan, Y. Zhang, S. P. Y. Fung, D. S. Ye, and H. Zhu, Efficient algorithms for finding a longest common increasing subsequence, J. Comb. Optim., vol. 13, no. 3, pp. 277–288, 2007.

Crossref Google Scholar

[44]

C. X. Zhang, W. Zheng, S. M. Mortuza, Y. Li, and Y. Zhang, DeepMSA: Constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins, Bioinformatics, vol. 36, no. 7, pp. 2105–2112, 2020.

Crossref Google Scholar

[45]

A. J. Hockenberry and C. O. Wilke, Evolutionary couplings detect side-chain interactions, PeerJ, vol. 7, p. e7280, 2019.

Crossref Google Scholar

[46]

M. Chonofsky, S. H. P. De Oliveira, K. Krawczyk, and C. M. Deane, The evolution of contact prediction: Evidence that contact selection in statistical contact prediction is changing, Bioinformatics, vol. 36, no. 6, pp. 1750–1756, 2020.

Crossref Google Scholar

Tsinghua Science and Technology

Volume 27 Issue 5,
October 2022

Pages 843-854

DOI: 10.26599/TST.2021.9010064

Cite this article:

Zhang H, Hao M, Wu H, et al. Protein Residue Contact Prediction Based on Deep Learning and Massive Statistical Features from Multi-Sequence Alignment. Tsinghua Science and Technology, 2022, 27(5): 843-854. https://doi.org/10.26599/TST.2021.9010064

1949

Views

211

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 20 July 2021

Revised: 17 August 2021

Accepted: 20 August 2021

Published: 17 March 2022

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).