Journal Home > Volume 6 , Issue 1

The identification of hepatitis C virus (HCV) virus-human protein interactions will not only help us understand the molecular mechanisms of related diseases but also be conductive to discovering new drug targets. An increasing number of clinically and experimentally validated interactions between HCV and human proteins have been documented in public databases, facilitating studies based on computational methods. In this study, we proposed a new computational approach, rotation forest position-specific scoring matrix (RF-PSSM), to predict the interactions among HCV and human proteins. In particular, PSSM was used to characterize each protein, two-dimensional principal component analysis (2DPCA) was then adopted for feature extraction of PSSM. Finally, rotation forest (RF) was used to implement classification. The results of various ablation experiments show that on independent datasets, the accuracy and area under curve (AUC) value of RF-PSSM can reach 93.74% and 94.29%, respectively, outperforming almost all cutting-edge research. In addition, we used RF-PSSM to predict 9 human proteins that may interact with HCV protein E1, which can provide theoretical guidance for future experimental studies.


menu
Abstract
Full text
Outline
Electronic supplementary material
About this article

RF-PSSM: A Combination of Rotation Forest Algorithm and Position-Specific Scoring Matrix for Improved Prediction of Protein-Protein Interactions Between Hepatitis C Virus and Human

Show Author's information Xin Liu1( )Yaping Lu2Liang Wang3( )Wei Geng1Xinyi Shi4Xiao Zhang1( )
School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou 221000, China
College of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
Laboratory Medicine, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou 510080, China
Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310005, China

Abstract

The identification of hepatitis C virus (HCV) virus-human protein interactions will not only help us understand the molecular mechanisms of related diseases but also be conductive to discovering new drug targets. An increasing number of clinically and experimentally validated interactions between HCV and human proteins have been documented in public databases, facilitating studies based on computational methods. In this study, we proposed a new computational approach, rotation forest position-specific scoring matrix (RF-PSSM), to predict the interactions among HCV and human proteins. In particular, PSSM was used to characterize each protein, two-dimensional principal component analysis (2DPCA) was then adopted for feature extraction of PSSM. Finally, rotation forest (RF) was used to implement classification. The results of various ablation experiments show that on independent datasets, the accuracy and area under curve (AUC) value of RF-PSSM can reach 93.74% and 94.29%, respectively, outperforming almost all cutting-edge research. In addition, we used RF-PSSM to predict 9 human proteins that may interact with HCV protein E1, which can provide theoretical guidance for future experimental studies.

Keywords: protein-protein interactions, hepatitis C virus, position specific scoring matrix, two-dimensional principal component analysis, rotation forest

References(49)

[1]
A. E. Jordan, D. C. Perlman, J. Reed, D. J. Smith, and H. Hagan,Patterns and gaps identified in a systematic review of the hepatitis C virus care continuum in studies among people who use drugs, Front. Public Health, vol. 5, p. 348, 2017.
[2]
R. Rashti, S. M. Alavian, Y. Moradi, H. Sharafi, A. Mohamadi Bolbanabad, D. Roshani, and G. Moradi, Global prevalence of HCV and/or HBV coinfections among people who inject drugs and female sex workers who live with HIV/AIDS: A systematic review and meta-analysis, Arch. Virol., vol. 165, no. 9, pp. 1947–1958, 2020.
[3]
R. Ansumana, S. Keitell, G. M. T. Roberts, F. Ntoumi, E. Petersen, G. Ippolito, and A. Zumla, Impact of infectious disease epidemics on tuberculosis diagnostic, management, and prevention services: Experiences and lessons from the 2014–2015 Ebola virus disease outbreak in West Africa,Int. J. Infect. Dis., vol. 56, pp. 101–104, 2017.
[4]
R. V. Thurber, J. P. Payet, A. R. Thurber, and A. M. S. Correa, Virus-host interactions and their roles in coral reef health and disease,Nat. Rev. Microbiol., vol. 15, no. 4, pp. 205–216, 2017.
[5]
A. F. Brito and J. W. Pinney, Protein-protein interactions in virus-host systems, Front. Microbiol., vol. 8, p. 1557, 2017.
[6]
C. K. Lai, K. S. Jeng, K. Machida, and M. M. C. Lai, Association of hepatitis C virus replication complexes with microtubules and actinfilaments is dependent on the interaction of NS3 and NS5A, J. Virol., vol. 82, no. 17, pp. 8838–8848, 2008.
[7]
M. A. Ansari, V. Pedergnana, C. L. C. Ip, A. Magri, A. Von Delft, D. Bonsall, N. Chaturvedi, I. Bartha, D. Smith, G. Nicholson, et al., Genome-to-genome analysis highlights the effect of the human innate and adaptive immune systems on the hepatitis C virus, Nat. Genet., vol. 49, no. 5, pp. 666–673, 2017.
[8]
M. Dimitrova, I. Imbert, M. P. Kieny, and C. Schuster, Protein-protein interactions between hepatitis C virus nonstructural proteins, J. Virol., vol. 77, no. 9, pp. 5401–5414, 2003.
[9]
M. Irshad, P. Gupta, and K. Irshad, Molecular basis of hepatocellular carcinoma induced by hepatitis C virus infection, World J. Hepatol., vol. 9, no. 36, pp. 1305–1314, 2017.
[10]
F. E. Eid, M. ElHefnawi, and L. S. Heath, DeNovo: Virus-host sequence-based protein-protein interaction prediction, Bioinformatics, vol. 32, no. 8, pp. 1144–1150, 2016.
[11]
A. Zhang, L. He, and Y. Wang, Prediction of GCRV virus-host protein interactome based on structural motif-domain interactions, BMC Bioinformatics, vol. 18, no. 1, p. 145, 2017.
[12]
Z. H. You, K. C. C. Chan, and P. Hu, Predicting protein-protein interactions from primary protein sequences using a novel multi-scale local feature representation scheme and the random forest, PLoS One, vol. 10, no. 5, p. e0125811, 2015.
[13]
J. Zahiri, O. Yaghoubi, M. Mohammad-Noori, R. Ebrahimpour, and A. Masoudi-Nejad, PPIevo: Protein-protein interaction prediction from PSSM based evolutionary information, Genomics, vol. 102, no. 4, pp. 237–242, 2013.
[14]
S. Mika and B. Rost, Protein-protein interactions more conserved within species than across species, PLoS Comput. Biol., vol. 2, no. 7, p. e79, 2006.
[15]
G. Cui, C. Fang, and K. Han, Prediction of protein-protein interactions between viruses and human by an SVM model, BMC Bioinformatics, vol. 13, no. 7S, p. S5, 2012.
[16]
A. Emamjomeh, B. Goliaei, J. Zahiri, and R. Ebrahimpour, Predicting protein-protein interactions between human and hepatitis C virus via an ensemble learning method, Mol. Biosyst., vol. 10, no. 12, pp. 3147–3154, 2014.
[17]
R. K. Barman, S. Saha, and S. Das, Prediction of interactions between viral and host proteins using supervised machine learning methods, PLoS One, vol. 9,no. 11, p. e112034, 2014.
[18]
B. Kim, S. Alguwaizani, X. Zhou, D. S. Huang, B. Park, and K. Han, An improved method for predicting interactions between virus and human proteins, J. Bioinform. Comput. Biol., vol. 15, no. 1, p. 1650024, 2017.
[19]
S. Alguwaizani, B. Park, X. Zhou, D. S. Huang, and K. Han, Predicting interactions between virus and host proteins using repeat patterns and composition of amino acids, J. Healthc. Eng., vol. 2018, p. 1391265, 2018.
[20]
P. Domingos, The role of Occam’s razor in knowledge discovery, Data Min. Knowl. Discov., vol. 3, no. 4, pp. 409–425, 1999.
[21]
J. Wang, B. Yang, J. Revote, A. Leier, T. T. Marquez-Lago, G. Webb, J. Song, K. C. Chou, and T. Lithgow, POSSUM: A bioinformatics toolkit for generating numerical sequence feature descriptors based on PSSM profiles,Bioinformatics, vol. 33, no. 17, pp. 2756–2758, 2017.
[22]
Y. Wang, Y. Ding, F. Guo, L. Wei, and J. Tang, Improved detection of DNA-binding proteins via compression technology on PSSM information, PLoS One, vol. 12, no. 9, p. e0185587, 2017.
[23]
Z. Li, P. Han, Z. H. You, X. Li, Y. Zhang, H. Yu, R. Nie, and X. Chen, In silico prediction of drug-target interaction networks based on drug chemical structure and protein sequences, Sci. Rep., vol. 7, no. 1, p. 11174, 2017.
[24]
C. Huang and J. Yuan, Using radial basis function on the general form of Chou’s pseudo amino acid composition and PSSM to predict subcellular locations of proteins with both single and multiple sites, Biosystems, vol. 113, no. 1, pp. 50–57, 2013.
[25]
S. Ding, Y. Li, Z. Shi, and S. Yan, A protein structural classes prediction method based on predicted secondary structure and PSI-BLAST profile, Biochimie, vol. 97, pp. 60–65, 2014.
[26]
Y. O. J. Hong, S. V. Chintapalli, K. D. Ko, G. Bhardwaj, Z. Zhang, D. Van Rossum, and R. L. Patterson, Predicting protein folds with fold-specific PSSM libraries, PLoS One, vol. 6, no. 6, p. e20557, 2011.
[27]
P. R. Wills, D. J. Scott, and D. J. Winzor, The osmotic second virial coefficient for protein self-interaction: Use and misuse to describe thermodynamic nonideality, Anal. Biochem., vol. 490, pp. 55–65, 2015.
[28]
T. Guirimand, S. Delmotte, and V. Navratil, VirHostNet 2.0: Surfing on the web of virus/host molecular interactions data, Nucleic Acids Res., vol. 43, pp. D583–D587, 2015.
[29]
N. Q. Khanh Le, Q. H. Nguyen, X. Chen, S. Rahardja, and B. P. Nguyen, Classification of adaptor proteins using recurrent neural networks and PSSM profiles, BMC Genomics, vol. 20, p. 966, 2019.
[30]
S. F. Altschul and E. V. Koonin, Iterated profile searches with PSI-BLAST–A tool for discovery in protein databases, Trends Biochem. Sci., vol. 23, no. 11, pp. 444–447, 1998.
[31]
P. J. A. Cock, J. M. Chilton, B. Grüning, J. E. Johnson, and N. Soranzo, NCBI BLAST+ integrated into Galaxy, GigaScience, vol. 4, p. 39, 2015.
[32]
Y. A. Huang, Z. H. You, X. Gao, L. Wong, and L. Wang, Using weighted sparse representation model combined with discrete cosine transformation to predict protein-protein interactions from protein sequence, Biomed Res. Int., vol. 2015, p. 902198, 2015.
[33]
J. Yang, D. Zhang, A. F. Frangi, and J. Y. Yang, Two-dimensional PCA: Anew approach to appearance-based face representation and recognition, IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 1, pp. 131–137, 2004.
[34]
J. J. Hu, G. Z. Tan, F. G. Luan, and A. S. M. Libda, 2DPCA versus PCA for face recognition, J. Cent. South Univ., vol. 22, no. 5, pp. 1809–1816, 2015.
[35]
J. Yang and J. Y. Yang, From image vector to matrix: A straightforward image projection technique-IMPCA vs. PCA, Pattern Recogn., vol. 35, no. 9, pp. 1997–1999, 2002.
[36]
Z. Li, R. Nie, Z. You, C. Cao, and J. Li, Using discriminative vector machine model with 2DPCA to predict interactions among proteins, BMC Bioinformatics, vol. 20, p. 694, 2019.
[37]
J. J. Rodriguez, L. I. Kuncheva, and C. J. Alonso, Rotation forest: A new classifier ensemble method, IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 10, pp. 1619–1630, 2006.
[38]
K. H. Liu and D. S. Huang, Cancer classification using Rotation Forest,Comput. Biol. Med., vol. 38, no. 5, pp. 601–610, 2008.
[39]
L. Wong, Z. H. You, Z. Ming, J. Li, X. Chen, and Y. A. Huang, Detection of interactions between proteins through rotation forest and local phase quantization descriptors, Int. J. Mol. Sci., vol. 17, no. 1, p. 1, 2016.
[40]
W. S. Noble, What is a support vector machine? Nat. Biotechnol., vol. 24, no. 12, pp. 1565–1567, 2006.
[41]
K. Duan, S. S. Keerthi, and A. N. Poo, Evaluation of simple performance measures for tuning SVM hyperparameters, Neurocomputing, vol. 51, pp. 41–59, 2003.
[42]
M. Mousavizadegan and H. Mohabatkar, Computational prediction of antifungal peptides via Chou’s PseAAC and SVM, J. Bioinform. Comput. Biol., vol. 16, no. 4, p. 1850016, 2018.
[43]
J. Zhou, L. Li, L. Wang, X. Li, H. Xing, and L. Cheng, Establishment of a SVM classifier to predict recurrence of ovarian cancer, Mol. Med. Rep., vol. 18, no. 4, pp. 3589–3598, 2018.
[44]
S. Huang, N. Cai, P. P. Pacheco, S. Narrandes, Y. Wang, and W. Xu, Applications of support vector machine (SVM) learning in cancer genomics,Cancer Genomics Proteomics, vol. 15, no. 1, pp. 41–51, 2018.
[45]
Y. Ge, S. Zhao, and X. Zhao, A step-by-step classification algorithm of protein secondary structures based on double-layer SVM model, Genomics, vol. 112, no. 2, pp. 1941–1946, 2020.
[46]
C. C. Chang and C. J. Lin, LIBSVM: A library for support vector machines, ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, p. 27, 2011.
[47]
X. Dong, Z. Yu, W. Cao, Y. Shi, and Q. Ma, A survey on ensemble learning, Front. Comput. Sci., vol. 14, no. 2, pp. 241–258, 2020.
[48]
B. Liu, F. Liu, X. Wang, J. Chen, L. Fang, and K. C. Chou, Pse-in-One: A web server for generating various modes of pseudo components of DNA, RNA, and protein sequences, Nucleic Acids Res., vol. 43, no. W1, pp. W65–W71, 2015.
[49]
T. Zhong, Z. Li, Z. H. You, R. Nie, and H. Zhao, Predicting miRNA-disease associations based on graph random propagation network and attention network, Brief Bioinform., vol. 23, no. 2, p. bbab589, 2022.
File
21-31EMS.pdf (100.8 KB)
Publication history
Copyright
Rights and permissions

Publication history

Received: 04 July 2022
Revised: 22 August 2022
Accepted: 30 August 2022
Published: 24 November 2022
Issue date: March 2023

Copyright

© The author(s) 2023.

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return