Journal Home > Volume 25 , Issue 4

Proteins drive virtually all cellular-level processes. The proteins that are critical to cell proliferation and survival are defined as essential. These essential proteins are implicated in key metabolic and regulatory networks, and are important in the context of rational drug design efforts. The computational identification of the essential proteins benefits from the proliferation of publicly available protein interaction datasets. Scientists have developed several algorithms that use these interaction datasets to predict essential proteins. However, a comprehensive web platform that facilitates the analysis and prediction of essential proteins is missing. In this study, we design, implement, and release NetEPD: a network-based essential protein discovery platform. This resource integrates data on Protein-Protein Interaction (PPI) networks, gene expression, subcellular localization, and a native set of essential proteins. It also computes a variety of node centrality measures, evaluates the predictions of essential proteins, and visualizes PPI networks. This comprehensive platform functions by implementing four activities, which include the collection of datasets, computation of centrality measures, evaluation, and visualization. The results produced by NetEPD are visualized on its website, and sent to a user-provided email, and they are available to download in a parsable format. This platform is freely available at http://bioinformatics.csu.edu.cn/netepd.


menu
Abstract
Full text
Outline
About this article

NetEPD: A Network-Based Essential Protein Discovery Platform

Show Author's information Jiashuai ZhangWenkai LiMin ZengXiangmao MengLukasz KurganFang-Xiang WuMin Li( )
School of Computer Science and Engineering, Central South University, Changsha 410083, China.
Department of Computer Science, Virginia Common-wealth University, Richmond, VA 23284-2512, USA.
Division of Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, S7N 5A9, Canada.

Abstract

Proteins drive virtually all cellular-level processes. The proteins that are critical to cell proliferation and survival are defined as essential. These essential proteins are implicated in key metabolic and regulatory networks, and are important in the context of rational drug design efforts. The computational identification of the essential proteins benefits from the proliferation of publicly available protein interaction datasets. Scientists have developed several algorithms that use these interaction datasets to predict essential proteins. However, a comprehensive web platform that facilitates the analysis and prediction of essential proteins is missing. In this study, we design, implement, and release NetEPD: a network-based essential protein discovery platform. This resource integrates data on Protein-Protein Interaction (PPI) networks, gene expression, subcellular localization, and a native set of essential proteins. It also computes a variety of node centrality measures, evaluates the predictions of essential proteins, and visualizes PPI networks. This comprehensive platform functions by implementing four activities, which include the collection of datasets, computation of centrality measures, evaluation, and visualization. The results produced by NetEPD are visualized on its website, and sent to a user-provided email, and they are available to download in a parsable format. This platform is freely available at http://bioinformatics.csu.edu.cn/netepd.

Keywords: visualization, essential proteins, centrality, data integration, evaluation

References(76)

[1]
S. J. Wodak, J. Vlasblom, A. L. Turinsky, and S. Y. Pu, Protein-protein interaction networks: The puzzling riches, Current Opinion in Structural Biology, vol. 23, no. 6, pp. 941-953, 2013.
[2]
P. Uetz, L. Giot, G. Cagney, T. A. Mansfield, R. S. Judson, J. R. Knight, D. Lockshon, V. Narayan, M. Srinivasan, P. Pochart, et al., A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae, Nature, vol. 403, no. 6770, pp. 623-627, 2000.
[3]
X. Peng, J. Wang, W. Peng, F. X. Wu, and Y. Pan, Protein-protein interactions: Detection, reliability assessment and applications, Brief. Bioinform., vol. 18, no. 5, pp. 798-819, 2017.
[4]
A. Buntru, P. Trepte, K. Klockmeier, S. Schnoegl, and E. E. Wanker, Current approaches toward quantitative mapping of the interactome, Front. Genet., vol. 7, p. 74, 2016.
[5]
A. L. Barabási and Z. N. Oltvai, Network biology: Understanding the cell’s functional organization, Nature Reviews Genetics, vol. 5, no. 2, pp. 101-113, 2004.
[6]
E. A. Winzeler, D. D. Shoemaker, A. Astromoff, H. Liang, K. Anderson, B. Andre, R. Bangham, R. Benito, J. D. Boeke, H. Bussey, et al., Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis, Science, vol. 285, no. 5429, pp. 901-906, 1999.
[7]
N. Vishveshwara, M. E. Bradley, and S. W. Liebman, Sequestration of essential proteins causes prion associated toxicity in yeast, Mol. Microbiol., vol. 73, no. 6, pp. 1101-1114, 2009.
[8]
N. Judson and J. J. Mekalanos, TnAraOut, a transposon-based approach to identify and characterize essential bacterial genes, Nature Biotechnology, vol. 18, no. 7, pp. 740-745, 2000.
[9]
G. Lamichhane, M. Zignol, N. J. Blades, D. E. Geiman, A. Dougherty, J. Grosset, K. W. Broman, and W. R. Bishai, A postgenomic method for predicting essential genes at subsaturation levels of mutagenesis: Application to Mycobacterium tuberculosis, Proceedings of the National Academy of Sciences of the United States of America, vol. 100, no. 12, pp. 7213-7218, 2003.
[10]
C. G. Zhang, Essential functions of iron-requiring proteins in DNA replication, repair and cell cycle control, Protein & Cell, vol. 5, no. 10, pp. 750-760, 2014.
[11]
F. H. Zhang, H. Song, M. Zeng, Y. H. Li, L. Kurgan, and M. Li, DeepFunc: A deep learning framework for accurate prediction of protein functions from protein sequences and interactions, Proteomics, .
[12]
H. Jeong, S. P. Mason, A. L. Barabási, and Z. N. Oltvai, Lethality and centrality in protein networks, Nature, vol. 411, no. 6833, pp. 41-42, 2001.
[13]
J. X. Wang, M. Li, H. Wang, and Y. Pan, Identification of essential proteins based on edge clustering coefficient, IEEE/ACM Transactions on Computational Biology & Bioinformatics, vol. 9, no. 4, pp. 1070-1080, 2012.
[14]
X. Y. Li, W. K. Li, M. Zeng, R. Q. Zheng, and M. Li, Network-based methods for predicting essential genes or proteins: A survey, Briefings in Bioinformatics, .
[15]
G. S. Li, M. Li, J. X. Wang, Y. H. Li, and Y. Pan, United neighborhood closeness centrality and orthology for predicting essential proteins, IEEE/ACM Transactions on Computational Biology and Bioinformatics, .
[16]
X. R. Liu, Z. Y. Hong, J. Liu, Y. Lin, A. Rodríguez-Patón, Q. Zou, and X. X. Zeng, Computational methods for identifying the critical nodes in biological networks, Briefings in Bioinformatics, .
[17]
G. S. Li, M. Li, W. Peng, Y. H. Li, Y. Pan, and J. X. Wang, A novel extended Pareto optimality consensus model for predicting essential proteins, J. Theor. Biol., vol. 480, pp. 141-149, 2019.
[18]
X. J. Lei, J. Zhao, H. Fujita, and A. D. Zhang, Predicting essential proteins based on RNA-Seq, subcellular localization and GO annotation datasets, Knowledge-Based Systems, vol. 151, pp. 136-148, 2018.
[19]
W. Kim, Prediction of essential proteins using topological properties in GO-pruned PPI network based on machine learning methods, Tsinghua Science and Technology, vol. 17, no. 6, pp. 645-658, 2012.
[20]
W. Peng, J. X. Wang, Y. J. Cheng, Y. Lu, F. X. Wu, and Y. Pan, UDoNC: An algorithm for identifying essential proteins based on protein domains and protein-protein interaction networks, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 12, no. 2, pp. 276-288, 2015.
[21]
Y. T. Fan, X. H. Hu, X. W. Tang, Q. Ping, and W. Wu, A novel algorithm for identifying essential proteins by integrating subcellular localization, in Proc. 2016 IEEE Int. Conf. Bioinformatics and Biomedicine, Shenzhen, China, 2016, pp. 107-110.
DOI
[22]
B. H. Zhao, J. X. Wang, X. Y. Li, and F. X. Wu, Essential protein discovery based on a combination of modularity and conservatism, Methods, vol. 110, pp. 54-63, 2016.
[23]
M. Li, Z. B. Niu, X. P. Chen, P. Zhong, F. X. Wu, and Y. Pan, A reliable neighbor-based method for identifying essential proteins by integrating gene expressions, orthology, and subcellular localization information, Tsinghua Science and Technology, vol. 21, no. 6, pp. 668-677, 2016.
[24]
J. W. Luo and Y. Qi, Identification of essential proteins based on a new combination of local interaction density and protein complexes, PLoS One, vol. 10, no. 6, p. e0131418, 2015.
[25]
M. Magrane and UniProt Consortium, UniProt knowledgebase: A hub of integrated protein data, Database, vol. 2011, p. bar009, 2011.
[26]
A. Chatr-Aryamontri, R. Oughtred, L. Boucher, J. Rust, C. Chang, N. K. Kolas, L. O’Donnell, S. Oster, C. Theesfeld, A. Sellam, et al., The BioGRID interaction database: 2017 update, Nucleic Acids Research, vol. 45, no. D1, pp. D369-D379, 2017.
[27]
I. Xenarios, E. Fernandez, L. Salwinski, X. J. Duan, M. J. Thompson, E. M. Marcotte, and D. Eisenberg, DIP: The database of interacting proteins: 2001 update, Nucleic Acids Research, vol. 29, no. 1, pp. 239-241, 2001.
[28]
H. W. Mewes, D. Frishman, U. Güldener, G. Mannhaupt, K. Mayer, M. Mokrejs, B. Morgenstern, M. Münsterkötter, S. Rudd, and B. Weil, MIPS: A database for genomes and protein sequences, Nucleic Acids Research, vol. 30, no. 1, pp. 31-34, 2002.
[29]
A. Chatr-Aryamontri, A. Ceol, L. M. Palazzi, G. Nardelli, M. V. Schneider, L. Castagnoli, and G. Cesareni, MINT: The molecular interaction database, Nucleic Acids Research, vol. 35, no. suppl. 1, pp. D572-D574, 2007.
[30]
T. Barrett, D. B. Troup, S. E. Wilhite, P. Ledoux, D. Rudnev, C. Evangelista, I. F. Kim, A. Soboleva, M. Tomashevsky, and R. Edgar, NCBI GEO: Mining tens of millions of expression profiles-database and tools update, Nucleic Acids Research, vol. 35, no. suppl. 1, pp. D760-D765, 2007.
[31]
B. P. Tu, A. Kudlicki, M. Rowicka, and S. L. McKnight, Logic of the yeast metabolic cycle: Temporal compartmentalization of cellular processes, Science, vol. 310, no. 5751, pp. 1152-1158, 2005.
[32]
J. X. Binder, S. Pletscher-Frankild, K. Tsafou, C. Stolte, S. I. O’Donoghue, R. Schneider, and L. J. Jensen, COMPARTMENTS: Unification and visualization of protein subcellular localization evidence, Database, vol. 2014, p. bau012, 2014.
[33]
H. Luo, Y. Lin, F. Gao, C. T. Zhang, and R. Zhang, DEG 10, an update of the database of essential genes that includes both protein-coding genes and noncoding genomic elements, Nucleic Acids Research, vol. 42, no. D1, pp. D574-D580, 2014.
[34]
S. Narayanan, The betweenness centrality of biological networks, Master dissertation, Virginia Tech, CV, USA, 2005.
[35]
P. Bonacich, Power and centrality: A family of measures, American Journal of Sociology, vol. 92, no. 5, pp. 1170-1182, 1987.
[36]
M. Li, J. X. Wang, X. Chen, H. Wang, and Y. Pan, A local average connectivity-based method for identifying essential proteins from the network level, Computational Biology & Chemistry, vol. 35, no. 3, pp. 143-150, 2011.
[37]
A. Korn, A. Schubert, and A. Telcs, Lobby index in networks, Physica A: Statistical Mechanics and its Applications, vol. 388, no. 11, pp. 2221-2226, 2009.
[38]
P. Hage and F. Harary, Eccentricity and centrality in networks, Social Networks, vol. 17, no. 1, pp. 57-63, 1995.
[39]
S. Maslov and K. Sneppen, Specificity and stability in topology of protein networks, Science, vol. 296, no. 5569, pp. 910-913, 2002.
[40]
T. Y. Nie, Z. Guo, K. Zhao, and Z. M. Lu, Using mapping entropy to identify node centrality in complex networks, Physica A: Statistical Mechanics and its Applications, vol. 453, pp. 290-297, 2016.
[41]
S. Nanda and D. Kotz, Localized bridging centrality for distributed network analysis, presented at the 17th Int. Conf. Computer Communications and Networks, St. Thomas, US Virgin Islands, USA, 2008.
DOI
[42]
N. Meghanathan, A computationally lightweight and localized centrality metric in lieu of betweenness centrality for complex network analysis, Vietnam Journal of Computer Science, vol. 4, no. 1, pp. 23-38, 2017.
[43]
E. Ernesto and J. A. Rodríguez-Velázquez, Subgraph centrality in complex networks, Phys. Rev. E, vol. 71, no. 5, p. 056103, 2005.
[44]
S. B. Yu, L. Gao, Y. F. Wang, G. Gao, C. C. Zhou, and Z. Y. Gao, Weighted H-index for identifying influential spreaders, arXiv preprint arXiv: 1710.05272, 2017.
[45]
C. Y. Lin, C. H. Chin, H. H. Wu, S. H. Chen, C. W. Ho, and M. T. Ko, Hubba: Hub objects analyzer-a framework of interactome hubs identification for network biology, Nucleic Acids Research, vol. 36, no. suppl. 2, pp. W438-W443, 2008.
[46]
S. J. Kim and S. H. Lee, An improved computation of the PageRank algorithm, in Advances in Information Retrieval, F. Crestani, M. Girolami, and C. J. van Rijsbergen, eds. Berlin, Germany: Springer, 2002, pp. 73-85.
[47]
L. Y. Lü, Y. C. Zhang, C. H. Yeung, and T. Zhou, Leaders in social networks, the delicious case, PLoS One, vol. 6, no. 6, p. e21202, 2011.
[48]
M. Kitsak, L. K. Gallos, S. Havlin, F. Liljeros, L. Muchnik, H. E. Stanley, and H. A. Makse, Identification of influential spreaders in complex networks, Nature Physics, vol. 6, no. 11, pp. 888-893, 2010.
[49]
A. Zeng and C. J. Zhang, Ranking spreaders by decomposing complex networks, Physics Letters A, vol. 377, no. 14, pp. 1031-1035, 2013.
[50]
K. Stephenson and M. Zelen, Rethinking centrality: Methods and examples, Social Networks, vol. 11, no. 1, pp. 1-37, 1989.
[51]
Q. H. Xiao, J. X. Wang, X. Q. Peng, F. X. Wu, and Y. Pan, Identifying essential proteins from active PPI networks constructed with dynamic gene expression, BMC Genomics, vol. 16, no. S3, p. S1, 2015.
[52]
X. Q. Peng, J. X. Wang, J. Wang, F. X. Wu, and Y. Pan, Rechecking the centrality-lethality rule in the scope of protein subcellular localization interaction networks, PLoS One, vol. 10, no. 6, p. e0130743, 2015.
[53]
M. Li, J. Yang, F. X. Wu, Y. Pan, and J. Wang, DyNetViewer: A Cytoscape app for dynamic network construction, analysis and visualization, Bioinformatics, vol. 34, no. 9, pp. 1597-1599, 2018.
[54]
J. X. Wang, X. Q. Peng, M. Li, and Y. Pan, Construction and application of dynamic protein interaction network based on time course gene expression data, Proteomics, vol. 13, no. 2, pp. 301-312, 2013.
[55]
M. Li, X. M. Meng, R. Q. Zheng, F. X. Wu, Y. H. Li, Y. Pan, and J. X. Wang, Identification of protein complexes by using a spatial and temporal active protein interaction network, IEEE/ACM Transactions on Computational Biology and Bioinformatics, .
[56]
M. Li, W. K. Li, F. X. Wu, Y. Pan, and J. X. Wang, Identifying essential proteins based on sub-network partition and prioritization by integrating subcellular localization information, Journal of Theoretical Biology, vol. 447, pp. 65-73, 2018.
[57]
P. Shannon, A. Markiel, O. Ozier, N. S. Baliga, J. T. Wang, D. Ramage, N. Amin, B. Schwikowski, and T. Ideker, Cytoscape: A software environment for integrated models of biomolecular interaction networks, Genome Research, vol. 13, no. 11, pp. 2498-2504, 2003.
[58]
Y. Tang, M. Li, J. X. Wang, Y. Pan, and F. X. Wu, CytoNCA: A Cytoscape plugin for centrality analysis and evaluation of protein interaction networks, Biosystems, vol. 127, pp. 67-72, 2015.
[59]
M. Jalili, A. Salehzadeh-Yazdi, Y. Asgari, S. S. Arab, M. Yaghmaie, A. Ghavamzadeh, and K. Alimoghaddam, CentiServer: A comprehensive resource, web-based application and R package for centrality analysis, PLoS One, vol. 10, no. 11, p. e0143111, 2015.
[60]
G. Scardoni, M. Petterlini, and C. Laudanna, Analyzing biological network parameters with CentiScaPe, Bioinformatics, vol. 25, no. 21, pp. 2857-2859, 2009.
[61]
Y. Assenov, F. Ramírez, S. E. Schelhorn, T. Lengauer, and M. Albrecht, Computing topological parameters of biological networks, Bioinformatics, vol. 24, no. 2, pp. 282-284, 2008.
[62]
A. A. Hagberg, D. A. Schult, and P. J. Swart, Exploring network structure, dynamics, and function using NetworkX, presented at the 7th Python in Science Conf., Pasadena, CA, USA, 2008.
[63]
J. X. Wang, H. L. Cao, J. Z. H. Zhang, and Y. F. Qi, Computational protein design with deep learning neural networks, Scientific Reports, vol. 8, no. 1, p. 6349, 2018.
[64]
A. S. Rifaioglu, H. Atas, M. J. Martin, R. Cetin- Atalay, V. Atalay, and T. Doǧan, Recent applications of deep learning and machine intelligence on in silico drug discovery: Methods, tools and databases, Brief. Bioinform., .
[65]
C. S. Cao, F. Liu, H. Tan, D. S. Song, W. J. Shu, W. Z. Li, Y. M. Zhou, X. C. Bo, and Z. Xie, Deep learning and its applications in biomedicine, Genomics, Proteomics & Bioinformatics, vol. 16, no. 1, pp. 17-32, 2018.
[66]
M. Zeng, F. H. Zhang, F. X. Wu, Y. H. Li, J. X. Wang, and M. Li, Protein-protein interaction site prediction through combining local and global features with deep neural networks, Bioinformatics, .
[67]
M. Zeng, M. Li, Z. H. Fei, F. X. Wu, Y. H. Li, Y. Pan, and J. X. Wang, A deep learning framework for identifying essential proteins by integrating multiple types of biological information, IEEE/ACM Transactions on Computational Biology and Bioinformatics, .
[68]
R. Q. Zheng, M. Li, X. Chen, F. X. Wu, Y. Pan, and J. X. Wang, BiXGBoost: A scalable, flexible boosting based method for reconstructing gene regulatory networks, Bioinformatics, vol. 35, no. 11, pp. 1893-1900, 2019.
[69]
X. Chen, M. Li, R. Q. Zheng, S. Y. Zhao, F. X. Wu, Y. H. Li, and J. X. Wang, A novel method of gene regulatory network structure inference from gene knock-out expression data, Tsinghua Science and Technology, vol. 24, no. 4, pp. 446-455, 2019.
[70]
R. Q. Zheng, M. Li, X. Chen, S. Y. Zhao, F. X. Wu, Y. Pan, and J. X. Wang, An ensemble method to reconstruct gene regulatory networks based on multivariate adaptive regression splines, IEEE/ACM Transactions on Computational Biology and Bioinformatics, .
[71]
M. Li, R. Q. Zheng, Y. H. Li, F. X. Wu, and J. X. Wang, MGT-SM: A method for constructing cellular signal transduction networks, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 16, no. 2, pp. 417-424, 2019.
[72]
Y. X. Hu, C. H. Chen, Y. Y. Ding, X. Wen, B. B. Wang, L. Gao, and K. Tan, Optimal control nodes in disease-perturbed networks as targets for combination therapy, Nature Communications, vol. 10, no. 1, p. 2180, 2019.
[73]
M. Li, H. Gao, J. X. Wang, and F. X. Wu, Control principles for complex biological networks, Briefings in Bioinformatics, .
[74]
T. S. Keshava Prasad, R. Goel, K. Kandasamy, S. Keerthikumar, S. Kumar, S. Mathivanan, D. Telikicherla, R. Raju, B. Shafreen, A. Venugopal, et al., Human protein reference database-2009 update, Nucleic Acids Research, vol. 37, no. suppl. 1, pp. D767-D772, 2009.
[75]
D. Szklarczyk, A. Franceschini, M. Kuhn, M. Simonovic, A. Roth, P. Minguez, T. Doerks, M. Stark, J. Muller, P. Bork, et al., The STRING database in 2011: Functional interaction networks of proteins, globally integrated and scored, Nucleic Acids Research, vol. 39, no. suppl. 1, pp. D561-D568, 2011.
[76]
S. Kerrien, B. Aranda, L. Breuza, A. Bridge, F. Broackes- Carter, C. Chen, M. Duesbury, M. Dumousseau, M. Feuermann, U. Hinz, et al., The IntAct molecular interaction database in 2012, Nucleic Acids Research, vol. 40, no. D1, pp. D841-D846, 2012.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 11 September 2019
Accepted: 16 September 2019
Published: 13 January 2020
Issue date: August 2020

Copyright

© The author(s) 2020

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Nos. 61832019, 61622213, and 61728211) and the 111 Project (No. B18059).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return