Journal Home > Volume 22 , Issue 1

Proteins interact with each other to form protein complexes, and cell functionality depends on both protein interactions and these complexes. Based on the assumption that protein complexes are highly connected and correspond to the dense regions in Protein-protein Interaction Networks (PINs), many methods have been proposed to identify the dense regions in PINs. Because protein complexes may be formed by proteins with similar properties, such as topological and functional properties, in this paper, we propose a protein complex identification framework (KCluster). In KCluster, a PIN is divided into K subnetworks using a K-means algorithm, and each subnetwork comprises proteins of similar degrees. We adopt a strategy based on the expected number of common neighbors to detect the protein complexes in each subnetwork. Moreover, we identify the protein complexes spanning two subnetworks by combining closely linked protein complexes from different subnetworks. Finally, we refine the predicted protein complexes using protein subcellular localization information. We apply KCluster and nine existing methods to identify protein complexes from a highly reliable yeast PIN. The results show that KCluster achieves higher Sn and Sp values and f-measures than other nine methods. Furthermore, the number of perfect matches predicted by KCluster is significantly higher than that of other nine methods.


menu
Abstract
Full text
Outline
About this article

Framework to Identify Protein Complexes Based on Similarity Preclustering

Show Author's information Xiaoqing PengXiaodong YanJianxin Wang( )
School of Information Science and Engineering, Central South University, Changsha 410083, China.

Abstract

Proteins interact with each other to form protein complexes, and cell functionality depends on both protein interactions and these complexes. Based on the assumption that protein complexes are highly connected and correspond to the dense regions in Protein-protein Interaction Networks (PINs), many methods have been proposed to identify the dense regions in PINs. Because protein complexes may be formed by proteins with similar properties, such as topological and functional properties, in this paper, we propose a protein complex identification framework (KCluster). In KCluster, a PIN is divided into K subnetworks using a K-means algorithm, and each subnetwork comprises proteins of similar degrees. We adopt a strategy based on the expected number of common neighbors to detect the protein complexes in each subnetwork. Moreover, we identify the protein complexes spanning two subnetworks by combining closely linked protein complexes from different subnetworks. Finally, we refine the predicted protein complexes using protein subcellular localization information. We apply KCluster and nine existing methods to identify protein complexes from a highly reliable yeast PIN. The results show that KCluster achieves higher Sn and Sp values and f-measures than other nine methods. Furthermore, the number of perfect matches predicted by KCluster is significantly higher than that of other nine methods.

Keywords: protein complex, similarity preclustering, protein-protein interaction networks, K-means

References(40)

[1]
Peng W., Li M., Chen L., and Wang L., Predicting protein functions by using unbalanced random walk algorithm on three biological networks, IEEE/ACM Transactions on Computational Biology and Bioinformatics, .
[2]
Bader G. D. and Hogue C. W., An automated method for finding molecular complexes in large protein interaction networks, BMC Bioinformatics, vol. 4, no. 1, pp. 2-28, 2003.
[3]
Spirin V. and Mirny L. A., Protein complexes and functional modules in molecular networks, Proceedings of the National Academy of Sciences, vol. 100, no. 21, pp. 12123-12128, 2003.
[4]
Palla G., Derényi I., Farkas I., and Vicsek T., Uncovering the overlapping community structure of complex networks in nature and society, Nature, vol. 435, no. 7043, pp. 814-818, 2005.
[5]
Adamcsek B., Palla G., Farkas I. J., Derényi I., and Vicsek T., CFinder: Locating cliques and overlapping modules in biological networks, Bioinformatics, vol. 22, no. 8, pp. 1021-1023, 2006.
[6]
Liu G., Wong L., and Chua H. N., Complex discovery from weighted ppi networks, Bioinformatics, vol. 25, no. 15, pp. 1891-1897, 2009.
[7]
Wang J., Liu B., Li M., and Pan Y., Identifying protein complexes from interaction networks based on clique percolation and distance restriction, BMC Genomics, vol. 11, no. Suppl 2, p. S10, 2010.
[8]
Altaf-Ul-Amin M., Shinbo Y., Mihara K., Kurokawa K., and Kanaya S., Development and implementation of an algorithm for detection of protein complexes in large interaction networks, BMC Bioinformatics, vol. 7, no. 1, pp. 207-219, 2006.
[9]
Li M., Chen J., Wang J., Hu B., and Chen G., Modifying the DPClus algorithm for identifying protein complexes based on new topological structures, BMC Bioinformatics, vol. 9, no. 1, pp. 398-413, 2008.
[10]
Nepusz T., Yu H., and Paccanaro A., Detecting overlapping protein complexes in protein-protein interaction networks, Nature Methods, vol. 9, no. 5, pp. 471-472, 2012.
[11]
Van Dongen S., Graph clustering by flow simulation, Ph.D. dissertation, University of Utrecht, The Netherlands, 2000.
[12]
Enright A. J., Van Dongen S., and Ouzounis C. A., An efficient algorithm for large-scale detection of protein families, Nucleic Acids Research, vol. 30, no. 7, pp. 1575-1584, 2002.
[13]
Macropol K., Can T., and Singh A. K., RRW: Repeated random walks on genome-scale protein networks for local cluster discovery, BMC Bioinformatics, vol. 10, no. 1, pp. 283-292, 2009.
[14]
Peng W., Wang J., Zhao B., and Wang L., Identification of protein complexes using weighted PageRank-Nibble algorithm and core-attachment structure, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 12, no. 1, pp. 179-192, 2015.
[15]
Zhao B., Wang J., Li M., and Wu F. X., Detecting protein complexes based on uncertain graph model, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 11, no. 3, pp. 486-497, 2014.
[16]
Wang J., Zhong J., Chen G., Li M., Wu F.-X., and Pan Y., Clusterviz: A cytoscape app for cluster analysis of biological network, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 12, no. 4, pp. 815-822, 2015.
[17]
Li M., Wang J., and Chen J. E., A fast agglomerate algorithm for mining functional modules in protein interaction networks, in 2008 International Conference on BioMedical Engineering and Informatics, vol. 1, pp. 3-7, 2008.
DOI
[18]
Shen H., Cheng X., Cai K., and Hu M.-B., Detect overlapping and hierarchical community structure in networks, Physica A: Statistical Mechanics and its Applications, vol. 388, no. 8, pp. 1706-1712, 2009.
[19]
Girvan M. and Newman M. E., Community structure in social and biological networks, Proceedings of the National Academy of Sciences, vol. 99, no. 12, pp. 7821-7826, 2002.
[20]
Luo F., Yang Y., Chen C.-F., Chang R., Zhou J., and Scheuermann R. H., Modular organization of protein interaction networks, Bioinformatics, vol. 23, no. 2, pp. 207-214, 2007.
[21]
Pržulj N., Wigle D. A., and Jurisica I., Functional topology in a network of protein interactions, Bioinformatics, vol. 20, no. 3, pp. 340-348, 2004.
[22]
Wang J., Li M., Chen J., and Pan Y., A fast hierarchical clustering algorithm for functional modules discovery in protein interaction networks, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. 3, pp. 607-620, 2011.
[23]
Gavin A.-C., Aloy P., Grandi P., Krause R., Boesche M., Marzioch M., Rau C., Jensen L. J., Bastuck S., Dümpelfeld B., et al., Proteome survey reveals modularity of the yeast cell machinery, Nature, vol. 440, no. 7084, pp. 631-636, 2006.
[24]
Leung H. C., Xiang Q., Yiu S. M., and Chin F. Y., Predicting protein complexes from ppi data: A core-attachment approach, Journal of Computational Biology, vol. 16, no. 2, pp. 133-144, 2009.
[25]
Wu M., Li X., Kwoh C.-K., and Ng S.-K., A core-attachment based method to detect protein complexes in ppi networks, BMC Bioinformatics, vol. 10, no. 1, p. 169, 2009.
[26]
Li M., Wu X., Wang J., and Pan Y., Towards the identification of protein complexes and functional modules by integrating PPI network and gene expression data, BMC Bioinformatics, vol. 13, no. 1, pp. 109-113, 2012.
[27]
Tang X., Wang J., Liu B., Li M., Chen G., and Pan Y., A comparison of the functional modules identified from time course and static PPI network data, BMC Bioinformatics, vol. 12, no. 1, pp. 339-353, 2011.
[28]
Wang J., Peng X., Li M., and Pan Y., Construction and application of dynamic protein interaction network based on time course gene expression data, Proteomics, vol. 13, no. 2, pp. 301-312, 2013.
[29]
Wang J., Peng X., Xiao Q., Li M., and Pan Y., An effective method for refining predicted protein complexes based on protein activity and the mechanism of protein complex formation, BMC Systems Biology, vol. 7, no. 1, pp. 28-39, 2013.
[30]
MacQueen J., Some methods for classification and analysis of multivariate observations, in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1967, pp. 281-297.
[31]
Hutchins J. R., Toyoda Y., Hegemann B., Poser I., Hériché J.- K., Sykora M. M., Augsburg M., Hudecz O., Buschhorn B. A., Bulkescher J., et al., Systematic analysis of human protein complexes identifies chromosome segregation proteins, Science, vol. 328, no. 5978, pp. 593-599, 2010.
[32]
Binder J. X., Pletscher-Frankild S., Tsafou K., Stolte C., O’Donoghue S. I., Schneider R., and Jensen L. J., COMPARTMENTS: Unification and visualization of protein subcellular localization evidence, Database, 2014, .
[33]
Chou K. C. and Cai Y. D., Using functional domain composition and support vector machines for prediction of protein subcellular location, Journal of Biological Chemistry, vol. 277, no. 48, pp. 45765-45769, 2002.
[34]
Yong C. H., Liu G., Chua H. N., and Wong L., Supervised maximum-likelihood weighting of composite protein networks for complex prediction, BMC Systems Biology, vol. 6, no. Suppl 2, p. S13, 2012.
[35]
Stark C., Breitkreutz B. J., Chatr-Aryamontri A., Boucher L., Oughtred R., Livstone M. S., Nixon J., Van Auken K., Wang X., Shi X., et al., The BioGRID interaction database: 2011 update, Nucleic Acids Research, vol. 39, no. Suppl 1, pp. D698-D704, 2011.
[36]
Kerrien S., Aranda B., Breuza L., Bridge A., Broackes-Carter F., Chen C., Duesbury M., Dumousseau M., Feuermann M., Hinz U., et al., The IntAct molecular interaction database in 2012, Nucleic Acids Research, vol. 40, no. D1, pp. D841-D846, 2011.
[37]
Licata L., Briganti L., Peluso D., Perfetto L., Iannuccelli M., Galeota E., Sacco F., Palma A., Nardozza A. P., Santonico E., et al., MINT, the molecular interaction database: 2012 update, Nucleic Acids Research, vol. 40, no. D1, pp. D857-D861, 2012.
[38]
Szklarczyk D., Franceschini A., Kuhn M., Simonovic M., Roth A., Minguez P., Doerks T., Stark M., Muller J., Bork P., et al., The STRING database in 2011: Functional interaction networks of proteins, globally integrated and scored, Nucleic Acids Research, vol. 39, no. Suppl 1, pp. D561-D568, 2011.
[39]
Pu S., Wong J., Turner B., Cho E., and Wodak S. J., Up-to-date catalogues of yeast protein complexes, Nucleic Acids Research, vol. 37, no. 3, pp. 825-831, 2009.
[40]
Wang J., Li M., Wang H., and Pan Y., Identification of essential proteins based on edge clustering coefficient, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 4, pp. 1070-1080, 2012.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 16 November 2015
Revised: 28 January 2016
Accepted: 03 February 2016
Published: 26 January 2017
Issue date: February 2017

Copyright

© The author(s) 2017

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Nos. 61232001, 61379108, and 61472133).

Rights and permissions

Return