Framework to Identify Protein Complexes Based on Similarity Preclustering

Xiaoqing Peng; Xiaodong Yan; Jianxin Wang

doi:10.1109/TST.2017.7830894

Tsinghua Science and Technology 2017, 22(1): 42-51 https://doi.org/10.1109/TST.2017.7830894

Open Access | Issue | Published: 26 January 2017

Framework to Identify Protein Complexes Based on Similarity Preclustering

Show Author's Information Hide Author's Information Xiaoqing Peng, Xiaodong Yan, Jianxin Wang(

)

School of Information Science and Engineering, Central South University, Changsha 410083, China.

Keywords:

protein complex, similarity preclustering, protein-protein interaction networks, K-means

Cite this article:

Peng X, Yan X, Wang J. Framework to Identify Protein Complexes Based on Similarity Preclustering. Tsinghua Science and Technology, 2017, 22(1): 42-51. https://doi.org/10.1109/TST.2017.7830894

Download citation

EndNote(RIS)

BibTeX

462

Views

Downloads

Citations

Crossref

N/A

WoS

Scopus

CSCD

Abstract Full text About this article

Abstract

Proteins interact with each other to form protein complexes, and cell functionality depends on both protein interactions and these complexes. Based on the assumption that protein complexes are highly connected and correspond to the dense regions in Protein-protein Interaction Networks (PINs), many methods have been proposed to identify the dense regions in PINs. Because protein complexes may be formed by proteins with similar properties, such as topological and functional properties, in this paper, we propose a protein complex identification framework (KCluster). In KCluster, a PIN is divided into K subnetworks using a K-means algorithm, and each subnetwork comprises proteins of similar degrees. We adopt a strategy based on the expected number of common neighbors to detect the protein complexes in each subnetwork. Moreover, we identify the protein complexes spanning two subnetworks by combining closely linked protein complexes from different subnetworks. Finally, we refine the predicted protein complexes using protein subcellular localization information. We apply KCluster and nine existing methods to identify protein complexes from a highly reliable yeast PIN. The results show that KCluster achieves higher Sn and Sp values and f-measures than other nine methods. Furthermore, the number of perfect matches predicted by KCluster is significantly higher than that of other nine methods.

Full text

Abstract

Full text

Outline

About this article

Framework to Identify Protein Complexes Based on Similarity Preclustering

Show Author's information Hide Author's Information Xiaoqing Peng, Xiaodong Yan, Jianxin Wang(

)

School of Information Science and Engineering, Central South University, Changsha 410083, China.

Abstract

Keywords: protein complex, similarity preclustering, protein-protein interaction networks, K-means

References(40)

[1]

Peng W., Li M., Chen L., and Wang L., Predicting protein functions by using unbalanced random walk algorithm on three biological networks, IEEE/ACM Transactions on Computational Biology and Bioinformatics, .

DOI Google Scholar

[2]

Bader G. D. and Hogue C. W., An automated method for finding molecular complexes in large protein interaction networks, BMC Bioinformatics, vol. 4, no. 1, pp. 2-28, 2003.

DOI Google Scholar

[3]

Spirin V. and Mirny L. A., Protein complexes and functional modules in molecular networks, Proceedings of the National Academy of Sciences, vol. 100, no. 21, pp. 12123-12128, 2003.

DOI Google Scholar

[4]

Palla G., Derényi I., Farkas I., and Vicsek T., Uncovering the overlapping community structure of complex networks in nature and society, Nature, vol. 435, no. 7043, pp. 814-818, 2005.

DOI Google Scholar

[5]

Adamcsek B., Palla G., Farkas I. J., Derényi I., and Vicsek T., CFinder: Locating cliques and overlapping modules in biological networks, Bioinformatics, vol. 22, no. 8, pp. 1021-1023, 2006.

DOI Google Scholar

[6]

Liu G., Wong L., and Chua H. N., Complex discovery from weighted ppi networks, Bioinformatics, vol. 25, no. 15, pp. 1891-1897, 2009.

DOI Google Scholar

[7]

Wang J., Liu B., Li M., and Pan Y., Identifying protein complexes from interaction networks based on clique percolation and distance restriction, BMC Genomics, vol. 11, no. Suppl 2, p. S10, 2010.

DOI Google Scholar

[8]

Altaf-Ul-Amin M., Shinbo Y., Mihara K., Kurokawa K., and Kanaya S., Development and implementation of an algorithm for detection of protein complexes in large interaction networks, BMC Bioinformatics, vol. 7, no. 1, pp. 207-219, 2006.

DOI Google Scholar

[9]

Li M., Chen J., Wang J., Hu B., and Chen G., Modifying the DPClus algorithm for identifying protein complexes based on new topological structures, BMC Bioinformatics, vol. 9, no. 1, pp. 398-413, 2008.

DOI Google Scholar

[10]

Nepusz T., Yu H., and Paccanaro A., Detecting overlapping protein complexes in protein-protein interaction networks, Nature Methods, vol. 9, no. 5, pp. 471-472, 2012.

DOI Google Scholar

[11]

Van Dongen S., Graph clustering by flow simulation, Ph.D. dissertation, University of Utrecht, The Netherlands, 2000.

[12]

Enright A. J., Van Dongen S., and Ouzounis C. A., An efficient algorithm for large-scale detection of protein families, Nucleic Acids Research, vol. 30, no. 7, pp. 1575-1584, 2002.

DOI Google Scholar

[13]

Macropol K., Can T., and Singh A. K., RRW: Repeated random walks on genome-scale protein networks for local cluster discovery, BMC Bioinformatics, vol. 10, no. 1, pp. 283-292, 2009.

DOI Google Scholar

[14]

Peng W., Wang J., Zhao B., and Wang L., Identification of protein complexes using weighted PageRank-Nibble algorithm and core-attachment structure, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 12, no. 1, pp. 179-192, 2015.

DOI Google Scholar

[15]

Zhao B., Wang J., Li M., and Wu F. X., Detecting protein complexes based on uncertain graph model, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 11, no. 3, pp. 486-497, 2014.

DOI Google Scholar

[16]

Wang J., Zhong J., Chen G., Li M., Wu F.-X., and Pan Y., Clusterviz: A cytoscape app for cluster analysis of biological network, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 12, no. 4, pp. 815-822, 2015.

DOI Google Scholar

[17]

Li M., Wang J., and Chen J. E., A fast agglomerate algorithm for mining functional modules in protein interaction networks, in 2008 International Conference on BioMedical Engineering and Informatics, vol. 1, pp. 3-7, 2008.

DOI

[18]

Shen H., Cheng X., Cai K., and Hu M.-B., Detect overlapping and hierarchical community structure in networks, Physica A: Statistical Mechanics and its Applications, vol. 388, no. 8, pp. 1706-1712, 2009.

DOI Google Scholar

[19]

Girvan M. and Newman M. E., Community structure in social and biological networks, Proceedings of the National Academy of Sciences, vol. 99, no. 12, pp. 7821-7826, 2002.

DOI Google Scholar

[20]

Luo F., Yang Y., Chen C.-F., Chang R., Zhou J., and Scheuermann R. H., Modular organization of protein interaction networks, Bioinformatics, vol. 23, no. 2, pp. 207-214, 2007.

DOI Google Scholar

[21]

Pržulj N., Wigle D. A., and Jurisica I., Functional topology in a network of protein interactions, Bioinformatics, vol. 20, no. 3, pp. 340-348, 2004.

DOI Google Scholar

[22]

Wang J., Li M., Chen J., and Pan Y., A fast hierarchical clustering algorithm for functional modules discovery in protein interaction networks, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. 3, pp. 607-620, 2011.

DOI Google Scholar

[23]

Gavin A.-C., Aloy P., Grandi P., Krause R., Boesche M., Marzioch M., Rau C., Jensen L. J., Bastuck S., Dümpelfeld B., et al., Proteome survey reveals modularity of the yeast cell machinery, Nature, vol. 440, no. 7084, pp. 631-636, 2006.

DOI Google Scholar

[24]

Leung H. C., Xiang Q., Yiu S. M., and Chin F. Y., Predicting protein complexes from ppi data: A core-attachment approach, Journal of Computational Biology, vol. 16, no. 2, pp. 133-144, 2009.

DOI Google Scholar

[25]

Wu M., Li X., Kwoh C.-K., and Ng S.-K., A core-attachment based method to detect protein complexes in ppi networks, BMC Bioinformatics, vol. 10, no. 1, p. 169, 2009.

DOI Google Scholar

[26]

Li M., Wu X., Wang J., and Pan Y., Towards the identification of protein complexes and functional modules by integrating PPI network and gene expression data, BMC Bioinformatics, vol. 13, no. 1, pp. 109-113, 2012.

DOI Google Scholar

[27]

Tang X., Wang J., Liu B., Li M., Chen G., and Pan Y., A comparison of the functional modules identified from time course and static PPI network data, BMC Bioinformatics, vol. 12, no. 1, pp. 339-353, 2011.

DOI Google Scholar

[28]

Wang J., Peng X., Li M., and Pan Y., Construction and application of dynamic protein interaction network based on time course gene expression data, Proteomics, vol. 13, no. 2, pp. 301-312, 2013.

DOI Google Scholar

[29]

Wang J., Peng X., Xiao Q., Li M., and Pan Y., An effective method for refining predicted protein complexes based on protein activity and the mechanism of protein complex formation, BMC Systems Biology, vol. 7, no. 1, pp. 28-39, 2013.

DOI Google Scholar

[30]

MacQueen J., Some methods for classification and analysis of multivariate observations, in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1967, pp. 281-297.

[31]

Hutchins J. R., Toyoda Y., Hegemann B., Poser I., Hériché J.- K., Sykora M. M., Augsburg M., Hudecz O., Buschhorn B. A., Bulkescher J., et al., Systematic analysis of human protein complexes identifies chromosome segregation proteins, Science, vol. 328, no. 5978, pp. 593-599, 2010.

DOI Google Scholar

[32]

Binder J. X., Pletscher-Frankild S., Tsafou K., Stolte C., O’Donoghue S. I., Schneider R., and Jensen L. J., COMPARTMENTS: Unification and visualization of protein subcellular localization evidence, Database, 2014, .

DOI Google Scholar

[33]

Chou K. C. and Cai Y. D., Using functional domain composition and support vector machines for prediction of protein subcellular location, Journal of Biological Chemistry, vol. 277, no. 48, pp. 45765-45769, 2002.

DOI Google Scholar

[34]

Yong C. H., Liu G., Chua H. N., and Wong L., Supervised maximum-likelihood weighting of composite protein networks for complex prediction, BMC Systems Biology, vol. 6, no. Suppl 2, p. S13, 2012.

DOI Google Scholar

[35]

Stark C., Breitkreutz B. J., Chatr-Aryamontri A., Boucher L., Oughtred R., Livstone M. S., Nixon J., Van Auken K., Wang X., Shi X., et al., The BioGRID interaction database: 2011 update, Nucleic Acids Research, vol. 39, no. Suppl 1, pp. D698-D704, 2011.

DOI Google Scholar

[36]

Kerrien S., Aranda B., Breuza L., Bridge A., Broackes-Carter F., Chen C., Duesbury M., Dumousseau M., Feuermann M., Hinz U., et al., The IntAct molecular interaction database in 2012, Nucleic Acids Research, vol. 40, no. D1, pp. D841-D846, 2011.

DOI Google Scholar

[37]

Licata L., Briganti L., Peluso D., Perfetto L., Iannuccelli M., Galeota E., Sacco F., Palma A., Nardozza A. P., Santonico E., et al., MINT, the molecular interaction database: 2012 update, Nucleic Acids Research, vol. 40, no. D1, pp. D857-D861, 2012.

DOI Google Scholar

[38]

Szklarczyk D., Franceschini A., Kuhn M., Simonovic M., Roth A., Minguez P., Doerks T., Stark M., Muller J., Bork P., et al., The STRING database in 2011: Functional interaction networks of proteins, globally integrated and scored, Nucleic Acids Research, vol. 39, no. Suppl 1, pp. D561-D568, 2011.

DOI Google Scholar

[39]

Pu S., Wong J., Turner B., Cho E., and Wodak S. J., Up-to-date catalogues of yeast protein complexes, Nucleic Acids Research, vol. 37, no. 3, pp. 825-831, 2009.

DOI Google Scholar

[40]

Wang J., Li M., Wang H., and Pan Y., Identification of essential proteins based on edge clustering coefficient, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 4, pp. 1070-1080, 2012.

DOI Google Scholar

About this article

Publication history

Acknowledgements

Rights and permissions

Publication history

Received: 16 November 2015

Revised: 28 January 2016

Accepted: 03 February 2016

Published: 26 January 2017

Issue date: February 2017

Copyright

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Nos. 61232001, 61379108, and 61472133).