Journal Home > Volume 18 , Issue 5

Network motif is defined as a frequent and unique subgraph pattern in a network, and the search involves counting all the possible instances or listing all patterns, testing isomorphism known as NP-hard and large amounts of repeated processes for statistical evaluation. Although many efficient algorithms have been introduced, exhaustive search methods are still infeasible and feasible approximation methods are yet implausible. Additionally, the fast and continual growth of biological networks makes the problem more challenging. As a consequence, parallel algorithms have been developed and distributed computing has been tested in the cloud computing environment as well. In this paper, we survey current algorithms for network motif detection and existing software tools. Then, we show that some methods have been utilized for parallel network motif search algorithms with static or dynamic load balancing techniques. With the advent of cloud computing services, network motif search has been implemented with MapReduce in Hadoop Distributed File System (HDFS), and with Storm, but without statistical testing. In this paper, we survey network motif search algorithms in general, including existing parallel methods as well as cloud computing based search, and show the promising potentials for the cloud computing based motif search methods.


menu
Abstract
Full text
Outline
About this article

Network Motif Detection: Algorithms, Parallel and Cloud Computing, and Related Tools

Show Author's information Wooyoung Kim( )Martin DikoKeith Rawson
Department of Computing and Software Systems, University of Washington Bothell, Bothell, WA 98011, USA

Abstract

Network motif is defined as a frequent and unique subgraph pattern in a network, and the search involves counting all the possible instances or listing all patterns, testing isomorphism known as NP-hard and large amounts of repeated processes for statistical evaluation. Although many efficient algorithms have been introduced, exhaustive search methods are still infeasible and feasible approximation methods are yet implausible. Additionally, the fast and continual growth of biological networks makes the problem more challenging. As a consequence, parallel algorithms have been developed and distributed computing has been tested in the cloud computing environment as well. In this paper, we survey current algorithms for network motif detection and existing software tools. Then, we show that some methods have been utilized for parallel network motif search algorithms with static or dynamic load balancing techniques. With the advent of cloud computing services, network motif search has been implemented with MapReduce in Hadoop Distributed File System (HDFS), and with Storm, but without statistical testing. In this paper, we survey network motif search algorithms in general, including existing parallel methods as well as cloud computing based search, and show the promising potentials for the cloud computing based motif search methods.

Keywords: MapReduce, network motif, parallel search, HDFS, storm

References(59)

[1]
B. H.Junkerand F.Schreiber, Analysis of Biological Networks. Wiley, 2008.
[2]
R.Milo, S.Shen-Orr, S.Itzkovitz, N.Kashtan, D.Chklovskii, and U.Alon, Network motifs: Simple building blocks of complex networks, Science, vol. 298, no. 5594, pp. 824-827, 2002.
[3]
R.Milo, N.Kashtan, S.Itzkovitz, M. E. J.Newman, and U.Alon, On the uniform generation of random graphs with prescribed degree sequences, eprint arXiv:condmat/0312028, December 2003.
[4]
N.Przulj, D. G.Corneil, and I.Jurisica, Modeling interactome: Scale-free or geometric? Bioinformatics, vol. 20, no. 18, pp. 3508-3515, 2004.
[5]
M.Middendorf, E.Ziv, and C. H.Wiggins, Inferring network mechanisms: The Drosophila melanogaster protein interaction network, Proceedings of the National Academy of Sciences of the United States of America, vol. 102, no. 9, pp. 3192-3197, 2005.
DOI
[6]
I.Albertand R.Albert, Conserved network motifs allow protein-protein interaction prediction, Bioinformatics, vol. 20, no. 18, pp. 3346-3352, 2004.
[7]
Y.Zhang, J.Xuan, B. G.de los Reyes, R.Clarke, and H. W.Ressom, Network motif-based identification of breast cancer susceptibility genes, in 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, IEEE, August2008, pp. 5696-5699.
[8]
J. A.Grochowand M.Kellis, Network motif discovery using subgraph enumeration and symmetrybreaking, in Proceedings of the 11th Annual International Conference on Research in Computational Molecular Biology, RECOMB’07, Springer-Verlag, 2007, pp. 92-106.
DOI
[9]
M.Schatz, E.Cooper-Balis, and A.Bazinet, Parallel network motif finding, Technical report, University of Maryland Insitute for Advanced Computer Studies, 2008.
[10]
P.Ribeiro, F.Silva, and L.Lopes, Parallel discovery of network motifs, Journal of Parallel and Distributed Computing, vol. 7, no. 2, pp. 144-154, 2012.
[11]
S.Wernicke, Efficient detection of network motifs, IEEE/ACM Trans. Comput. Biol. Bioinformatics, vol. 3, no. 4, pp. 347-359, 2006.
[12]
Y.Liu, X.Jiang, H.Chen, J.Ma, and X.Zhang, Mapreduce-based pattern finding algorithm applied in motif detection for prescription compatibility network, in Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies, APPT ’09, Springer-Verlag, 2009, pp. 341-355.
DOI
[13]
Storm: Distributed and fault-tolerant realtime computation, http://storm-project.net/, 2013.
[14]
M.Harrigan, P.Cunningham, and L.Coyle, Using storm to perform dynamic egocentric network motif analysis, in Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on, 2012, pp. 408-415.
DOI
[15]
E.Wong, B.Baur, S.Quader, and C.-H.Huang, Biological network motif detection: Principles and practice, Briefings in Bioinformatics, 2011.
[16]
S. S.Shen-Orr, R.Milo, S.Mangan, and U.Alon, Network motifs in the transcriptional regulation network of Escherichia coli, Nat. Genet., vol. 31, no. 1, pp. 64-68, 2002.
[17]
S.Maslov, K.Sneppen, and U.Alon, Correlation profiles and motifs in complex networks, in Handbook of Graphs and Networks. Wiley-VCH, 2003.
[18]
G.Cirielloand C.Guerra, A review on models and algorithms for motif discovery in protein-protein interaction networks, Brief Funct. Genomic Proteomic, vol. 7, no. 2, pp. 147-156, 2008.
[19]
B.McKay, Practical graph isomorphism, Congr. Numer., vol. 30, pp. 45-87, 1981.
[20]
N.Kashtan, S.Itzkovitz, R.Milo, and U.Alon, Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs, Bioinformatics, vol. 20, no. 11, pp. 1746-1758, 2004.
[21]
FSchreiberand HSchwobbermeyer, Mavisto: A tool for the exploration of network motifs, Bioinformatics, vol. 21, pp. 3572-3574, 2005.
[22]
SWernickeand FRasche, Fanmod: A tool for fast network motif detection, Bioinformatics, vol. 22, pp. 1152-1153, 2006.
[23]
Z.Kashani, H.Ahrabian, E.Elahi, A.Nowzari-Dalini, E.Ansari, S.Asadi, S.Mohammadi, F.Schreiber, and A.Masoudi-Nejad, Kavosh: A new algorithm for finding network motifs, BMC Bioinformatics, vol. 10, no. 1, p. 318, 2009.
[24]
J.Chen, W.Hsu, M. L.Lee, and S.-K.Ng, Nemofinder: Dissecting genome-wide protein-protein interactions with meso-scale network motifs, in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Dining, KDD ’06, New York, NY, USA, 2006, pp. 106-115.
[25]
J.Chen, W.Hsu, M. L.Lee, and S.-K.Ng, Labeling network motifs in protein interactomes for protein function prediction, in Data Engineering, International Conference on, 2007, pp. 546-555.
[26]
D. L. Kreher, Combinatorial Algorithms: Generation, Enumeration, and Search. CRC, December 1998.
DOI
[27]
P.Shannon, A.Markiel, O.Ozier, N. S.Baliga, J. T.Wang, D.Ramage, N.Amin, B.Schwikowski, and T.Ideker, Cytoscape: A software environment for integrated models of biomolecular interaction networks, Genome Research, vol. 13, no. 11, pp. 2498-2504, 2003.
[28]
S.Wuchty, Z. N.Oltvai, and A.Barabasi, Evolutionary conservation of motif constituents within the yeast protein interaction network, Nature Genetics, vol. 35, pp. 176-179, 2003.
[29]
W.-P.Lee, B.-C.Jeng, T.-W.Pai, C.-P.Tsai, C.-Y.Yu, and W.-S.Tzou, Differential evolutionary conservation of motif modes in the yeast protein interaction network, BMC Genomics, vol. 7, no. 1, p. 89, 2006.
[30]
L.Zhang, O.King, S.Wong, D.Goldberg, A.Tong, G.Lesage, B.Andrews, H.Bussey, C.Boone, and F.Roth, , themes and thematic maps of an integrated saccharomyces cerevisiae interaction network, Journal of Biology, vol. 4, no. 2, p. 6, 2005.
[31]
F.Hararyand E. M.Palmer, Graphical Enumeration. Academic Press, 1973.
DOI
[32]
S.Omidi, F.Schreiber, and A.Masoudi-Nejad, Moda: An efficient algorithm for network motif discovery in biological networks, Genes and Genetic Systems, vol. 84, no. 5, pp. 385-395, 2009.
[33]
S.Manganand U.Alon, Structure and function of the feedforward loop network motif, Proceedings of the National Academy of Sciences, vol. 100, no. 21, pp. 11980-11985, 2003.
[34]
S.Mangan, A.Zaslaver, and U.Alon, The coherent feedforward loop serves as a sign-sensitive delay element in transcription networks, Journal of Molecular Biology, vol. 334, no. 2, pp. 197-204, 2003.
[35]
P.Ingram, M.Stumpf, and J.Stark, Network motifs: Structure does not determine function, BMC Genomics, vol. 7, no. 1, p. 108, 2006.
[36]
M.Kittisopikuland G. M.Suel, Biological role of noise encoded in a genetic network motif, Proceedings of the National Academy of Sciences, vol. 107, no. 30, pp. 13300-13304, 2010.
[37]
N.Bhardwajand H.Lu, Co-expression among constituents of a motif in the protein-protein interaction network, J. Bioinformatics and Computational Biology, vol. 7, no. 1, pp. 1-17, 2009.
[38]
R.J Prill, P. AIglesias, and A.Levchenko, Dynamic properties of network motifs contribute to biological network organization, PLoS Biol., vol. 3, no. 11, p. e343, 2005.
[39]
J.S.Hallinanand A.Wipat, Network motifs in context: An exploration of the evolution of oscillatory dynamics in transcriptional networks, in Computational Intelligence in Bioinformatics and Computational Biology, IEEE Symposium on, Sept. 2008, pp. 83-89.
[40]
R.Dobrin, Q. K.Beg, A.-L.Barabasi, and Z. N.Oltvai, Aggregation of topological motifs in the Escherichia coli transcriptional regulatory network, BMC Bioinformatics, vol. 5, p.10, 2004.
[41]
Z.-R.Xieand M.-J.Hwang, An interactionmotif- based scoring function for protein-ligand docking, BMC Bioinformatics, vol. 11, no. 1, p. 298, 2010.
[42]
R.Milo, S.Itzkovitz, N.Kashtan, R.Levitt, S.Shen-Orr, I.Ayzenshtat, M.Sheffer, and U.Alon, Superfamilies of evolved and designed networks, Science, vol. 303, no. 5663, pp. 1538-1542, 2004.
[43]
Y.Artzy-Randrup, S. J.Fleishman, N.Ben-Tal, and L.Stone, Comment on “network motifs: Simple building blocks of complex networks” and “superfamilies of evolved and designed networks”, Science, vol. 305, no. 5687, p. 1107, 2004.
[44]
G. C.Conantand A.Wagner, Convergent evolution of gene circuits, Nature Genetics, vol. 34, pp. 244-266, 2003.
[45]
B.Andreopoulos, A.An, X.Wang, and M.Schroeder, A roadmap of clustering algorithms: Finding a match for a biomedical application, Briefings in Bioinformatics, vol. 10, no. 3, pp. 297-314, 2009.
[46]
O.Trelles, On the parallelisation of bioinformatics applications, Briefings in Bioinformatics, vol. 2, no. 2, pp. 181-194, 2001.
[47]
T.Wang, J. W.Touchman, W.Zhang, E. B.Suh, and G.Xue, A parallel algorithm for extracting transcription regulatory network motifs, in Bioinformatic and Bioengineering, IEEE International Symposium on, 2005, pp. 193-200.
[48]
P.Ribeiro, F.Silva, and L.Lopes, Efficient parallel subgraph counting using g-tries, in Cluster Computing (CLUSTER), 2010 IEEE International Conference on, Sept. 2010, pp. 217-226.
[49]
P.Ribeiroand F.Silva, g-tries: An efficient data structure for discovering network motifs, in Proceedings of the 2010 ACM Symposium on Applied Computing, SAC ’10, New York, NY, USA, 2010, pp. 1559-1566.
[50]
P.Ribeiro, F.Silva, and L.Lopes, Parallel calculation of subgraph census in biological networks, in BIOINFORMATICS 2010 - Proceedings of the First International Conference on Bioinformatics, Valencia, Spain, January, 2010, pp. 56-65.
[51]
M.Armbrust, A.Fox, R.Griffith, A. D.Joseph, R.Katz, A.Konwinski, G.Lee, D.Patterson, A.Rabkin, I.Stoica, and M.Zaharia, A view of cloud computing, Commun. ACM, vol. 53, no. 4, pp. 50-58, 2010.
[52]
J. P.Arraisand J. L.Oliveira, On the exploitation of cloud computing in bioinformatics, in Information Technology and Applications in Biomedicine (ITAB), 2010 10th IEEE International Conference on, Nov. 2010, pp. 1-4.
[53]
J.Deanand S.Ghemawat, MapReduce: A flexible data processing tool, Commun. ACM, vol. 53, no. 1, pp. 72-77, 2010.
[54]
T.Condie, N.Conway, P.Alvaro, J. M.Hellerstein, K.Elmeleegy, and R.Sears, MapReduce online, in Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation, NSDI’10, Berkeley, CA, USA, 2010, p. 21.
[55]
D.Borthakur, The hadoop distributed file system: Architecture and design, http://hadoop.apache.org/docs/ r0.18.0/hdfs_design.pdf, 2007.
[56]
R Taylor,, An overview of the hadoop/mapreduce/hbase framework and its current applications in bioinformatics, BMC Bioinformatics, vol. 11, no. Suppl 12, 2010.
[57]
M.Kuramochiand G.Karypis, Finding frequent patterns in a large sparse graph, Data Mining and Knowledge Discovery, vol. 11, no. 3, pp. 243-271, 2005.
[58]
D.Eppsteinand E. S.Spiro, The h-index of a graph and its application to dynamic subgraph statistics, CoRR, abs/0904.3741, 2009.
[59]
D.Eppstein, M. T.Goodrich, D.Strash, and L.Trott, Extended h-index parameterized data structures for computing dynamic subgraph statistics, CoRR, abs/1009.0783, 2010.
Publication history
Copyright
Rights and permissions

Publication history

Received: 09 August 2013
Accepted: 10 August 2013
Published: 03 October 2013
Issue date: October 2013

Copyright

© The author(s) 2013

Rights and permissions

Return