Journal Home > Volume 6 , Issue 1

The problems of biological sequence analysis have great theoretical and practical value in modern bioinformatics. Numerous solving algorithms are used for these problems, and complex similarities and differences exist among these algorithms for the same problem, causing difficulty for researchers to select the appropriate one. To address this situation, combined with the formal partition-and-recur method, component technology, domain engineering, and generic programming, the paper presents a method for the development of a family of biological sequence analysis algorithms. It designs highly trustworthy reusable domain algorithm components and further assembles them to generate specifific biological sequence analysis algorithms. The experiment of the development of a dynamic programming based LCS algorithm family shows the proposed method enables the improvement of the reliability, understandability, and development efficiency of particular algorithms.


menu
Abstract
Full text
Outline
About this article

A Method for Bio-Sequence Analysis Algorithm Development Based on the PAR Platform

Show Author's information Haipeng Shi1Huan Chen2Qinghong Yang2Jun Wang2Haihe Shi2( )
School of Software, Jiangxi Normal University, Nanchang 330022, China
School of Computer and Information Engineering, Jiangxi Normal University, Nanchang 330022, China

Abstract

The problems of biological sequence analysis have great theoretical and practical value in modern bioinformatics. Numerous solving algorithms are used for these problems, and complex similarities and differences exist among these algorithms for the same problem, causing difficulty for researchers to select the appropriate one. To address this situation, combined with the formal partition-and-recur method, component technology, domain engineering, and generic programming, the paper presents a method for the development of a family of biological sequence analysis algorithms. It designs highly trustworthy reusable domain algorithm components and further assembles them to generate specifific biological sequence analysis algorithms. The experiment of the development of a dynamic programming based LCS algorithm family shows the proposed method enables the improvement of the reliability, understandability, and development efficiency of particular algorithms.

Keywords: partition-and-recur (PAR), domain engineering, biological sequences, feature model, component assembly

References(21)

[1]
Y. Zhao, R. Gu, and S. Du, Current status and development trend of bioinformatics research, Journal of Medical Informatics, vol. 33, no. 5, pp. 2–6, 2012.
[2]
S. Liu, Y. P. Wang, W. N. Tong, and S. W. Wei, A fast and memory efficient MLCS algorithm by character merging for DNA sequences alignment, Bioinformatics, vol. 36, no. 4, pp. 1066–1073, 2020.
[3]
H. Shi and X. Zhang, Component-based design and assembly of heuristic multiple sequence alignment algorithms, Front. Genet, vol. 11, p. 105, 2020.
[4]
S. Liu, Design and implementation of first descending squad based flipping sorting algorithm, Master dissertation, School of Computer Science and Technology, Shandong University, Jinan, China, 2006.
[5]
M. D. Cao, S. H. Nguyen, D. Ganesamoorthy, A. G. Elliott, M. Cooper, and L. J. M. Coin, Scaffolding and completing genome assemblies in real-time with nanopore sequencing, Nature Communications, vol. 8, p. 14515, 2017.
[6]
R. A. Wagner and M. J. Fischer, The string-to-string correction problem, Journal of ACM, vol. 21, no. 1, pp. 168–173, 1974.
[7]
D. S. Hirschberg, Algorithms for the longest common subsequence problem, Journal of the ACM, vol. 24, no. 4, pp. 664–675, 1977.
[8]
Y. Li, Y. Wang, Z. Zhang, Y. Wang, D. Ma, and J. Huang, A novel fast and memory efficient parallel MLCS algorithm for long and large-scale sequences alignments, in Proc. 2016 IEEE 32nd International Conference on Data Engineering, Helsinki, Finland, 2016, pp. 1170–1181.
[9]
Y. Chen, A. Wan, and W. Liu, A fast parallel algorithm for finding the longest common sequence of multiple biosequences, BMC Bioinformatics, vol. 7, no. 4, p. S4, 2006.
[10]
Z. Peng and Y. Wang, A novel efficient graph model for the multiple longest common subsequences (MLCS) problem, Frontiers in Genetics, vol. 8, p. 104, 2017.
[11]
N. Nakatsu, Y. Kambayashi, and S. Yajima, A longest common subsequence algorithm suitable for similar text strings, Acta Informatica, vol. 18, no. 2, pp. 171–179, 1982.
[12]
E. W. Myers and W. Miller, Optimal alignments in linear space, Computer Applications in the Biosciences: CABIOS, vol. 4, no. 1, pp. 11–17, 1988.
[13]
Y. -T. Tsai, The constrained longest common subsequence problem, Information Processing Letters, vol. 88, pp. 173–176, 2003.
[14]
C. L. Peng, An approach for solving the constrained longestcommon subsequence problem, http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/search, 2003.
[15]
A. N. Arslan and Ö. Eǧecioǧlu, Algorithms for the constrained longest common subsequence problems, International Journal of Foundations of Computer Science, vol. 16, no. 6, pp. 1099–1109, 2005.
[16]
Y. C. Chen and K. M. Chao, On the generalized constrained longest common subsequence problems, Journal of Combinatorial Optimization, vol. 21, no. 3, pp. 383–392, 2011.
[17]
J. Xue, PAR method and its supporting platform, in Proc. 1st Asian Working Conference on Verified Software (AWCVS 2006), Macao, China, 2006, pp. 29–31.
[18]
J. Xue, Y. Zheng, Q. Hu, Z. You, W. Xie, and Z. Cheng, PAR: A practicable formal method and its supporting platform, in Proc. 20th International Conference on Formal Engineering Methods (ICFEM), Gold Coast, Australia, 2018, pp. 70–86.
[19]
H. Mili, F. Mili, and A. Mili, Reusing software: Issues and research directions, IEEE Trans. on Software Engineering, vol. 21, no. 6, pp. 528–562, 1995.
[20]
E. A. Karlsson, Software Reuse: A Holistic Approach. New York, NY, USA: John Wiley and Sons Ltd, 1995.
[21]
K. C. Kang, S. G. Cohen, J. A. Hess, W. E. Novak, and A. S. Peterson, Feature-oriented domain analysis (FODA) feasibility study, Tech. Rep. CMU/SEI-90-TR-21, Carnegie Mellon University, Pittsburgh, PA, USA, 1990.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 28 June 2022
Revised: 12 August 2022
Accepted: 17 August 2022
Published: 24 November 2022
Issue date: March 2023

Copyright

© The author(s) 2023.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 62062039) and Natural Science Foundation of Jiangxi Province (Nos. 20202BAB202024 and 20212BAB202017).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return