A Method for Bio-Sequence Analysis Algorithm Development Based on the PAR Platform

Haipeng Shi; Huan Chen; Qinghong Yang; Jun Wang; Haihe Shi

doi:10.26599/BDMA.2022.9020030

Big Data Mining and Analytics 2023, 6(1): 11-20 https://doi.org/10.26599/BDMA.2022.9020030

Open Access | Issue | Published: 24 November 2022

A Method for Bio-Sequence Analysis Algorithm Development Based on the PAR Platform

Show Author's Information Hide Author's Information Haipeng Shi^¹, Huan Chen^², Qinghong Yang^², Jun Wang^², Haihe Shi^²(

)

1School of Software, Jiangxi Normal University, Nanchang 330022, China

2School of Computer and Information Engineering, Jiangxi Normal University, Nanchang 330022, China

Keywords:

partition-and-recur (PAR), domain engineering, biological sequences, feature model, component assembly

Cite this article:

Shi H, Chen H, Yang Q, et al. A Method for Bio-Sequence Analysis Algorithm Development Based on the PAR Platform. Big Data Mining and Analytics, 2023, 6(1): 11-20. https://doi.org/10.26599/BDMA.2022.9020030

Download citation

EndNote(RIS)

BibTeX

509

Views

Downloads

Citations

Crossref

WoS

Scopus

CSCD

Abstract Full text About this article

Abstract

The problems of biological sequence analysis have great theoretical and practical value in modern bioinformatics. Numerous solving algorithms are used for these problems, and complex similarities and differences exist among these algorithms for the same problem, causing difficulty for researchers to select the appropriate one. To address this situation, combined with the formal partition-and-recur method, component technology, domain engineering, and generic programming, the paper presents a method for the development of a family of biological sequence analysis algorithms. It designs highly trustworthy reusable domain algorithm components and further assembles them to generate specifific biological sequence analysis algorithms. The experiment of the development of a dynamic programming based LCS algorithm family shows the proposed method enables the improvement of the reliability, understandability, and development efficiency of particular algorithms.

Full text

Abstract

Full text

Outline

About this article

A Method for Bio-Sequence Analysis Algorithm Development Based on the PAR Platform

Show Author's information Hide Author's Information Haipeng Shi^¹, Huan Chen^², Qinghong Yang^², Jun Wang^², Haihe Shi^²(

)

1School of Software, Jiangxi Normal University, Nanchang 330022, China

2School of Computer and Information Engineering, Jiangxi Normal University, Nanchang 330022, China

Abstract

Keywords: partition-and-recur (PAR), domain engineering, biological sequences, feature model, component assembly

References(21)

[1]

Y. Zhao, R. Gu, and S. Du, Current status and development trend of bioinformatics research, Journal of Medical Informatics, vol. 33, no. 5, pp. 2–6, 2012.

Google Scholar

[2]

S. Liu, Y. P. Wang, W. N. Tong, and S. W. Wei, A fast and memory efficient MLCS algorithm by character merging for DNA sequences alignment, Bioinformatics, vol. 36, no. 4, pp. 1066–1073, 2020.

DOI Google Scholar

[3]

H. Shi and X. Zhang, Component-based design and assembly of heuristic multiple sequence alignment algorithms, Front. Genet, vol. 11, p. 105, 2020.

DOI Google Scholar

[4]

S. Liu, Design and implementation of first descending squad based flipping sorting algorithm, Master dissertation, School of Computer Science and Technology, Shandong University, Jinan, China, 2006.

[5]

M. D. Cao, S. H. Nguyen, D. Ganesamoorthy, A. G. Elliott, M. Cooper, and L. J. M. Coin, Scaffolding and completing genome assemblies in real-time with nanopore sequencing, Nature Communications, vol. 8, p. 14515, 2017.

DOI Google Scholar

[6]

R. A. Wagner and M. J. Fischer, The string-to-string correction problem, Journal of ACM, vol. 21, no. 1, pp. 168–173, 1974.

DOI Google Scholar

[7]

D. S. Hirschberg, Algorithms for the longest common subsequence problem, Journal of the ACM, vol. 24, no. 4, pp. 664–675, 1977.

DOI Google Scholar

[8]

Y. Li, Y. Wang, Z. Zhang, Y. Wang, D. Ma, and J. Huang, A novel fast and memory efficient parallel MLCS algorithm for long and large-scale sequences alignments, in Proc. 2016 IEEE 32^nd International Conference on Data Engineering, Helsinki, Finland, 2016, pp. 1170–1181.

DOI Google Scholar

[9]

Y. Chen, A. Wan, and W. Liu, A fast parallel algorithm for finding the longest common sequence of multiple biosequences, BMC Bioinformatics, vol. 7, no. 4, p. S4, 2006.

DOI Google Scholar

[10]

Z. Peng and Y. Wang, A novel efficient graph model for the multiple longest common subsequences (MLCS) problem, Frontiers in Genetics, vol. 8, p. 104, 2017.

DOI Google Scholar

[11]

N. Nakatsu, Y. Kambayashi, and S. Yajima, A longest common subsequence algorithm suitable for similar text strings, Acta Informatica, vol. 18, no. 2, pp. 171–179, 1982.

DOI Google Scholar

[12]

E. W. Myers and W. Miller, Optimal alignments in linear space, Computer Applications in the Biosciences: CABIOS, vol. 4, no. 1, pp. 11–17, 1988.

DOI Google Scholar

[13]

Y. -T. Tsai, The constrained longest common subsequence problem, Information Processing Letters, vol. 88, pp. 173–176, 2003.

DOI Google Scholar

[14]

C. L. Peng, An approach for solving the constrained longestcommon subsequence problem, http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/search, 2003.

[15]

A. N. Arslan and Ö. Eǧecioǧlu, Algorithms for the constrained longest common subsequence problems, International Journal of Foundations of Computer Science, vol. 16, no. 6, pp. 1099–1109, 2005.

DOI Google Scholar

[16]

Y. C. Chen and K. M. Chao, On the generalized constrained longest common subsequence problems, Journal of Combinatorial Optimization, vol. 21, no. 3, pp. 383–392, 2011.

DOI Google Scholar

[17]

J. Xue, PAR method and its supporting platform, in Proc. 1st Asian Working Conference on Verified Software (AWCVS 2006), Macao, China, 2006, pp. 29–31.

Google Scholar

[18]

J. Xue, Y. Zheng, Q. Hu, Z. You, W. Xie, and Z. Cheng, PAR: A practicable formal method and its supporting platform, in Proc. 20^th International Conference on Formal Engineering Methods (ICFEM), Gold Coast, Australia, 2018, pp. 70–86.

DOI Google Scholar

[19]

H. Mili, F. Mili, and A. Mili, Reusing software: Issues and research directions, IEEE Trans. on Software Engineering, vol. 21, no. 6, pp. 528–562, 1995.

DOI Google Scholar

[20]

E. A. Karlsson, Software Reuse: A Holistic Approach. New York, NY, USA: John Wiley and Sons Ltd, 1995.

[21]

K. C. Kang, S. G. Cohen, J. A. Hess, W. E. Novak, and A. S. Peterson, Feature-oriented domain analysis (FODA) feasibility study, Tech. Rep. CMU/SEI-90-TR-21, Carnegie Mellon University, Pittsburgh, PA, USA, 1990.

Google Scholar

About this article

Publication history

Acknowledgements

Rights and permissions

Publication history

Received: 28 June 2022

Revised: 12 August 2022

Accepted: 17 August 2022

Published: 24 November 2022

Issue date: March 2023

Copyright

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 62062039) and Natural Science Foundation of Jiangxi Province (Nos. 20202BAB202024 and 20212BAB202017).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).