In proteomics, many methods for the identification of proteins have been developed. However, because of limited known genome sequences, noisy data, incomplete ion sequences, and the accuracy of protein identification, it is challenging to identify peptides using tandem mass spectral data. Noise filtering and removing thus play a key role in accurate peptide identification from tandem mass spectra. In this paper, we employ a Bayesian model to identify proteins based on the prior information of bond cleavages. A Markov Chain Monte Carlo (MCMC) algorithm is used to simulate candidate peptides from the posterior distribution and to estimate the parameters for the Bayesian model. Our simulation and computational experimental results show that the model can identify peptide with a higher accuracy.
- Article type
- Year
- Co-author


Mass spectrometry is one of the widely utilized important methods to study protein functions and components. The challenge of mono-isotope pattern recognition from large scale protein mass spectral data needs computational algorithms and tools to speed up the analysis and improve the analytic results. We utilized naïve Bayes network as the classifier with the assumption that the selected features are independent to predict mono-isotope pattern from mass spectrometry. Mono-isotopes detected from validated theoretical spectra were used as prior information in the Bayes method. Three main features extracted from the dataset were employed as independent variables in our model. The application of the proposed algorithm to publicMo dataset demonstrates that our naïve Bayes classifier is advantageous over existing methods in both accuracy and sensitivity.