Abstract
The prediction of linear B cell epitopes is crucial for understanding the mechanisms of B cell immunity, accelerating the screening of B cell epitopes, and expediting the development of related drugs. Most current prediction methods focus on features such as amino acid composition and k-mers, as well as machine learning models. However, these methods usually ignore hidden information in linear B cell epitopes, such as the positional information of amino acids in the sequences and the physicochemical properties of amino acids, thus resulting in poor prediction performance. To address this limitation, we develop CGABepi, a deep learning framework based on amino acid and physicochemical feature encoding. CGABepi employs convolutional neural networks to capture local amino acid associations and BiGRU to capture contextual relationships in sequences. To verify the superiority of the CGABepi architecture, we conduct extensive fair comparative experiments. We train CGABepi on data from two methods (epitope1D and NetBCE), both of which demonstrate significantly better performance than the original method. The ablation study confirms the importance of each module in CGABepi, demonstrating that the CGABepi architecture is well suited for predicting linear B cell epitopes. Additionally, we compare the results on four independent test sets, and CGABepi achieved the best results on all of these test sets. Finally, we successfully predict two epitope datasets for SARS-CoV-1 and SARS-CoV-2 using CGABepi. It is worth noting that out of the 10 epitopes of SARS-CoV-1, 7 epitopes are screened with ultra-high confidence, with predicted scores exceed 99.9%. The multifaceted results demonstrate that CGABepi is currently the state-of-the-art method for linear B cell epitope prediction.
京公网安备11010802044758号
Comments on this article