1880
Views
419
Downloads
0
Crossref
0
WoS
0
Scopus
0
CSCD
With the development of sequencing technologies, somatic mutation analysis has become an important component in cancer research and treatment. VarDict is a commonly used somatic variant caller for this task. Although the heuristic-based VarDict algorithm exhibits high sensitivity and versatility, it may detect higher amounts of false positive variants than callers, limiting its clinical practicality. To address this problem, we propose DeepFilter, a deep-learning based filter for VarDict, which can filter out the false positive variants detected by VarDict effectively. Our approach trains two models for insertion-deletion mutations (InDels) and single nucleotide variants (SNVs), respectively. Experiments show that DeepFilter can filter at least 98.5% of false positive variants and retain 93.5% of true positive variants for InDels and SNVs in the commonly used tumor-normal paired mode. Source code and pre-trained models are available at https://github.com/LeiHaoa/DeepFilter.
With the development of sequencing technologies, somatic mutation analysis has become an important component in cancer research and treatment. VarDict is a commonly used somatic variant caller for this task. Although the heuristic-based VarDict algorithm exhibits high sensitivity and versatility, it may detect higher amounts of false positive variants than callers, limiting its clinical practicality. To address this problem, we propose DeepFilter, a deep-learning based filter for VarDict, which can filter out the false positive variants detected by VarDict effectively. Our approach trains two models for insertion-deletion mutations (InDels) and single nucleotide variants (SNVs), respectively. Experiments show that DeepFilter can filter at least 98.5% of false positive variants and retain 93.5% of true positive variants for InDels and SNVs in the commonly used tumor-normal paired mode. Source code and pre-trained models are available at https://github.com/LeiHaoa/DeepFilter.
This work was partially supported by the National Natural Science Foundation of China (NSFC) (Nos. 62102231 and 61972231); the Shenzhen Basic Research Fund (No. JCYJ20180507182818013); the Key Project of Joint Fund of Shandong Province (No. ZR2019LZH007); Shandong Provincial Natural Science Foundation (No. ZR2021QF089); the PPP project from CSC and DAAD; and Engineering Research Center of Digital Media Technology, Ministry of Education, China.
The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).