Scholar - SciOpen

Sequence clustering software is essential in bioinformatics. However, selecting the appropriate one can be challenging due to its diverse algorithms and targeted applications. This paper analyzes and evaluates eight representative softwares (algorithms) in terms of precision, sensitivity, speed, scale of running time, and memory consumption. Furthermore, this paper examines the effects of sequence count, sequence length, identity, thread count, and GPU on the above aspects. Sequence length and identity significantly impact clustering efficiency (speed and memory consumption), with fluctuation amplitudes exceeding an order of magnitude and non-monotonic effects observed. The evaluation results are analyzed and summarized in tables for users’ reference.

Open Access Issue

Autism Spectrum Disorder Classification with Interpretability in Children Based on Structural MRI Features Extracted Using Contrastive Variational Autoencoder

Ruimin Ma, Ruitao Xie, Yanlin Wang, Jintao Meng, Yanjie Wei, Yunpeng Cai, Wenhui Xi, Yi Pan

Big Data Mining and Analytics 2024, 7(3): 781-793

Published: 28 August 2024

Abstract

PDF (5.2 MB) Collect Collected

Downloads：144

Autism Spectrum Disorder (ASD) is a highly disabling mental disease that brings significant impairments of social interaction ability to the patients, making early screening and intervention of ASD critical. With the development of the machine learning and neuroimaging technology, extensive research has been conducted on machine classification of ASD based on structural Magnetic Resonance Imaging (s-MRI). However, most studies involve with datasets where participants’ age are above 5 and lack interpretability. In this paper, we propose a machine learning method for ASD classification in children with age range from 0.92 to 4.83 years, based on s-MRI features extracted using Contrastive Variational AutoEncoder (CVAE). 78 s-MRIs, collected from Shenzhen Children’s Hospital, are used for training CVAE, which consists of both ASD-specific feature channel and common-shared feature channel. The ASD participants represented by ASD-specific features can be easily discriminated from Typical Control (TC) participants represented by the common-shared features. In case of degraded predictive accuracy when data size is extremely small, a transfer learning strategy is proposed here as a potential solution. Finally, we conduct neuroanatomical interpretation based on the correlation between s-MRI features extracted from CVAE and surface area of different cortical regions, which discloses potential biomarkers that could help target treatments of ASD in the future.

Open Access Issue

Protein Residue Contact Prediction Based on Deep Learning and Massive Statistical Features from Multi-Sequence Alignment

Huiling Zhang, Min Hao, Hao Wu, Hing-Fung Ting, Yihong Tang, Wenhui Xi, Yanjie Wei

Tsinghua Science and Technology 2022, 27(5): 843-854

Published: 17 March 2022

Abstract

PDF (21.6 MB) Collect Collected

Downloads：312

Sequence-based protein tertiary structure prediction is of fundamental importance because the function of a protein ultimately depends on its 3D structure. An accurate residue-residue contact map is one of the essential elements for current ab initio prediction protocols of 3D structure prediction. Recently, with the combination of deep learning and direct coupling techniques, the performance of residue contact prediction has achieved significant progress. However, a considerable number of current Deep-Learning (DL)-based prediction methods are usually time-consuming, mainly because they rely on different categories of data types and third-party programs. In this research, we transformed the complex biological problem into a pure computational problem through statistics and artificial intelligence. We have accordingly proposed a feature extraction method to obtain various categories of statistical information from only the multi-sequence alignment, followed by training a DL model for residue-residue contact prediction based on the massive statistical information. The proposed method is robust in terms of different test sets, showed high reliability on model confidence score, could obtain high computational efficiency and achieve comparable prediction precisions with DL methods that relying on multi-source inputs.

Total 3