Open Access Issue
Molecular Generation and Optimization of Molecular Properties Using a Transformer Model
Big Data Mining and Analytics 2024, 7 (1): 142-155
Published: 25 December 2023
Abstract PDF (2.9 MB) Collect

Generating novel molecules to satisfy specific properties is a challenging task in modern drug discovery, which requires the optimization of a specific objective based on satisfying chemical rules. Herein, we aim to optimize the properties of a specific molecule to satisfy the specific properties of the generated molecule. The Matched Molecular Pairs (MMPs), which contain the source and target molecules, are used herein, and logD and solubility are selected as the optimization properties. The main innovative work lies in the calculation related to a specific transformer from the perspective of a matrix dimension. Threshold intervals and state changes are then used to encode logD and solubility for subsequent tests. During the experiments, we screen the data based on the proportion of heavy atoms to all atoms in the groups and select 12365, 1503, and 1570 MMPs as the training, validation, and test sets, respectively. Transformer models are compared with the baseline models with respect to their abilities to generate molecules with specific properties. Results show that the transformer model can accurately optimize the source molecules to satisfy specific properties.

Open Access Issue
Metabolite-Disease Association Prediction Algorithm Combining DeepWalk and Random Forest
Tsinghua Science and Technology 2022, 27 (1): 58-67
Published: 17 August 2021
Abstract PDF (2.2 MB) Collect

Identifying the association between metabolites and diseases will help us understand the pathogenesis of diseases, which has great significance in diagnosing and treating diseases. However, traditional biometric methods are time consuming and expensive. Accordingly, we propose a new metabolite-disease association prediction algorithm based on DeepWalk and random forest (DWRF), which consists of the following key steps: First, the semantic similarity and information entropy similarity of diseases are integrated as the final disease similarity. Similarly, molecular fingerprint similarity and information entropy similarity of metabolites are integrated as the final metabolite similarity. Then, DeepWalk is used to extract metabolite features based on the network of metabolite-gene associations. Finally, a random forest algorithm is employed to infer metabolite-disease associations. The experimental results show that DWRF has good performances in terms of the area under the curve value, leave-one-out cross-validation, and five-fold cross-validation. Case studies also indicate that DWRF has a reliable performance in metabolite-disease association prediction.

Regular Paper Issue
Predicting CircRNA-Disease Associations Based on Improved Weighted Biased Meta-Structure
Journal of Computer Science and Technology 2021, 36 (2): 288-298
Published: 05 March 2021
Abstract Collect

Circular RNAs (circRNAs) are RNAs with a special closed loop structure, which play important roles in tumors and other diseases. Due to the time consumption of biological experiments, computational methods for predicting associations between circRNAs and diseases become a better choice. Taking the limited number of verified circRNA-disease associations into account, we propose a method named CDWBMS, which integrates a small number of verified circRNA-disease associations with a plenty of circRNA information to discover the novel circRNA-disease associations. CDWBMS adopts an improved weighted biased meta-structure search algorithm on a heterogeneous network to predict associations between circRNAs and diseases. In terms of leave-one-out-cross-validation (LOOCV), 10-fold cross-validation and 5-fold cross-validation, CDWBMS yields the area under the receiver operating characteristic curve (AUC) values of 0.921 6, 0.917 2 and 0.900 5, respectively. Furthermore, case studies show that CDWBMS can predict unknow circRNA-disease associations. In conclusion, CDWBMS is an effective method for exploring disease-related circRNAs.

Open Access Issue
CircRNA-Disease Associations Prediction Based on Metapath2vec++ and Matrix Factorization
Big Data Mining and Analytics 2020, 3 (4): 280-291
Published: 16 November 2020
Abstract PDF (4.4 MB) Collect

Circular RNA (circRNA) is a novel non-coding endogenous RNAs. Evidence has shown that circRNAs are related to many biological processes and play essential roles in different biological functions. Although increasing numbers of circRNAs are discovered using high-throughput sequencing technologies, these techniques are still time-consuming and costly. In this study, we propose a computational method to predict circRNA-disesae associations which is based on metapath2vec++ and matrix factorization with integrated multiple data (called PCD_MVMF). To construct more reliable networks, various aspects are considered. Firstly, circRNA annotation, sequence, and functional similarity networks are established, and disease-related genes and semantics are adopted to construct disease functional and semantic similarity networks. Secondly, metapath2vec++ is applied on an integrated heterogeneous network to learn the embedded features and initial prediction score. Finally, we use matrix factorization, take similarity as a constraint, and optimize it to obtain the final prediction results. Leave-one-out cross-validation, five-fold cross-validation, and f-measure are adopted to evaluate the performance of PCD_MVMF. These evaluation metrics verify that PCD_MVMF has better prediction performance than other methods. To further illustrate the performance of PCD_MVMF, case studies of common diseases are conducted. Therefore, PCD_MVMF can be regarded as a reliable and useful circRNA-disease association prediction tool.

Open Access Issue
Prediction of miRNA-circRNA Associations Based on k-NN Multi-Label with Random Walk Restart on a Heterogeneous Network
Big Data Mining and Analytics 2019, 2 (4): 261-272
Published: 05 August 2019
Abstract PDF (61.6 MB) Collect

Circular RNAs (circRNAs) play important roles in various biological processes, as essential non-coding RNAs that have effects on transcriptional and posttranscriptional gene expression regulation. Recently, many studies have shown that circRNAs can be regarded as micro RNA (miRNA) sponges, which are known to be associated with certain diseases. Therefore efficient computation methods are needed to explore miRNA-circRNA interactions, but only very few computational methods for predicting the associations between miRNAs and circRNAs exist. In this study, we adopt an improved random walk computational method, named KRWRMC, to express complicated associations between miRNAs and circRNAs. Our major contributions can be summed up in two points. First, in the conventional Random Walk Restart Heterogeneous (RWRH) algorithm, the computational method simply converts the circRNA/miRNA similarity network into the transition probability matrix; in contrast, we take the influence of the neighbor of the node in the network into account, which can suggest or stress some potential associations. Second, our proposed KRWRMC is the first computational model to calculate large numbers of miRNA-circRNA associations, which can be regarded as biomarkers to diagnose certain diseases and can thus help us to better understand complicated diseases. The reliability of KRWRMC has been verified by Leave One Out Cross Validation (LOOCV) and 10-fold cross validation, the results of which indicate that this method achieves excellent performance in predicting potential miRNA-circRNA associations.

Total 5