Scholar - SciOpen

Regular Paper Issue

Multi-Source Data with Laplacian Eigenmaps and Denoising Autoencoder for Predicting Microbe-Disease Associations via Convolutional Neural Network

Xiu-Juan Lei, Ya-Li Chen, Yi Pan

Journal of Computer Science and Technology 2025, 40(2): 588-604

Published: 31 March 2025

Abstract Collect Collected

Identifying microbes associated with diseases is important for understanding the pathogenesis of diseases as well as for the diagnosis and treatment of diseases. In this article, we propose a method based on a multi-source association network to predict microbe-disease associations, named MMHN-MDA. First, a heterogeneous network of multi-molecule associations is constructed based on associations between microbes, diseases, drugs, and metabolites. Second, the graph embedding algorithm Laplacian eigenmaps is applied to the association network to learn the behavior features of microbe nodes and disease nodes. At the same time, the denoising autoencoder (DAE) is used to learn the attribute features of microbe nodes and disease nodes. Finally, attribute features and behavior features are combined to get the final embedding features of microbes and diseases, which are fed into the convolutional neural network (CNN) to predict the microbe-disease associations. Experimental results show that the proposed method is more effective than existing methods. In addition, case studies on bipolar disorder and schizophrenia demonstrate good predictive performance of the MMHN-MDA model, and further, the results suggest that gut microbes may influence host gene expression or compounds in the nervous system, such as neurotransmitters, or metabolites that alter the blood-brain barrier.

Research Article Issue

Drug-Target Interactions Prediction Based on Signed Heterogeneous Graph Neural Networks

Ming CHEN, Yajian JIANG, Xiujuan LEI, Yi PAN, Chunyan JI, Wei JIANG

Chinese Journal of Electronics 2024, 33(1): 231-244

Published: 05 January 2024

Abstract

PDF (1.9 MB) Collect Collected

Downloads：65

Drug-target interactions (DTIs) prediction plays an important role in the process of drug discovery. Most computational methods treat it as a binary prediction problem, determining whether there are connections between drugs and targets while ignoring relational types information. Considering the positive or negative effects of DTIs will facilitate the study on comprehensive mechanisms of multiple drugs on a common target, in this work, we model DTIs on signed heterogeneous networks, through categorizing interaction patterns of DTIs and additionally extracting interactions within drug pairs and target protein pairs. We propose signed heterogeneous graph neural networks (SHGNNs), further put forward an end-to-end framework for signed DTIs prediction, called SHGNN-DTI, which not only adapts to signed bipartite networks, but also could naturally incorporate auxiliary information from drug-drug interactions (DDIs) and protein-protein interactions (PPIs). For the framework, we solve the message passing and aggregation problem on signed DTI networks, and consider different training modes on the whole networks consisting of DTIs, DDIs and PPIs. Experiments are conducted on two datasets extracted from DrugBank and related databases, under different settings of initial inputs, embedding dimensions and training modes. The prediction results show excellent performance in terms of metric indicators, and the feasibility is further verified by the case study with two drugs on breast cancer.

Research Article Issue

An Encoding-Decoding Framework Based on CNN for circRNA-RBP Binding Sites Prediction

Yajing GUO, Xiujuan LEI, Yi PAN

Chinese Journal of Electronics 2024, 33(1): 256-263

Published: 05 January 2024

Abstract

PDF (3.5 MB) Collect Collected

Downloads：68

Predicting RNA binding protein (RBP) binding sites on circular RNAs (circRNAs) is a fundamental step to understand their interaction mechanism. Numerous computational methods are developed to solve this problem, but they cannot fully learn the features. Therefore, we propose circ-CNNED, a convolutional neural network (CNN)-based encoding and decoding framework. We first adopt two encoding methods to obtain two original matrices. We preprocess them using CNN before fusion. To capture the feature dependencies, we utilize temporal convolutional network (TCN) and CNN to construct encoding and decoding blocks, respectively. Then we introduce global expectation pooling to learn latent information and enhance the robustness of circ-CNNED. We perform circ-CNNED across 37 datasets to evaluate its effect. The comparison and ablation experiments demonstrate that our method is superior. In addition, motif enrichment analysis on four datasets helps us to explore the reason for performance improvement of circ-CNNED.

Open Access Issue

SeaConvNeXt: A Lightweight Two-Branch Network Architecture for Efficient Prediction of Specific IHC Proteins and Antigens on Hematoxylin and Eosin (H&E) Images

Yuli Chen, Guoping Chen, Guoying Shi, Yao Zhou, Jiayang Bai, Germán Corredor, Cheng Lu, Xiujuan Lei

Big Data Mining and Analytics 2024, 7(4): 1212-1236

Published: 04 December 2024

Abstract

PDF (23.1 MB) Collect Collected

Downloads：130

Immunohistochemistry (IHC) is a vital technique for detecting specific proteins and antigens in tissue sections using antibodies, aiding in the analysis of tumor growth and metastasis. However, IHC is costly and time-consuming, making it challenging to implement on a large scale. To address this issue, we introduce a method that enables virtual IHC staining directly on Hematoxylin and Eosin (H&E) images. Firstly, we have developed a novel registration technique, called Bi-stage Registration based on density Clustering (BiReC), to enhance the registration efficiency between H&E and IHC images. This method involves automatically generating numerous Regions Of Interest (ROI) labels on the H&E image for model training, with the labels being determined by the intensity of IHC staining. Secondly, we propose a novel two-branch network architecture, called SeaConvNeXt, which integrates a lightweight Squeeze-Enhanced Axial (SEA) attention mechanism to efficiently extract and fuse multi-level local and global features from H&E images for direct prediction of specific proteins and antigens. The SeaConvNeXt consists of a ConvNeXt branch and a global fusion branch. The ConvNeXt branch extracts multi-level local features at four stages, while the global fusion branch, including an SEA Transformer module and three global blocks, is designed for global feature extraction and multiple feature fusion. Our experiments demonstrate that SeaConvNeXt outperforms current state-of-the-art methods on two public datasets with corresponding IHC and H&E images, achieving an AUC of 90.7% on the HER2SC dataset and 82.5% on the CRC dataset. These results suggest that SeaConvNeXt has great potential for predicting virtual IHC staining on H&E images.

Open Access Issue

Molecular Generation and Optimization of Molecular Properties Using a Transformer Model

Zhongyin Xu, Xiujuan Lei, Mei Ma, Yi Pan

Big Data Mining and Analytics 2024, 7(1): 142-155

Published: 25 December 2023

Abstract

PDF (2.9 MB) Collect Collected

Downloads：310

Generating novel molecules to satisfy specific properties is a challenging task in modern drug discovery, which requires the optimization of a specific objective based on satisfying chemical rules. Herein, we aim to optimize the properties of a specific molecule to satisfy the specific properties of the generated molecule. The Matched Molecular Pairs (MMPs), which contain the source and target molecules, are used herein, and logD and solubility are selected as the optimization properties. The main innovative work lies in the calculation related to a specific transformer from the perspective of a matrix dimension. Threshold intervals and state changes are then used to encode logD and solubility for subsequent tests. During the experiments, we screen the data based on the proportion of heavy atoms to all atoms in the groups and select 12365, 1503, and 1570 MMPs as the training, validation, and test sets, respectively. Transformer models are compared with the baseline models with respect to their abilities to generate molecules with specific properties. Results show that the transformer model can accurately optimize the source molecules to satisfy specific properties.

Open Access Issue

Metabolite-Disease Association Prediction Algorithm Combining DeepWalk and Random Forest

Jiaojiao Tie, Xiujuan Lei, Yi Pan

Tsinghua Science and Technology 2022, 27(1): 58-67

Published: 17 August 2021

Abstract

PDF (2.2 MB) Collect Collected

Downloads：192

Identifying the association between metabolites and diseases will help us understand the pathogenesis of diseases, which has great significance in diagnosing and treating diseases. However, traditional biometric methods are time consuming and expensive. Accordingly, we propose a new metabolite-disease association prediction algorithm based on DeepWalk and random forest (DWRF), which consists of the following key steps: First, the semantic similarity and information entropy similarity of diseases are integrated as the final disease similarity. Similarly, molecular fingerprint similarity and information entropy similarity of metabolites are integrated as the final metabolite similarity. Then, DeepWalk is used to extract metabolite features based on the network of metabolite-gene associations. Finally, a random forest algorithm is employed to infer metabolite-disease associations. The experimental results show that DWRF has good performances in terms of the area under the curve value, leave-one-out cross-validation, and five-fold cross-validation. Case studies also indicate that DWRF has a reliable performance in metabolite-disease association prediction.

Regular Paper Issue

Predicting CircRNA-Disease Associations Based on Improved Weighted Biased Meta-Structure

Xiu-Juan Lei, Chen Bian, Yi Pan

Journal of Computer Science and Technology 2021, 36(2): 288-298

Published: 05 March 2021

Abstract Collect Collected

Circular RNAs (circRNAs) are RNAs with a special closed loop structure, which play important roles in tumors and other diseases. Due to the time consumption of biological experiments, computational methods for predicting associations between circRNAs and diseases become a better choice. Taking the limited number of verified circRNA-disease associations into account, we propose a method named CDWBMS, which integrates a small number of verified circRNA-disease associations with a plenty of circRNA information to discover the novel circRNA-disease associations. CDWBMS adopts an improved weighted biased meta-structure search algorithm on a heterogeneous network to predict associations between circRNAs and diseases. In terms of leave-one-out-cross-validation (LOOCV), 10-fold cross-validation and 5-fold cross-validation, CDWBMS yields the area under the receiver operating characteristic curve (AUC) values of 0.921 6, 0.917 2 and 0.900 5, respectively. Furthermore, case studies show that CDWBMS can predict unknow circRNA-disease associations. In conclusion, CDWBMS is an effective method for exploring disease-related circRNAs.

Open Access Issue

CircRNA-Disease Associations Prediction Based on Metapath2vec++ and Matrix Factorization

Yuchen Zhang, Xiujuan Lei, Zengqiang Fang, Yi Pan

Big Data Mining and Analytics 2020, 3(4): 280-291

Published: 16 November 2020

Abstract

PDF (4.4 MB) Collect Collected

Downloads：180

Circular RNA (circRNA) is a novel non-coding endogenous RNAs. Evidence has shown that circRNAs are related to many biological processes and play essential roles in different biological functions. Although increasing numbers of circRNAs are discovered using high-throughput sequencing technologies, these techniques are still time-consuming and costly. In this study, we propose a computational method to predict circRNA-disesae associations which is based on metapath2vec++ and matrix factorization with integrated multiple data (called PCD_MVMF). To construct more reliable networks, various aspects are considered. Firstly, circRNA annotation, sequence, and functional similarity networks are established, and disease-related genes and semantics are adopted to construct disease functional and semantic similarity networks. Secondly, metapath2vec++ is applied on an integrated heterogeneous network to learn the embedded features and initial prediction score. Finally, we use matrix factorization, take similarity as a constraint, and optimize it to obtain the final prediction results. Leave-one-out cross-validation, five-fold cross-validation, and f-measure are adopted to evaluate the performance of PCD_MVMF. These evaluation metrics verify that PCD_MVMF has better prediction performance than other methods. To further illustrate the performance of PCD_MVMF, case studies of common diseases are conducted. Therefore, PCD_MVMF can be regarded as a reliable and useful circRNA-disease association prediction tool.

Open Access Issue

Prediction of miRNA-circRNA Associations Based on k-NN Multi-Label with Random Walk Restart on a Heterogeneous Network

Zengqiang Fang, Xiujuan Lei

Big Data Mining and Analytics 2019, 2(4): 261-272

Published: 05 August 2019

Abstract

PDF (61.6 MB) Collect Collected

Downloads：85

Circular RNAs (circRNAs) play important roles in various biological processes, as essential non-coding RNAs that have effects on transcriptional and posttranscriptional gene expression regulation. Recently, many studies have shown that circRNAs can be regarded as micro RNA (miRNA) sponges, which are known to be associated with certain diseases. Therefore efficient computation methods are needed to explore miRNA-circRNA interactions, but only very few computational methods for predicting the associations between miRNAs and circRNAs exist. In this study, we adopt an improved random walk computational method, named KRWRMC, to express complicated associations between miRNAs and circRNAs. Our major contributions can be summed up in two points. First, in the conventional Random Walk Restart Heterogeneous (RWRH) algorithm, the computational method simply converts the circRNA/miRNA similarity network into the transition probability matrix; in contrast, we take the influence of the neighbor of the node in the network into account, which can suggest or stress some potential associations. Second, our proposed KRWRMC is the first computational model to calculate large numbers of miRNA-circRNA associations, which can be regarded as biomarkers to diagnose certain diseases and can thus help us to better understand complicated diseases. The reliability of KRWRMC has been verified by Leave One Out Cross Validation (LOOCV) and 10-fold cross validation, the results of which indicate that this method achieves excellent performance in predicting potential miRNA-circRNA associations.