Open Access Issue
Medical Knowledge Graph: Data Sources, Construction, Reasoning, and Applications
Big Data Mining and Analytics 2023, 6 (2): 201-217
Published: 26 January 2023

Medical knowledge graphs (MKGs) are the basis for intelligent health care, and they have been in use in a variety of intelligent medical applications. Thus, understanding the research and application development of MKGs will be crucial for future relevant research in the biomedical field. To this end, we offer an in-depth review of MKG in this work. Our research begins with the examination of four types of medical information sources, knowledge graph creation methodologies, and six major themes for MKG development. Furthermore, three popular models of reasoning from the viewpoint of knowledge reasoning are discussed. A reasoning implementation path (RIP) is proposed as a means of expressing the reasoning procedures for MKG. In addition, we explore intelligent medical applications based on RIP and MKG and classify them into nine major types. Finally, we summarize the current state of MKG research based on more than 130 publications and future challenges and opportunities.

Open Access Issue
Metabolite-Disease Association Prediction Algorithm Combining DeepWalk and Random Forest
Tsinghua Science and Technology 2022, 27 (1): 58-67
Published: 17 August 2021

Identifying the association between metabolites and diseases will help us understand the pathogenesis of diseases, which has great significance in diagnosing and treating diseases. However, traditional biometric methods are time consuming and expensive. Accordingly, we propose a new metabolite-disease association prediction algorithm based on DeepWalk and random forest (DWRF), which consists of the following key steps: First, the semantic similarity and information entropy similarity of diseases are integrated as the final disease similarity. Similarly, molecular fingerprint similarity and information entropy similarity of metabolites are integrated as the final metabolite similarity. Then, DeepWalk is used to extract metabolite features based on the network of metabolite-gene associations. Finally, a random forest algorithm is employed to infer metabolite-disease associations. The experimental results show that DWRF has good performances in terms of the area under the curve value, leave-one-out cross-validation, and five-fold cross-validation. Case studies also indicate that DWRF has a reliable performance in metabolite-disease association prediction.

Open Access Issue
CircRNA-Disease Associations Prediction Based on Metapath2vec++ and Matrix Factorization
Big Data Mining and Analytics 2020, 3 (4): 280-291
Published: 16 November 2020

Circular RNA (circRNA) is a novel non-coding endogenous RNAs. Evidence has shown that circRNAs are related to many biological processes and play essential roles in different biological functions. Although increasing numbers of circRNAs are discovered using high-throughput sequencing technologies, these techniques are still time-consuming and costly. In this study, we propose a computational method to predict circRNA-disesae associations which is based on metapath2vec++ and matrix factorization with integrated multiple data (called PCD_MVMF). To construct more reliable networks, various aspects are considered. Firstly, circRNA annotation, sequence, and functional similarity networks are established, and disease-related genes and semantics are adopted to construct disease functional and semantic similarity networks. Secondly, metapath2vec++ is applied on an integrated heterogeneous network to learn the embedded features and initial prediction score. Finally, we use matrix factorization, take similarity as a constraint, and optimize it to obtain the final prediction results. Leave-one-out cross-validation, five-fold cross-validation, and f-measure are adopted to evaluate the performance of PCD_MVMF. These evaluation metrics verify that PCD_MVMF has better prediction performance than other methods. To further illustrate the performance of PCD_MVMF, case studies of common diseases are conducted. Therefore, PCD_MVMF can be regarded as a reliable and useful circRNA-disease association prediction tool.

Open Access Issue
TW-Co-MFC: Two-Level Weighted Collaborative Fuzzy Clustering Based on Maximum Entropy for Multi-View Data
Tsinghua Science and Technology 2021, 26 (2): 185-198
Published: 24 July 2020

In recent years, multi-view clustering research has attracted considerable attention because of the rapidly growing demand for unsupervised analysis of multi-view data in practical applications. Despite the significant advances in multi-view clustering, two challenges still need to be addressed, i.e., how to make full use of the consistent and complementary information in multiple views and how to discriminate the contributions of different views and features in the same view to efficiently reveal the latent cluster structure of multi-view data for clustering. In this study, we propose a novel Two-level Weighted Collaborative Multi-view Fuzzy Clustering (TW-Co-MFC) approach to address the aforementioned issues. In TW-Co-MFC, a two-level weighting strategy is devised to measure the importance of views and features, and a collaborative working mechanism is introduced to balance the within-view clustering quality and the cross-view clustering consistency. Then an iterative optimization objective function based on the maximum entropy principle is designed for multi-view clustering. Experiments on real-world datasets show the effectiveness of the proposed approach.

Open Access Issue
Gradient Amplification: An Efficient Way to Train Deep Neural Networks
Big Data Mining and Analytics 2020, 3 (3): 196-207
Published: 16 July 2020

Improving performance of deep learning models and reducing their training times are ongoing challenges in deep neural networks. There are several approaches proposed to address these challenges, one of which is to increase the depth of the neural networks. Such deeper networks not only increase training times, but also suffer from vanishing gradients problem while training. In this work, we propose gradient amplification approach for training deep learning models to prevent vanishing gradients and also develop a training strategy to enable or disable gradient amplification method across several epochs with different learning rates. We perform experiments on VGG-19 and Resnet models (Resnet-18 and Resnet-34) , and study the impact of amplification parameters on these models in detail. Our proposed approach improves performance of these deep learning models even at higher learning rates, thereby allowing these models to achieve higher performance with reduced training time.

Open Access Issue
Applications of Deep Learning to MRI Images: A Survey
Big Data Mining and Analytics 2018, 1 (1): 1-18
Published: 25 January 2018

Deep learning provides exciting solutions in many fields, such as image analysis, natural language processing, and expert system, and is seen as a key method for various future applications. On account of its non-invasive and good soft tissue contrast, in recent years, Magnetic Resonance Imaging (MRI) has been attracting increasing attention. With the development of deep learning, many innovative deep learning methods have been proposed to improve MRI image processing and analysis performance. The purpose of this article is to provide a comprehensive overview of deep learning-based MRI image processing and analysis. First, a brief introduction of deep learning and imaging modalities of MRI images is given. Then, common deep learning architectures are introduced. Next, deep learning applications of MRI images, such as image detection, image registration, image segmentation, and image classification are discussed. Subsequently, the advantages and weaknesses of several common tools are discussed, and several deep learning tools in the applications of MRI images are presented. Finally, an objective assessment of deep learning in MRI applications is presented, and future developments and trends with regard to deep learning for MRI images are addressed.

Open Access Issue
Genome-Wide Interaction-Based Association of Human Diseases — A Survey
Tsinghua Science and Technology 2014, 19 (6): 596-616
Published: 20 November 2014

Genome-Wide Association Studies (GWASs) aim to identify genetic variants that are associated with disease by assaying and analyzing hundreds of thousands of Single Nucleotide Polymorphisms (SNPs). Although traditional single-locus statistical approaches have been standardized and led to many interesting findings, a substantial number of recent GWASs indicate that for most disorders, the individual SNPs explain only a small fraction of the genetic causes. Consequently, exploring multi-SNPs interactions in the hope of discovering more significant associations has attracted more attentions. Due to the huge search space for complicated multi-locus interactions, many fast and effective methods have recently been proposed for detecting disease-associated epistatic interactions using GWAS data. In this paper, we provide a critical review and comparison of eight popular methods, i.e., BOOST, TEAM, epiForest, EDCF, SNPHarvester, epiMODE, MECPM, and MIC, which are used for detecting gene-gene interactions among genetic loci. In views of the assumption model on the data and searching strategies, we divide the methods into seven categories. Moreover, the evaluation methodologies, including detecting powers, disease models for simulation, resources of real GWAS data, and the control of false discover rate, are elaborated as references for new approach developers. At the end of the paper, we summarize the methods and discuss the future directions in genome-wide association studies for detecting epistatic interactions.

Open Access Issue
Guest Editorial: Special Issue on Bioinformatics and Computational Biology
Tsinghua Science and Technology 2013, 18 (5): 429-430
Published: 03 October 2013
total 8