Open Access Issue
Medical Knowledge Graph: Data Sources, Construction, Reasoning, and Applications
Big Data Mining and Analytics 2023, 6 (2): 201-217
Published: 26 January 2023

Medical knowledge graphs (MKGs) are the basis for intelligent health care, and they have been in use in a variety of intelligent medical applications. Thus, understanding the research and application development of MKGs will be crucial for future relevant research in the biomedical field. To this end, we offer an in-depth review of MKG in this work. Our research begins with the examination of four types of medical information sources, knowledge graph creation methodologies, and six major themes for MKG development. Furthermore, three popular models of reasoning from the viewpoint of knowledge reasoning are discussed. A reasoning implementation path (RIP) is proposed as a means of expressing the reasoning procedures for MKG. In addition, we explore intelligent medical applications based on RIP and MKG and classify them into nine major types. Finally, we summarize the current state of MKG research based on more than 130 publications and future challenges and opportunities.

Open Access Issue
Fusion Model for Tentative Diagnosis Inference Based on Clinical Narratives
Tsinghua Science and Technology 2023, 28 (4): 686-695
Published: 06 January 2023

In general, physicians make a preliminary diagnosis based on patients’ admission narratives and admission conditions, largely depending on their experiences and professional knowledge. An automatic and accurate tentative diagnosis based on clinical narratives would be of great importance to physicians, particularly in the shortage of medical resources. Despite its great value, little work has been conducted on this diagnosis method. Thus, in this study, we propose a fusion model that integrates the semantic and symptom features contained in the clinical text. The semantic features of the input text are initially captured by an attention-based Bidirectional Long Short-Term Memory (BiLSTM) network. The symptom concepts, recognized from the input text, are then vectorized by using the term frequency-inverse document frequency method based on the relations between symptoms and diseases. Finally, two fusion strategies are utilized to recommend the most potential candidate for the international classification of diseases code. Model training and evaluation are performed on a public clinical dataset. The results show that both fusion strategies achieved a promising performance, in which the best performance obtained a top-3 accuracy of 0.7412.

Open Access Issue
A Data-Driven Clustering Recommendation Method for Single-Cell RNA-Sequencing Data
Tsinghua Science and Technology 2021, 26 (5): 772-789
Published: 20 April 2021

Recently, the emergence of single-cell RNA-sequencing (scRNA-seq) technology makes it possible to solve biological problems at the single-cell resolution. One of the critical steps in cellular heterogeneity analysis is the cell type identification. Diverse scRNA-seq clustering methods have been proposed to partition cells into clusters. Among all the methods, hierarchical clustering and spectral clustering are the most popular approaches in the downstream clustering analysis with different preprocessing strategies such as similarity learning, dropout imputation, and dimensionality reduction. In this study, we carry out a comprehensive analysis by combining different strategies with these two categories of clustering methods on scRNA-seq datasets under different biological conditions. The analysis results show that the methods with spectral clustering tend to perform better on datasets with continuous shapes in two-dimension, while those with hierarchical clustering achieve better results on datasets with obvious boundaries between clusters in two-dimension. Motivated by this finding, a new strategy, called QRS, is developed to quantitatively evaluate the latent representative shape of a dataset to distinguish whether it has clear boundaries or not. Finally, a data-driven clustering recommendation method, called DDCR, is proposed to recommend hierarchical clustering or spectral clustering for scRNA-seq data. We perform DDCR on two typical single cell clustering methods, SC3 and RAFSIL, and the results show that DDCR recommends a more suitable downstream clustering method for different scRNA-seq datasets and obtains more robust and accurate results.

Open Access Issue
NetEPD: A Network-Based Essential Protein Discovery Platform
Tsinghua Science and Technology 2020, 25 (4): 542-552
Published: 13 January 2020

Proteins drive virtually all cellular-level processes. The proteins that are critical to cell proliferation and survival are defined as essential. These essential proteins are implicated in key metabolic and regulatory networks, and are important in the context of rational drug design efforts. The computational identification of the essential proteins benefits from the proliferation of publicly available protein interaction datasets. Scientists have developed several algorithms that use these interaction datasets to predict essential proteins. However, a comprehensive web platform that facilitates the analysis and prediction of essential proteins is missing. In this study, we design, implement, and release NetEPD: a network-based essential protein discovery platform. This resource integrates data on Protein-Protein Interaction (PPI) networks, gene expression, subcellular localization, and a native set of essential proteins. It also computes a variety of node centrality measures, evaluates the predictions of essential proteins, and visualizes PPI networks. This comprehensive platform functions by implementing four activities, which include the collection of datasets, computation of centrality measures, evaluation, and visualization. The results produced by NetEPD are visualized on its website, and sent to a user-provided email, and they are available to download in a parsable format. This platform is freely available at

Open Access Issue
Clinical Big Data and Deep Learning: Applications, Challenges, and Future Outlooks
Big Data Mining and Analytics 2019, 2 (4): 288-305
Published: 05 August 2019

The explosion of digital healthcare data has led to a surge of data-driven medical research based on machine learning. In recent years, as a powerful technique for big data, deep learning has gained a central position in machine learning circles for its great advantages in feature representation and pattern recognition. This article presents a comprehensive overview of studies that employ deep learning methods to deal with clinical data. Firstly, based on the analysis of the characteristics of clinical data, various types of clinical data (e.g., medical images, clinical notes, lab results, vital signs, and demographic informatics) are discussed and details provided of some public clinical datasets. Secondly, a brief review of common deep learning models and their characteristics is conducted. Then, considering the wide range of clinical research and the diversity of data types, several deep learning applications for clinical data are illustrated: auxiliary diagnosis, prognosis, early warning, and other tasks. Although there are challenges involved in applying deep learning techniques to clinical data, it is still worthwhile to look forward to a promising future for deep learning applications in clinical big data in the direction of precision medicine.

Open Access Issue
A Novel Method of Gene Regulatory Network Structure Inference from Gene Knock-Out Expression Data
Tsinghua Science and Technology 2019, 24 (4): 446-455
Published: 07 March 2019

Inferring Gene Regulatory Networks (GRNs) structure from gene expression data has been a challenging problem in systems biology. It is critical to identify complicated regulatory relationships among genes for understanding regulatory mechanisms in cells. Various methods based on information theory have been developed to infer GRNs. However, these methods introduce many redundant regulatory relationships in the network inference process due to external noise in the original data, topology sparseness in the network structure, and non-linear dependency among genes. Especially as the network size increases, the performance of these methods decreases dramatically. In this paper, a novel network structure inference method named Loc-PCA-CMI is proposed that first identifies local overlapped gene clusters, and then infers the local network structure for each cluster by a Path Consistency Algorithm based on Conditional Mutual Information (PCA-CMI). The final structure of the GRN is denoted as dependence among genes by an ensemble of the obtained local network structures. Loc-PCA-CMI was evaluated on DREAM3 knock-out datasets, and its performance was compared to other information theory-based network inference methods including ARACNE, MRNET, PCA-CMI, and PCA-PMI. Experimental results demonstrate our novel method Loc-PCA-CMI outperforms the other four methods in DREAM3 datasets especially in size 50 and 100 networks.

Open Access Issue
Applications of Deep Learning to MRI Images: A Survey
Big Data Mining and Analytics 2018, 1 (1): 1-18
Published: 25 January 2018

Deep learning provides exciting solutions in many fields, such as image analysis, natural language processing, and expert system, and is seen as a key method for various future applications. On account of its non-invasive and good soft tissue contrast, in recent years, Magnetic Resonance Imaging (MRI) has been attracting increasing attention. With the development of deep learning, many innovative deep learning methods have been proposed to improve MRI image processing and analysis performance. The purpose of this article is to provide a comprehensive overview of deep learning-based MRI image processing and analysis. First, a brief introduction of deep learning and imaging modalities of MRI images is given. Then, common deep learning architectures are introduced. Next, deep learning applications of MRI images, such as image detection, image registration, image segmentation, and image classification are discussed. Subsequently, the advantages and weaknesses of several common tools are discussed, and several deep learning tools in the applications of MRI images are presented. Finally, an objective assessment of deep learning in MRI applications is presented, and future developments and trends with regard to deep learning for MRI images are addressed.

Open Access Issue
Computational Approaches for Prioritizing Candidate Disease Genes Based on PPI Networks
Tsinghua Science and Technology 2015, 20 (5): 500-512
Published: 13 October 2015

With the continuing development and improvement of genome-wide techniques, a great number of candidate genes are discovered. How to identify the most likely disease genes among a large number of candidates becomes a fundamental challenge in human health. A common view is that genes related to a specific or similar disease tend to reside in the same neighbourhood of biomolecular networks. Recently, based on such observations, many methods have been developed to tackle this challenge. In this review, we firstly introduce the concept of disease genes, their properties, and available data for identifying them. Then we review the recent computational approaches for prioritizing candidate disease genes based on Protein-Protein Interaction (PPI) networks and investigate their advantages and disadvantages. Furthermore, some pieces of existing software and network resources are summarized. Finally, we discuss key issues in prioritizing candidate disease genes and point out some future research directions.

total 8