Discover the SciOpen Platform and Achieve Your Research Goals with Ease.
Search articles, authors, keywords, DOl and etc.
The precise prediction of molecular properties is essential for advancements in drug development, particularly in virtual screening and compound optimization. The recent introduction of numerous deep learningbased methods has shown remarkable potential in enhancing Molecular Property Prediction (MPP), especially improving accuracy and insights into molecular structures. Yet, two critical questions arise: does the integration of domain knowledge augment the accuracy of molecular property prediction and does employing multi-modal data fusion yield more precise results than unique data source methods? To explore these matters, we comprehensively review and quantitatively analyze recent deep learning methods based on various benchmarks. We discover that integrating molecular information significantly improves Molecular Property Prediction (MPP) for both regression and classification tasks. Specifically, regression improvements, measured by reductions in Root Mean Square Error (RMSE), are up to 4.0%, while classification enhancements, measured by the area under the receiver operating characteristic curve (ROC-AUC), are up to 1.7%. Additionally, we discover that, as measured by ROC-AUC, augmenting 2D graphs with 3D information improves performance for classification tasks by up to 13.2% and enriching 2D graphs with 1D SMILES boosts multi-modal learning performance for regression tasks by up to 9.1%. The two consolidated insights offer crucial guidance for future advancements in drug discovery.
J. Shen and C. A Nicolaou, Molecular property prediction: Recent trends in the era of artificial intelligence, Drug Discov Today Technol., vol. 32–33, pp. 29–36, 2019.
Z. Li, M. Jiang, S. Wang, and S. Zhang, Deep learning methods for molecular representation and property prediction, Drug Discov. Today, vol. 27, no. 12, p. 103373, 2022.
X. Lin, Z. Quan, Z. J. Wang, H. Huang, and X. Zeng, A novel molecular representation with BiGRU neural networks for learning atom, Brief. Bioinform., vol. 21, no. 6, pp. 2099–2111, 2020.
Q. Lv, G. Chen, L. Zhao, W. Zhong, and C. Y. C. Chen, Mol2Context-vec: Learning molecular representation from context awareness for drug discovery, Brief. Bioinform., vol. 22, no. 6, p. bbab317, 2021.
S. Han, H. Fu, Y. Wu, G. Zhao, Z. Song, F. Huang, Z. Zhang, S. Liu, and W. Zhang, HimGNN: A novel hierarchical molecular graph representation learning framework for property prediction, Brief. Bioinform., vol. 24, no. 5, p. bbad305, 2023.
G. Bouritsas, F. Frasca, S. Zafeiriou, and M. M. Bronstein, Improving graph neural network expressivity via subgraph isomorphism counting, IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 1, pp. 657–668, 2023.
Z. Xiong, D. Wang, X. Liu, F. Zhong, X. Wan, X. Li, Z. Li, X. Luo, K. Chen, H. Jiang, et al., Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism, J. Med. Chem., vol. 63, no. 16, pp. 8749–8760, 2020.
J. Ross, B. Belgodere, V. Chenthamarakshan, I. Padhi, Y. Mroueh, and P. Das, Large-scale chemical language representations capture molecular structure and properties, Nat. Mach. Intell., vol. 4, no. 12, pp. 1256–1264, 2022.
X. Zeng, H. Xiang, L. Yu, J. Wang, K. Li, R. Nussinov, and F. Cheng, Accurate prediction of molecular properties and drug targets using a self-supervised image representation learning framework, Nat. Mach. Intell., vol. 4, no. 11, pp. 1004–1016, 2022.
J. H. Chen and Y. J. Tseng, Different molecular enumeration influences in deep learning: An example using aqueous solubility, Brief. Bioinform., vol. 22, no. 3, p. bbaa092, 2021.
S. Liu, J. Li, K. C. Bennett, B. Ganoe, T. Stauch, M. Head-Gordon, A. Hexemer, D. Ushizima, and T. Head-Gordon, Multiresolution 3D-DenseNet for chemical shift prediction in NMR crystallography, J. Phys. Chem. Lett., vol. 10, no. 16, p. 4558–4565, 2019.
Y. Wang, J. Wang, Z. Cao, and A. B. Farimani, Molecular contrastive learning of representations via graph neural networks, Nat. Mach. Intell., vol. 4, no. 3, pp. 279–287, 2022.
Y. Fang, Q. Zhang, N. Zhang, Z. Chen, X. Zhuang, X. Shao, X. Fan, and H. Chen, Knowledge graph-enhanced molecular contrastive learning with functional prompt, Nat. Mach. Intell., vol. 5, no. 5, pp. 542–553, 2023.
H. Li, R. Zhang, Y. Min, D. Ma, D. Zhao, and J. Zeng, A knowledge-guided pre-training framework for improving molecular representation learning, Nat. Commun., vol. 14, no. 1, p. 7568, 2023.
D. Zhang, W. Feng, Y. Wang, Z. Qi, Y. Shan, and J. Tang, DropConn: Dropout connection based random GNNs for molecular property prediction, IEEE Trans. Knowl. Data Eng., vol. 36, no. 2, pp. 518–529, 2024.
S. Biswas, Y. Chung, J. Ramirez, H. Wu, and W. H. Green, Predicting critical properties and acentric factors of fluids using multitask machine learning, J. Chem. Inf. Model., vol. 63, no. 15, pp. 4574–4588, 2023.
Z. Tan, Y. Li, W. Shi, and S. Yang, A multitask approach to learn molecular properties, J. Chem. Inf. Model., vol. 61, no. 8, pp. 3824–3834, 2021.
Z. Wu, B. Ramsundar, E. N. Feinberg, J. Gomes, C. Geniesse, A. S. Pappu, K. Leswing, and V. Pande, MoleculeNet: A benchmark for molecular machine learning, Chem. Sci., vol. 9, no. 2, pp. 513–530, 2018.
D. Weininger, SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules, J. Chem. Inf. Comput. Sci., vol. 28, no. 1, pp. 31–36, 1988.
D. Rogers and M. Hahn, Extended-connectivity fingerprints, J. Chem. Inf. Model., vol. 50, no. 5, pp. 742–754, 2010.
J. L. Durant, B. A. Leland, D. R. Henry, and J. G. Nourse, Reoptimization of mdl keys for use in drug discovery, J. Chem. Inf. Comput. Sci., vol. 42, no. 6, pp. 1273–1280, 2002.
M. Krenn, F. Häse, A. Nigam, P. Friederich, and A. Aspuru-Guzik, Self-referencing embedded strings (SELFIES): A 100% robust molecular string representation, Mach. Learn.: Sci. Technol., vol. 1, no. 4, p. 045024, 2020.
S. R. Heller, A. McNaught, I. Pletnev, S. Stein, and D. Tchekhovskoi, InChi, the IUPAC international chemical identifier, J. Cheminform., vol. 7, no. 1, p. 23, 2015.
W. L. DeLano, PyMOL: An open-source molecular graphics tool, CCP4 Newsl. Protein Crystallogr, vol. 40, no. 1, pp. 82–92, 2002.
J. Sunseri and D. R. Koes, Libmolgrid: Graphics processing unit accelerated molecular gridding for deep learning applications, J. Chem. Inf. Model., vol. 60, no. 3, pp. 1079–1084, 2020.
J. Degen, C. Wegscheid-Gerlach, A. Zaliani, and M. Rarey, On the art of compiling and using ‘drug-like’ chemical fragment spaces, ChemMedChem, vol. 3, no. 10, pp. 1503–1507, 2008.
G. W. Bemis and M. A. Murcko, The properties of known drugs. 1. Molecular frameworks, J. Med. Chem., vol. 39, no. 15, pp. 2887–2893, 1996.
T. Liu, M. Naderi, C. Alvin, S. Mukhopadhyay, and M. Brylinski, Break down in order to build up: Decomposing small molecules for fragment-based drug design with eMolFrag, J. Chem. Inf. Model., vol. 57, no. 4, pp. 627–631, 2017.
C. K. Wu, X. C. Zhang, Z. J. Yang, A. P. Lu, T. J. Hou, and D. S. Cao, Learning to SMILES: Ban-based strategies to improve latent representation learning from molecules, Brief. Bioinform., vol. 22, no. 6, p. bbab327, 2021.
K. Yang, K. Swanson, W. Jin, C. Coley, P. Eiden, H. Gao, A. Guzman-Perez, T. Hopper, B. Kelley, M. Mathea, et al., Analyzing learned molecular representations for property prediction, J. Chem. Inf. Model., vol. 59, no. 8, pp. 3370–3388, 2019.
Y. Ji, G. Wan, Y. Zhan, and B. Du, Metapath-fused heterogeneous graph network for molecular property prediction, Inf. Sci., vol. 629, pp. 155–168, 2023.
X. Fang, L. Liu, J. Lei, D. He, S. Zhang, J. Zhou, F. Wang, H. Wu, and H. Wang, Geometry-enhanced molecular representation learning for property prediction, Nat. Mach. Intell., vol. 4, no. 2, pp. 127–134, 2022.
S. Liu, W. Nie, C. Wang, J. Lu, Z. Qiao, L. Liu, J. Tang, C. Xiao, and A. Anandkumar, Multi-modal molecule structure–text model for text-based retrieval and editing, Nat. Mach. Intell., vol. 5, no. 12, pp. 1447–1457, 2023.
A. L. Nazarova, L. Yang, K. Liu, A. Mishra, R. K. Kalia, K. I. Nomura, A. Nakano, P. Vashishta, and P. Rajak, Dielectric polymer property prediction using recurrent neural networks with optimizations, J. Chem. Inf. Model., vol. 61, no. 5, pp. 2175–2186, 2021.
Z. Wang, Y. Su, W. Shen, S. Jin, J. H. Clark, J. Ren, and X. Zhang, Predictive deep learning models for environmental properties: The direct calculation of octanol–water partition coefficients from molecular graphs, Green Chem., vol. 21, no. 16, pp. 4555–4565, 2019.
M. Withnall, E. Lindelöf, O. Engkvist, and H. Chen, Building attention and edge message passing neural networks for bioactivity and physical–chemical property prediction, J. Cheminform., vol. 12, no. 1, p. 1, 2020.
P. Li, Y. Li, C. Y. Hsieh, S. Zhang, X. Liu, H. Liu, S. Song, and X. Yao, TrimNet: Learning molecular representation from triplet messages for biomedicine, Brief. Bioinform., vol. 22, no. 4, p. bbaa266, 2021.
X. Fan, M. Gong, Y. Wu, A. K. Qin, and Y. Xie, Propagation enhanced neural message passing for graph representation learning, IEEE Trans. Knowl. Data Eng., vol. 35, no. 2, pp. 1952–1964, 2023.
Y. Li, P. Li, X. Yang, C. Y. Hsieh, S. Zhang, X. Wang, R. Lu, H. Liu, and X. Yao, Introducing block design in graph neural networks for molecular properties prediction, Chem. Eng. J., vol. 414, p. 128817, 2021.
X. Liu, X. Wang, J. Wu, and K. Xia, Hypergraph-based persistent cohomology (HPC) for molecular representations in drug design, Brief. Bioinform., vol. 22, no. 5, p. bbaa411, 2021.
T. Hasebe, Knowledge-embedded message-passing neural networks: Improving molecular property prediction with human knowledge, ACS Omega, vol. 6, no. 42, pp. 27955–27967, 2021.
X. Zang, X. Zhao, and B. Tang, Hierarchical molecular graph self-supervised learning for property prediction, Commun. Chem., vol. 6, no. 1, p. 34, 2023.
N. Liu, S. Jian, D. Li, Y. Zhang, Z. Lai, and H. Xu, Hierarchical adaptive pooling by capturing high-order dependency for graph representation learning, IEEE Trans. Knowl. Data Eng., vol. 35, no. 4, pp. 3952–3965, 2023.
J. Gao, J. Gao, X. Ying, M. Lu, and J. Wang, Higher-order interaction goes neural: A substructure assembling graph attention network for graph classification, IEEE Trans. Knowl. Data Eng., vol. 35, no. 2, pp. 1594–1608, 2023.
X. B. Ye, Q. Guan, W. Luo, L. Fang, Z. R. Lai, and J. Wang, Molecular substructure graph attention network for molecular property identification in drug discovery, Pattern Recog., vol. 128, p. 108659, 2022.
W. Zhu, Y. Zhang, D. Zhao, J. Xu, and L. Wang, HiGNN: A hierarchical informative graph neural network for molecular property prediction equipped with feature-wise attention, J. Chem. Inf. Model., vol. 63, no. 1, pp. 43–55, 2023.
B. Winter, C. Winter, J. Schilling, and A. Bardow, A smile is all you need: Predicting limiting activity coefficients from smiles with natural language processing, Digit Discov, vol. 1, no. 6, pp. 859–869, 2022.
J. Su, M. Ahmed, Y. Lu, S. Pan, W. Bo, and Y. Liu, RoFormer: Enhanced transformer with rotary position embedding, Neurocomputing, vol. 568, p. 127063, 2024.
G. P. Ren, K. J. Wu, and Y. He, Enhancing molecular representations via graph transformation layers, J. Chem. Inf. Model., vol. 63, no. 9, pp. 2679–2688, 2023.
J. Gao, Z. Shen, Y. Xie, J. Lu, Y. Lu, S. Chen, Q. Bian, Y. Guo, L. Shen, J. Wu, et al., TransFoxMol: Predicting molecular property with focused attention, Brief. Bioinform., vol. 24, no. 5, p. bbad306, 2023.
Y. Jiang, S. Jin, X. Jin, X. Xiao, W. Wu, X. Liu, Q. Zhang, X. Zeng, G. Yang, and Z. Niu, Pharmacophoric-constrained heterogeneous graph transformer model for molecular property prediction, Commun. Chem., vol. 6, no. 1, p. 60, 2023.
M. Hirohara, Y. Saito, Y. Koda, K. Sato, and Y. Sakakibara, Convolutional neural network based on smiles representation of compounds for detecting chemical motif, BMC Bioinformatics, vol. 19, no. S19, p. 526, 2018.
P. Jiang, Y. Chi, X. S. Li, Z. Meng, X. Liu, X. S. Hua, and K. Xia, Molecular persistent spectral image (Mol-PSI) representation for machine learning models in drug design, Brief. Bioinform., vol. 23, no. 1, p. bbab527, 2022.
X. Wang, Z. Li, M. Jiang, S. Wang, S. Zhang, and Z. Wei, Molecule property prediction based on spatial graph embedding, J. Chem. Inf. Model., vol. 59, no. 9, pp. 3817–3828, 2019.
J. Liu, X. Lei, Y. Zhang, and Y. Pan, The prediction of molecular toxicity based on BiGRU and GraphSAGE, Comput. Biol. Med., vol. 153, p. 106524, 2023.
Q. Tang, F. Nie, Q. Zhao, and W. Chen, A merged molecular representation deep learning method for blood–brain barrier permeability prediction, Brief. Bioinform., vol. 23, no. 5, p. bbac357, 2022.
T. Zhang, S. Chen, A. Wulamu, X. Guo, Q. Li, and H. Zheng, TransG-Net: Transformer and graph neural network based multi-modal data fusion network for molecular properties prediction, Appl. Intell., vol. 53, no. 12, pp. 16077–16088, 2023.
D. Chen, K. Gao, D. D. Nguyen, X. Chen, Y. Jiang, G. W. Wei, and F. Pan, Algebraic graph-assisted bidirectional transformers for molecular property prediction, Nat. Commun., vol. 12, no. 1, p. 3521, 2021.
W. X. Shen, X. Zeng, F. Zhu, Y. L. Wang, C. Qin, Y. Tan, Y. Y. Jiang, and Y. Z. Chen, Out-of-the-box deep learning prediction of pharmaceutical properties by broadly learned knowledge-based molecular representations, Nat. Mach. Intell., vol. 3, no. 4, pp. 334–343, 2021.
Z. Wang, M. Liu, Y. Luo, Z. Xu, Y. Xie, L. Wang, L. Cai, Q. Qi, Z. Yuan, T. Yang, et al., Advanced graph and sequence neural networks for molecular property prediction and drug discovery, Bioinformatics, vol. 38, no. 9, pp. 2579–2586, 2022.
Y. Wang, R. Magar, C. Liang, and A. B. Farimani, Improving molecular contrastive learning via faulty negative mitigation and decomposed fragment contrast, J. Chem. Inf. Model., vol. 62, no. 11, pp. 2713–2725, 2022.
Z. Wu, D. Jiang, J. Wang, X. Zhang, H. Du, L. Pan, C. Y. Hsieh, D. Cao, and T. Hou, Knowledge-based BERT: A method to extract molecular features like computational chemists, Brief. Bioinform., vol. 23, no. 3, p. bbac131, 2022.
Z. Wu, J. Wang, H. Du, D. Jiang, Y. Kang, D. Li, P. Pan, Y. Deng, D. Cao, C. Y. Hsieh, et al., Chemistry-intuitive explanation of graph neural networks for molecular property prediction with substructure masking, Nat. Commun., vol. 14, no. 1, p. 2585, 2023.
R. Irwin, S. Dimitriadis, J. He, and E. J. Bjerrum, Chemformer: A pre-trained transformer for computational chemistry, Mach. Learn.: Sci. Technol., vol. 3, no. 1, p. 015022, 2022.
X. C. Zhang, C. K. Wu, J. C. Yi, X. X. Zeng, C. Q. Yang, A. P. Lu, T. J. Hou, and D. S. Cao, Pushing the boundaries of molecular property prediction for drug discovery with multitask learning BERT enhanced by SMILES enumeration, Research, vol. 2022, p. 0004, 2022.
H. Abdel-Aty and I. R. Gould, Large-scale distributed training of transformers for chemical fingerprinting, J. Chem. Inf. Model., vol. 62, no. 20, pp. 4852–4862, 2022.
Z. Zheng, Y. Tan, H. Wang, S. Yu, T. Liu, and C. Liang, CasANGCL: Pre-training and fine-tuning model based on cascaded attention network and graph contrastive learning for molecular property prediction, Brief. Bioinform., vol. 24, no. 1, p. bbac566, 2023.
H. Liu, Y. Huang, X. Liu, and L. Deng, Attention-wise masked graph contrastive learning for predicting molecular property, Brief. Bioinform., vol. 23, no. 5, p. bbac303, 2022.
S. Lin, C. Liu, P. Zhou, Z. Y. Hu, S. Wang, R. Zhao, Y. Zheng, L. Lin, E. Xing, and X. Liang, Prototypical graph contrastive learning, IEEE Trans. Neural Netw. Learn. Syst., vol. 35, no. 2, pp. 2747–2758, 2024.
J. Wang, J. Guan, and S. Zhou, Molecular property prediction by contrastive learning with attention-guided positive sample selection, Bioinformatics, vol. 39, no. 5, p. btad258, 2023.
X. Wu, J. Duan, Y. Pan, and M. Li, Medical knowledge graph: Data sources, construction, reasoning, and applications, Big Data Mining and Analytics, vol. 6, no. 2, pp. 201–217, 2023.
X. Xu, C. Deng, Y. Xie, and S. Ji, Group contrastive self-supervised learning on graphs, IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 3, pp. 3169–3180, 2023.
A. Xie, Z. Zhang, J. Guan, and S. Zhou, Self-supervised learning with chemistry-aware fragmentation for effective molecular property prediction, Brief. Bioinform., vol. 24, no. 5, p. bbad296, 2023.
Z. Ji, R. Shi, J. Lu, F. Li, and Y. Yang, ReLMole: Molecular representation learning based on two-level graph similarities, J. Chem. Inf. Model., vol. 62, no. 22, pp. 5361–5372, 2022.
J. Chen, Y. W. Si, C. W. Un, and S. W. I. Siu, Chemical toxicity prediction based on semi-supervised learning and graph convolutional neural network, J. Cheminform., vol. 13, no. 1, p. 93, 2021.
K. Yu, S. Visweswaran, and K. Batmanghelich, Semi-supervised hierarchical drug embedding in hyperbolic space, J. Chem. Inf. Model., vol. 60, no. 12, pp. 5647–5657, 2020.
X. Li, X. Yan, Q. Gu, H. Zhou, D. Wu, and J. Xu, DeepChemStable: Chemical stability prediction with an attention-based graph convolution network, J. Chem. Inf. Model., vol. 59, no. 3, pp. 1044–1049, 2019.
H. Li, X. Zhao, S. Li, F. Wan, D. Zhao, and J. Zeng, Improving molecular property prediction through a task similarity enhanced transfer learning strategy, iScience, vol. 25, no. 10, p. 105231, 2022.
W. Ju, Z. Liu, Y. Qin, B. Feng, C. Wang, Z. Guo, X. Luo, and M. Zhang, Few-shot molecular property prediction via hierarchically structured learning on relation graphs, Neural Netw., vol. 163, pp. 122–131, 2023.
L. Torres, J. P. Arrais, and B. Ribeiro, Few-shot learning via graph embeddings with convolutional networks for low-data molecular property prediction, Neural Comput. Appl., vol. 35, no. 18, pp. 13167–13185, 2023.
K. P. Ham and L. Sael, Evidential meta-model for molecular property prediction, Bioinformatics, vol. 39, no. 10, p. btad604, 2023.
D. van Tilborg, A. Alenicheva, and F. Grisoni, Exposing the limitations of molecular machine learning with activity cliffs, J. Chem. Inf. Model., vol. 62, no. 23, pp. 5938–5951, 2022.
S. Chmiela, A. Tkatchenko, H. E. Sauceda, I. Poltavsky, K. T. Schütt, and K. R. Müller, Machine learning of accurate energy-conserving molecular force fields, Sci. Adv., vol. 3, no. 5, p. e1603015, 2017.
A. Wojtuch, T. Danel, S. Podlewska, and Ł. Maziarka, Extended study on atomic featurization in graph neural networks for molecular property prediction, J. Cheminform., vol. 15, no. 1, p. 81, 2023.
Z. Zeng, Y. Yao, Z. Liu, and M. Sun, A deep-learning system bridging molecule structure and biomedical text with comprehension comparable to human professionals, Nat. Commun., vol. 13, no. 1, p. 862, 2022.
The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).