Machine Learning for Selecting Important Clinical Markers of Imaging Subgroups of Cerebral Small Vessel DiseaseBased on a Common Data Model

Lan Lan; Guoliang Hu; Rui Li; Tingting Wang; Lingling Jiang; Jiawei Luo; Zhiwei Ji; Yilong Wang

doi:10.26599/TST.2023.9010092

| Sign up

PDF (2.2 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Open Access

Machine Learning for Selecting Important Clinical Markers of Imaging Subgroups of Cerebral Small Vessel DiseaseBased on a Common Data Model

Lan Lan^¹, Guoliang Hu^², Rui Li^¹, Tingting Wang^², Lingling Jiang^³, Jiawei Luo^⁴, Zhiwei Ji^⁵, Yilong Wang^²()

1IT Center, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China

2Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China

3China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China

4West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu 610044, China

5College of Artificial Intelligence, Nanjing Agricultural University, Nanjing 210095, China

Show Author Information

Abstract

Differences in the imaging subgroups of cerebral small vessel disease (CSVD) need to be further explored. First, we use propensity score matching to obtain balanced datasets. Then random forest (RF) is adopted to classify the subgroups compared with support vector machine (SVM) and extreme gradient boosting (XGBoost), and to select the features. The top 10 important features are included in the stepwise logistic regression, and the odds ratio (OR) and 95% confidence interval (CI) are obtained. There are 41 290 adult inpatient records diagnosed with CSVD. Accuracy and area under curve (AUC) of RF are close to 0.7, which performs best in classification compared to SVM and XGBoost. OR and 95% CI of hematocrit for white matter lesions (WMLs), lacunes, microbleeds, atrophy, and enlarged perivascular space (EPVS) are 0.9875 (0.9857−0.9893), 0.9728 (0.9705−0.9752), 0.9782 (0.9740−0.9824), 1.0093 (1.0081−1.0106), and 0.9716 (0.9597−0.9832). OR and 95% CI of red cell distribution width for WMLs, lacunes, atrophy, and EPVS are 0.9600 (0.9538−0.9662), 0.9630 (0.9559−0.9702), 1.0751 (1.0686−1.0817), and 0.9304 (0.8864−0.9755). OR and 95% CI of platelet distribution width for WMLs, lacunes, and microbleeds are 1.1796 (1.1636−1.1958), 1.1663 (1.1476−1.1853), and 1.0416 (1.0152−1.0687). This study proposes a new analytical framework to select important clinical markers for CSVD with machine learning based on a common data model, which has low cost, fast speed, large sample size, and continuous data sources.

Keywords

common data model machine learning cerebral small vessel disease imaging subgroups clinical markers

References

[1]

J. M. Wardlaw, E. E. Smith, G. J. Biessels, C. Cordonnier, F. Fazekas, R. Frayne, R. I. Lindley, J. T. O'Brien, F. Barkhof, O. R. Benavente, et al., Neuroimaging standards for research into small vessel disease and its contribution to ageing and neurodegeneration, Lancet Neurol., vol. 12, no. 8, pp. 822–838, 2013.