Discover the SciOpen Platform and Achieve Your Research Goals with Ease.
Search articles, authors, keywords, DOl and etc.
To efficiently mine threat intelligence from the vast array of open-source cybersecurity analysis reports on the web, we have developed the Parallel Deep Forest-based Multi-Label Classification (PDFMLC) algorithm. Initially, open-source cybersecurity analysis reports are collected and converted into a standardized text format. Subsequently, five tactics category labels are annotated, creating a multi-label dataset for tactics classification. Addressing the limitations of low execution efficiency and scalability in the sequential deep forest algorithm, our PDFMLC algorithm employs broadcast variables and the Lempel-Ziv-Welch (LZW) algorithm, significantly enhancing its acceleration ratio. Furthermore, our proposed PDFMLC algorithm incorporates label mutual information from the established dataset as input features. This captures latent label associations, significantly improving classification accuracy. Finally, we present the PDFMLC-based Threat Intelligence Mining (PDFMLC-TIM) method. Experimental results demonstrate that the PDFMLC algorithm exhibits exceptional node scalability and execution efficiency. Simultaneously, the PDFMLC-TIM method proficiently conducts text classification on cybersecurity analysis reports, extracting tactics entities to construct comprehensive threat intelligence. As a result, successfully formatted STIX2.1 threat intelligence is established.
N. Sun, M. Ding, J. Jiang, W. Xu, X. Mo, Y. Tai, and J. Zhang, Cyber threat intelligence mining for proactive cybersecurity defense: A survey and new perspectives, IEEE Commun. Surv. Tut., vol. 25, no. 3, pp. 1748–1774, 2023.
G. Wang, H. Peng, Y. W. Tang, and Y. Q. Jin, Error repair technology of Lempel-Ziv-Welch (LZW) compression data, (in Chinese), Trans. Beijing Inst. Technol., vol. 40, no. 5, pp. 562–569, 2020.
R. Rahim, M. Dahria, M. Syahril, and B. Anwar, Combination of the Blowfish and Lempel-Ziv-Welch algorithms for text compression, World Trans. Eng. Technol. Educ., vol. 15, no. 3, pp. 292–297, 2017.
P. E. Latham and Y. Roudi, Mutual information, Scholarpedia, vol. 4, no. 1, p. 1658, 2009.
M. Zbili and S. Rama, A quick and easy way to estimate entropy and mutual information for neuroscience, Front. Neuroinform., vol. 15, p. 596443, 2021.
J. Deng, G. Y. Shi, T. H. Cai, J. Zhu, and L. B. Huai, Research on the method of filling of the incomplete poems of famous monks in the tang dynasty based on TF-IDF, (in Chinese), Mod. Comput., vol. 25, no. 8, pp. 7–11&15, 2019.
E. P. Xing, Q. Ho, P. Xie, and D. Wei, Strategies and principles of distributed machine learning on big data, Engineering, vol. 2, no. 2, pp. 179–195, 2016.
J. X. Shao, Y. N. Xing, F. Z. Nan, X. Zhao, T. H. Ma, and Y. R. Qian, Improved CK-means+algorithm and parallel implementation, (in Chinese), Comput. Eng. Des., vol. 43, no. 5, pp. 1240–1248, 2022.
L. Breiman, Random forests, Mach. Learn., vol. 45, no. 1, pp. 5–32, 2001.
P. Geurts, D. Ernst, and L. Wehenkel, Extremely randomized trees, Mach. Learn., vol. 63, no. 1, pp. 3–42, 2006.
Y. Mao, J. Geng, and L. Chen, Improved parallel deep forest algorithm combining with information theory, (in Chinese), Comput. Eng. Appl., vol. 58, no. 7, pp. 106–115, 2022.
S. R. Gunn, Support vector machines for classification and regression, Technical report, https://see.xidian.edu.cn/faculty/chzheng/bishe/indexfiles/new_folder/svm.pdf, 2023.
H. Wang, K. Qin, G. Duan, and G. Luo, Denoising graph inference network for document-level relation extraction, Big Data Mining and Analytics, vol. 6, no. 2, pp. 248–262, 2023.
Y. Huo, J. Fan, Y. Wen, and R. Li, A cross-layer cooperative jamming scheme for social internet of things, Tsinghua Science and Technology, vol. 26, no. 4, pp. 523–535, 2021.
M. Moutaib, T. Ahajjam, M. Fattah, Y. Farhaoui, B. Aghoutane, and M. El Bekkali, Application of internet of things in the health sector: Toward minimizing energy consumption, Big Data Mining and Analytics, vol. 5, no. 4, pp. 302–308, 2022.
949
Views
327
Downloads
2
Crossref
0
Web of Science
2
Scopus
0
CSCD
Altmetrics
The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).