Unstructured Big Data Threat Intelligence Parallel Mining Algorithm

Zhihua Li; Xinye Yu; Tao Wei; Junhao Qian

doi:10.26599/BDMA.2023.9020032

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

PDF (12.6 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

Unstructured Big Data Threat Intelligence Parallel Mining Algorithm

Zhihua Li^¹, Xinye Yu^¹, Tao Wei^¹, Junhao Qian^²(

)

1School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China

2School of IoT Engineering, Jiangnan University, Wuxi 214122, China

Show Author Information

Abstract

To efficiently mine threat intelligence from the vast array of open-source cybersecurity analysis reports on the web, we have developed the Parallel Deep Forest-based Multi-Label Classification (PDFMLC) algorithm. Initially, open-source cybersecurity analysis reports are collected and converted into a standardized text format. Subsequently, five tactics category labels are annotated, creating a multi-label dataset for tactics classification. Addressing the limitations of low execution efficiency and scalability in the sequential deep forest algorithm, our PDFMLC algorithm employs broadcast variables and the Lempel-Ziv-Welch (LZW) algorithm, significantly enhancing its acceleration ratio. Furthermore, our proposed PDFMLC algorithm incorporates label mutual information from the established dataset as input features. This captures latent label associations, significantly improving classification accuracy. Finally, we present the PDFMLC-based Threat Intelligence Mining (PDFMLC-TIM) method. Experimental results demonstrate that the PDFMLC algorithm exhibits exceptional node scalability and execution efficiency. Simultaneously, the PDFMLC-TIM method proficiently conducts text classification on cybersecurity analysis reports, extracting tactics entities to construct comprehensive threat intelligence. As a result, successfully formatted STIX2.1 threat intelligence is established.

Keywords

unstructured big data mining parallel deep forest multi-label classification algorithm threat intelligence

References

【1】

Crossref Google Scholar

Big Data Mining and Analytics

Volume 7 Issue 2,
June 2024

Pages 531-546

DOI: 10.26599/BDMA.2023.9020032

	{{item.num}}
{{version.versionName}} Author Response
{{version.versionName}} Review comment

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Cite this Report

. . , , {{reviewData.reportCite.doi}}

Cite this article:

Li Z, Yu X, Wei T, et al. Unstructured Big Data Threat Intelligence Parallel Mining Algorithm. Big Data Mining and Analytics, 2024, 7(2): 531-546. https://doi.org/10.26599/BDMA.2023.9020032

1906

Views

422

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Received: 09 August 2023

Revised: 23 October 2023

Accepted: 02 November 2023

Published: 22 April 2024

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).