Open Access Issue
Unstructured Big Data Threat Intelligence Parallel Mining Algorithm
Big Data Mining and Analytics 2024, 7 (2): 531-546
Published: 22 April 2024
Abstract PDF (12.6 MB) Collect

To efficiently mine threat intelligence from the vast array of open-source cybersecurity analysis reports on the web, we have developed the Parallel Deep Forest-based Multi-Label Classification (PDFMLC) algorithm. Initially, open-source cybersecurity analysis reports are collected and converted into a standardized text format. Subsequently, five tactics category labels are annotated, creating a multi-label dataset for tactics classification. Addressing the limitations of low execution efficiency and scalability in the sequential deep forest algorithm, our PDFMLC algorithm employs broadcast variables and the Lempel-Ziv-Welch (LZW) algorithm, significantly enhancing its acceleration ratio. Furthermore, our proposed PDFMLC algorithm incorporates label mutual information from the established dataset as input features. This captures latent label associations, significantly improving classification accuracy. Finally, we present the PDFMLC-based Threat Intelligence Mining (PDFMLC-TIM) method. Experimental results demonstrate that the PDFMLC algorithm exhibits exceptional node scalability and execution efficiency. Simultaneously, the PDFMLC-TIM method proficiently conducts text classification on cybersecurity analysis reports, extracting tactics entities to construct comprehensive threat intelligence. As a result, successfully formatted STIX2.1 threat intelligence is established.

Open Access Issue
Energy-efficient multiuser and multitask computation offloading optimization method
Intelligent and Converged Networks 2023, 4 (1): 76-92
Published: 20 March 2023
Abstract PDF (1.6 MB) Collect

For dynamic application scenarios of Mobile Edge Computing (MEC), an Energy-efficient Multiuser and Multitask Computation Offloading (EMMCO) optimization method is proposed. Under the consideration of multiuser and multitask computation offloading, first, the EMMCO method takes into account the existence of dependencies among different tasks within an implementation, abstracts these dependencies as a Directed Acyclic Graph (DAG), and models the computation offloading problem as a Markov decision process. Subsequently, the task embedding sequence in the DAG is fed to the RNN encoder-decoder neural network with combination of the attention mechanism, the long-term dependencies among different tasks are successfully captured by this scheme. Finally, the Improved Policy Loss Clip-based PPO2 (IPLC-PPO2) algorithm is developed, and the RNN encoder-decoder neural network is trained by the developed algorithm. The loss function in the IPLC-PPO2 algorithm is utilized as a preference for the training process, and the neural network parameters are continuously updated to select the optimal offloading scheduling decisions. Simulation results demonstrate that the proposed EMMCO method can achieve lower latency, reduce energy consumption, and obtain a significant improvement in the Quality of Service (QoS) than the compared algorithms under different situations of mobile edge network.

Open Access Issue
Multimodal Adaptive Identity-Recognition Algorithm Fused with Gait Perception
Big Data Mining and Analytics 2021, 4 (4): 223-232
Published: 26 August 2021
Abstract PDF (5.3 MB) Collect

Identity-recognition technologies require assistive equipment, whereas they are poor in recognition accuracy and expensive. To overcome this deficiency, this paper proposes several gait feature identification algorithms. First, in combination with the collected gait information of individuals from triaxial accelerometers on smartphones, the collected information is preprocessed, and multimodal fusion is used with the existing standard datasets to yield a multimodal synthetic dataset; then, with the multimodal characteristics of the collected biological gait information, a Convolutional Neural Network based Gait Recognition (CNN-GR) model and the related scheme for the multimodal features are developed; at last, regarding the proposed CNN-GR model and scheme, a unimodal gait feature identity single-gait feature identification algorithm and a multimodal gait feature fusion identity multimodal gait information algorithm are proposed. Experimental results show that the proposed algorithms perform well in recognition accuracy, the confusion matrix, and the kappa statistic, and they have better recognition scores and robustness than the compared algorithms; thus, the proposed algorithm has prominent promise in practice.

Open Access Issue
Survey on Encoding Schemes for Genomic Data Representation and Feature Learning—From Signal Processing to Machine Learning
Big Data Mining and Analytics 2018, 1 (3): 191-210
Published: 24 May 2018
Abstract PDF (3.2 MB) Collect

Data-driven machine learning, especially deep learning technology, is becoming an important tool for handling big data issues in bioinformatics. In machine learning, DNA sequences are often converted to numerical values for data representation and feature learning in various applications. Similar conversion occurs in Genomic Signal Processing (GSP), where genome sequences are transformed into numerical sequences for signal extraction and recognition. This kind of conversion is also called encoding scheme. The diverse encoding schemes can greatly affect the performance of GSP applications and machine learning models. This paper aims to collect, analyze, discuss, and summarize the existing encoding schemes of genome sequence particularly in GSP as well as other genome analysis applications to provide a comprehensive reference for the genomic data representation and feature learning in machine learning.

Total 4