An Adaptive Scalable Data Pipeline for Multiclass Attack Classification in Large-Scale IoT Networks

Selvam Saravanan; Uma Maheswari Balasubramanian

doi:10.26599/BDMA.2023.9020027

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

PDF (1.3 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

An Adaptive Scalable Data Pipeline for Multiclass Attack Classification in Large-Scale IoT Networks

Selvam Saravanan(

), Uma Maheswari Balasubramanian

Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Bengaluru 560035, India

Show Author Information

Abstract

The current large-scale Internet of Things (IoT) networks typically generate high-velocity network traffic streams. Attackers use IoT devices to create botnets and launch attacks, such as DDoS, Spamming, Cryptocurrency mining, Phishing, etc. The service providers of large-scale IoT networks need to set up a data pipeline to collect the vast network traffic data from the IoT devices, store it, analyze it, and report the malicious IoT devices and types of attacks. Further, the attacks originating from IoT devices are dynamic, as attackers launch one kind of attack at one time and another kind of attack at another time. The number of attacks and benign instances also vary from time to time. This phenomenon of change in attack patterns is called concept drift. Hence, the attack detection system must learn continuously from the ever-changing real-time attack patterns in large-scale IoT network traffic. To meet this requirement, in this work, we propose a data pipeline with Apache Kafka, Apache Spark structured streaming, and MongoDB that can adapt to the ever-changing attack patterns in real time and classify attacks in large-scale IoT networks. When concept drift is detected, the proposed system retrains the classifier with the instances that cause the drift and a representative subsample instances from the previous training of the model. The proposed approach is evaluated with the latest dataset, IoT23, which consists of benign and several attack instances from various IoT devices. Attack classification accuracy is improved from 97.8% to 99.46% by the proposed system. The training time of distributed random forest algorithm is also studied by varying the number of cores in Apache Spark environment.

Keywords

Internet of Things (IoT)Apache Spark Apache Kafka MongoDB streaming concept drift

References

【1】

Crossref Google Scholar

Big Data Mining and Analytics

Volume 7 Issue 2,
June 2024

Pages 500-511

DOI: 10.26599/BDMA.2023.9020027

	{{item.num}}
{{version.versionName}} Author Response
{{version.versionName}} Review comment

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Cite this Report

. . , , {{reviewData.reportCite.doi}}

Cite this article:

Saravanan S, Maheswari Balasubramanian U. An Adaptive Scalable Data Pipeline for Multiclass Attack Classification in Large-Scale IoT Networks. Big Data Mining and Analytics, 2024, 7(2): 500-511. https://doi.org/10.26599/BDMA.2023.9020027

961

Views

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Received: 20 April 2023

Revised: 15 September 2023

Accepted: 23 September 2023

Published: 22 April 2024

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).