Journal Home > Volume 3 , Issue 3

With vast amounts of data being generated daily and the ever increasing interconnectivity of the world’s internet infrastructures, a machine learning based Intrusion Detection Systems (IDS) has become a vital component to protect our economic and national security. Previous shallow learning and deep learning strategies adopt the single learning model approach for intrusion detection. The single learning model approach may experience problems to understand increasingly complicated data distribution of intrusion patterns. Particularly, the single deep learning model may not be effective to capture unique patterns from intrusive attacks having a small number of samples. In order to further enhance the performance of machine learning based IDS, we propose the Big Data based Hierarchical Deep Learning System (BDHDLS). BDHDLS utilizes behavioral features and content features to understand both network traffic characteristics and information stored in the payload. Each deep learning model in the BDHDLS concentrates its efforts to learn the unique data distribution in one cluster. This strategy can increase the detection rate of intrusive attacks as compared to the previous single learning model approaches. Based on parallel training strategy and big data techniques, the model construction time of BDHDLS is reduced substantially when multiple machines are deployed.


menu
Abstract
Full text
Outline
About this article

Applying Big Data Based Deep Learning System to Intrusion Detection

Show Author's information Wei Zhong( )Ning YuChunyu Ai
Division of Math and Computer Science, University of South Carolina Upstate, Spartanburg, SC 29303, USA.
Department of Computing Sciences, State University of New York College at Brockport, Brockport, NY 14420, USA.

Abstract

With vast amounts of data being generated daily and the ever increasing interconnectivity of the world’s internet infrastructures, a machine learning based Intrusion Detection Systems (IDS) has become a vital component to protect our economic and national security. Previous shallow learning and deep learning strategies adopt the single learning model approach for intrusion detection. The single learning model approach may experience problems to understand increasingly complicated data distribution of intrusion patterns. Particularly, the single deep learning model may not be effective to capture unique patterns from intrusive attacks having a small number of samples. In order to further enhance the performance of machine learning based IDS, we propose the Big Data based Hierarchical Deep Learning System (BDHDLS). BDHDLS utilizes behavioral features and content features to understand both network traffic characteristics and information stored in the payload. Each deep learning model in the BDHDLS concentrates its efforts to learn the unique data distribution in one cluster. This strategy can increase the detection rate of intrusive attacks as compared to the previous single learning model approaches. Based on parallel training strategy and big data techniques, the model construction time of BDHDLS is reduced substantially when multiple machines are deployed.

Keywords: deep learning, intrusion detection, convolution neural network, fully connected feedforward neural network, multi-level clustering algorithm

References(45)

[1]
Homeland Security Council, National strategy for homeland security, https://www.dhs.gov/xlibrary/assets/nat_strat_homelandsecurity_2007.pdf, 2007.
[2]
S. Dua and X Du, Data Mining and Machine Learning in Cybersecurity. Boston, MA, USA: Auerbach Publications, 2011.
[3]
K. Kim and M. E. Aminanto, Deep learning in intrusion detection perspective: Overview and further challenges, in Proc. 2017 Int. Workshop on Big Data and Information Security (IWBIS), Jakarta, Indonesia, 2017, pp. 5-10.
DOI
[4]
A. L. Buczak and E. Guven, A survey of data mining and machine learning methods for cyber security intrusion detection, IEEE Commun. Surv. Tutor., vol. 18, no. 2, pp. 1153-1176, 2016.
[5]
C. A. Catania and C. G. Garino, Automatic network intrusion detection: Current techniques and open issues, Comput. Electr. Eng., vol. 38, no. 5, pp. 1062-1072, 2012.
[6]
G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. W. M. Van Der Laak, B. Van Ginneken, and C. I. Sánchez, A survey on deep learning in medical image analysis, Med. Image Anal., vol. 42, pp. 60-88, 2017.
[7]
E. Hodo, X. Bellekens, A. Hamilton, C. Tachtatzis, and R. Atkinson, Shallow and deep networks intrusion detection system: A taxonomy and survey, arXiv preprint arXiv: 1701.02145, 2017.
[8]
B. Chandra and R. K. Sharma, Deep learning with adaptive learning rate using laplacian score, Exp. Syst. Appl., vol. 63, pp. 1-7, 2016.
[9]
Y. C. Li, X. Q. Nie, and R. Huang, Web spam classification method based on deep belief networks, Exp. Syst. Appl., vol. 96, pp. 261-270, 2018.
[10]
Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol. 521, no. 7553, pp. 436-444, 2015.
[11]
M. Papakostas and T. Giannakopoulos, Speech-music discrimination using deep visual feature extractors, Exp. Syst. Appl., vol. 114, pp. 334-344, 2018.
[12]
Y. Yu, J. Long, and Z. P. Cai, Network intrusion detection through stacking dilated convolutional autoencoders, Secur. Commun. Networks, vol. 2017, p. 4184196, 2017.
[13]
T. T. H. Le, J. Kim, and H. Kim, An effective intrusion detection classifier using long short-term memory with gradient descent optimization, in Proc. 2017 Int. Conf. Platform Technology and Service (PlatCon), Busan, South Korea, 2017, pp. 1-6.
DOI
[14]
A. F. M. Agarap, A neural network architecture combining gated recurrent unit (GRU) and support vector machine (SVM) for intrusion detection in network traffic data, in Proc. 10th Int. Conf. Machine Learning and Computing, Macau, China, 2018, pp. 26-30.
DOI
[15]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, in Proc. 25th Int. Conf. Neural Information Processing Systems, Lake Tahoe, NV, USA, 2012, pp. 1097-1105.
[16]
A. Shiravi, H. Shiravi, M. Tavallaee, and A. A. Ghorbani, Toward developing a systematic approach to generate benchmark datasets for intrusion detection, Comput. Secur., vol. 31, no. 3, pp. 357-374, 2012.
[17]
W. Wang, Y. Q. Sheng, J. L. Wang, X. W. Zeng, X. Z. Ye, Y. Z. Huang, and M. Zhu, HAST-IDS: Learning hierarchical spatial-temporal features using deep neural networks to improve intrusion detection, IEEE Access, vol. 6, pp. 1792-1806, 2017.
[18]
E. Alpaydm, Combined 5 × 2 cv F test for comparing supervised classification learning algorithms, Neural Comput., vol. 11, no. 8, pp. 1885-1892, 1999.
[19]
P. Baldi, S. Brunak, Y. Chauvin, C. A. F. Andersen, and H. Nielsen, Assessing the accuracy of prediction algorithms for classification: An overview, Bioinformatics, vol. 16, no. 5, pp. 412-424, 2000.
[20]
N. Shone, T. N. Ngoc, V. D. Phai, and Q. Shi, A deep learning approach to network intrusion detection, IEEE Trans. Emerg. Top. Comput. Intell., vol. 2, no. 1, pp. 41-50, 2018.
[21]
U. Fiore, F. Palmieri, A. Castiglione, and A. De Santis, Network anomaly detection with the restricted boltzmann machine, Neurocomputing, vol. 122, pp. 13-23, 2013.
[22]
J. Schmidhuber, Deep learning in neural networks: An overview, Neural Networks, vol. 61, pp. 85-117, 2015.
[23]
R. Vinayakumar, M. Alazab, K. P. Soman, P. Poornachandran, A. Al-Nemrat, and S. Venkatraman, Deep learning approach for intelligent intrusion detection system, IEEE Access, vol. 7, pp. 41525-41550, 2019.
[24]
S. M. Kasongo and Y. X. Sun, A deep learning method with filter based feature engineering for wireless intrusion detection system, IEEE Access, vol. 7, pp. 38597-38607, 2019.
[25]
P. Nagar, H. K. Menaria, and M. Tiwari, Novel approach of intrusion detection classification deeplearning using SVM, presented at First International Conference on Sustainable Technologies for Computational Intelligence, Singapore, 2020, pp. 365-381.
DOI
[26]
M. Akter, G. D. Dip, M. S. Mira, M. A. Hamid, and M. Mridha, Construing attacks of internet of things (IoT) and a prehensile intrusion detection system for anomaly detection using deep learning approach, presented at International Conference on Innovative Computing and Communications: Proceedings of ICICC 2019, Singapore, 2020, pp. 427-438.
DOI
[27]
Z. Q. Liu, M. U. D. Ghulam, Y. Zhu, X. L. Yan, L. F. Wang, Z. J. Jiang, and J. C. Luo, Deep learning approach for ids, presented at Fourth International Congress on Information and Communication Technology: ICICT 2019, Singapore, 2020, pp. 471-479.
DOI
[28]
C. Sekhar and K. V. Rao, A study: Machine learning and deep learning approaches for intrusion detection system, presented at Int. Conf. Computer Networks and Inventive Communication Technologies, Coimbatore, India, 2019, pp. 845-849.
DOI
[29]
G. Nguyen, S. Dlugolinsky, V. Tran, and A. L. García, Deep learning for proactive network monitoring and security protection, IEEE Access, vol. 8, pp. 19696-19716, 2020.
[30]
A. Abusitta, M. Bellaiche, M. Dagenais, and T. Halabi, A deep learning approach for proactive multi-cloud cooperative intrusion detection system, Future Generation Comput. Syst., vol. 98, pp. 308-318, 2019.
[31]
A. Liu and B. Sun, An intrusion detection system based on a quantitative model of interaction mode between ports, IEEE Access, vol. 7, pp. 161725-161740, 2019.
[32]
T. Aldwairi, D. Perera, and M. A. Novotny, An evaluation of the performance of restricted boltzmann machines as a model for anomaly network intrusion detection, Comput. Networks, vol. 144, pp. 111-119, 2018.
[33]
[34]
W. Zhong and F. Gu, A multi-level deep learning system for malware detection, Exp. Syst. Appl., vol. 133, pp. 151-162, 2019.
[35]
J. W. Han and M. Kamber, Data Mining: Concepts and Techniques. San Francisco, CA, USA: Elsevier, 2011.
[36]
S. K. Gupta, K. S. Rao, and V. Bhatnagar, K-means clustering algorithm for categorical attributes, in Proc. 1st Int. Conf. Data Warehousing and Knowledge Discovery, Berlin, Germany: Springer, 1999, pp. 203-208.
DOI
[37]
S. Owen, R. Anil, T. Dunning, and E. Friedman, Mahout in Action. Shelter Island, NY, USA: Manning Publications, 2011.
[38]
W. Zhong, G. Altun, R. Harrison, P. C. Tai, and Y. Pan, Improved K-means clustering algorithm for exploring local protein sequence motifs representing common structural property, IEEE Trans. Nanobioscience, vol. 4, no. 3, pp. 255-265, 2005.
[39]
L. D. Gibert, Convolutional neural networks for malware classification, Master dissertation, Universitat Politècnica de Catalunya, Tarragona, Spain, 2016.
[40]
M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, A detailed analysis of the KDD CUP 99 data set, in Proc. 2009 IEEE Symp. Computational Intelligence for Security and Defense Applications, Ottawa, Canada, 2009, pp. 1-6.
DOI
[41]
J. Song, H. Takakura, and Y. Okabe, Description of Kyoto University benchmark data, http://www.takakura.com/Kyoto_data/BenchmarkData-Description-v5.pdf, 2006.
[42]
R. Lippmann, R. K. Cunningham, D. J. Fried, I. Graf, K. R. Kendall, S. E. Webster, and M. A. Zissman, Results of the DARPA 1998 offline intrusion detection evaluation, presented at Recent Advances in Intrusion Detection: 4th International Symposium, New York, NY, USA, 1999, pp. 829-835.
[43]
I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, Toward generating a new intrusion detection dataset and intrusion traffic characterization, in Proc. 4th Int. Conf. Information Systems Security and Privacy (ICISSP), Funchal, Portugal, 2018, pp. 108-116.
DOI
[44]
X. Chen, A simple utility to classify packets into flows, https://github.com/caesar0301/pkt2flow, 2017.
[45]
M. H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita, Network anomaly detection: Methods, systems and tools, IEEE Commun. Surv. Tutor., vol. 16, no. 1, pp. 303-336, 2014.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 08 March 2020
Revised: 27 March 2020
Accepted: 30 March 2020
Published: 16 July 2020
Issue date: September 2020

Copyright

© The author(s) 2020

Acknowledgements

This work was partially supported by Research Initiative for Summer Engagement (RISE) from the Office of the Vice President for Research at University of South Carolina.

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return