Journal Home > Volume 26 , Issue 4

Botnets based on the Domain Generation Algorithm (DGA) mechanism pose great challenges to the main current detection methods because of their strong concealment and robustness. However, the complexity of the DGA family and the imbalance of samples continue to impede research on DGA detection. In the existing work, the sample size of each DGA family is regarded as the most important determinant of the resampling proportion; thus, differences in the characteristics of various samples are ignored, and the optimal resampling effect is not achieved. In this paper, a Long Short-Term Memory-based Property and Quantity Dependent Optimization (LSTM.PQDO) method is proposed. This method takes advantage of LSTM to automatically mine the comprehensive features of DGA domain names. It iterates the resampling proportion with the optimal solution based on a comprehensive consideration of the original number and characteristics of the samples to heuristically search for a better solution around the initial solution in the right direction; thus, dynamic optimization of the resampling proportion is realized. The experimental results show that the LSTM.PQDO method can achieve better performance compared with existing models to overcome the difficulties of unbalanced datasets; moreover, it can function as a reference for sample resampling tasks in similar scenarios.


menu
Abstract
Full text
Outline
About this article

DGA-Based Botnet Detection Toward Imbalanced Multiclass Learning

Show Author's information Yijing ChenBo PangGuolin Shao( )Guozhu WenXingshu Chen
College of Cybersecurity, Sichuan University, Chengdu 610065, China.
Cybersecurity Research Institute, Sichuan University, Chengdu 610065, China.

Abstract

Botnets based on the Domain Generation Algorithm (DGA) mechanism pose great challenges to the main current detection methods because of their strong concealment and robustness. However, the complexity of the DGA family and the imbalance of samples continue to impede research on DGA detection. In the existing work, the sample size of each DGA family is regarded as the most important determinant of the resampling proportion; thus, differences in the characteristics of various samples are ignored, and the optimal resampling effect is not achieved. In this paper, a Long Short-Term Memory-based Property and Quantity Dependent Optimization (LSTM.PQDO) method is proposed. This method takes advantage of LSTM to automatically mine the comprehensive features of DGA domain names. It iterates the resampling proportion with the optimal solution based on a comprehensive consideration of the original number and characteristics of the samples to heuristically search for a better solution around the initial solution in the right direction; thus, dynamic optimization of the resampling proportion is realized. The experimental results show that the LSTM.PQDO method can achieve better performance compared with existing models to overcome the difficulties of unbalanced datasets; moreover, it can function as a reference for sample resampling tasks in similar scenarios.

Keywords: botnet, Domain Generation Algorithm (DGA), multiclass imbalance, resampling

References(21)

[1]
N. Hoque, D. K. Bhattacharyya, and J. K. Kalita, Botnet in DDoS attacks: Trends and challenges, IEEE Commun. Surv. Tutor., vol. 17, no. 4, pp. 2242-2270, 2015.
[2]
C. L. Zhou, K. Chen, X. X. Gong, P. Chen, and H. Ma, Detection of fast-flux domains based on passive DNS analysis, (in Chinese), Acta Sci. Natur. Univ. Pekinensis, vol. 52, no. 3, pp. 396-402, 2016.
[3]
C. D. Chang and H. T. Lin, On similarities of string and query sequence for DGA botnet detection, in Proc. 2018 Int. Conf. on Information Networking, Chiang Mai, Thailand, 2018, pp. 104-109.
[4]
J. Kwon, J. Lee, H. Lee, and A. Perrig, PsyBoG: A scalable botnet detection method for large-scale DNS traffic, Comput Networks, vol. 97, pp. 48-73, 2016.
[5]
S. Yadav, A. K. K. Reddy, A. L. N. Reddy, and S. Ranjan, Detecting algorithmically generated domain-flux attacks with DNS traffic analysis, IEEE/ACM Trans. Netw., vol. 20, no. 5, pp. 1663-1677, 2012.
[6]
S. Schiavoni, F. Maggi, L. Cavallaro, and S. Zanero, Phoenix: DGA-based botnet tracking and intelligence, presented at 11th Int. Conf. on Detection of Intrusions and Malware, and Vulnerability Assessment, Egham, UK, 2014, pp. 192-211.
[7]
D. T. Truong and G. Cheng, Detecting domain-flux botnet based on DNS traffic features in managed network, Secur. Commun. Networks, vol. 9, no. 14, 2016, pp. 2338-2347.
[8]
V. Tong and G. Nguyen, A method for detecting DGA botnet based on semantic and cluster analysis, in Proc. Seventh Symp. on Information and Communication Technology, Ho Chi Minh City, Vietnam, 2016, pp. 272-277.
[9]
J. Mathew, M. Luo, C. K. Pang, and H. L. Chan, Kernel-based SMOTE for SVM classification of imbalanced datasets, in Proc. 41st Conf. of the IEEE Industrial Electronics Society, Yokohama, Japan, 2015, pp. 1127-1132.
[10]
W. C. Lin, C. F. Tsai, Y. H. Hu, and J. S. Jhang, Clustering-based undersampling in class-imbalanced data, Inf Sci, vol. 409-410, pp. 17-26, 2017.
[11]
J. Ha and J. S. Lee, A new under-sampling method using genetic algorithm for imbalanced data classification, presented at 10th Int. Conf. on Ubiquitous Information Management and Communication, Danang, Vietnam, 2016.
[12]
S. Gazzah, A. Hechkel, and N. E. B. Amara, A hybrid sampling method for imbalanced data, in Proc. 2015 IEEE 12th Int. Multi-Conference on Systems, Signals & Devices, Mahdia, Tunisia, 2015, pp. 1-6.
[13]
D. Tran, H. Mac, V. Tong, H. A. Tran, and L. G. Nguyen, A LSTM based framework for handling multiclass imbalance in DGA botnet detection, Neurocomputing, vol. 275, pp. 2401-2413, 2018.
[14]
Y. C. Chen, Y. J. Li, A. Tseng, and T. Lin, Deep learning for malicious flow detection, arXiv preprint arXiv: 1802.03358, 2018.
[15]
J. Woodbridge, H. S. Anderson, A. Ahuja, and D. Grant, Predicting domain generation algorithms with long short-term memory networks, arXiv preprint arXiv: 1611.00791, 2016.
[16]
Y. Li, K. Q. Xiong, T. Chin, and C. Hu, A machine learning framework for domain generation algorithm-based malware detection, IEEE Access, vol. 7, pp. 32 765-32 782, 2019.
[17]
F. Zeng, S. Chang, and X. C. Wan, Classification for DGA-based malicious domain names with deep learning architectures, Int. J. Intell. Inf. Syst., vol. 6, no. 6, pp. 67-71, 2017.
[18]
B. Athiwaratkun and J. W. Stokes, Malware classification with LSTM and GRU language models and a character-level CNN, in Proc. 2017 IEEE Int. Conf. on Acoustics, Speech and Signal Processing, New Orleans, LA, USA, 2017, pp. 2482-2486.
[19]
B. Yu, J. Pan, J. M. Hu, A. Nascimento, and M. De Cock, Character level based detection of DGA domain names, in Proc. 2018 Int. Joint Conf. on Neural Networks, Rio de Janeiro, Brazil, 2018, pp. 1-8.
[20]
L. L. Gao, Z. Guo, H. W. Zhang, X. Xu, and H. T. Shen, Video captioning with attention-based LSTM and semantic consistency, IEEE Trans. Multimed., vol. 19, no. 9, pp. 2045-2055, 2017.
[21]
Bambenek Consulting-Master feeds, http://osint.bambenekconsulting.com/feeds/, 2019.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 04 October 2019
Accepted: 05 November 2019
Published: 04 January 2021
Issue date: August 2021

Copyright

© The author(s) 2021

Acknowledgements

This work was partially funded by the National Natural Science Foundation of China (No. 61272447), the National Entrepreneurship & Innovation Demonstration Base of China (No. C700011), and the Key Research & Development Project of Sichuan Province of China (No. 2018G20100).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return