DGA-Based Botnet Detection Toward Imbalanced Multiclass Learning

Yijing Chen; Bo Pang; Guolin Shao; Guozhu Wen; Xingshu Chen

doi:10.26599/TST.2020.9010021

| Sign up

PDF (10 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Open Access

DGA-Based Botnet Detection Toward Imbalanced Multiclass Learning

Yijing Chen, Bo Pang, Guolin Shao(), Guozhu Wen, Xingshu Chen

College of Cybersecurity, Sichuan University, Chengdu 610065, China.

Cybersecurity Research Institute, Sichuan University, Chengdu 610065, China.

Show Author Information

An erratum to this article is available online at:

https://doi.org/10.26599/TST.2021.9010004

Abstract

Botnets based on the Domain Generation Algorithm (DGA) mechanism pose great challenges to the main current detection methods because of their strong concealment and robustness. However, the complexity of the DGA family and the imbalance of samples continue to impede research on DGA detection. In the existing work, the sample size of each DGA family is regarded as the most important determinant of the resampling proportion; thus, differences in the characteristics of various samples are ignored, and the optimal resampling effect is not achieved. In this paper, a Long Short-Term Memory-based Property and Quantity Dependent Optimization (LSTM.PQDO) method is proposed. This method takes advantage of LSTM to automatically mine the comprehensive features of DGA domain names. It iterates the resampling proportion with the optimal solution based on a comprehensive consideration of the original number and characteristics of the samples to heuristically search for a better solution around the initial solution in the right direction; thus, dynamic optimization of the resampling proportion is realized. The experimental results show that the LSTM.PQDO method can achieve better performance compared with existing models to overcome the difficulties of unbalanced datasets; moreover, it can function as a reference for sample resampling tasks in similar scenarios.

Keywords

botnet Domain Generation Algorithm (DGA)multiclass imbalance resampling

References

[1]

Hoque

, D. K.

Bhattacharyya

, and J. K.

Kalita

, Botnet in DDoS attacks: Trends and challenges, IEEE Commun. Surv. Tutor., vol. 17, no. 4, pp. 2242-2270, 2015.

Google Scholar

[2]

C. L.

Zhou

, K.

Chen

, X. X.

Gong

, P.

Chen

, and H.

, Detection of fast-flux domains based on passive DNS analysis, (in Chinese), Acta Sci. Natur. Univ. Pekinensis, vol. 52, no. 3, pp. 396-402, 2016.

Google Scholar

[3]

C. D.

Chang

and H. T.

Lin

, On similarities of string and query sequence for DGA botnet detection, in Proc. 2018 Int. Conf. on Information Networking, Chiang Mai, Thailand, 2018, pp. 104-109.

[4]

Kwon

, J.

Lee

, H.

Lee

, and A.

Perrig

, PsyBoG: A scalable botnet detection method for large-scale DNS traffic, Comput Networks, vol. 97, pp. 48-73, 2016.

Google Scholar

[5]

Yadav

, A. K. K.

Reddy

, A. L. N.

Reddy

, and S.

Ranjan

, Detecting algorithmically generated domain-flux attacks with DNS traffic analysis, IEEE/ACM Trans. Netw., vol. 20, no. 5, pp. 1663-1677, 2012.

Google Scholar

[6]

Schiavoni

, F.

Maggi

, L.

Cavallaro

, and S.

Zanero

, Phoenix: DGA-based botnet tracking and intelligence, presented at 11th Int. Conf. on Detection of Intrusions and Malware, and Vulnerability Assessment, Egham, UK, 2014, pp. 192-211.

[7]

D. T.

Truong

and G.

Cheng

, Detecting domain-flux botnet based on DNS traffic features in managed network, Secur. Commun. Networks, vol. 9, no. 14, 2016, pp. 2338-2347.

Google Scholar

[8]

Tong

and G.

Nguyen

, A method for detecting DGA botnet based on semantic and cluster analysis, in Proc. Seventh Symp. on Information and Communication Technology, Ho Chi Minh City, Vietnam, 2016, pp. 272-277.

[9]

Mathew

, M.

Luo

, C. K.

Pang

, and H. L.

Chan

, Kernel-based SMOTE for SVM classification of imbalanced datasets, in Proc. 41st Conf. of the IEEE Industrial Electronics Society, Yokohama, Japan, 2015, pp. 1127-1132.

[10]

W. C.

Lin

, C. F.

Tsai

, Y. H.

, and J. S.

Jhang

, Clustering-based undersampling in class-imbalanced data, Inf Sci, vol. 409-410, pp. 17-26, 2017.

Google Scholar

[11]

and J. S.

Lee

, A new under-sampling method using genetic algorithm for imbalanced data classification, presented at 10th Int. Conf. on Ubiquitous Information Management and Communication, Danang, Vietnam, 2016.

[12]

Gazzah

, A.

Hechkel

, and N. E. B.

Amara

, A hybrid sampling method for imbalanced data, in Proc. 2015 IEEE 12th Int. Multi-Conference on Systems, Signals & Devices, Mahdia, Tunisia, 2015, pp. 1-6.

[13]

Tran

, H.

Mac

, V.

Tong

, H. A.

Tran

, and L. G.

Nguyen

, A LSTM based framework for handling multiclass imbalance in DGA botnet detection, Neurocomputing, vol. 275, pp. 2401-2413, 2018.

Google Scholar

[14]

Y. C.

Chen

, Y. J.

, A.

Tseng

, and T.

Lin

, Deep learning for malicious flow detection, arXiv preprint arXiv: 1802.03358, 2018.

[15]

Woodbridge

, H. S.

Anderson

, A.

Ahuja

, and D.

Grant

, Predicting domain generation algorithms with long short-term memory networks, arXiv preprint arXiv: 1611.00791, 2016.

[16]

, K. Q.

Xiong

, T.

Chin

, and C.

, A machine learning framework for domain generation algorithm-based malware detection, IEEE Access, vol. 7, pp. 32 765-32 782, 2019.

Google Scholar

[17]

Zeng

, S.

Chang

, and X. C.

Wan

, Classification for DGA-based malicious domain names with deep learning architectures, Int. J. Intell. Inf. Syst., vol. 6, no. 6, pp. 67-71, 2017.

Google Scholar

[18]

Athiwaratkun

and J. W.

Stokes

, Malware classification with LSTM and GRU language models and a character-level CNN, in Proc. 2017 IEEE Int. Conf. on Acoustics, Speech and Signal Processing, New Orleans, LA, USA, 2017, pp. 2482-2486.

[19]

, J.

Pan

, J. M.

, A.

Nascimento

, and M.

De Cock

, Character level based detection of DGA domain names, in Proc. 2018 Int. Joint Conf. on Neural Networks, Rio de Janeiro, Brazil, 2018, pp. 1-8.

[20]

L. L.

Gao

, Z.

Guo

, H. W.

Zhang

, X.

, and H. T.

Shen

, Video captioning with attention-based LSTM and semantic consistency, IEEE Trans. Multimed., vol. 19, no. 9, pp. 2045-2055, 2017.

Google Scholar

[21]

Bambenek Consulting-Master feeds, http://osint.bambenekconsulting.com/feeds/, 2019.

Tsinghua Science and Technology

Volume 26 Issue 4,
August 2021

Pages 387-402

DOI: 10.26599/TST.2020.9010021

Cite this article:

Chen Y, Pang B, Shao G, et al. DGA-Based Botnet Detection Toward Imbalanced Multiclass Learning. Tsinghua Science and Technology, 2021, 26(4): 387-402. https://doi.org/10.26599/TST.2020.9010021