Journal Home > Volume 7 , Issue 2

Windows malware is becoming an increasingly pressing problem as the amount of malware continues to grow and more sensitive information is stored on systems. One of the major challenges in tackling this problem is the complexity of malware analysis, which requires expertise from human analysts. Recent developments in machine learning have led to the creation of deep models for malware detection. However, these models often lack transparency, making it difficult to understand the reasoning behind the model’s decisions, otherwise known as the black-box problem. To address these limitations, this paper presents a novel model for malware detection, utilizing vision transformers to analyze the Operation Code (OpCode) sequences of more than 350000 Windows portable executable malware samples from real-world datasets. The model achieves a high accuracy of 0.9864, not only surpassing the previous results but also providing valuable insights into the reasoning behind the classification. Our model is able to pinpoint specific instructions that lead to malicious behavior in malware samples, aiding human experts in their analysis and driving further advancements in the field. We report our findings and show how causality can be established between malicious code and actual classification by a deep learning model, thus opening up this black-box problem for deeper analysis.


menu
Abstract
Full text
Outline
About this article

Interpretable Detection of Malicious Behavior in Windows Portable Executables Using Multi-Head 2D Transformers

Show Author's information Sohail Khan1Mohammad Nauman1( )
Computer Science Department, Effat College of Engineering, Effat University, Jeddah 23341, Kingdom of Saudi Arabia

Abstract

Windows malware is becoming an increasingly pressing problem as the amount of malware continues to grow and more sensitive information is stored on systems. One of the major challenges in tackling this problem is the complexity of malware analysis, which requires expertise from human analysts. Recent developments in machine learning have led to the creation of deep models for malware detection. However, these models often lack transparency, making it difficult to understand the reasoning behind the model’s decisions, otherwise known as the black-box problem. To address these limitations, this paper presents a novel model for malware detection, utilizing vision transformers to analyze the Operation Code (OpCode) sequences of more than 350000 Windows portable executable malware samples from real-world datasets. The model achieves a high accuracy of 0.9864, not only surpassing the previous results but also providing valuable insights into the reasoning behind the classification. Our model is able to pinpoint specific instructions that lead to malicious behavior in malware samples, aiding human experts in their analysis and driving further advancements in the field. We report our findings and show how causality can be established between malicious code and actual classification by a deep learning model, thus opening up this black-box problem for deeper analysis.

Keywords: machine learning, malware, vision transformers, Windows Protable Executable (PE)

References(42)

[1]

O. Sharma, A. Sharma, and A. Kalia, Windows and IoT malware visualization and classification with deep CNN and xception CNN using Markov images, J. Intell. Inf. Syst., vol. 60, no. 2, pp. 349–375, 2023.

[2]
Microsoft, Global threat activity, https://www.microsoft.com/en-us/wdsi/threats, 2023.
[3]

I. Kara and M. Aydos, The rise of ransomware: Forensic analysis for windows based ransomware attacks, Expert Systems with Applications, vol. 190, p. 116198, 2022.

[4]

A. I. A. Alzahrani, M. Ayadi, M. M. Asiri, A. Al-Rasheed, and A. Ksibi, Detecting the presence of malware and identifying the type of cyber attack using deep learning and VGG-16 techniques, Electronics, vol. 11, no. 22, p. 3665, 2022.

[5]
N. Aggarwal, P. Aggarwal, and R. Gupta, Static malware analysis using PE header files API, in Proc. 2022 6 th Int. Conf. Computing Methodologies and Communication (ICCMC), Erode, India, 2022, pp. 159–162.
DOI
[6]
A. Hussain, M. Asif, M. B. Ahmad, T. Mahmood, and M. A. Raza, Malware detection using machine learning algorithms for windows platform, in Proc. Int. Conf. Information Technology and Applications, Singapore, 2022, pp. 619–632.
DOI
[7]

U. E. H. Tayyab, F. B. Khan, M. H. Durad, A. Khan, and Y. S. Lee, A survey of the recent trends in deep learning based malware detection, J. Cybersecur. Priv., vol. 2, no. 4, pp. 800–829, 2022.

[8]

S. Khan, M. Nauman, S. Ali Alsaif, T. Ali Syed, and H. A. Eleraky, Using capsule networks for android malware detection through orientation-based features, Comput. Mater. Cont., vol. 70, no. 3, pp. 5345–5362, 2022.

[9]

M. Nauta, D. Bucur, and C. Seifert, Causal discovery with attention-based convolutional neural networks, Mach. Learn. Knowl. Extr., vol. 1, no. 1, pp. 312–340, 2019.

[10]
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An image is worth 16x16 words: Transformers for image recognition at scale, in Proc. 9 th Int. Conf. Learning Representations, Vienna, Austria, dio: 10.48550/arXiv. 2010.11929.
[11]

M. Gopinath and S. C. Sethuraman, A comprehensive survey on deep learning based malware detection techniques, Comput. Sci. Rev., vol. 47, p. 100529, 2023.

[12]

X. Ling, L. F. Wu, J. Y. Zhang, Z. Q. Qu, W. Deng, X. Chen, Y. G. Qian, C. M. Wu, S. L. Ji, T. Y. Luo, et al., Adversarial attacks against windows PE malware detection: A survey of the state-of-the-art, Comput. Secur., vol. 128, p. 103134, 2023.

[13]

M. Rhode, P. Burnap, and K. Jones, Early-stage malware prediction using recurrent neural networks, Comput. Secur., vol. 77, pp. 578–594, 2018.

[14]

I. Rosenberg, G. Sicard, and E. David, End-to-end deep neural networks and transfer learning for automatic analysis of nation-state malware, Entropy (Basel), vol. 20, no. 5, p. 390, 2018.

[15]

U. Divakarla, K. H. K. Reddy, and K. Chandrasekaran, A novel approach towards windows malware detection system using deep neural networks, Proc. Comput. Sci., vol. 215, pp. 148–157, 2022.

[16]
H. S. Anderson and P. Roth, EMBER: An open dataset for training static PE malware machine learning models, arXiv preprint arXiv: 1804.04637, 2018.
[17]

C. Li, Q. J. Lv, N. Li, Y. Wang, D. G. Sun, and Y. Y. Qiao, A novel deep framework for dynamic malware detection based on API sequence intrinsic features, Comput. Secur., vol. 116, p. 102686, 2022.

[18]

S. S. Lad and A. C. Adamuthe, Improved deep learning model for static PE files malware detection and classification, Int. J. Comput. Network Inf. Secur., vol. 14, no. 2, pp. 14–26, 2022.

[19]

V. Ravi, M. Alazab, S. Selvaganapathy, and R. Chaganti, A multi-view attention-based deep learning framework for malware detection in smart healthcare systems, Comput. Commun., vol. 195, p. 73–81, 2022.

[20]

F. Xiao, Z. W. Lin, Y. Sun, and Y. Ma, Malware detection based on deep learning of behavior graphs, Mathemat. Problems Eng., vol. 2019, pp. 8195395, 2019.

[21]

Z. H. Cui, F. Xue, X. J. Cai, Y. Cao, G. G. Wang, and J. J. Chen, Detection of malicious code variants based on deep learning, IEEE Trans. Ind. Informat., vol. 14, no. 7, p. 3187–3196, 2018.

[22]
B. Kolosnjaji, A. Zarras, G. Webster, and C. Eckert, Deep learning for classification of malware system call sequences, in Proc. 29 th Australasian Joint Conf. Artificial Intelligence, Hobart, Australia, 2016, pp. 137–149.
DOI
[23]

R. Chaganti, V. Ravi, and T. D. Pham, A multi-view feature fusion approach for effective malware classification using deep learning, J. Inf. Secur. Appl., vol. 72, p. 103402, 2023.

[24]

D. Gibert, J. Planes, C. Mateu, and Q. Le, Fusing feature engineering and deep learning: A case study for malware classification, Exp. Syst. Appl., vol. 207, p. 117957, 2022.

[25]

K. Tong, Y. Q. Wu, and F. Zhou, Recent advances in small object detection based on deep learning: A review, Image Vis. Comput., vol. 97, p. 103910, 2020.

[26]

D. Soydaner, A comparison of optimization algorithms for deep learning, Int. J. Patt. Recognit. Artif. Intell., vol. 34, no. 13, p. 2052013, 2020.

[27]

M. Roodschild, J. Gotay-Sardiñas, and A. Will, A new approach for the vanishing gradient problem on sigmoid activation, Progr. Artif. Intell., vol. 9, no. 4, pp. 351–360, 2020.

[28]
D. Hendrycks and K. Gimpel, Gaussian error linear units (GELUs), arXiv preprint arXiv: 1606.08415, 2016.
[29]
T. Sattler, Q. Zhou, M. Pollefeys, and L. Leal-Taixé, Understanding the limitations of CNN-based absolute camera pose regression, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 3297–3307.
DOI
[30]
S. Sabour, N. Frosst, and G. E. Hinton, Dynamic routing between capsules, in Proc. 31 st Int. Conf. Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 3859–3869.
[31]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, Attention is all you need, in Proc. 31 st Int. Conf. Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 6000–6010.
[32]
J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, in Proc. 2019 Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2018, pp. 4171–4186.
[33]
T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., Language models are few-shot learners, in Proc. 34 th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2020, p. 159.
[34]
A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever, Zero-shot text-to-image generation, in Proc. Int. Conf. Machine Learning.
[35]
OpenAI, ChatGPT: Optimizing language models for dialogue, https://chat.openai.com/chat, 2023.
[36]
H. Chefer, S. Gur, and L. Wolf, Transformer interpretability beyond attention visualization, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Nashville, TN, USA, 2021, pp. 782–791.
DOI
[37]
J. J. Xu, X. Sun, Z. Y. Zhang, G. X. Zhao, and J. Y. Lin, Understanding and improving layer normalization, in Proc. 33 rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, p. 394.
[38]
Abuse. ch, Malware bazaar, https://bazaar.abuse.ch/, 2023.
[39]
HexRays, A powerful disassembler and a versatile debugger, https://hex-rays.com/ida-pro/g, 2023.
[40]
Apache, Arrow: A cross-language development platform for in-memory analytics, https://arrow.apache.org/, 2023.
[41]
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., PyTorch: An imperative style, high-performance deep learning library, in Proc. 33 rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019. p. 721.
[42]
T. O’Malley, E. Bursztein, J. Long, F. Chollet, H. F. Jin, and L. Invernizzi, Kerastuner, https://github.com/keras-team/keras-tuner, 2019.
Publication history
Copyright
Rights and permissions

Publication history

Received: 16 January 2023
Revised: 16 August 2023
Accepted: 08 September 2023
Published: 22 April 2024
Issue date: June 2024

Copyright

© The author(s) 2023.

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return