SmartEagleEye: A Cloud-Oriented Webshell Detection System Based on Dynamic Gray-Box and Deep Learning

Xin Liu; Yingli Zhang; Qingchen Yu; Jiajun Min; Jun Shen; Rui Zhou; Qingguo Zhou

doi:10.26599/TST.2023.9010020

Tsinghua Science and Technology 2024, 29(3): 766-783 https://doi.org/10.26599/TST.2023.9010020

Open Access | Issue | Published: 04 December 2023

SmartEagleEye: A Cloud-Oriented Webshell Detection System Based on Dynamic Gray-Box and Deep Learning

Show Author's Information Hide Author's Information Xin Liu^¹, Yingli Zhang^¹, Qingchen Yu^², Jiajun Min^¹, Jun Shen^³, Rui Zhou^¹(

), Qingguo Zhou^¹(

)

1School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China

2College of Computer Science and Technology, Zhejiang University, Hangzhou 310058, China

3School of Computing and Information Technology, University of Wollongong, Wollongong 2500, Australia

Keywords:

deep learning, detection, cloud, Webshell, web security

Cite this article:

Liu X, Zhang Y, Yu Q, et al. SmartEagleEye: A Cloud-Oriented Webshell Detection System Based on Dynamic Gray-Box and Deep Learning. Tsinghua Science and Technology, 2024, 29(3): 766-783. https://doi.org/10.26599/TST.2023.9010020

Download citation

EndNote(RIS)

BibTeX

216

Views

Downloads

Citations

Crossref

WoS

Scopus

CSCD

Abstract Full text About this article

Abstract

Compared with traditional environments, the cloud environment exposes online services to additional vulnerabilities and threats of cyber attacks, and the cyber security of cloud platforms is becoming increasingly prominent. A piece of code, known as a Webshell, is usually uploaded to the target servers to achieve multiple attacks. Preventing Webshell attacks has become a hot spot in current research. Moreover, the traditional Webshell detectors are not built for the cloud, making it highly difficult to play a defensive role in the cloud environment. SmartEagleEye, a Webshell detection system based on deep learning that is successfully applied in various scenarios, is proposed in this paper. This system contains two important components: gray-box and neural network analyzers. The gray-box analyzer defines a series of rules and algorithms for extracting static and dynamic behaviors from the code to make the decision jointly. The neural network analyzer transforms suspicious code into Operation Code (OPCODE) sequences, turning the detection task into a classification problem. Comprehensive experiment results show that SmartEagleEye achieves an encouraging high detection rate and an acceptable false-positive rate, which indicate its capability to provide good protection for the cloud environment.

Full text

Abstract

Full text

Outline

About this article

SmartEagleEye: A Cloud-Oriented Webshell Detection System Based on Dynamic Gray-Box and Deep Learning

Show Author's information Hide Author's Information Xin Liu^¹, Yingli Zhang^¹, Qingchen Yu^², Jiajun Min^¹, Jun Shen^³, Rui Zhou^¹(

), Qingguo Zhou^¹(

)

1School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China

2College of Computer Science and Technology, Zhejiang University, Hangzhou 310058, China

3School of Computing and Information Technology, University of Wollongong, Wollongong 2500, Australia

Abstract

Keywords: deep learning, detection, cloud, Webshell, web security

References(45)

[1]

The NIST definition of Cloud computing, https://nvlpubs.nist.gov/nistpubs/legacy/sp/nistspecialpublication800-145.pdf, 2010.

[2]

A. K. Sandhu, Big data with cloud computing: Discussions and challenges, Big Data Mining and Analytics, vol. 5, no. 1, pp. 32–40, 2022.

DOI Google Scholar

[3]

M. Azrour, J. Mabrouki, A. Guezzaz, and Y. Farhaoui, New enhanced authentication protocol for internet of things, Big Data Mining and Analytics, vol. 4, no. 1, pp. 1–9, 2021.

DOI Google Scholar

[4]

Web shell, https://en.wikipedia.org/wiki/Webshell, 2023.

[5]

Acunetix Web Application Vulnerability Report 2019, https://www.acunetix.com/acunetix-web-application-vulnerability-report/, 2020.

[6]

Zend Engine 2 Opcodes, https://php-legacy-docs.zend.com/manual/php5/en/internals2.opcodes, 2022.

[7]

J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, in Proc. 2019 Conf. North American Chapter of the Association for Computational Linguistics : Human Language Technologies, Volume 1 (Long and Short Papers ), Minneapolis, MN, USA, 2018, pp. 4171–4186.

[8]

O. Starov, J. Dahse, S. S. Ahmad, T. Holz, and N. Nikiforakis, No honor among thieves: A large-scale analysis of malicious web shells, in Proc. 25^th Int. Conf. World Wide Web, Montréal, Canada, 2016, pp. 1021–1032.

DOI

[9]

Usage statistics of PHP for websites, https://w3techs.com/technologies/details/pl-php, 2020.

[10]

AMPQ Homepage, https://www.amqp.org/, 2022.

[11]

PSR-12: Extended Coding Style, https://www.php-fig.org/psr/psr-12/, 2020.

[12]

Z. Li, D. Zou, S. Xu, X. Ou, H. Jin, S. Wang, Z. Deng, and Y. Zhong, VulDeePecker: A deep learning-based system for vulnerability detection, in Proc. Network and Distributed System Security Symp., San Diego, CA, USA, 2018, pp. 1–15.

DOI

[13]

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Attention is all you need, arXiv preprint arXiv:1706.03762, 2023.

[14]

Activation function-wikipedia, https://en.wikipedia.org/wiki/Activation_function, 2023.

[15]

J. F. Kolen and S. C. Kremer, Gradient flow in recurrent nets: The difficulty of learning long-term dependencies, in A Field Guide to Dynamical Recurrent Networks. Los Alamitos, MX, USA: Wiley-IEEE Press, 2001, pp. 237−243.

[16]

U. Alon, M. Zilberstein, O. Levy, and E. Yahav, code2vec: Learning distributed representations of code, arXiv preprint arXiv:1803.09473, 2018.

DOI

[17]

U. Alon, S. Brody, O. Levy, and E. Yahav, code2seq: Generating sequences from structured representations of code, arXiv preprint arXiv:1808.01400, 2019.

[18]

M. Allamanis, M. Brockschmidt, and M. Khademi, Learning to represent programs with graphs, arXiv preprint arXiv:1711.00740, 2018.

[19]

Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, A comprehensive survey on graph neural networks, arXiv preprint arXiv:1901.00596, 2019.

[20]

Z. Li, D. Zou, S. Xu, H. Jin, H. Qi, and J. Hu, VulPecker: An automated vulnerability detection system based on code similarity analysis, in Proc. 32^nd Annu. Conf. Computer Security Applications, Los Angeles, CA, USA, 2016, pp. 201−213.

DOI

[21]

Y. Li, C. Gu, T. Dullien, O. Vinyals, and P. Kohli, Graph matching networks for learning the similarity of graph structured objects, arXiv preprint arXiv:1904.12787, 2019.

[22]

B. A. Jnr, Managing digital transformation of smart cities through enterprise architecture–a review and research agenda, Enterp. Inf. Syst., vol. 15, no. 3, pp. 299–331, 2021.

DOI Google Scholar

[23]

W. Tan, Y. Zhao, X. Hu, L. Xu, A. Tang, and T. Wang, A method towards Web service combination for cross-organisational business process using QoS and cluster, Enterp. Inf. Syst., vol. 13, no. 5, pp. 631–649, 2019.

DOI Google Scholar

[24]

K. Alieyan, A. Almomani, M. Anbar, M. Alauthman, R. Abdullah, and B. B. Gupta, DNS rule-based schema to botnet detection, Enterp. Inf. Syst., vol. 15, no. 4, pp. 545–564, 2021.

DOI

[25]

A. Dahiya and B. B. Gupta, A PBNM and economic incentive-based defensive mechanism against DDoS attacks, Enterp. Inf. Syst., vol. 16, no. 3, pp. 406–426, 2022.

DOI

[26]

J. Dahse and T. Holz, Simulation of built-in PHP features for precise static code analysis, in NDSS’14, San Diego, CA, USA, 2014, pp. 23–26.

DOI

[27]

N. Jovanovic, C. Kruegel, and E. Kirda, Pixy: A static analysis tool for detecting web application vulnerabilities, in Proc. 2006 IEEE Symp. Security and Privacy, Berkeley/Oakland, CA, USA, 2006, pp. 258–263.

DOI

[28]

D. Balzarotti, M. Cova, V. Felmetsger, N. Jovanovic, E. Kirda, C. Krügel, and G. Vigna, Saner: Composing static and dynamic analysis to validate sanitization in web applications, in Proc. 2008 IEEE Symp. Security and Privacy, Oakland, CA, USA, 2008, pp. 387–401.

DOI

[29]

NeoPI, https://github.com/CiscoCXSecurity/NeoPI, 2023.

[30]

Z. Ying and H. Yong, Webshell detection method based on correlation analysis, Journal of Information Security Research, vol. 4, no. 3, p. 5, 2018.

Google Scholar

[31]

V. G. Le, H. T. Nguyen, D. N. Lu, and N. H. Nguyen, A solution for automatically malicious web shell and web application vulnerability detection, in Proc. 8^th Int. Conf. Computational Collective Intelligence, Halkidiki, Greece, 2016, pp. 367–378.

DOI

[32]

W. Zhong, N. Yu, and C. Ai, Applying big data based deep learning system to intrusion detection, Big Data Mining and Analytics, vol. 3, no. 3, pp. 181–195, 2020.

DOI Google Scholar

[33]

G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, et al., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82–97, 2012.

DOI Google Scholar

[34]

A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, Commun. ACM, vol. 60, no. 6, pp. 84–90, 2017.

DOI Google Scholar

[35]

J. Pennington, R. Socher, and C. Manning, Glove: Global vectors for word representation, in Proc. 2014 Conf. Empirical Methods in Natural Language Processing (EMNLP ), Doha, Qatar, 2014, pp. 1532–1543.

DOI

[36]

H. Cui, D. Huang, F. Yong, L. Liang, and H. Cheng, Webshell detection based on random forest-gradient boosting decision tree algorithm, in Proc. 2018 IEEE Third Int. Conf. Data Science in Cyberspace (DSC), Guangzhou, China, 2018, pp. 153–160.

DOI

[37]

Y. Fang, Y. Qiu, L. Liu, and C. Huang, Detecting webshell based on random forest with FastText, in Proc. 2018 Int. Conf. Computing and Artificial Intelligence, Chengdu, China, 2018, pp. 52–56.

DOI

[38]

H. Zhang, H. Guan, H. Yan, W. Li, Y. Yu, H. Zhou, and X. Zeng, Webshell traffic detection with character-level features based on deep learning, IEEE Access, vol. 6, pp. 75268–75277, 2018.

DOI Google Scholar

[39]

Z. Zhou, L. Li, and X. Zhao, Webshell detection technology based on deep learning, in Proc. 2021 7 ^th IEEE Int. Conf. Big Data Security on Cloud (BigDataSecurity), IEEE Int. Conf. High Performance and Smart Computing (HPSC), and IEEE Int. Conf. Intelligent Data and Security (IDS), New York, NY, USA, 2021, pp. 52–56.

DOI

[40]

B. Gogoi, T. Ahmed, and R. G. Dinda, PHP web shell detection through static analysis of AST using LSTM based deep learning, in Proc. 2022 First Int. Conf. Artificial Intelligence Trends and Pattern Recognition (ICAITPR), Hyderabad, India, 2022, pp. 1–6.

DOI

[41]

L. Qi, R. Kong, Y. Lu, and H. Zhuang, An end-to-end detection method for WebShell with deep learning, in Proc. 2018 Eighth Int. Conf. Instrumentation & Measurement, Computer, Communication and Control (IMCCC), Harbin, China, 2018, pp. 660–665.

DOI

[42]

G. Betarte, E. Giménez, R. Martínez, and Á. Pardo, Machine learning-assisted virtual patching of web applications, arXiv preprint arXiv:1803.05529, 2018.

[43]

L. Nataraj, S. Karthikeyan, G. Jacob, and B. S. Manjunath, Malware images: Visualization and automatic classification, in Proc. 8^th Int. Symp. Visualization for Cyber Security, Pittsburgh, PA, USA, 2011, p. 4.

DOI

[44]

J. Lin, G. Sun, J. Shen, D. E. Pritchard, P. Yu, T. Cui, D. Xu, L. Li, and G. Beydoun, From computer vision to short text understanding: Applying similar approaches into different disciplines, Intelligent and Converged Networks, vol. 3, no. 2, pp. 161–172, 2022.

DOI Google Scholar

[45]

Q. Zhu, X. Ma, and X. Li, Statistical learning for semantic parsing: A survey, Big Data Mining and Analytics, vol. 2, no. 4, pp. 217–239, 2019.

DOI Google Scholar

About this article

Publication history

Acknowledgements

Rights and permissions

Publication history

Received: 16 January 2023

Revised: 18 March 2023

Accepted: 19 March 2023

Published: 04 December 2023

Issue date: June 2024

Copyright

Acknowledgements

Acknowledgment

Special thanks to Prof. Binbin Yong for his help with this work. This work was supported by the National Key R&D Program of China (No. 2020YFC0832500), the Science and Technology Plan of Gansu Province (Nos. 22ZD6GA048 and 22YF7GA004), and the Supercomputing Center of Lanzhou University.

Rights and permissions

The articles published in this open access journal are distributed under the terms of theCreative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).