Journal Home > Volume 29 , Issue 3

Compared with traditional environments, the cloud environment exposes online services to additional vulnerabilities and threats of cyber attacks, and the cyber security of cloud platforms is becoming increasingly prominent. A piece of code, known as a Webshell, is usually uploaded to the target servers to achieve multiple attacks. Preventing Webshell attacks has become a hot spot in current research. Moreover, the traditional Webshell detectors are not built for the cloud, making it highly difficult to play a defensive role in the cloud environment. SmartEagleEye, a Webshell detection system based on deep learning that is successfully applied in various scenarios, is proposed in this paper. This system contains two important components: gray-box and neural network analyzers. The gray-box analyzer defines a series of rules and algorithms for extracting static and dynamic behaviors from the code to make the decision jointly. The neural network analyzer transforms suspicious code into Operation Code (OPCODE) sequences, turning the detection task into a classification problem. Comprehensive experiment results show that SmartEagleEye achieves an encouraging high detection rate and an acceptable false-positive rate, which indicate its capability to provide good protection for the cloud environment.


menu
Abstract
Full text
Outline
About this article

SmartEagleEye: A Cloud-Oriented Webshell Detection System Based on Dynamic Gray-Box and Deep Learning

Show Author's information Xin Liu1Yingli Zhang1Qingchen Yu2Jiajun Min1Jun Shen3Rui Zhou1( )Qingguo Zhou1( )
School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China
College of Computer Science and Technology, Zhejiang University, Hangzhou 310058, China
School of Computing and Information Technology, University of Wollongong, Wollongong 2500, Australia

Abstract

Compared with traditional environments, the cloud environment exposes online services to additional vulnerabilities and threats of cyber attacks, and the cyber security of cloud platforms is becoming increasingly prominent. A piece of code, known as a Webshell, is usually uploaded to the target servers to achieve multiple attacks. Preventing Webshell attacks has become a hot spot in current research. Moreover, the traditional Webshell detectors are not built for the cloud, making it highly difficult to play a defensive role in the cloud environment. SmartEagleEye, a Webshell detection system based on deep learning that is successfully applied in various scenarios, is proposed in this paper. This system contains two important components: gray-box and neural network analyzers. The gray-box analyzer defines a series of rules and algorithms for extracting static and dynamic behaviors from the code to make the decision jointly. The neural network analyzer transforms suspicious code into Operation Code (OPCODE) sequences, turning the detection task into a classification problem. Comprehensive experiment results show that SmartEagleEye achieves an encouraging high detection rate and an acceptable false-positive rate, which indicate its capability to provide good protection for the cloud environment.

Keywords: deep learning, detection, cloud, Webshell, web security

References(45)

[1]
The NIST definition of Cloud computing, https://nvlpubs.nist.gov/nistpubs/legacy/sp/nistspecialpublication800-145.pdf, 2010.
[2]

A. K. Sandhu, Big data with cloud computing: Discussions and challenges, Big Data Mining and Analytics, vol. 5, no. 1, pp. 32–40, 2022.

[3]

M. Azrour, J. Mabrouki, A. Guezzaz, and Y. Farhaoui, New enhanced authentication protocol for internet of things, Big Data Mining and Analytics, vol. 4, no. 1, pp. 1–9, 2021.

[4]
Web shell, https://en.wikipedia.org/wiki/Webshell, 2023.
[5]
Acunetix Web Application Vulnerability Report 2019, https://www.acunetix.com/acunetix-web-application-vulnerability-report/, 2020.
[6]
Zend Engine 2 Opcodes, https://php-legacy-docs.zend.com/manual/php5/en/internals2.opcodes, 2022.
[7]
J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, in Proc. 2019 Conf. North American Chapter of the Association for Computational Linguistics : Human Language Technologies, Volume 1 (Long and Short Papers ), Minneapolis, MN, USA, 2018, pp. 4171–4186.
[8]
O. Starov, J. Dahse, S. S. Ahmad, T. Holz, and N. Nikiforakis, No honor among thieves: A large-scale analysis of malicious web shells, in Proc. 25 th Int. Conf. World Wide Web, Montréal, Canada, 2016, pp. 1021–1032.
DOI
[9]
Usage statistics of PHP for websites, https://w3techs.com/technologies/details/pl-php, 2020.
[10]
AMPQ Homepage, https://www.amqp.org/, 2022.
[11]
PSR-12: Extended Coding Style, https://www.php-fig.org/psr/psr-12/, 2020.
[12]
Z. Li, D. Zou, S. Xu, X. Ou, H. Jin, S. Wang, Z. Deng, and Y. Zhong, VulDeePecker: A deep learning-based system for vulnerability detection, in Proc. Network and Distributed System Security Symp., San Diego, CA, USA, 2018, pp. 1–15.
DOI
[13]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Attention is all you need, arXiv preprint arXiv:1706.03762, 2023.
[14]
Activation function-wikipedia, https://en.wikipedia.org/wiki/Activation_function, 2023.
[15]
J. F. Kolen and S. C. Kremer, Gradient flow in recurrent nets: The difficulty of learning long-term dependencies, in A Field Guide to Dynamical Recurrent Networks. Los Alamitos, MX, USA: Wiley-IEEE Press, 2001, pp. 237−243.
[16]
U. Alon, M. Zilberstein, O. Levy, and E. Yahav, code2vec: Learning distributed representations of code, arXiv preprint arXiv:1803.09473, 2018.
DOI
[17]
U. Alon, S. Brody, O. Levy, and E. Yahav, code2seq: Generating sequences from structured representations of code, arXiv preprint arXiv:1808.01400, 2019.
[18]
M. Allamanis, M. Brockschmidt, and M. Khademi, Learning to represent programs with graphs, arXiv preprint arXiv:1711.00740, 2018.
[19]
Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, A comprehensive survey on graph neural networks, arXiv preprint arXiv:1901.00596, 2019.
[20]
Z. Li, D. Zou, S. Xu, H. Jin, H. Qi, and J. Hu, VulPecker: An automated vulnerability detection system based on code similarity analysis, in Proc. 32 nd Annu. Conf. Computer Security Applications, Los Angeles, CA, USA, 2016, pp. 201−213.
DOI
[21]
Y. Li, C. Gu, T. Dullien, O. Vinyals, and P. Kohli, Graph matching networks for learning the similarity of graph structured objects, arXiv preprint arXiv:1904.12787, 2019.
[22]

B. A. Jnr, Managing digital transformation of smart cities through enterprise architecture–a review and research agenda, Enterp. Inf. Syst., vol. 15, no. 3, pp. 299–331, 2021.

[23]

W. Tan, Y. Zhao, X. Hu, L. Xu, A. Tang, and T. Wang, A method towards Web service combination for cross-organisational business process using QoS and cluster, Enterp. Inf. Syst., vol. 13, no. 5, pp. 631–649, 2019.

[24]
K. Alieyan, A. Almomani, M. Anbar, M. Alauthman, R. Abdullah, and B. B. Gupta, DNS rule-based schema to botnet detection, Enterp. Inf. Syst., vol. 15, no. 4, pp. 545–564, 2021.
DOI
[25]
A. Dahiya and B. B. Gupta, A PBNM and economic incentive-based defensive mechanism against DDoS attacks, Enterp. Inf. Syst., vol. 16, no. 3, pp. 406–426, 2022.
DOI
[26]
J. Dahse and T. Holz, Simulation of built-in PHP features for precise static code analysis, in NDSS’14, San Diego, CA, USA, 2014, pp. 23–26.
DOI
[27]
N. Jovanovic, C. Kruegel, and E. Kirda, Pixy: A static analysis tool for detecting web application vulnerabilities, in Proc. 2006 IEEE Symp. Security and Privacy, Berkeley/Oakland, CA, USA, 2006, pp. 258–263.
DOI
[28]
D. Balzarotti, M. Cova, V. Felmetsger, N. Jovanovic, E. Kirda, C. Krügel, and G. Vigna, Saner: Composing static and dynamic analysis to validate sanitization in web applications, in Proc. 2008 IEEE Symp. Security and Privacy, Oakland, CA, USA, 2008, pp. 387–401.
DOI
[29]
NeoPI, https://github.com/CiscoCXSecurity/NeoPI, 2023.
[30]

Z. Ying and H. Yong, Webshell detection method based on correlation analysis, Journal of Information Security Research, vol. 4, no. 3, p. 5, 2018.

[31]
V. G. Le, H. T. Nguyen, D. N. Lu, and N. H. Nguyen, A solution for automatically malicious web shell and web application vulnerability detection, in Proc. 8 th Int. Conf. Computational Collective Intelligence, Halkidiki, Greece, 2016, pp. 367–378.
DOI
[32]

W. Zhong, N. Yu, and C. Ai, Applying big data based deep learning system to intrusion detection, Big Data Mining and Analytics, vol. 3, no. 3, pp. 181–195, 2020.

[33]

G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, et al., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82–97, 2012.

[34]

A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, Commun. ACM, vol. 60, no. 6, pp. 84–90, 2017.

[35]
J. Pennington, R. Socher, and C. Manning, Glove: Global vectors for word representation, in Proc. 2014 Conf. Empirical Methods in Natural Language Processing (EMNLP ), Doha, Qatar, 2014, pp. 1532–1543.
DOI
[36]
H. Cui, D. Huang, F. Yong, L. Liang, and H. Cheng, Webshell detection based on random forest-gradient boosting decision tree algorithm, in Proc. 2018 IEEE Third Int. Conf. Data Science in Cyberspace (DSC), Guangzhou, China, 2018, pp. 153–160.
DOI
[37]
Y. Fang, Y. Qiu, L. Liu, and C. Huang, Detecting webshell based on random forest with FastText, in Proc. 2018 Int. Conf. Computing and Artificial Intelligence, Chengdu, China, 2018, pp. 52–56.
DOI
[38]

H. Zhang, H. Guan, H. Yan, W. Li, Y. Yu, H. Zhou, and X. Zeng, Webshell traffic detection with character-level features based on deep learning, IEEE Access, vol. 6, pp. 75268–75277, 2018.

[39]
Z. Zhou, L. Li, and X. Zhao, Webshell detection technology based on deep learning, in Proc. 2021 7 th IEEE Int. Conf. Big Data Security on Cloud (BigDataSecurity), IEEE Int. Conf. High Performance and Smart Computing (HPSC), and IEEE Int. Conf. Intelligent Data and Security (IDS), New York, NY, USA, 2021, pp. 52–56.
DOI
[40]
B. Gogoi, T. Ahmed, and R. G. Dinda, PHP web shell detection through static analysis of AST using LSTM based deep learning, in Proc. 2022 First Int. Conf. Artificial Intelligence Trends and Pattern Recognition (ICAITPR), Hyderabad, India, 2022, pp. 1–6.
DOI
[41]
L. Qi, R. Kong, Y. Lu, and H. Zhuang, An end-to-end detection method for WebShell with deep learning, in Proc. 2018 Eighth Int. Conf. Instrumentation & Measurement, Computer, Communication and Control (IMCCC), Harbin, China, 2018, pp. 660–665.
DOI
[42]
G. Betarte, E. Giménez, R. Martínez, and Á. Pardo, Machine learning-assisted virtual patching of web applications, arXiv preprint arXiv:1803.05529, 2018.
[43]
L. Nataraj, S. Karthikeyan, G. Jacob, and B. S. Manjunath, Malware images: Visualization and automatic classification, in Proc. 8 th Int. Symp. Visualization for Cyber Security, Pittsburgh, PA, USA, 2011, p. 4.
DOI
[44]

J. Lin, G. Sun, J. Shen, D. E. Pritchard, P. Yu, T. Cui, D. Xu, L. Li, and G. Beydoun, From computer vision to short text understanding: Applying similar approaches into different disciplines, Intelligent and Converged Networks, vol. 3, no. 2, pp. 161–172, 2022.

[45]

Q. Zhu, X. Ma, and X. Li, Statistical learning for semantic parsing: A survey, Big Data Mining and Analytics, vol. 2, no. 4, pp. 217–239, 2019.

Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 16 January 2023
Revised: 18 March 2023
Accepted: 19 March 2023
Published: 04 December 2023
Issue date: June 2024

Copyright

© The Author(s) 2024.

Acknowledgements

Acknowledgment

Special thanks to Prof. Binbin Yong for his help with this work. This work was supported by the National Key R&D Program of China (No. 2020YFC0832500), the Science and Technology Plan of Gansu Province (Nos. 22ZD6GA048 and 22YF7GA004), and the Supercomputing Center of Lanzhou University.

Rights and permissions

The articles published in this open access journal are distributed under the terms of theCreative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return