Journal Home > Volume 4 , Issue 4

As internet technology use is on the rise globally, phishing constitutes a considerable share of the threats that may attack individuals and organizations, leading to significant losses from personal and confidential information to substantial financial losses. Thus, much research has been dedicated in recent years to developing effective and robust mechanisms to enhance the ability to trace illegitimate web pages and to distinguish them from non-phishing sites as accurately as possible. Aiming to conclude whether a universally accepted model can detect phishing attempts with 100% accuracy, we conduct a systematic review of research carried out in 2018–2021 published in well-known journals published by Elsevier, IEEE, Springer, and Emerald. Those researchers studied different Data Mining (DM) algorithms, some of which created a whole new model, while others compared the performance of several algorithms. Some studies combined two or more algorithms to enhance the detection performance. Results reveal that while most algorithms achieve accuracies higher than 90%, only some specific models can achieve 100% accurate results.


menu
Abstract
Full text
Outline
About this article

A systematic review: Detecting phishing websites using data mining models

Show Author's information Dina Jibat1Sarah Jamjoom1Qasem Abu Al-Haija2( )Abdallah Qusef3
Department of Business Intelligence Technology, Princess Sumaya University for Technology, Amman 11941, Jordan
Department of Cybersecurity, Princess Sumaya University for Technology, Amman 11941, Jordan
Department of Software Engineering, Princess Sumaya University for Technology, Amman 11941, Jordan

Abstract

As internet technology use is on the rise globally, phishing constitutes a considerable share of the threats that may attack individuals and organizations, leading to significant losses from personal and confidential information to substantial financial losses. Thus, much research has been dedicated in recent years to developing effective and robust mechanisms to enhance the ability to trace illegitimate web pages and to distinguish them from non-phishing sites as accurately as possible. Aiming to conclude whether a universally accepted model can detect phishing attempts with 100% accuracy, we conduct a systematic review of research carried out in 2018–2021 published in well-known journals published by Elsevier, IEEE, Springer, and Emerald. Those researchers studied different Data Mining (DM) algorithms, some of which created a whole new model, while others compared the performance of several algorithms. Some studies combined two or more algorithms to enhance the detection performance. Results reveal that while most algorithms achieve accuracies higher than 90%, only some specific models can achieve 100% accurate results.

Keywords: classification, machine learning, data mining, algorithm, phishing

References(34)

[1]

D. Goel and A. K. Jain, Mobile phishing attacks and defence mechanisms: State of art and open research challenges, Comput. Secur., vol. 73, pp. 519–544, 2018.

[2]

Q. Abu Al-Haija and M. Al-Fayoumi, An intelligent identification and classification system for malicious uniform resource locators (URLs), Neural Comput. Appl., pp. 1–17, 2023.

[3]
S. Parekh, D. Parikh, S. Kotak, and S. Sankhe, A new method for detection of phishing websites: URL detection, in Proc. 2018 Second Int. Conf. Inventive Communication and Computational Technologies (ICICCT), Coimbatore, India, 2018, pp. 949–952.
DOI
[4]
M. Baykara and Z. Z. Gürel, Detection of phishing attacks, in Proc. 2018 6th Int. Symp. Digital Forensic and Security (ISDFS), Antalya, Turkey, 2018, pp. 1–5.
DOI
[5]

I. Vayansky and S. Kumar, Phishing - challenges and solutions, Comput. Fraud Secur., vol. 2018, no. 1, pp. 15–20, 2018.

[6]
S. Salloum, T. Gaber, S. Vadera, and K. Shaalan, Phishing website detection from URLs using classical machine learning ANN model, in Proc. 17th EAI Int. Conf. Security and Privacy in Communication Systems, virtual, 2021, pp. 509–523.
DOI
[7]

Q. Abu Al-Haija, M. Alohaly, and A. Odeh, A lightweight double-stage scheme to identify malicious DNS over HTTPS traffic using a hybrid learning approach, Sensors, vol. 23, no. 7, pp. 3489, 2023.

[8]

R. Butler and M. Butler, Assessing the information quality of phishing-related content on financial institutions’ websites, Inf. Comput. Secur., vol. 26, no. 5, pp. 514–532, 2018.

[9]
A. A. Zuraiq and M. Alkasassbeh, Review: Phishing detection approaches, in Proc. 2019 2nd Int. Conf. New Trends in Computing Sciences (ICTCS), Amman, Jordan, 2019, pp. 1–6.
DOI
[10]
S. Adi, Y. Pristyanto, and A. Sunyoto, The best features selection method and relevance variable for web phishing classification, in Proc. 2019 Int. Conf. Information and Communications Technology (ICOIACT), Yogyakarta, Indonesia, 2019, pp. 578–583.
DOI
[11]
S. Zaman, S. M. Uddin Deep, Z. Kawsar, M. Ashaduzzaman, and A. I. Pritom, Phishing website detection using effective classifiers and feature selection techniques, in Proc. 2019 2nd Int. Conf. Innovation in Engineering and Technology (ICIET), Dhaka, Bangladesh, 2020, pp. 1–6.
DOI
[12]

G. Vrbančič, I. Fister Jr, and V. Podgorelec, Datasets for phishing websites detection, Data Brief, vol. 33, p. 106438, 2020.

[13]
H. Shirazi, B. Bezawada, I. Ray, and C. Anderson, Adversarial sampling attacks against phishing detection, in Proc. 33rd Annual IFIP Conf. Data and Applications Security and Privacy, Charleston, SC, USA, 2019, pp. 83–101.
DOI
[14]
M. Karabatak and T. Mustafa, Performance comparison of classifiers on reduced phishing website dataset, in Proc. 2018 6th Int. Symp. on Digital Forensic and Security (ISDFS), Antalya, Turkey, 2018, pp. 1–5.
DOI
[15]
S. Gupta and A. Singhal, Dynamic classification mining techniques for predicting phishing URL, in Soft Computing : Theories and Applications, M. Pant, K. Ray, T. K. Sharma, S. Rawat, A. Bandyopadhyay Eds. Singapore: Springer, 2018: 537-546.
DOI
[16]
F. Feng, Q. Zhou, Z. Shen, X. Yang, L. Han, and J. Wang, The application of a novel neural network in the detection of phishing websites, J. Ambient Intell. Humaniz. Comput., pp. 1–15, 2018.
DOI
[17]
Y. Sönmez, T. Tuncer, H. Gökal, and E. Avcı, Phishing web sites features classification based on extreme learning machine, in Proc. 2018 6th Int. Symp. on Digital Forensic and Security (ISDFS), Antalya, Turkey, 2018, pp. 1–5.
DOI
[18]
I. Salihovic, H. Serdarevic, and J. Kevric, The role of feature selection in machine learning for detection of Spam and phishing attacks, in Proc. Int. Symp. Innovative and Interdisciplinary Applications of Advanced Technologies (IAT), Jahorina, Bosnia and Herzegovina, 2018, pp. 476–483.
DOI
[19]
N. N. Gana and S. M. Abdulhamid, Machine learning classification algorithms for phishing detection: A comparative appraisal and analysis, in Proc. 2019 2nd Int. Conf. IEEE Nigeria Computer Chapter (NigeriaComputConf), Zaria, Nigeria, 2020, pp. 1–8.
DOI
[20]
A. F. Nugraha and L. Rahman, Meta-algorithms for improving classification performance in the web-phishing detection process, in Proc. 2019 4th Int. Conf. Information Technology, Information Systems and Electrical Engineering (ICITISEE), Yogyakarta, Indonesia, 2020, pp. 271–275.
DOI
[21]

A. K. Jain and B. B. Gupta, A machine learning based approach for phishing detection using hyperlinks information, J. Ambient Intell. Humaniz. Comput., vol. 10, no. 5, pp. 2015–2028, 2019.

[22]
M. M. Yadollahi, F. Shoeleh, E. Serkani, A. Madani, and H. Gharaee, An adaptive machine learning based approach for phishing detection using hybrid features, in Proc. 2019 5th Int. Conf. Web Research (ICWR), Tehran, Iran, 2019, pp. 281–286.
DOI
[23]

M. A Adebowale, K. T. Lwin, E. Sánchez, and M. A. Hossain, Intelligent web-phishing detection and protection scheme using integrated images, frames, and text features, Expert Syst. Appl., vol. 115, pp. 300–313, 2019.

[24]

A. Zamir, H. U. Khan, T. Iqbal, N. Yousaf, F. Aslam, A. Anjum, and M. Hamdani, Phishing web site detection using diverse machine learning algorithms, Electron. Libr., vol. 38, no. 1, pp. 65–80, 2020.

[25]

X. Xiao, D. Zhang, G. Hu, Y. Jiang, and S. Xia, CNN–MHSA: A Convolutional Neural Network and multi-head self-attention combined approach for detecting phishing websites, Neural Netw., vol. 125, pp. 303–312, 2020.

[26]

C. Wang, Z. Hu, R. Chiong, Y. Bao, and J. Wu, Identification of phishing websites through hyperlink analysis and rule extraction, Electron. Libr., vol. 38, nos. 5/6, pp. 1073–1093, 2020.

[27]

Y. Ahmad Alsariera, V. E. Adeyemo, A. O. Balogun, and A. K. Alazzawi, AI meta-learners and extra-trees algorithm for the detection of phishing websites, IEEE Access, vol. 8, pp. 142532–142542, 2020.

[28]
M. Korkmaz, O. K. Sahingoz, and B. Diri, Detection of phishing websites by using machine learning-based URL analysis, in Proc. 2020 11th Int. Conf. Computing, Communication and Networking Technologies (ICCCNT), Kharagpur, India, 2020, pp. 1–7.
DOI
[29]

M. A. Adebowale, K. T. Lwin, and M. A. Hossain, Intelligent phishing detection scheme using deep learning algorithms, J. Enterp. Inf. Manag., vol. 36, no. 3, pp. 747–766, 2023.

[30]

M. Arshey and K. S. Angel Viji, An optimization-based deep belief network for the detection of phishing e-mails, Data Technol. Appl., vol. 54, no. 4, pp. 529–549, 2020.

[31]

A. O. Balogun, K. S. Adewole, M. O. Raheem, O. N. Akande, F. E. Usman-Hamza, M. A. Mabayoje, A. G. Akintola, A. W. Asaju-Gbolagade, M. K. Jimoh, R. G. Jimoh, et al., Improving the phishing website detection using empirical analysis of Function Tree and its variants, Heliyon, vol. 7, no. 7, p. e07437, 2021.

[32]

D. J. Liu, G. G. Geng, X. B. Jin, and W. Wang, An efficient multistage phishing website detection model based on the CASE feature framework: Aiming at the real web environment, Comput. Secur., vol. 110, p. 102421, 2021.

[33]
A. Ghimire, A. Kumar Jha, S. Thapa, S. Mishra, and A. Mani Jha, Machine learning approach based on hybrid features for detection of phishing URLs, in Proc. 2021 11th Int. Conf. Cloud Computing, Data Science & Engineering (Confluence), Noida, India, 2021, pp. 954–959.
DOI
[34]
Q. Abu Al-Haija and A. Al Badawi, URL-based phishing websites detection via machine learning, in Proc. 2021 Int. Conf. Data Analytics for Business and Industry (ICDABI), Sakheer, Bahrain, 2021, pp. 644–649.
DOI
Publication history
Copyright
Rights and permissions

Publication history

Received: 18 March 2023
Accepted: 04 September 2023
Published: 30 December 2023
Issue date: December 2023

Copyright

© All articles included in the journal are copyrighted to the ITU and TUP.

Rights and permissions

This work is available under the CC BY-NC-ND 3.0 IGO license:https://creativecommons.org/licenses/by-nc-nd/3.0/igo/

Return