Intelligent and Converged Networks 2023, 4(4): 326-341 https://doi.org/10.23919/ICN.2023.0027

Open Access | Issue | Published: 30 December 2023

A systematic review: Detecting phishing websites using data mining models

Show Author's Information Hide Author's Information Dina Jibat^¹, Sarah Jamjoom^¹, Qasem Abu Al-Haija^²(

), Abdallah Qusef^³

1Department of Business Intelligence Technology, Princess Sumaya University for Technology, Amman 11941, Jordan

2Department of Cybersecurity, Princess Sumaya University for Technology, Amman 11941, Jordan

3Department of Software Engineering, Princess Sumaya University for Technology, Amman 11941, Jordan

Keywords:

classification, machine learning, data mining, algorithm, phishing

Cite this article:

Jibat D, Jamjoom S, Abu Al-Haija Q, et al. A systematic review: Detecting phishing websites using data mining models. Intelligent and Converged Networks, 2023, 4(4): 326-341. https://doi.org/10.23919/ICN.2023.0027

Download citation

EndNote(RIS)

BibTeX

375

Views

Downloads

Citations

Crossref

N/A

WoS

Scopus

N/A

CSCD

Abstract Full text About this article

Abstract

As internet technology use is on the rise globally, phishing constitutes a considerable share of the threats that may attack individuals and organizations, leading to significant losses from personal and confidential information to substantial financial losses. Thus, much research has been dedicated in recent years to developing effective and robust mechanisms to enhance the ability to trace illegitimate web pages and to distinguish them from non-phishing sites as accurately as possible. Aiming to conclude whether a universally accepted model can detect phishing attempts with 100% accuracy, we conduct a systematic review of research carried out in 2018–2021 published in well-known journals published by Elsevier, IEEE, Springer, and Emerald. Those researchers studied different Data Mining (DM) algorithms, some of which created a whole new model, while others compared the performance of several algorithms. Some studies combined two or more algorithms to enhance the detection performance. Results reveal that while most algorithms achieve accuracies higher than 90%, only some specific models can achieve 100% accurate results.

Full text

Abstract

Full text

Outline

About this article

A systematic review: Detecting phishing websites using data mining models

Show Author's information Hide Author's Information Dina Jibat^¹, Sarah Jamjoom^¹, Qasem Abu Al-Haija^²(

), Abdallah Qusef^³

1Department of Business Intelligence Technology, Princess Sumaya University for Technology, Amman 11941, Jordan

2Department of Cybersecurity, Princess Sumaya University for Technology, Amman 11941, Jordan

3Department of Software Engineering, Princess Sumaya University for Technology, Amman 11941, Jordan

Abstract

Keywords: classification, machine learning, data mining, algorithm, phishing

References(34)

[1]

D. Goel and A. K. Jain, Mobile phishing attacks and defence mechanisms: State of art and open research challenges, Comput. Secur., vol. 73, pp. 519–544, 2018.

DOI Google Scholar

[2]

Q. Abu Al-Haija and M. Al-Fayoumi, An intelligent identification and classification system for malicious uniform resource locators (URLs), Neural Comput. Appl., pp. 1–17, 2023.

DOI Google Scholar

[3]

S. Parekh, D. Parikh, S. Kotak, and S. Sankhe, A new method for detection of phishing websites: URL detection, in Proc. 2018 Second Int. Conf. Inventive Communication and Computational Technologies (ICICCT), Coimbatore, India, 2018, pp. 949–952.

DOI

[4]

M. Baykara and Z. Z. Gürel, Detection of phishing attacks, in Proc. 2018 6th Int. Symp. Digital Forensic and Security (ISDFS), Antalya, Turkey, 2018, pp. 1–5.

DOI

[5]

I. Vayansky and S. Kumar, Phishing - challenges and solutions, Comput. Fraud Secur., vol. 2018, no. 1, pp. 15–20, 2018.

DOI Google Scholar

[6]

S. Salloum, T. Gaber, S. Vadera, and K. Shaalan, Phishing website detection from URLs using classical machine learning ANN model, in Proc. 17th EAI Int. Conf. Security and Privacy in Communication Systems, virtual, 2021, pp. 509–523.

DOI

[7]

Q. Abu Al-Haija, M. Alohaly, and A. Odeh, A lightweight double-stage scheme to identify malicious DNS over HTTPS traffic using a hybrid learning approach, Sensors, vol. 23, no. 7, pp. 3489, 2023.

DOI Google Scholar

[8]

R. Butler and M. Butler, Assessing the information quality of phishing-related content on financial institutions’ websites, Inf. Comput. Secur., vol. 26, no. 5, pp. 514–532, 2018.

DOI Google Scholar

[9]

A. A. Zuraiq and M. Alkasassbeh, Review: Phishing detection approaches, in Proc. 2019 2nd Int. Conf. New Trends in Computing Sciences (ICTCS), Amman, Jordan, 2019, pp. 1–6.

DOI

[10]

S. Adi, Y. Pristyanto, and A. Sunyoto, The best features selection method and relevance variable for web phishing classification, in Proc. 2019 Int. Conf. Information and Communications Technology (ICOIACT), Yogyakarta, Indonesia, 2019, pp. 578–583.

DOI

[11]

S. Zaman, S. M. Uddin Deep, Z. Kawsar, M. Ashaduzzaman, and A. I. Pritom, Phishing website detection using effective classifiers and feature selection techniques, in Proc. 2019 2nd Int. Conf. Innovation in Engineering and Technology (ICIET), Dhaka, Bangladesh, 2020, pp. 1–6.

DOI

[12]

G. Vrbančič, I. Fister Jr, and V. Podgorelec, Datasets for phishing websites detection, Data Brief, vol. 33, p. 106438, 2020.

DOI Google Scholar

[13]

H. Shirazi, B. Bezawada, I. Ray, and C. Anderson, Adversarial sampling attacks against phishing detection, in Proc. 33rd Annual IFIP Conf. Data and Applications Security and Privacy, Charleston, SC, USA, 2019, pp. 83–101.

DOI

[14]

M. Karabatak and T. Mustafa, Performance comparison of classifiers on reduced phishing website dataset, in Proc. 2018 6th Int. Symp. on Digital Forensic and Security (ISDFS), Antalya, Turkey, 2018, pp. 1–5.

DOI

[15]

S. Gupta and A. Singhal, Dynamic classification mining techniques for predicting phishing URL, in Soft Computing : Theories and Applications, M. Pant, K. Ray, T. K. Sharma, S. Rawat, A. Bandyopadhyay Eds. Singapore: Springer, 2018: 537-546.

DOI

[16]

F. Feng, Q. Zhou, Z. Shen, X. Yang, L. Han, and J. Wang, The application of a novel neural network in the detection of phishing websites, J. Ambient Intell. Humaniz. Comput., pp. 1–15, 2018.

DOI

[17]

Y. Sönmez, T. Tuncer, H. Gökal, and E. Avcı, Phishing web sites features classification based on extreme learning machine, in Proc. 2018 6th Int. Symp. on Digital Forensic and Security (ISDFS), Antalya, Turkey, 2018, pp. 1–5.

DOI

[18]

I. Salihovic, H. Serdarevic, and J. Kevric, The role of feature selection in machine learning for detection of Spam and phishing attacks, in Proc. Int. Symp. Innovative and Interdisciplinary Applications of Advanced Technologies (IAT), Jahorina, Bosnia and Herzegovina, 2018, pp. 476–483.

DOI

[19]

N. N. Gana and S. M. Abdulhamid, Machine learning classification algorithms for phishing detection: A comparative appraisal and analysis, in Proc. 2019 2nd Int. Conf. IEEE Nigeria Computer Chapter (NigeriaComputConf), Zaria, Nigeria, 2020, pp. 1–8.

DOI

[20]

A. F. Nugraha and L. Rahman, Meta-algorithms for improving classification performance in the web-phishing detection process, in Proc. 2019 4th Int. Conf. Information Technology, Information Systems and Electrical Engineering (ICITISEE), Yogyakarta, Indonesia, 2020, pp. 271–275.

DOI

[21]

A. K. Jain and B. B. Gupta, A machine learning based approach for phishing detection using hyperlinks information, J. Ambient Intell. Humaniz. Comput., vol. 10, no. 5, pp. 2015–2028, 2019.

DOI Google Scholar

[22]

M. M. Yadollahi, F. Shoeleh, E. Serkani, A. Madani, and H. Gharaee, An adaptive machine learning based approach for phishing detection using hybrid features, in Proc. 2019 5th Int. Conf. Web Research (ICWR), Tehran, Iran, 2019, pp. 281–286.

DOI

[23]

M. A Adebowale, K. T. Lwin, E. Sánchez, and M. A. Hossain, Intelligent web-phishing detection and protection scheme using integrated images, frames, and text features, Expert Syst. Appl., vol. 115, pp. 300–313, 2019.

DOI Google Scholar

[24]

A. Zamir, H. U. Khan, T. Iqbal, N. Yousaf, F. Aslam, A. Anjum, and M. Hamdani, Phishing web site detection using diverse machine learning algorithms, Electron. Libr., vol. 38, no. 1, pp. 65–80, 2020.

DOI Google Scholar

[25]

X. Xiao, D. Zhang, G. Hu, Y. Jiang, and S. Xia, CNN–MHSA: A Convolutional Neural Network and multi-head self-attention combined approach for detecting phishing websites, Neural Netw., vol. 125, pp. 303–312, 2020.

DOI Google Scholar

[26]

C. Wang, Z. Hu, R. Chiong, Y. Bao, and J. Wu, Identification of phishing websites through hyperlink analysis and rule extraction, Electron. Libr., vol. 38, nos. 5/6, pp. 1073–1093, 2020.

DOI Google Scholar

[27]

Y. Ahmad Alsariera, V. E. Adeyemo, A. O. Balogun, and A. K. Alazzawi, AI meta-learners and extra-trees algorithm for the detection of phishing websites, IEEE Access, vol. 8, pp. 142532–142542, 2020.

DOI Google Scholar

[28]

M. Korkmaz, O. K. Sahingoz, and B. Diri, Detection of phishing websites by using machine learning-based URL analysis, in Proc. 2020 11th Int. Conf. Computing, Communication and Networking Technologies (ICCCNT), Kharagpur, India, 2020, pp. 1–7.

DOI

[29]

M. A. Adebowale, K. T. Lwin, and M. A. Hossain, Intelligent phishing detection scheme using deep learning algorithms, J. Enterp. Inf. Manag., vol. 36, no. 3, pp. 747–766, 2023.

DOI Google Scholar

[30]

M. Arshey and K. S. Angel Viji, An optimization-based deep belief network for the detection of phishing e-mails, Data Technol. Appl., vol. 54, no. 4, pp. 529–549, 2020.

DOI Google Scholar

[31]

A. O. Balogun, K. S. Adewole, M. O. Raheem, O. N. Akande, F. E. Usman-Hamza, M. A. Mabayoje, A. G. Akintola, A. W. Asaju-Gbolagade, M. K. Jimoh, R. G. Jimoh, et al., Improving the phishing website detection using empirical analysis of Function Tree and its variants, Heliyon, vol. 7, no. 7, p. e07437, 2021.

DOI Google Scholar

[32]

D. J. Liu, G. G. Geng, X. B. Jin, and W. Wang, An efficient multistage phishing website detection model based on the CASE feature framework: Aiming at the real web environment, Comput. Secur., vol. 110, p. 102421, 2021.

DOI Google Scholar

[33]

A. Ghimire, A. Kumar Jha, S. Thapa, S. Mishra, and A. Mani Jha, Machine learning approach based on hybrid features for detection of phishing URLs, in Proc. 2021 11th Int. Conf. Cloud Computing, Data Science & Engineering (Confluence), Noida, India, 2021, pp. 954–959.

DOI

[34]

Q. Abu Al-Haija and A. Al Badawi, URL-based phishing websites detection via machine learning, in Proc. 2021 Int. Conf. Data Analytics for Business and Industry (ICDABI), Sakheer, Bahrain, 2021, pp. 644–649.

DOI

About this article

Publication history

Rights and permissions

Publication history

Received: 18 March 2023

Accepted: 04 September 2023

Published: 30 December 2023

Issue date: December 2023

Copyright

Rights and permissions

This work is available under the CC BY-NC-ND 3.0 IGO license:https://creativecommons.org/licenses/by-nc-nd/3.0/igo/