Journal Home > Volume 27 , Issue 1

Software Defect Prediction (SDP) technology is an effective tool for improving software system quality that has attracted much attention in recent years. However, the prediction of cross-project data remains a challenge for the traditional SDP method due to the different distributions of the training and testing datasets. Another major difficulty is the class imbalance issue that must be addressed in Cross-Project Defect Prediction (CPDP). In this work, we propose a transfer-leaning algorithm (TSboostDF) that considers both knowledge transfer and class imbalance for CPDP. The experimental results demonstrate that the performance achieved by TSboostDF is better than those of existing CPDP methods.


menu
Abstract
Full text
Outline
About this article

A Novel Cross-Project Software Defect Prediction Algorithm Based on Transfer Learning

Show Author's information Shiqi TangSong Huang( )Changyou Zheng( )Erhu LiuCheng ZongYixian Ding
Command & Control Engineering College, Army Engineering University of PLA, Nanjing 210000, China
Foreign Language College, Liaoning Technical University, Fuxin 123000, China

Abstract

Software Defect Prediction (SDP) technology is an effective tool for improving software system quality that has attracted much attention in recent years. However, the prediction of cross-project data remains a challenge for the traditional SDP method due to the different distributions of the training and testing datasets. Another major difficulty is the class imbalance issue that must be addressed in Cross-Project Defect Prediction (CPDP). In this work, we propose a transfer-leaning algorithm (TSboostDF) that considers both knowledge transfer and class imbalance for CPDP. The experimental results demonstrate that the performance achieved by TSboostDF is better than those of existing CPDP methods.

Keywords: transfer learning, Software Defect Prediction (SDP), imbalance class, cross-project

References(42)

[1]
X. Chen, Q. Gu, W. S. Liu, S. L. Liu, and C. Ni, Survey of static software defect prediction, (in Chinese), Journal of Software, vol. 27, no. 1, pp. 1-25, 2016.
[2]
Q. Wang, S. J. Wu, and M. S. Li, Software defect prediction, (in Chinese), Journal of Software, vol. 19, no. 7, pp. 1565-1580, 2008.
[3]
T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell, A systematic literature review on fault prediction performance in software engineering, IEEE Transactions on Software Engineering, vol. 38, no. 6, pp. 1276-1304, 2012.
[4]
K. Punitha and S. Chitra, Software defect prediction using software metrics—A survey, in Proc. of International Conference on Information Communication & Embedded Systems, Chennai, India, 2013, pp. 555-558.
DOI
[5]
R. Malhotra, An empirical framework for defect prediction using machine learning techniques with Android software, Applied Soft Computing, vol. 49, pp. 1034-1050, 2016.
[6]
R. Rana, M. Staron, C. Berger, J. Hansson, M. Nisslon, and W. Meding, The adoption of machine learning techniques for software defect prediction: An initial industrial validation, in Proc. on Knowledge-Based Software Engineering, Cham, Greece, 2014, pp. 270-285.
DOI
[7]
M. Shepperd, D. Bowes, and T. Hall, Researcher bias: The use of machine learning in software defect prediction, IEEE Transactions on Software Engineering, vol. 42, no. 40, pp. 603-616, 2014.
[8]
S. Lessmann, B. Baesens, C. Mues, and S. Pietsch, Benchmarking classification models for software defect prediction: A proposed framework and novel findings, IEEE Transactions on Software Engineering, vol. 34, no. 4, pp. 649-660, 2008.
[9]
K. O. Elish and M. O. Elish, Predicting defect-prone software modules using support vector machines, Journal of Systems & Software, vol. 81, no. 5, pp. 649-660, 2008.
[10]
X. Y. Jing, S. Ying, Z. W. Zhang, S. S. Wu, and J. Liu, Dictionary learning-based software defect prediction, in Proceedings of the 36th International Conference on Software Engineering, Hyderabad, India, 2014, pp. 414-423.
DOI
[11]
J. Wang, B. Shen, and Y. Chen, Compressed C4.5 models for software defect prediction, in Proceedings of International Conference on Quality Software, Nanjing, China, pp. 13-16, 2012.
DOI
[12]
H. Y. Jiang, M. Zong, and X. Y. Liu, Research of software defect prediction model based on ACO-SVM, (in Chinese), Chinese Journal of Computers, vol. 34, no. 6, pp. 1148-1154, 2011.
[13]
M. Li, H. Zhang, R. Wu, and Z. H. Zhou, Sample-based software defect prediction with active and semi-supervised learning, Automated Software Engineering, vol. 19, no. 2, pp. 201-230, 2012.
[14]
H. Lu, E. Kocaguneli, and B. Cukic, Defect prediction between software versions with active learning and dimensionality reduction, in Proceedings of International Symposium on Software Reliability Engineering, Naples, Italy, 2014, pp. 270-285.
DOI
[15]
L. Pelayo and S. Dick, Applying novel resampling strategies to software defect prediction, in Proceedings of Annual Meeting of the North American Fuzzy Information Processing Society, San Diego, CA, USA, 2007, pp. 69-72.
DOI
[16]
T. Menzies, T. Burak, B. Ayse, G. Gregory, C. Bojan, and Y. Jiang, Implications of ceiling effects in defect predictors, in Proc. of International Workshop on Predictor MODELS in Software Engineering, Leipzig, Germany, pp. 47-54, 2008.
DOI
[17]
Y. Ma, G. C. Luo, X. Zeng, and A. G. Chen, Transfer learning for cross-company software defect prediction, Information and Software Technology, vol. 54, no. 3, pp. 248-256, 2012
[18]
S. J. Pan and Q. Yang, A survey on transfer learning, IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345-1359, 2010.
[19]
F. Z. Zhuang, P. Luo, Q. He, and Z. Z. Shi, Survey on transfer learning research, (in Chinese), Journal of Software, vol. 26, no. 1, pp. 26-39, 2015.
[20]
J. Nam, S. J. Pan, and S. Kim, Transfer defect learning, in Proc. of 2013 35th International Conference on Software Engineering (ICSE), San Francisco, CA, USA, 2013, pp. 382-391.
DOI
[21]
L. Chen, B. Fang, Z. W. Shang, and Y. Y. Tang, Negative samples reduction in cross-company software defects prediction, Information and Software Technology, vol. 62, no. 1, pp. 67-77, 2015
[22]
J. Zheng, Cost-sensitive boosting neural networks for software defect prediction, Expert Systems with Applications, vol. 37, no. 6, pp. 4537-4543, 2010.
[23]
S. Wang and X. Yao, Using class imbalance learning for software defect prediction, IEEE Transactions on Reliability, vol. 62, no. 2, pp. 434-443, 2013.
[24]
T. Zimmermann, N. Nagappan, H. Gall, E. Giger, and B. Murphy, Cross-project defect prediction: A large scale experiment on data vs. domain vs. process, in Proc. of the Joint Meeting of the European Software Engineering Conference and the ACM Sigsoft Symposium on the Foundations of Software Engineering, Amsterdam, the Netherlands, 2009, pp. 91-100.
DOI
[25]
W. Dai, G. R. Xue, Q. Yang, and Y. Yu, Co-clustering-based classification for out-of-domain documents, in Proc. of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, CA USA, pp. 210-219, 2007.
DOI
[26]
M. Muhammad, Y. Liu, M. Sun, and H. Luan, Enriching the transfer learning with pre-trained lexicon embedding for low-resource neural machine translation, Tsinghua Science & Technology, .
[27]
J. Lin, L. Liang, X. Han, C. Yang, X. Chen, and X. Gao, Cross-target transfer algorithm based on the volterra model of SSVEP-BCI, Tsinghua Science & Technology, .
[28]
Q. Wu, H. Wu, X. Zhou, M. Tan, Y. Xu, Y. Yan, and T. Hao, Online transfer learning with multiple homogeneous or heterogeneous sources, IEEE Transactions on Knowledgeand Data Engineering, vol. 29, no. 7, pp. 1494-1507, 2017.
[29]
B. Turhan, T. Menzies, A. Bener, and J. Distefano, On the relative value of cross-company and within-company data for defect prediction, Empirical Software Engineering, vol. 14, no. 5, pp. 540-578, 2009.
[30]
I. H. Witten and E. Frank, Data mining: Practical machine learning tools and techniques, Acm Sigmod Record, vol. 31, no. 1, pp. 76-77, 2005.
[31]
Y. Li, Z. Q. Huang, Y. Wang, and B. W. Fang, New approach of cross-project defect prediction based on multi-source data, (in Chinese), Journal of Jilin University, vol. 46, no. 6, pp. 2034-2041, 2015.
[32]
[33]
G. Boetticher, T. Menzies, and T. Ostrand, The promise repository of empirical software engineering data, https://github.com/opensciences/opensciences.github.io, 2007.
[34]
H. J. Ji and S. Huang, A new framework consisted of data preprocessing and classifier modelling for software defect prediction, Computational Intelligence and Neuroscience, vol. 2018, no. 1, pp. 1-13, 2018.
[35]
H. Tong, B. Liu, and S. H. Wang, Transfer-learning oriented class imbalance learning for cross-project defect prediction, https://arxiv.org/abs/1901.08429, 2019.
[36]
M. Shepperd, D. Bowes, and T. Hall, Researcher bias: The use of machine learning in software defect prediction, IEEE Transactions on Software Engineering, vol. 40, no. 6, pp. 603-616, 2014.
[37]
G. Macbeth, E. Razumiejczyk, and R. D. Ledesma, Cliff’s delta calculator: A non-parametric effect size program for two groups of observations, Universitas Psychologica, vol. 10, no. 2, pp 545-555, 2011.
[38]
Y. N. Wu, S. Huang, H. J. Ji, C. Y. Zheng, and C. Z. Bai, A novel Bayes defect predictor based on information diffusion function, Knowledge-Based Systems, vol. 144, no. 1, pp. 1-8, 2018.
[39]
H. Tong, B. Liu, and S. H. Wang, Kernel spectral embedding transfer ensemble for heterogeneous defect prediction, IEEE Transactions on Software Engineering, .
[40]
Y. Freund and R. E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences, vol. 55, no. 1, pp. 23-37, 1997.
[41]
S. Wang, H. Chen, and X. Yao, Negative correlation learning for classification ensembles, in Proc. of International Joint Conference on Neural Networks, Barcelona, Spain, 2010, pp. 1-8.
DOI
[42]
I. H. Witten and E. Frank, Data mining: Practical machine learning tools and techniques, Acm Sigmod Record, vol. 31, no. 1, pp. 76-77, 2005.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 28 August 2020
Accepted: 18 September 2020
Published: 17 August 2021
Issue date: February 2022

Copyright

© The author(s) 2022

Acknowledgements

This work was partially supported by the Army Weapons and Equipment Internal Research (No. LJ20191C080690).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return