Journal Home > Volume 4 , Issue 3

As various software bots are widely used in open source software repositories, some drawbacks are coming to light, such as giving newcomers non-positive feedback and misleading empirical studies of software engineering researchers. Several techniques have been proposed by researchers to perform bot detection, but most of them are limited to identifying bots performing specific activities, let alone distinguishing between GitHub App and OAuth App. In this paper, we propose a bot detection technique for OAuth App, named BDGOA. 24 features are used in BDGOA, which can be divided into three dimensions: account information, account activity, and text similarity. To better explore the behavioral features, we define a fine-grained classification of behavioral events and introduce self-similarity to quantify the repeatability of behavioral sequence. We leverage five machine learning classifiers on the benchmark dataset to conduct bot detection, and finally choose random forest as the classifier, which achieves the highest F1-score of 95.83%. The experimental results comparing with the state-of-the-art approaches also demonstrate the superiority of BDGOA.


menu
Abstract
Full text
Outline
About this article

BDGOA: A bot detection approach for GitHub OAuth Apps

Show Author's information Zhifang Liao1Xuechun Huang1Bolin Zhang1Jinsong Wu2( )Yu Cheng3
School of Computer Science and Engineering, Central South University, Changsha 410083, China
School of Artificial Intelligence, Guilin University of Electronic Technology, Guilin 541004, China, and also with the Department of Electrical Engineering, University of Chile, Santiago 8320000, Chile
Hunan Glozeal Science and Technology Co., Ltd., Changsha 410083, China

Abstract

As various software bots are widely used in open source software repositories, some drawbacks are coming to light, such as giving newcomers non-positive feedback and misleading empirical studies of software engineering researchers. Several techniques have been proposed by researchers to perform bot detection, but most of them are limited to identifying bots performing specific activities, let alone distinguishing between GitHub App and OAuth App. In this paper, we propose a bot detection technique for OAuth App, named BDGOA. 24 features are used in BDGOA, which can be divided into three dimensions: account information, account activity, and text similarity. To better explore the behavioral features, we define a fine-grained classification of behavioral events and introduce self-similarity to quantify the repeatability of behavioral sequence. We leverage five machine learning classifiers on the benchmark dataset to conduct bot detection, and finally choose random forest as the classifier, which achieves the highest F1-score of 95.83%. The experimental results comparing with the state-of-the-art approaches also demonstrate the superiority of BDGOA.

Keywords: machine learning, Github, DevBots, text similarity

References(30)

[1]
S. Mirhosseini and C. Parnin, Can automated pull requests encourage software developers to upgrade out-of-date dependencies? in Proc. 32nd IEEE/ACM Int. Conf. on Automated Software Engineering, Urbana, IL, USA, 2017, pp. 84–94.
DOI
[2]
S. Urli, Z. Yu, L. Seinturier, and M. Monperrus, How to design a program repair bot? Insights from the repairnator project, in Proc. IEEE/ACM 40th Int. Conf. on Software Engineering: Software Engineering in Practice Track, Gothenburg, Sweden, 2018, pp. 95–104.
DOI
[3]
L. Erlenhov, F. G. De O. Neto, M. Chukaleski, and S. Daknache, Challenges and guidelines on designing test cases for test bots, in Proc. IEEE/ACM 42nd Int. Conf. on Software Engineering Workshops, Seoul, Republic of Korea, 2020, pp. 41–45.
DOI
[4]
M. Wyrich and J. Bogner, Towards an autonomous bot for automatic source code refactoring, in Proc. IEEE/ACM 1st Int. Workshop on Bots in Software Engineering, Montreal, Canada, 2019, pp. 24–28.
DOI
[5]
L. P. S. Alves, I. S. Wiese, A. P. Chaves, and I. Steinmacher, How to find my task? Chatbot to assist newcomers in choosing tasks in OSS projects, in Proc. 5th Int. Workshop on Chatbot Research and Design, Virtual Event, 2022, pp. 90–107.
DOI
[6]
J. Dominic, J. Houser, I. Steinmacher, C. Ritter, and P. Rodeghero, Conversational bot for newcomers onboarding to open source projects, in Proc. IEEE/ACM 42nd Int. Conf. on Software Engineering Workshops, Seoul, Republic of Korea, 2020, pp. 46–50.
DOI
[7]
S. Amreen, B. Bichescu, R. Bradley, T. Dey, Y. Ma, A. Mockus, S. Mousavi, and R. Zaretzki, A methodology for measuring FLOSS ecosystems, in Towards Engineering Free/Libre Open Source Software (FLOSS) Ecosystems for Impact and Sustainability, B. Fitzgerald, A. Mockus, M. Zhou, eds. Singapore: Springer, 2019, pp. 1–29.
DOI
[8]
T. Dey, Y. Ma, and A. Mockus, Patterns of effort contribution and demand and user classification based on participation patterns in NPM ecosystem, in Proc. Fifteenth Int. Conf. on Predictive Models and Data Analytics in Software Engineering, Recife, Brazil, 2019, pp. 36–45.
DOI
[9]

T. Dey and A. Mockus, Deriving a usage-independent software quality metric, Empir. Software Eng., vol. 25, no. 2, pp. 1596–1641, 2020.

[10]
T. Dey and A. Mockus, Which pull requests get accepted and why? A study of popular NPM packages, arXiv preprint arXiv: 2003.01153, 2020.
[11]

T. Bhowmik, N. Niu, W. Wang, J. R. C. Cheng, L. Li, and X. Cao, Optimal group size for software change tasks: A social information foraging perspective, IEEE Trans. Cybern., vol. 46, no. 8, pp. 1784–1795, 2016.

[12]
M. Zhou and A. Mockus, Developer fluency: Achieving true mastery in software projects, in Proc. Eighteenth ACM SIGSOFT Int. Symp. on Foundations of Software Engineering, Santa Fe, NM, USA, 2010, pp. 137–146.
DOI
[13]
T. Dey, S. Mousavi, E. Ponce, T. Fry, B. Vasilescu, A. Filippova, and A. Mockus, Detecting and characterizing bots that commit code, in Proc. 17th Int. Conf. on Mining Software Repositories, Seoul, Republic of Korea, 2020, pp. 137–146.
DOI
[14]
M. Wessel, A. Serebrenik, I. Wiese, I. Steinmacher, and M. A. Gerosa, What to expect from code review bots on GitHub? A survey with OSS maintainers, in Proc. XXXIV Brazilian Symp. on Software Engineering, Natal, Brazil, 2020, pp. 457–462.
DOI
[15]

M. Golzadeh, A. Decan, D. Legay, and T. Mens, A ground-truth dataset and classification model for detecting bots in GitHub issue and PR comments, J. Syst. Software, vol. 175, p. 110911, 2021.

[16]
M. Golzadeh, A. Decan, E. Constantinou, and T. Mens, Identifying bot activity in GitHub pull request and issue comments, in Proc. 2021 IEEE/ACM Third Int. Workshop on Bots in Software Engineering, Madrid, Spain, 2021, pp. 21–25.
DOI
[17]
M. Golzadeh, A. Decan, and T. Mens, Evaluating a bot detection model on git commit messages, in Proc. 19th Belgium-Netherlands Software Evolution Workshop, Virtual Event, http://arxiv.org/abs/2013.11779, 2021.
[18]
A. Abdellatif, M. Wessel, I. Steinmacher, M. A. Gerosa, and E. Shihab, BotHunter: An approach to detect software bots in GitHub, in Proc. IEEE/ACM 19th Int. Conf. on Mining Software Repositories, Pittsburgh, PA, USA, pp. 6–17, 2022.
DOI
[19]

M. Wessel, B. M. De Souza, I. Steinmacher, I. S. Wiese, I. Polato, A. P. Chaves, and M. A. Gerosa, The power of bots: Characterizing and understanding bots in OSS projects, Proc. ACM Human-Comput. Interact., vol. 2, no. CSCW, p. 182, 2018.

[20]
M. Wessel, A. Serebrenik, I. Wiese, I. Steinmacher, and M. A. Gerosa, Effects of adopting code review bots on pull requests to OSS projects, in Proc. 2020 IEEE Int. Conf. on Software Maintenance and Evolution, Adelaide, Australia, 2020, pp. 1–11.
DOI
[21]
S. Phaithoon, S. Wongnil, P. Pussawong, M. Choetkiertikul, C. Ragkhitwetsagul, T. Sunetnanta, R. Maipradit, H. Hata, and K. Matsumoto, FixMe: A GitHub bot for detecting and monitoring on-hold self-admitted technical debt, in Proc. 36th IEEE/ACM Int. Conf. on Automated Software Engineering, Melbourne, Australia, 2021, pp. 1257–1261.
DOI
[22]
H. Mohayeji, F. Ebert, E. Arts, E. Constantinou, and A. Serebrenik, On the adoption of a TODO bot on GitHub: A preliminary study, in Proc. IEEE/ACM 4th Int. Workshop on Bots in Software Engineering, Pittsburgh, PA, USA, 2022, pp. 23–27.
DOI
[23]
R. Romero, E. Parra, and S. Haiduc, Experiences building an answer bot for gitter, in Proc. IEEE/ACM 42nd Int. Conf. on Software Engineering Workshops, Seoul, Republic of Korea, 2020, pp. 66–70.
DOI
[24]
M. Golzadeh, D. Legay, A. Decan, and T. Mens, Bot or not? Detecting bots in GitHub pull request activity based on comment similarity, in Proc. IEEE/ACM 42nd Int. Conf. on Software Engineering Workshops, Seoul, Republic of Korea, 2020, pp. 31–35.
DOI
[25]
N. Chidambaram, A. Decan, and M. Golzadeh, Leveraging predictions from multiple repositories to improve bot detection, in Proc. IEEE/ACM 4th Int. Workshop on Bots in Software Engineering, Pittsburgh, PA, USA, 2022, pp. 6–9.
DOI
[26]
E. Lee, J. Woo, H. Kim, A. Mohaisen, and H. K. Kim, You are a game bot! Uncovering game bots in MMORPGs via self-similarity in the wild, in Proc. 23rd Annu. Network and Distributed System Security Symp., San Diego, CA, USA, 2016, pp. 1–15.
DOI
[27]

P. Jaccard, The distribution of the flora in the alpine zone, New Phytol., vol. 11, no. 2, pp. 37–50, 1912.

[28]

V. I. Levenshtein, Binary codes capable of correcting deletions, insertions and reversals, Sov. Phys. Dokl., vol. 10, pp. 707–710, 1966.

[29]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al., Scikit-learn: Machine learning in Python, J. Mach. Learn. Res., vol. 12, pp. 2825–2830, 2011.

[30]

A. Altmann, L. Toloşi, O. Sander, and T. Lengauer, Permutation importance: A corrected feature importance measure, Bioinformatics, vol. 26, no. 10, pp. 1340–1347, 2010.

Publication history
Copyright
Rights and permissions

Publication history

Received: 23 February 2023
Revised: 08 March 2023
Accepted: 01 April 2023
Published: 30 September 2023
Issue date: September 2023

Copyright

© All articles included in the journal are copyrighted to the ITU and TUP.

Rights and permissions

This work is available under the CC BY-NC-ND 3.0 IGO license:https://creativecommons.org/licenses/by-nc-nd/3.0/igo/

Return