Intelligent and Converged Networks 2023, 4(3): 181-197 https://doi.org/10.23919/ICN.2023.0006

Open Access | Issue | Published: 30 September 2023

BDGOA: A bot detection approach for GitHub OAuth Apps

Show Author's Information Hide Author's Information Zhifang Liao^¹, Xuechun Huang^¹, Bolin Zhang^¹, Jinsong Wu^²(

), Yu Cheng^³

1School of Computer Science and Engineering, Central South University, Changsha 410083, China

2School of Artificial Intelligence, Guilin University of Electronic Technology, Guilin 541004, China, and also with the Department of Electrical Engineering, University of Chile, Santiago 8320000, Chile

3Hunan Glozeal Science and Technology Co., Ltd., Changsha 410083, China

Keywords:

machine learning, Github, DevBots, text similarity

Cite this article:

Liao Z, Huang X, Zhang B, et al. BDGOA: A bot detection approach for GitHub OAuth Apps. Intelligent and Converged Networks, 2023, 4(3): 181-197. https://doi.org/10.23919/ICN.2023.0006

Download citation

EndNote(RIS)

BibTeX

559

Views

116

Downloads

Citations

Crossref

N/A

WoS

Scopus

N/A

CSCD

Abstract Full text About this article

Abstract

As various software bots are widely used in open source software repositories, some drawbacks are coming to light, such as giving newcomers non-positive feedback and misleading empirical studies of software engineering researchers. Several techniques have been proposed by researchers to perform bot detection, but most of them are limited to identifying bots performing specific activities, let alone distinguishing between GitHub App and OAuth App. In this paper, we propose a bot detection technique for OAuth App, named BDGOA. 24 features are used in BDGOA, which can be divided into three dimensions: account information, account activity, and text similarity. To better explore the behavioral features, we define a fine-grained classification of behavioral events and introduce self-similarity to quantify the repeatability of behavioral sequence. We leverage five machine learning classifiers on the benchmark dataset to conduct bot detection, and finally choose random forest as the classifier, which achieves the highest F1-score of 95.83%. The experimental results comparing with the state-of-the-art approaches also demonstrate the superiority of BDGOA.

Full text

Abstract

Full text

Outline

About this article

BDGOA: A bot detection approach for GitHub OAuth Apps

Show Author's information Hide Author's Information Zhifang Liao^¹, Xuechun Huang^¹, Bolin Zhang^¹, Jinsong Wu^²(

), Yu Cheng^³

1School of Computer Science and Engineering, Central South University, Changsha 410083, China

3Hunan Glozeal Science and Technology Co., Ltd., Changsha 410083, China

Abstract

Keywords: machine learning, Github, DevBots, text similarity

References(30)

[1]

S. Mirhosseini and C. Parnin, Can automated pull requests encourage software developers to upgrade out-of-date dependencies? in Proc. 32^nd IEEE/ACM Int. Conf. on Automated Software Engineering, Urbana, IL, USA, 2017, pp. 84–94.

DOI

[2]

S. Urli, Z. Yu, L. Seinturier, and M. Monperrus, How to design a program repair bot? Insights from the repairnator project, in Proc. IEEE/ACM 40^th Int. Conf. on Software Engineering: Software Engineering in Practice Track, Gothenburg, Sweden, 2018, pp. 95–104.

DOI

[3]

L. Erlenhov, F. G. De O. Neto, M. Chukaleski, and S. Daknache, Challenges and guidelines on designing test cases for test bots, in Proc. IEEE/ACM 42^nd Int. Conf. on Software Engineering Workshops, Seoul, Republic of Korea, 2020, pp. 41–45.

DOI

[4]

M. Wyrich and J. Bogner, Towards an autonomous bot for automatic source code refactoring, in Proc. IEEE/ACM 1^st Int. Workshop on Bots in Software Engineering, Montreal, Canada, 2019, pp. 24–28.

DOI

[5]

L. P. S. Alves, I. S. Wiese, A. P. Chaves, and I. Steinmacher, How to find my task? Chatbot to assist newcomers in choosing tasks in OSS projects, in Proc. 5^th Int. Workshop on Chatbot Research and Design, Virtual Event, 2022, pp. 90–107.

DOI

[6]

J. Dominic, J. Houser, I. Steinmacher, C. Ritter, and P. Rodeghero, Conversational bot for newcomers onboarding to open source projects, in Proc. IEEE/ACM 42^nd Int. Conf. on Software Engineering Workshops, Seoul, Republic of Korea, 2020, pp. 46–50.

DOI

[7]

S. Amreen, B. Bichescu, R. Bradley, T. Dey, Y. Ma, A. Mockus, S. Mousavi, and R. Zaretzki, A methodology for measuring FLOSS ecosystems, in Towards Engineering Free/Libre Open Source Software (FLOSS) Ecosystems for Impact and Sustainability, B. Fitzgerald, A. Mockus, M. Zhou, eds. Singapore: Springer, 2019, pp. 1–29.

DOI

[8]

T. Dey, Y. Ma, and A. Mockus, Patterns of effort contribution and demand and user classification based on participation patterns in NPM ecosystem, in Proc. Fifteenth Int. Conf. on Predictive Models and Data Analytics in Software Engineering, Recife, Brazil, 2019, pp. 36–45.

DOI

[9]

T. Dey and A. Mockus, Deriving a usage-independent software quality metric, Empir. Software Eng., vol. 25, no. 2, pp. 1596–1641, 2020.

DOI Google Scholar

[10]

T. Dey and A. Mockus, Which pull requests get accepted and why? A study of popular NPM packages, arXiv preprint arXiv: 2003.01153, 2020.

[11]

T. Bhowmik, N. Niu, W. Wang, J. R. C. Cheng, L. Li, and X. Cao, Optimal group size for software change tasks: A social information foraging perspective, IEEE Trans. Cybern., vol. 46, no. 8, pp. 1784–1795, 2016.

DOI Google Scholar

[12]

M. Zhou and A. Mockus, Developer fluency: Achieving true mastery in software projects, in Proc. Eighteenth ACM SIGSOFT Int. Symp. on Foundations of Software Engineering, Santa Fe, NM, USA, 2010, pp. 137–146.

DOI

[13]

T. Dey, S. Mousavi, E. Ponce, T. Fry, B. Vasilescu, A. Filippova, and A. Mockus, Detecting and characterizing bots that commit code, in Proc. 17^th Int. Conf. on Mining Software Repositories, Seoul, Republic of Korea, 2020, pp. 137–146.

DOI

[14]

M. Wessel, A. Serebrenik, I. Wiese, I. Steinmacher, and M. A. Gerosa, What to expect from code review bots on GitHub? A survey with OSS maintainers, in Proc. XXXIV Brazilian Symp. on Software Engineering, Natal, Brazil, 2020, pp. 457–462.

DOI

[15]

M. Golzadeh, A. Decan, D. Legay, and T. Mens, A ground-truth dataset and classification model for detecting bots in GitHub issue and PR comments, J. Syst. Software, vol. 175, p. 110911, 2021.

DOI Google Scholar

[16]

M. Golzadeh, A. Decan, E. Constantinou, and T. Mens, Identifying bot activity in GitHub pull request and issue comments, in Proc. 2021 IEEE/ACM Third Int. Workshop on Bots in Software Engineering, Madrid, Spain, 2021, pp. 21–25.

DOI

[17]

M. Golzadeh, A. Decan, and T. Mens, Evaluating a bot detection model on git commit messages, in Proc. 19^th Belgium-Netherlands Software Evolution Workshop, Virtual Event, http://arxiv.org/abs/2013.11779, 2021.

[18]

A. Abdellatif, M. Wessel, I. Steinmacher, M. A. Gerosa, and E. Shihab, BotHunter: An approach to detect software bots in GitHub, in Proc. IEEE/ACM 19^th Int. Conf. on Mining Software Repositories, Pittsburgh, PA, USA, pp. 6–17, 2022.

DOI

[19]

M. Wessel, B. M. De Souza, I. Steinmacher, I. S. Wiese, I. Polato, A. P. Chaves, and M. A. Gerosa, The power of bots: Characterizing and understanding bots in OSS projects, Proc. ACM Human-Comput. Interact., vol. 2, no. CSCW, p. 182, 2018.

DOI Google Scholar

[20]

M. Wessel, A. Serebrenik, I. Wiese, I. Steinmacher, and M. A. Gerosa, Effects of adopting code review bots on pull requests to OSS projects, in Proc. 2020 IEEE Int. Conf. on Software Maintenance and Evolution, Adelaide, Australia, 2020, pp. 1–11.

DOI

[21]

S. Phaithoon, S. Wongnil, P. Pussawong, M. Choetkiertikul, C. Ragkhitwetsagul, T. Sunetnanta, R. Maipradit, H. Hata, and K. Matsumoto, FixMe: A GitHub bot for detecting and monitoring on-hold self-admitted technical debt, in Proc. 36^th IEEE/ACM Int. Conf. on Automated Software Engineering, Melbourne, Australia, 2021, pp. 1257–1261.

DOI

[22]

H. Mohayeji, F. Ebert, E. Arts, E. Constantinou, and A. Serebrenik, On the adoption of a TODO bot on GitHub: A preliminary study, in Proc. IEEE/ACM 4^th Int. Workshop on Bots in Software Engineering, Pittsburgh, PA, USA, 2022, pp. 23–27.

DOI

[23]

R. Romero, E. Parra, and S. Haiduc, Experiences building an answer bot for gitter, in Proc. IEEE/ACM 42^nd Int. Conf. on Software Engineering Workshops, Seoul, Republic of Korea, 2020, pp. 66–70.

DOI

[24]

M. Golzadeh, D. Legay, A. Decan, and T. Mens, Bot or not? Detecting bots in GitHub pull request activity based on comment similarity, in Proc. IEEE/ACM 42^nd Int. Conf. on Software Engineering Workshops, Seoul, Republic of Korea, 2020, pp. 31–35.

DOI

[25]

N. Chidambaram, A. Decan, and M. Golzadeh, Leveraging predictions from multiple repositories to improve bot detection, in Proc. IEEE/ACM 4^th Int. Workshop on Bots in Software Engineering, Pittsburgh, PA, USA, 2022, pp. 6–9.

DOI

[26]

E. Lee, J. Woo, H. Kim, A. Mohaisen, and H. K. Kim, You are a game bot! Uncovering game bots in MMORPGs via self-similarity in the wild, in Proc. 23^rd Annu. Network and Distributed System Security Symp., San Diego, CA, USA, 2016, pp. 1–15.

DOI

[27]

P. Jaccard, The distribution of the flora in the alpine zone, New Phytol., vol. 11, no. 2, pp. 37–50, 1912.

DOI Google Scholar

[28]

V. I. Levenshtein, Binary codes capable of correcting deletions, insertions and reversals, Sov. Phys. Dokl., vol. 10, pp. 707–710, 1966.

Google Scholar

[29]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al., Scikit-learn: Machine learning in Python, J. Mach. Learn. Res., vol. 12, pp. 2825–2830, 2011.

Google Scholar

[30]

A. Altmann, L. Toloşi, O. Sander, and T. Lengauer, Permutation importance: A corrected feature importance measure, Bioinformatics, vol. 26, no. 10, pp. 1340–1347, 2010.

DOI Google Scholar

About this article

Publication history

Rights and permissions

Publication history

Received: 23 February 2023

Revised: 08 March 2023

Accepted: 01 April 2023

Published: 30 September 2023

Issue date: September 2023

Copyright

Rights and permissions

This work is available under the CC BY-NC-ND 3.0 IGO license:https://creativecommons.org/licenses/by-nc-nd/3.0/igo/