Journal Home > Volume 6 , Issue 1

A common phenomenon that increasingly stimulates the interest of investors, companies, and entrepreneurs involved in crowd funding activities particularly on the Kickstarter website is identifying metrics that make such campaigns markedly successful. This study seeks to gauge the importance of key predictive variables or features based on statistical analysis, identify model-based machine learning methods based on performance assessment that predict success of a campaigns, and compare the selected different machine learning algorithms. To achieve our research objectives and maximize insight into the dataset used, feature engineering was performed. Then, machine learning models, inclusive of Logistic Regression (LR), Support Vector Machines (SVMs) in the form of Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), and random forest analysis (bagging and boosting), were performed and compared via cross validation approaches in terms of their resulting test error rates, F1 score, Accuracy, Precision, and Recall rates. Of the machine learning models employed for predictive analysis, the test error rates and the other classification metric scores obtained across the three cross-validation approaches identified bagging and gradient boosting (the SVMs) as more robust methods for predicting success of Kickstarter projects. The major research objectives in this paper have been achieved by accessing the performance of key statistical learning methods that guides the choice of learning methods or models and giving us a measure of the quality of the ultimately chosen model. However, Bayesian semi-parametric approaches are of future research consideration. These methods facilitate the usage of an infinite number of parameters to capture information regarding the underlying distributions of even more complex data.


menu
Abstract
Full text
Outline
About this article

Predicting the Entrepreneurial Success of Crowdfunding Campaigns Using Model-Based Machine Learning Methods

Show Author's information Michael Safo Oduro1Han Yu1( )Hong Huang2
Department of Applied Statistics and Research Methods, University of Northern Colorado, Greeley, CO 80639, USA
School of Information, University of South Florida, Tampa, FL 33620-9951, USA

Abstract

A common phenomenon that increasingly stimulates the interest of investors, companies, and entrepreneurs involved in crowd funding activities particularly on the Kickstarter website is identifying metrics that make such campaigns markedly successful. This study seeks to gauge the importance of key predictive variables or features based on statistical analysis, identify model-based machine learning methods based on performance assessment that predict success of a campaigns, and compare the selected different machine learning algorithms. To achieve our research objectives and maximize insight into the dataset used, feature engineering was performed. Then, machine learning models, inclusive of Logistic Regression (LR), Support Vector Machines (SVMs) in the form of Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), and random forest analysis (bagging and boosting), were performed and compared via cross validation approaches in terms of their resulting test error rates, F1 score, Accuracy, Precision, and Recall rates. Of the machine learning models employed for predictive analysis, the test error rates and the other classification metric scores obtained across the three cross-validation approaches identified bagging and gradient boosting (the SVMs) as more robust methods for predicting success of Kickstarter projects. The major research objectives in this paper have been achieved by accessing the performance of key statistical learning methods that guides the choice of learning methods or models and giving us a measure of the quality of the ultimately chosen model. However, Bayesian semi-parametric approaches are of future research consideration. These methods facilitate the usage of an infinite number of parameters to capture information regarding the underlying distributions of even more complex data.

Keywords: machine learning, crowdfunding, entrepreneurship, cross validation, Support Vector Machines (SVM)

References(15)

1

P. Belleflamme, T. Lambert, and A. Schwienbacher, Crowdfunding: Tapping the right crowd, J. Bus. Venturing, vol. 29, no. 5, pp. 585–609, 2014.

2
V. Kuppuswamy and B. L. Bayus, Crowdfunding creatynamics oive ideas: The df project backers, in The Economics of Crowdfunding, D. Cumming and L. Hornuf, Eds. Cham, Germany: Springer, 2018, pp. 151–182.https://doi.org/10.1007/978-3-319-66119-3_8
DOI
3

E. M. Gerber and J. Hui, Crowdfunding: Motivations and deterrents for participation, ACM Trans. Comput.-Human Interact., vol. 20, no. 6, p. 34, 2013.

4
J. S. Hui, M. D. Greenberg, and E. M. Gerber, Understanding the role of community in crowdfunding work, in Proc. 17th ACM Conf. on Computer Supported Cooperative Work & Social Computing, Baltimore, MD, USA, 2014, pp. 62–74.https://doi.org/10.1145/2531602.2531715
DOI
5

E. Mollick, The dynamics of crowdfunding: An exploratory study, J. Bus. Ventur., vol. 29, no. 1, pp. 1–16, 2014.

6

M. J. Zhou, B. Z. Lu, W. P. Fan, and G. A. Wang, Project description and crowdfunding success: An exploratory study, Inf. Syst. Front., vol. 20, no. 2, pp. 259–274, 2018.

7

N. X. Wang, Q. X. Li, H. G. Liang, T. F. Ye, and S. L. Ge, Understanding the importance of interaction between creators and backers in crowdfunding success, Electron. Commer. Res. Appl., vol. 27, pp. 106–117, 2018.

8

K. Choy and D. Schlagwein, Crowdsourcing for a better world: On the relation between it affordances and donor motivations in charitable crowdfunding, Inf. Technol. People, vol. 29, no. 1, pp. 221–247, 2016.

9

H. Yu, S. H. Jiang, and K. C. Land, Multicollinearity in hierarchical linear models, Soc. Sci. Res., vol. 53, pp. 118–136, 2015.

10

S. L. Kukreja, J. Löfberg, and M. J. Brenner, A least absolute shrinkage and selection operator (LASSO) for nonlinear system identification, IFAC Proc. Vol., vol. 39, no. 1, pp. 814–819, 2006.

11

B. H. Menze, B. M. Kelm, R. Masuch, U. Himmelreich, P. Bachert, W. Petrich, and F. A. Hamprecht, A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data, BMC Bioinformatics, vol. 10, no. 1, p. 213, 2009.

12
P. McCullagh and J. A. Nelder, Generalized Linear Models. 2nd ed. London, UK: Chapman & Hall/CRC, 1989.https://doi.org/10.1007/978-1-4899-3242-6
DOI
13

J. Franklin, The elements of statistical learning: Data mining, inference and prediction, Math. Intell., vol. 27, no. 2, pp. 83–85, 2005.

14
M. Grandini, E. Bagli, and G. Visani, Metrics for multi-class classification: An overview, arXiv preprint arXiv:2008.05756, 2020.
15
N. Japkowicz and M. Shah, Evaluating learning algorithms: A classification perspective. Cambridge, UK: Cambridge University Press, 2011.https://doi.org/10.1017/CBO9780511921803
DOI
Publication history
Copyright
Rights and permissions

Publication history

Received: 28 January 2021
Revised: 12 April 2021
Accepted: 26 October 2021
Published: 15 April 2022
Issue date: April 2022

Copyright

© The author(s) 2022

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return