582
Views
15
Downloads
3
Crossref
N/A
WoS
3
Scopus
N/A
CSCD
Ensuring quality is one of the most significant challenges in microtask crowdsourcing tasks. Aggregation of the collected data from the crowd is one of the important steps to infer the correct answer, but the existing study seems to be limited to the single-step task. This study aims to look at multiple-step classification tasks and understand aggregation in such cases; hence, it is useful for assessing the classification quality.
The authors present a model to capture the information of the workflow, questions and answers for both single- and multiple-question classification tasks. They propose an adapted approach on top of the classic approach so that the model can handle tasks with several multiple-choice questions in general instead of a specific domain or any specific hierarchical classifications. They evaluate their approach with three representative tasks from existing citizen science projects in which they have the gold standard created by experts.
The results show that the approach can provide significant improvements to the overall classification accuracy. The authors’ analysis also demonstrates that all algorithms can achieve higher accuracy for the volunteer- versus paid-generated data sets for the same task. Furthermore, the authors observed interesting patterns in the relationship between the performance of different algorithms and workflow-specific factors including the number of steps and the number of available options in each step.
Due to the nature of crowdsourcing, aggregating the collected data is an important process to understand the quality of crowdsourcing results. Different inference algorithms have been studied for simple microtasks consisting of single questions with two or more answers. However, as classification tasks typically contain many questions, the proposed method can be applied to a wide range of tasks including both single- and multiple-question classification tasks.
Ensuring quality is one of the most significant challenges in microtask crowdsourcing tasks. Aggregation of the collected data from the crowd is one of the important steps to infer the correct answer, but the existing study seems to be limited to the single-step task. This study aims to look at multiple-step classification tasks and understand aggregation in such cases; hence, it is useful for assessing the classification quality.
The authors present a model to capture the information of the workflow, questions and answers for both single- and multiple-question classification tasks. They propose an adapted approach on top of the classic approach so that the model can handle tasks with several multiple-choice questions in general instead of a specific domain or any specific hierarchical classifications. They evaluate their approach with three representative tasks from existing citizen science projects in which they have the gold standard created by experts.
The results show that the approach can provide significant improvements to the overall classification accuracy. The authors’ analysis also demonstrates that all algorithms can achieve higher accuracy for the volunteer- versus paid-generated data sets for the same task. Furthermore, the authors observed interesting patterns in the relationship between the performance of different algorithms and workflow-specific factors including the number of steps and the number of available options in each step.
Due to the nature of crowdsourcing, aggregating the collected data is an important process to understand the quality of crowdsourcing results. Different inference algorithms have been studied for simple microtasks consisting of single questions with two or more answers. However, as classification tasks typically contain many questions, the proposed method can be applied to a wide range of tasks including both single- and multiple-question classification tasks.
Acosta, M., Zaveri, A., Simperl, E., Kontokostas, D., Auer, S. and Lehmann, J. (2013), “Crowdsourcing linked data quality assessment”, The Semantic Web – ISWC 2013, pp. 260-276.
Batini, C., Cappiello, C., Francalanci, C. and Maurino, A. (2009), “Methodologies for data quality assessment and improvement”, ACM Computing Surveys, Vol. 41 No. 3, pp. 1-52.
Dawid, A.P. and Skene, A.M. (1979), “Maximum likelihood estimation of observer error-rates using the em algorithm”, Applied Statistics , Vol. 28 No. 1, p. 20.
Difallah, D.E., Catasta, M., Demartini, G. and Panagiotis, G. (2015b), “Ipeirotis, and Philippe Cudré-Mauroux”, The Dynamics of Micro-Task Crowdsourcing: The Case of Amazon MTurk,” Pages, pp. 238-247.
dos Reis, F.J.C.S., Lynn, H.R., Ali, D., Eccles, A., Hanby, E., Provenzano, C., Caldas, W.J., Howat, L.-A., McDuffus, B. and Liu, (2015), “Crowdsourcing the general public for large scale molecular pathology studies in cancer”, EBioMedicine, Vol. 2 No. 7, pp. 679-687.
Gadiraju, U., Demartini, G., Kawase, R. and Dietze, S. (2015), “Human beyond the machine: challenges and opportunities of microtask crowdsourcing”, IEEE Intelligent Systems, Vol. 30 No. 4, pp. 81-85.
Ipeirotis, P.G., Provost, F., Sheng, V.S. and Wang, J. (2014), “Repeated labeling using multiple noisy labelers”, Data Mining and Knowledge Discovery, Vol. 28 No. 2, pp. 402-441.
Kahn, B.K., Strong, D.M. and Wang, R.Y. (2002), “Information quality benchmarks: product and service performance”, Communications of the Acm, Vol. 45 No. 4, pp. 184-192.
Karger, D.R., Oh, S. and Shah, D. (2011), “Iterative learning for reliable crowdsourcing systems”, Advances in Neural Information Processing Systems, pp. 1953-1961.
Lintott, C., Schawinski, K., Bamford, S., Slosar, A., Land, K., Thomas, D., Edmondson, E., Masters, K., Nichol, R.C. and Raddick, M.J. (2011), “Galaxy zoo 1: data release of morphological classifications for nearly 900 000 galaxies”, Monthly Notices of the Royal Astronomical Society, Vol. 410 No. 1, pp. 166-178.
Liu, X., Lu, M., Ooi, C., Shen, Y., Wu, S. and Zhang, M. (2012), “CDAS: a crowdsourcing data analytics system”, Proceedings of the Vldb Endowment, Vol. 5 No. 10, pp. 1040-1051.
Malone, T.W., Laubacher, R. and Dellarocas, C. (2010), “The collective intelligence genome”, IEEE Engineering Management Review, Vol. 38 No. 3.
Parameswaran, A., Sarma, A.D., Garcia-Molina, H., Polyzotis, N. and Widom, J. (2011), “Human-Assisted graph search: it’s okay to ask questions”, Proceedings of the VLDB Endowment, Vol. 4 No. 5, pp. 267-278.
Paulheim, H. and Bizer, C. (2014), “Improving the quality of linked data using statistical distributions”, International Journal on Semantic Web and Information Systems (Systems), Vol. 10 No. 2, pp. 63-86.
Pukelsheim, F. (1994), “The three sigma rule”, The American Statistician, Vol. 48 No. 2, pp. 88-91.
Sheshadri, A., Lease, M. (2013), “SQUARE: a benchmark for research on computing crowd consensus”, First AAAI Conference on Human Computation and …, pp. 156-164.
Simpson, E., Roberts, S., Psorakis, I. and Smith, A. (2013), “Dynamic Bayesian combination of multiple imperfect classifiers”, Studies in Computational Intelligence, Vol. 474, pp. 1-35.
Whitehill, J., Ruvolo, P., Wu, T., Bergsma, J. and Movellan, J. (2009), “Whose vote should count more: optimal integration of labels from labelers of unknown expertise”, Advances in Neural Information Processing Systems, Vol. 22 No. 1, pp. 1-9.
Willett, K.W., Lintott, C.J., Bamford, S.P., Masters, K.L., Simmons, B.D., Casteels, K.R.V., Edmondson, E.M., Fortson, L.F., Kaviraj, S., Keel, W.C., Melvin, T., Nichol, R.C., Raddick, M.J., Schawinski, K., Simpson, R.J., Skibba, R.A., Smith, A.M. and Thomas, D. (2013), “Galaxy zoo 2: detailed morphological classifications for 304 122 galaxies from the sloan digital sky survey”, Monthly Notices of the Royal Astronomical Society, Vol. 435 No. 4, pp. 2835-2860.
Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S. and Hitzler, P. (2013), “Quality assessment methodologies for linked open data”, Semantic Web.
Zhang, J., Sheng, V.S., Li, Q., Wu, J. and Wu, X. (2017a), “Consensus algorithms for biased labeling in crowdsourcing”, Information Sciences, Vol. 382-383, pp. 254-273.
Zheng, Y., Li, G., Li, Y., Shan, C. and Cheng, R. (2017b), “Truth inference in crowdsourcing: is the problem solved?”, Proceedings of the VLDB Endowment, Vol. 10 No. 5.
Qiong Bu, Elena Simperl, Adriane Chapman and Eddy Maddalena. Published in International Journal of Crowd Science. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode