AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (1.2 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Research paper | Open Access

Application of keyword extraction on MOOC resources

Zhuoxuan Jiang1( )Chunyan Miao2Xiaoming Li1
Peking University, Beijing, China
Joint NTU-UBC Research Centre of Excellence in Active Living for the Elderly, Nanyang Technological University, Singapore, Singapore and School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
Show Author Information

Abstract

Purpose

Recent years have witnessed the rapid development of massive open online courses (MOOCs). With more and more courses being produced by instructors and being participated by learners all over the world, unprecedented massive educational resources are aggregated. The educational resources include videos, subtitles, lecture notes, quizzes, etc., on the teaching side, and forum contents, Wiki, log of learning behavior, log of homework, etc., on the learning side. However, the data are both unstructured and diverse. To facilitate knowledge management and mining on MOOCs, extracting keywords from the resources is important. This paper aims to adapt the state-of-the-art techniques to MOOC settings and evaluate the effectiveness on real data. In terms of practice, this paper also tries to answer the questions for the first time that to what extend can the MOOC resources support keyword extraction models, and how many human efforts are required to make the models work well.

Design/methodology/approach

Based on which side generates the data, i.e instructors or learners, the data are classified to teaching resources and learning resources, respectively. The approach used on teaching resources is based on machine learning models with labels, while the approach used on learning resources is based on graph model without labels.

Findings

From the teaching resources, the methods used by the authors can accurately extract keywords with only 10 per cent labeled data. The authors find a characteristic of the data that the resources of various forms, e.g. subtitles and PPTs, should be separately considered because they have the different model ability. From the learning resources, the keywords extracted from MOOC forums are not as domain-specific as those extracted from teaching resources, but they can reflect the topics which are lively discussed in forums. Then instructors can get feedback from the indication. The authors implement two applications with the extracted keywords: generating concept map and generating learning path. The visual demos show they have the potential to improve learning efficiency when they are integrated into a real MOOC platform.

Research limitations/implications

Conducting keyword extraction on MOOC resources is quite difficult because teaching resources are hard to be obtained due to copyrights. Also, getting labeled data is tough because usually expertise of the corresponding domain is required.

Practical implications

The experiment results support that MOOC resources are good enough for building models of keyword extraction, and an acceptable balance between human efforts and model accuracy can be achieved.

Originality/value

This paper presents a pioneer study on keyword extraction on MOOC resources and obtains some new findings.

References

 
Anderson, A., Huttenlocher, D., Kleinberg, J. and Leskovec, J. (2014), “Engaging with massive online courses”, WWW’14 Proceedings of the 23rd International Conference on World Wide Web, pp. 687-698.https://doi.org/10.1145/2566486.2568042
 

Bin, Y. and Shichao, C. (2011), “Term extraction method based on mutual information with threshold interval”,Applied Informatics and Communication, Vol. 227 No. 4, pp. 186-194.

 

Breslow, L., Pritchard, D.E., DeBoer, J., Stump, G.S., Ho, A.D. and Seaton, D.T. (2013), “Studying learning in the worldwide classroom: Research into edX’s first MOOC”, Research & Practice in Assessment, Vol. 8 No. 1, pp. 13-25.

 
Brin, S. and Page, L. (1998). “The anatomy of a large-scale hypertextual web search engine”, Proceedings of the 7th International Conference on World Wide Web, WWW ’1998, Elsevier Science Publishers, pp. 107-117.https://doi.org/10.1016/S0169-7552(98)00110-X
 
Chang, P.-C., Galley, M. and Manning, C. (2008). “Optimizing Chinese word segmentation for machine translation performance”, Proceedings of the Third Workshop on Statistical Machine Translation, pp. 224-232.https://doi.org/10.3115/1626394.1626430
 

Chen, N.-S., Kinshuk, Wei, C.-W. and Chen, H.-J. (2008), “Mining e-learning domain concept map from academic articles”, Computers & Education, Vol. 50 No. 3, pp. 1009-1021.

 
Chu, H.-C., Hwang, G.-J., Wu, P.-H. and Chen, J.-M. (2007). “A computer-assisted collaborative approach for e-training course design”, Proceedings of the 7th IEEE International Conference on Advanced Learning Technologies, IEEE, pp. 36-40.https://doi.org/10.1109/ICALT.2007.7
 
Dong, X., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann, T. and Zhang, S.S.W. (2014). “Knowledge vault: a web-scale approach to probabilistic knowledge fusion”, Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp. 601-610.https://doi.org/10.1145/2623330.2623623
 
Finkel, J.R., Grenager, T. and Manning, C. (2005), “Incorporating non-local information into information extraction systems by gibbs sampling”, Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics, ACL, pp. 363-370.https://doi.org/10.3115/1219840.1219885
 

Frantzi, K., Ananiadou, S. and Mima, H. (2000), “Automatic recognition of multi-word terms: the C-value/NC-value method”, International Journal on Digital Libraries, Vol. 3 No. 2, pp. 115-130.

 
Ho, A.D., Reich, J., Nesterko, S.O., Seaton, D.T., Mullaney, T., Waldo, J. and Chuang, I. (2013), “HarvardX and MITx: the first year of open online courses”, HarvardX and MITx Working Paper No. 1, available at: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2381263
 
Huang, C.-J., Tsai, P.-H., Hsu, C.-L. and Pan, R.-C. (2006), “Exploring cognitive difference in instructional outcomes using text mining technology”, Proceedings of the 2006 IEEE International Conference on Systems, Man and Cybernetics, IEEE, pp. 2116-2120.https://doi.org/10.1109/ICSMC.2006.385173
 
Huang, J., Dasgupta, A., Ghosh, A., Manning, J. and Sanders, M. (2014). “Superposter behavior in MOOC forums”, Proceedings of the first ACM Conference on Learning @ Scale Conference, Atlanta, GA, 4-5 March, ACM, New York, NY, pp. 117-126.https://doi.org/10.1145/2556325.2566249
 
Huang, X., Yang, K. and Lawrence, V. (2015), “Classification-based approach to concept map generation in adaptive learning”, Proceedings of the IEEE 15th International Conference on Advanced Learning Technologies, IEEE, pp. 19-23.https://doi.org/10.1109/ICALT.2015.149
 
Jiang, Z., Zhang, Y., Liu, C. and Li, X. (2015), “Influence analysis by heterogeneous network in MOOC forums: what can we discover?”, paper presented at the International Conference on Educational Data Mining, Madrid, pp. 242-249.
 

Justesona, J.S. and Katza, S.M. (1995), “Technical terminology: some linguistic properties and an algorithm for identification in text”, Natural Language Engineering, Vol. 1 No. 1, pp. 9-27.

 
Kizilcec, R.F., Piech, C. and Schneider, E. (2013). “Deconstructing disengagement: analyzing learner subpopulations in massive open online courses”, Proceeding of LAK 2013, ACM Press, pp. 170-179.https://doi.org/10.1145/2460296.2460330
 
Lafferty, J.D., McCallum, A. and Pereira, F.C.N. (2001), “Conditional random fields: probabilistic models for segmenting and labeling sequence data”, Proceedings of the 18th International Conference on Machine Learning 2001 ICML’01, pp. 282-289.
 

Lau, R.Y., Song, D., Li, Y., Cheung, T.C. and Hao, J.-X. (2009), “Toward a fuzzy domain ontology extraction method for adaptive e-learning”, IEEE Transactions on Knowledge & Data Engineering, Vol. 21 No. 6, pp. 800-813.

 

Lee, C.-H., Lee, G.-G. and Leu, Y. (2009), “Application of automatically constructed concept map of learning to conceptual diagnosis of e-learning”, Expert Systems with Applications, Vol. 36 No. 2, pp. 1675-1684.

 

Liu, A., Jun, G. and Ghosh, J. (2009), “A self-training approach to cost sensitive uncertainty sampling”, Machine Learning, Vol. 76 Nos 2/3, pp. 257-270.

 
Marian, S. and Maria, B. (2009), “Automatic concept relationships discovery for an adaptive e-course”, Proceedings of the 2nd International Conference on Educational Data Mining, IEDMS, pp. 171-178.
 
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S. and Dean, J. (2013). “Distributed representations of words and phrases and their compositionality”, NIPS’13 Proceedings of the 26th International Conference on Neural Information Processing Systems, Curran Associates, pp. 3111-3119.
 

Nadeau, D. and Sekine, S. (2007), “A survey of named entity recognition and classification”, Lingvisticae Investigationes, Vol. 30 No. 1, pp. 3-26.

 
Nickel, M. Murphy, K. Tresp, V. and Gabrilovich, E. (2015). “A review of relational machine learning for knowledge graphs”, available at: http://arxiv.org/abs/1503.00759v3
 
Nojiri, S. and Manning, C.D. (2015). “Software document terminology recognition”, AAAI Spring Symposium, pp. 49-54.
 
Novak, J.D. and Cañas, A.J. (2006), “The theory underlying concept maps and how to construct and use them”, Technical Report IHMC CmapTools 2006-01 Rev 2008-01, available at: http://cmap.ihmc.us/docs/theory-of-concept-maps.php
 

Qasim, I., Jeong, J.-W., Heu, J.-U. and Lee, D.-H. (2013), “Concept map construction from text documents using affinity propagation”, Journal of Information Science, Vol. 39 No. 6, pp. 719-736.

 
Qin, Y., Zheng, D., Zhao, T. and Zhang, M. (2013), “Chinese terminology extraction using em-based transfer learning method”, 14th International Conference, CICLing 2013, pp. 139-152.https://doi.org/10.1007/978-3-642-37247-6_12
 
Ratinov, L. and Roth, D. (2009), “Design challenges and misconceptions in named entity recognition”,Proceedings of the 13th Conference on Computational Natural Language Learning, ACL, Boulder, pp. 147-155.https://doi.org/10.3115/1596374.1596399
 
Robertson, S., Zaragoza, H. and Taylor, M. (2004), “Simple bm25 extension to multiple weighted fields”, CIKM’04 Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, ACM, New York, NY, pp. 42-49.https://doi.org/10.1145/1031171.1031181
 

Romero, C. and Ventura, S. (2010), “Educational data mining: a review of the state of the art”, IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, Vol. 40 No. 6, pp. 601-618.

 

Sonawane, S.S. and Kulkarni, P.A. (2014), “Graph based representation and analysis of text document: a survey of techniques”,International Journal of Computer Applications, Vol. 96 No. 19, pp. 1-8.

 

Sutton, C. and McCallum, A. (2011), “An introduction to conditional random fields”, Machine Learning, Vol. 4 No. 4, pp. 267-373.

 
Toutanova, K., Klein, D., Manning, C. and Singer, Y. (2003). “Feature-rich part-of-speech tagging with a cyclic dependency network”, Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pp. 252-259.https://doi.org/10.3115/1073445.1073478
 

Tsenga, S.-S., Sue, P.-C., Su, J.-M., Weng, J.-F. and Tsai, W.-N. (2007), “A new approach for constructing the concept map”, Computers & Education, Vol. 49 No. 3, pp. 691-707.

 
Wang, X., Yang, D., Wen, M., Koedinger, K. and Rosé, C.P. (2015). “Investigating how student’s cognitive behavior in MOOC discussion forums affect learning gains”, The 8th International Conference on Educational Data Mining EDM’15, Madrid, pp. 226-233.
 
Wen, M., Yang, D. and Rosé, C. (2014), “Sentiment analysis in MOOC discussion forums: what does it tell us?”, EDM’14, pp. 130-137.
International Journal of Crowd Science
Pages 48-70
Cite this article:
Jiang Z, Miao C, Li X. Application of keyword extraction on MOOC resources. International Journal of Crowd Science, 2017, 1(1): 48-70. https://doi.org/10.1108/IJCS-12-2016-0003

716

Views

17

Downloads

0

Crossref

1

Scopus

Altmetrics

Received: 02 December 2016
Revised: 17 January 2017
Accepted: 19 January 2017
Published: 06 March 2017
© The author(s)

Zhuoxuan Jiang, Chunyan Miao and Xiaoming Li. Published in International Journal of Crowd Science. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode

Return