Journal Home > Volume 28 , Issue 3

Question Generation (QG) is the task of utilizing Artificial Intelligence (AI) technology to generate questions that can be answered by a span of text within a given passage. Existing research on QG in the educational field struggles with two challenges: the mainstream QG models based on seq-to-seq fail to utilize the structured information from the passage; the other is the lack of specialized educational QG datasets. To address the challenges, a specialized QG dataset, reading comprehension dataset from examinations for QG (named RACE4QG), is reconstructed by applying a new answer tagging approach and a data-filtering strategy to the RACE dataset. Further, an end-to-end QG model, which can exploit the intra- and inter-sentence information to generate better questions, is proposed. In our model, the encoder utilizes a Gated Recurrent Units (GRU) network, which takes the concatenation of word embedding, answer tagging, and Graph Attention neTworks(GAT) embedding as input. The hidden states of the GRU are operated with a gated self-attention to obtain the final passage-answer representation, which will be fed to the decoder. Results show that our model outperforms baselines on automatic metrics and human evaluation. Consequently, the model improves the baseline by 0.44, 1.32, and 1.34 on BLEU-4, ROUGE-L, and METEOR metrics, respectively, indicating the effectivity and reliability of our model. Its gap with human expectations also reflects the research potential.


menu
Abstract
Full text
Outline
About this article

Leveraging Structured Information from a Passage to Generate Questions

Show Author's information Jian Xu1,4Yu Sun2( )Jianhou Gan3Mingtao Zhou2Di Wu1
Key Laboratory of Educational Informatization for Nationalities, Yunnan Normal University, Kunming 650500, China
School of Information Science and Technology, Yunnan Normal University, Kunming 650500, China
Yunnan Key Laboratory of Smart Education, Yunnan Normal University, Kunming 650500, China
School of Information Engineering, Qujing Normal University, Qujing 655011, China

Abstract

Question Generation (QG) is the task of utilizing Artificial Intelligence (AI) technology to generate questions that can be answered by a span of text within a given passage. Existing research on QG in the educational field struggles with two challenges: the mainstream QG models based on seq-to-seq fail to utilize the structured information from the passage; the other is the lack of specialized educational QG datasets. To address the challenges, a specialized QG dataset, reading comprehension dataset from examinations for QG (named RACE4QG), is reconstructed by applying a new answer tagging approach and a data-filtering strategy to the RACE dataset. Further, an end-to-end QG model, which can exploit the intra- and inter-sentence information to generate better questions, is proposed. In our model, the encoder utilizes a Gated Recurrent Units (GRU) network, which takes the concatenation of word embedding, answer tagging, and Graph Attention neTworks(GAT) embedding as input. The hidden states of the GRU are operated with a gated self-attention to obtain the final passage-answer representation, which will be fed to the decoder. Results show that our model outperforms baselines on automatic metrics and human evaluation. Consequently, the model improves the baseline by 0.44, 1.32, and 1.34 on BLEU-4, ROUGE-L, and METEOR metrics, respectively, indicating the effectivity and reliability of our model. Its gap with human expectations also reflects the research potential.

Keywords: attention mechanism, automatic Question Generation (QG), RACE4QG dataset, Answer-Oriented GAT (AO-GAT), structured information

References(36)

[1]
B. Ghanem, L. L. Coleman, J. R. Dexter, S. von der Ohe, and A. Fyshe, Question generation for reading comprehension assessment by modeling how and what to ask, in Proc. Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, 2022, pp. 2131–2146.
[2]
P. D. Pearson, The roots of reading comprehension instruction, in Handbook of Research on Reading Comprehension, S. E. Israel and G. G. Duffy, eds. New York, NY, USA: Routledge, 2009, pp. 3–31.
[3]
X. Du, J. Shao, and C. Cardie, Learning to ask: Neural question generation for reading comprehension, in Proc. 55th Annu. Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada, 2017, pp. 1342–1352.
[4]
G. Kurdi, J. Leo, B. Parsia, U. Sattler, and S. Al-Emari, A systematic review of automatic question generation for educational purposes, International Journal of Artificial Intelligence in Education, vol. 30, no. 2, pp. 121–204, 2020.
[5]
Y. Zhao, X. Ni, Y. Ding, and Q. Ke, Paragraph-level neural question generation with maxout pointer and gated self-attention networks, in Proc. 2018 Conf. on Empirical Methods in Natural Language Processing, Brussels, Belgium, 2018, pp. 3901–3910.
[6]
Q. Zhou, N. Yang, F. Wei, C. Tan, H. Bao, and M. Zhou, Neural question generation from text: A preliminary study, in Proc. 6th CCF Conf. on Natural Language Processing and Chinese Computing, Dalian, China, 2018, pp. 662–671.
[7]
X. Jia, W. Zhou, X. Sun, and Y. Wu, EQG-RACE: Examination-type question generation, in Proc. 35th AAAI Conf. on Artificial Intelligence, Virtual Event, 2021, pp. 13143–13151.
[8]
P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang, SQuAD: 100 000+ questions for machine comprehension of text, in Proc. 2016 Conf. on Empirical Methods in Natural Language Processing, Texas, TX, USA, 2016, pp. 2383–2392.
[9]
T. Kočiský, J. Schwarz, P. Blunsom, C. Dyer, K. M. Hermann, G. Melis, and E. Grefenstette, The NarrativeQA reading comprehension challenge, Trans. Assoc. Comput. Linguist., vol. 6, pp. 317–328, 2018.
[10]
Z. Yang, P. Qi, S. Zhang, Y. Bengio, W. Cohen, R. Salakhutdinov, and C. D. Manning, HotpotQA: A dataset for diverse, explainable multi-hop question answering, in Proc. 2018 Conf. on Empirical Methods in Natural Language Processing, Brussels, Belgium, 2018, pp. 2369–2380.
[11]
G. Lai, Q. Xie, H. Liu, Y. Yang, and E. Hovy, RACE: Large-scale reading comprehension dataset from examinations, in Proc. 2017 Conf. on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 2017, pp. 785–794.
[12]
S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
[13]
K. Dhole and C. D. Manning, Syn-QG: Syntactic and shallow semantic rules for question generation, in Proc. 58th Annu. Meeting of the Association for Computational Linguistics, Virtual Event, 2020, pp. 752–765.
[14]
V. Rus, B. Wyse, P. Piwek, M. Lintean, S. Stoyanchev, and C. Moldovan, Question generation shared task and evaluation challenge-status report, in Proc. 13th European Workshop on Natural Language Generation, Nancy, France, 2011, pp. 318–320.
[15]
M. Heilman and N. A. Smith, Good question! Statistical ranking for question generation, in Proc. Human Language Technologies: The 2010 Annu. Conf. of the North American Chapter of the Association for Computational Linguistics, Los Angeles, CA, USA, 2010, pp. 609–617.
[16]
I. Labutov, S. Basu, and L. Vanderwende, Deep questions without deep understanding, in Proc. 53rd Annu. Meeting of the Association for Computational Linguistics and the 7th Int. Joint Conf. on Natural Language Processing (Volume 1: Long Papers), Beijing, China, 2015, pp. 889–898.
[17]
X. Yao, G. Bouma, and Y. Zhang, Semantics-based question generation and implementation, Dialogue & Discourse, vol. 3, no. 2, pp. 11–42, 2012.
[18]
L. Song, Z. Wang, and W. Hamza, A unified query-based generative model for question generation and question answering, arXiv preprint arXiv: 1709.01058, 2017.
[19]
W. Yuan, T. He, and X. Dai, Improving neural question generation using deep linguistic representation, in Proc. Web Conf. 2021, Ljubljana, Slovenia, 2021, pp. 3489–3500.
[20]
T. Hosking and S. Riedel, Evaluating rewards for question generation models, in Proc. 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2019, pp. 2278–2283.
[21]
X. Zhou, W. Liang, W. Li, K. Yan, S. Shimizu, and K. I. K. Wang, Hierarchical adversarial attacks against graph-neural-network-based IoT network intrusion detection system, IEEE Internet Things J., vol. 9, no. 12, pp. 9310–9319, 2022.
[22]
P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio, Graph attention networks, presented at the 6th Int. Conf. on Learning Representations, Vancouver, Canada, 2018.
[23]
Y. Zhang, P. Qi, and C. D. Manning, Graph convolution over pruned dependency trees improves relation extraction, in Proc. 2018 Conf. on Empirical Methods in Natural Language Processing, Brussels, Belgium, 2018, pp. 2205–2215.
[24]
C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. McClosky, The Stanford CoreNLP natural language processing toolkit, in Proc. of 52nd Annu. Meeting of the Association for Computational Linguistics: System Demonstrations, Baltimore, MD, USA, 2014, pp. 55–60.
[25]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, Attention is all you need, in Proc. 31st Int. Conf. on Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 6000–6010.
[26]
D. Bahdanau, K. Cho, and Y. Bengio, Neural machine translation by jointly learning to align and translate, presented at the 3rd Int. Conf. on Learning Representations, San Diego, CA, USA, 2015.
[27]
M. T. Luong, H. Pham, and C. D. Manning, Effective approaches to attention-based neural machine translation, in Proc. 2015 Conf. on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 2015, pp. 1412–1421.
[28]
A. See, P. J. Liu, and C. D. Manning, Get to the point: Summarization with pointer-generator networks, in Proc. 55th Annu. Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada, 2017, pp. 1073–1083.
[29]
J. Pennington, R. Socher, and C. D. Manning, GloVe: Global vectors for word representation, in Proc. 2014 Conf. on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 2014, pp. 1532–1543.
[30]
B. Liu, Neural question generation based on Seq2Seq, in Proc. 5th Int. Conf. on Mathematics and Artificial Intelligence, Chengdu, China, 2020, pp. 119–123.
[31]
X. Sun, J. Liu, Y. Lyu, W. He, Y. Ma, and S. Wang, Answer-focused and position-aware neural question generation, in Proc. 2018 Conf. on Empirical Methods in Natural Language Processing, Brussels, Belgium, 2018, pp. 3930–3939.
[32]
L. E. Lopez, D. K. Cruz, J. C. B. Cruz, and C. Cheng, Simplifying paragraph-level question generation via transformer language models, in Proc. 18th Pacific Rim Int. Conf. on Artificial Intelligence, Hanoi, Vietnam, 2021, pp. 323–334.
[33]
S. Zhang and M. Bansal, Addressing semantic drift in question generation for semi-supervised question answering, in Proc. 2019 Conf. on Empirical Methods in Natural Language Processing and the 9th Int. Joint Conf. on Natural Language Processing (EMNLPIJCNLP), Hong Kong, China, 2019, pp. 2495–2509.
[34]
K. Papineni, S. Roukos, T. Ward, and W. J. Zhu, Bleu: A method for automatic evaluation of machine translation, in Proc. 40th Annu. Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, 2002, pp. 311–318.
[35]
C. Y. Lin, ROUGE: A package for automatic evaluation of summaries, presented at the Workshop on Text Summarization Branches Out, Post-Conf. Workshop of ACL 2004, Barcelona, Spain, 2004.
[36]
S. Banerjee and A. Lavie, METEOR: An automatic metric for MT evaluation with improved correlation with human judgments, in Proc. Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, MI, USA, 2005, pp. 65–72.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 04 July 2022
Revised: 10 August 2022
Accepted: 22 August 2022
Published: 13 December 2022
Issue date: June 2023

Copyright

© The author(s) 2023.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 62166050), Yunnan Fundamental Research Projects (No. 202201AS070021), Yunnan Innovation Team of Education Informatization for Nationalities, Scientific Technology Innovation Team of Educational Big Data Application Technology in University of Yunnan Province, and Yunnan Normal University Graduate Research and innovation fund in 2020 (No. ysdyjs2020006).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return