Journal Home > Volume 24 , Issue 3

Community Question Answering (CQA) in web forums, as a classic forum for user communication, provides a large number of high-quality useful answers in comparison with traditional question answering. Development of methods to get good, honest answers according to user questions is a challenging task in natural language processing. Many answers are not associated with the actual problem or shift the subjects, and this usually occurs in relatively long answers. In this paper, we enhance answer selection in CQA using multi-dimensional feature combination and similarity order. We make full use of the information in answers to questions to determine the similarity between questions and answers, and use the text-based description of the answer to determine whether it is a reasonable one. Our work includes two subtasks: (a) classifying answers as good, bad, or potentially associated with a question, and (b) answering YES/NO based on a list of all answers to a question. The experimental results show that our approach is significantly more efficient than the baseline model, and its overall ranking is relatively high in comparison with that of other models.


menu
Abstract
Full text
Outline
About this article

Enhanced Answer Selection in CQA Using Multi-Dimensional Features Combination

Show Author's information Hongjie FanZhiyi Ma( )Hongqiang LiDongsheng WangJunfei Liu
School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China.
School of Software and Microelectronics, Peking University, Beijing 100871, China.
National Engineering Research Center for Software Engineering, Peking University, Beijing 100871, China.

Abstract

Community Question Answering (CQA) in web forums, as a classic forum for user communication, provides a large number of high-quality useful answers in comparison with traditional question answering. Development of methods to get good, honest answers according to user questions is a challenging task in natural language processing. Many answers are not associated with the actual problem or shift the subjects, and this usually occurs in relatively long answers. In this paper, we enhance answer selection in CQA using multi-dimensional feature combination and similarity order. We make full use of the information in answers to questions to determine the similarity between questions and answers, and use the text-based description of the answer to determine whether it is a reasonable one. Our work includes two subtasks: (a) classifying answers as good, bad, or potentially associated with a question, and (b) answering YES/NO based on a list of all answers to a question. The experimental results show that our approach is significantly more efficient than the baseline model, and its overall ranking is relatively high in comparison with that of other models.

Keywords: community question answering, information retrieval, multi-dimensional features extraction, similarity computation

References(51)

[1]
Asaduzzaman M., Mashiyat A. S., Roy C. K., and Schneider K. A., Answering questions about unanswered questions of stack overflow, in Proceedings of the 10th Working Conference on Mining Software Repositories, San Francisco, CA, USA, 2013, pp. 97-100.
DOI
[2]
Nakov P., Mrquez L., Moschitti A., Magdy W., Mubarak H., Freihat A. A., Glass J., and Randeree B., SemEval-2016 task 3: Community question answering, in Proceedings of the 10th International Workshop on Semantic Evaluation, San Diego, CA, USA, 2016, pp. 525-545.
DOI
[3]
Baltadzhieva A. and Chrupala G., Question quality in community question answering forums: A survey, SIGKDD Explorations, vol. 17, no. 1, pp. 8-13, 2015.
[4]
Agichtein E., Castillo C., Donato D., Gionis A., and Mishne G., Finding high-quality content in social media, in Proceedings of the International Conference on Web Search and Web Data Mining, Palo Alto, CA, USA, 2008, pp. 183-194.
DOI
[5]
Li B., Jin T., Lyu M. R., King I., and Mak B., Analyzing and predicting question quality in community question answering services, in Proceedings of the 21st World Wide Web Conference, Lyon, France, 2012, pp. 775-782.
DOI
[6]
Deerwester S. C., Dumais S. T., Landauer T. K., Furnas G. W., and Harshman R. A., Indexing by latent semantic analysis, Journal of the American Society for Information Science, vol. 41, no. 6, pp. 391-407, 1990.
DOI
[7]
Blei D. M., Ng A. Y., and Jordan M. I., Latent Dirichlet allocation, Journal of Machine Learning Research, vol. 3, pp. 993-1022, 2003.
[8]
Balikas G., Amini M., and Clausel M., On a topic model for sentences, in Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, Pisa, Italy, 2016, pp. 921-924.
DOI
[9]
Mikolov T., Sutskever I., Chen K., Corrado G. S., and Dean J., Distributed representations of words and phrases and their compositionality, in Proceedings Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 2013, pp. 3111-3119.
[10]
Mikolov T., Chen K., Corrado G., and Dean J., Efficient estimation of word representations in vector space, in Proceedings of Workshop ICLR, 2013.
[11]
Bird S., Klein E., and Loper E., Natural Language Processing with Python. Dublin, Ireland: O’Reilly Press, 2009.
[12]
Manning C. D., Raghavan P., and Schtze H., Introduction to Information Retrieval. Cambridge, England: Cambridge University Press, 2008.
[13]
Heinrich G., Parameter estimation for text analysis, Technical report, Fraubhofer IGD, 2005, pp. 1-31.
[14]
Chen X., Xu L., Liu Z., Sun M., and Luan H., Joint learning of character and word embeddings, in Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 2015, pp. 1236-1242.
[15]
Abbasi A., Chen H., and Salem A., Sentiment analysis in multiple languages: Feature selection for opinion classification in web forums, ACM Trans. Inf. Syst., vol. 26, no. 3, pp. 12:1-12:34, 2008.
[16]
Liang J., Zhou X., Guo L., and Bai S., Feature selection for sentiment classification using matrix factorization, in Proceedings of the 24th International Conference on World Wide Web Companion, Florence, Italy, 2015, pp. 63-64.
DOI
[17]
Salton G., Wong A., and Yang C., A vector space model for automatic indexing, Commun. ACM, vol. 18, no. 11, pp. 613-620, 1975.
[18]
Xie J. and Coggeshall S., Prediction of transfers to tertiary care and hospital mortality: A gradient boosting decision tree approach, Statistical Analysis and Data Mining, vol. 3, no. 4, pp. 253-258, 2010.
[19]
Becker V. L. C., Rigamonti R., and Fua P., Supervised feature learning for curvilinear structure segmentation, in Proceedings of Medical Image Computing and Computer-Assisted Intervention, Nagoya, Japan, 2013, pp. 526-533.
DOI
[20]
Breiman L., Random forests, Machine Learning, vol. 45, no.1, pp. 5-32, 2001.
[21]
Tran Q. H., Tran V., Vu T., Ng M., and Pham S. B., JAIST: Combining multiple features for answer selection in community question answering, in Proceedings of the 9th International Workshop on Semantic Evaluation, Denver, CO, USA, 2015, pp. 215-219.
DOI
[22]
Hou Y., Tan C., Wang X., Zhang Y., Xu J., and Chen Q., HITSZ-ICRC: Exploiting classification approach for answer selection in community question answering, in Proceedings of the 9th International Workshop on Semantic Evaluation, Denver, CO, USA, 2015, pp. 196-202.
DOI
[23]
Nicosia M., Filice S., Cedeo A. B., Saleh I., Mubarak H., Gao W., Nakov P., Martino G., Moschitti A., Darwish K., et al., QCRI: Answer selection for community question answering-experiments for Arabic and English, in Proceedings of the 9th International Workshop on Semantic Evaluation, Denver, CO, USA, 2015, pp. 203-209.
DOI
[24]
Zhao J., Zhu T., and Lan M., ECNU: One stone two birds—Ensemble of heterogenous measures for semantic relatedness and textual entailment, in Proceedings of the 8th International Workshop on Semantic Evaluation, Dublin, Ireland, 2014, pp. 271-277.
DOI
[25]
Zhou X., Hu B., Lin J., Xiang Y., and Wang X., ICRC-HIT: A deep learning based comment sequence labeling system for answer selection challenge, in Proceedings of the 9th International Workshop on Semantic Evaluation, Denver, CO, USA, 2015, pp. 210-214.
DOI
[26]
Belinkov Y., Mohtarami M., Cyphers S., and Glass J. R., VectorSLU: A continuous word vector approach to answer selection in community question answering systems, in Proceedings of the 9th International Workshop on Semantic Evaluation, Denver, CO, USA, 2015, pp. 282-287.
DOI
[27]
Alashty A. H., Rahmani S., Roostaee M., and Fakhrahmad M., A proposed list wise approach to answer validation, in Proceedings of the 9th International Workshop on Semantic Evaluation, Denver, CO, USA, 2015, pp. 220-225.
[28]
Phuoc N., Magnolini S., and Popescu O., FBK-HLT: An application of semantic textual similarity for answer selection in community question answering, in Proceedings of the 9th International Workshop on Semantic Evaluation, Denver, CO, USA, 2015, pp. 231-235.
[29]
Zamanov I., Kraeva M., Hateva N., Yovcheva I., Nikolova I., and Angelova G., Voltron: A hybrid system for answer validation based on lexical and distance features, in Proceedings of the 9th International Workshop on Semantic Evaluation, Denver, CO, USA, 2015, pp. 242-246.
DOI
[30]
Adorno H. G., Vilario D., Pinto D., and Sidorov G., CICBUAPnlp: Graph-based approach for answer selection in community question answering task, in Proceedings of the 9th International Workshop on Semantic Evaluation, Denver, CO, USA, 2015, pp. 18-22.
[31]
Lou J., Lim K. H., Fang Y., and Peng J. Z., Drivers of knowledge contribution quality and quantity in online question and answering communities, in Proceedings of Pacific Asia Conference on Information Systems, Queensland, Australia, 2011, p. 121.
[32]
Jeon J., Croft W. B., Lee J. H., and Park S., A framework to predict the quality of answers with non-textual features, in Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Washington, DC, USA, 2006, pp. 228-235.
DOI
[33]
Suryanto M. A., Lim E. P., Sun A., and Chiang R. H. L., Quality-aware collaborative question answering: Methods and evaluation, in Proceedings of the Second International Conference on Web Search and Web Data Mining, Barcelona, Spain, 2009, pp. 142-151.
DOI
[34]
Mamykina L., Manoim B., Mittal M., Hrip G., and Hartmann B., Design lessons from the fastest q&a site in the west, in Proceedings of the International Conference on Human Factors in Computing Systems, 2011, pp. 2857-2866.
DOI
[35]
Wang B. and Sun L., Extracting Chinese question answer pairs from online forums, in Proceedings of the IEEE International Conference on Systems, 2009, pp. 1159-1164.
DOI
[36]
Shah C. and Po J., Evaluating and predicting answer quality in community qa, in Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Geneva, Switzerland, 2010, pp. 411-418.
DOI
[37]
Cedeo A. B., Filice S., Martino G. D., Joty S. R., Mrquez L., Nakov P., and Moschitti A., Thread-level information for comment classification in community question answering, in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Beijing, China, 2015, pp. 687-693.
[38]
Filice S., Croce D., Moschitti A., and Basili R., KeLP at SemEval-2016 task 3: Learning semantic relations between questions and answers, in Proceedings of the 10th International Workshop on Semantic Evaluation, 2016, pp. 1116-1123.
DOI
[39]
Huang J., Zhou M., and Yang D., Extracting chatbot knowledge from online discussion forums, in Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, 2007, pp. 423-428.
[40]
Lu H. and Kong M., Community-based question answering via contextual ranking metric network learning, in Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017, pp. 4963-4964.
[41]
Ding S., Cong G., Lin C., and Zhu X., Using conditional random fields to extract contexts and answers of questions from online forums, in Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics, 2008, pp. 710-718.
[42]
Wang K., Ming Z., and Chua T., A syntactic tree matching approach to finding similar questions in community based qa services, in Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2009, pp. 187-194.
DOI
[43]
Moschitti A., Quarteroni S., Basili R., and Manandhar S., Exploiting syntactic and shallow semantic kernels for question answer classification, in Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech Republic, 2007, pp. 776-783.
[44]
Wang B., Wang X., Sun C., Liu B., and Sun L., Modeling semantic relevance for question-answer pairs in web social communities, in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, 2010, pp. 1230-1238.
[45]
Hu H., Liu B., Wang B., Liu M., and Wang X., Multimodal dbn for predicting high-quality answers in cqa portals, in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, 2013, pp. 843-847.
[46]
Hu B., Lu Z. D., Li H., and Chen Q., Convolutional neural network architectures for matching natural language sentences, in Proceedings of Annual Conference on Neural Information Processing Systems, Quebec, Canada, 2014, pp. 2042-2050.
[47]
Yih W., Chang M. W., Meek C., and Pastusiak A., Question answering using enhanced lexical semantic models, in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, 2013, pp. 1744-1753.
[48]
Jeon J., Croft W. B., and Ho Lee J., Finding similar questions in large question and answer archives, in Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management, Bremen, Germany, 2005, pp. 84-90.
DOI
[49]
Xue X., Jeon J., and Croft W. B., Retrieval models for question and answer archives, in Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Singapore, 2008, pp. 475-482.
DOI
[50]
Zhou X., Hu B., Chen Q., Tang B., and Wang X., Answer sequence learning with neural networks for answer selection in community question answering, in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, Beijing, China, 2015, pp. 713-718.
DOI
[51]
Shah C. and Pomerantz J., Evaluating and predicting answer quality in community QA, in Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Geneva, Switzerland, 2010, pp. 411-418.
DOI
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 10 January 2018
Accepted: 15 February 2018
Published: 24 January 2019
Issue date: June 2019

Copyright

© The author(s) 2019

Acknowledgements

This research was developed by the NLP601 group at School of Electronics Engineering and Computer Science, Peking University, within the National Natural Science Foundation of China (No. 61672046).

Rights and permissions

Return