Journal Home > Volume 22 , Issue 6

Word embedding has drawn a lot of attention due to its usefulness in many NLP tasks. So far a handful of neural-network based word embedding algorithms have been proposed without considering the effects of pronouns in the training corpus. In this paper, we propose using co-reference resolution to improve the word embedding by extracting better context. We evaluate four word embeddings with considerations of co-reference resolution and compare the quality of word embedding on the task of word analogy and word similarity on multiple data sets. Experiments show that by using co-reference resolution, the word embedding performance in the word analogy task can be improved by around 1.88 %. We find that the words that are names of countries are affected the most, which is as expected.


menu
Abstract
Full text
Outline
About this article

How Do Pronouns Affect Word Embedding

Show Author's information Tonglee ChungBin XuYongbin Liu( )Juanzi LiChunping Ouyang
Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China.
School of Computer Science and Technology, University of South China, Hengyang 421001, China.

Abstract

Word embedding has drawn a lot of attention due to its usefulness in many NLP tasks. So far a handful of neural-network based word embedding algorithms have been proposed without considering the effects of pronouns in the training corpus. In this paper, we propose using co-reference resolution to improve the word embedding by extracting better context. We evaluate four word embeddings with considerations of co-reference resolution and compare the quality of word embedding on the task of word analogy and word similarity on multiple data sets. Experiments show that by using co-reference resolution, the word embedding performance in the word analogy task can be improved by around 1.88 %. We find that the words that are names of countries are affected the most, which is as expected.

Keywords: word embedding, co-reference resolution, representation learning

References(22)

[1]
Harris Z. S., Distributional structure, Word, vol. 10, nos. 2&3, pp. 146-162, 1954.
[2]
Rubenstein H. and Goodenough J. B., Contextual correlates of synonymy, Communications of the ACM, vol. 8, no. 10, pp. 627-633, 1965.
[3]
Mikolov T., Chen K., Corrado G., and Dean J., Efficient estimation of word representations in vector space, arXiv preprint arXiv:1301.3781.
[4]
Le Q. V. and Mikolov T., Distributed representations of sentences and documents, arXiv preprint arXiv:1405.4053.
[5]
Mikolov T., Kombrink S., Burget L., and Cernocky J. H., Extensions of recurrent neural network language model, in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 2011, pp. 5528-5531.
DOI
[6]
Mikolov T. and Zweig G., Context dependent recurrent neural network language model, in IEEE Workshop on Spoken Language Technology (SLT), Miami, FL, USA, 2012, pp. 234-239.
DOI
[7]
Mikolov T., Deoras A., Povey D., and Burget L., Strategies for training large scale neural network language models, in 2011 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Waikoloa, HI, USA, 2011, pp. 196-201.
DOI
[8]
Levy O. and Goldberg Y., Dependency-based word embeddings, in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Baltimore, MD, USA, 2014, pp. 302-308.
DOI
[9]
Pennington J., Socher R., and Manning C., GloVe: Global vectors for word representation, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Doha, Qatar, 2014, pp. 1532-1543.
DOI
[10]
Baroni M., Dinu G., and Kruszewski G., Don’t count, predict! A systematic comparison of context-counting vs. contextpredicting semantic vectors, in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Baltimore, MD, USA, 2014, pp. 238-247.
DOI
[11]
Faruqui M. and Dyer C., Community evaluation and exchange of word vectors at wordvectors.org, in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Association for Computational Linguistics, Baltimore, MD, USA, 2014, pp. 19-24.
DOI
[12]
Lee H., Peirsman Y., Chang A., Chambers N., Surdeanu M., and Jurafsky D., Stanfords multi-pass sieve coreference resolution system at the conll-2011 shared task, in Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task, CONLL Shared Task 11, Association for Computational Linguistics, Stroudsburg, PA, USA, 2011, pp. 28-34.
[13]
Mikolov T., Sutskever I., Chen K., Corrado G., and Dean J., Distributed representations of words and phrases and their compositionality, in Advances in Neural Information Processing Systems, Lake Tahoe, CA, USA, 2013, pp. 3111-3119.
[14]
Bengio Y., Courville A. C., and Vincent P., Unsupervised feature learning and deep learning: A review and new perspectives, arXiv preprint arXiv:1206.5538v1.
[15]
Huang E. H., Socher R., Manning C. D., and Ng A. Y., Improving word representations via global context and multiple word prototypes, in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Stroudsburg, PA, USA, 2012, pp. 873-882.
[16]
Bengio Y., Ducharme R., Vincent P., and Janvin C., A neural probabilistic language model, Journal of Machine Learning Research, vol. 3, pp. 1137-1155, 2003.
[17]
Mnih A. and Hinton G., Three new graphical models for statistical language modelling, in Proceedings of the 24th International Conference on Machine Learning, ICML 07, ACM, New York, NY, USA, 2007, pp. 641-648.
DOI
[18]
Collobert R. and Weston J., A unified architecture for natural language processing: Deep neural networks with multitask learning, in Proceedings of the 25th International Conference on Machine Learning, ICML 08, ACM, New York, NY, USA, 2008, pp. 160-167.
DOI
[19]
Sutskever I., Martens J., and Hinton G. E., Generating text with recurrent neural networks, in Proceedings of the 28th International Conference on Machine Learning (ICML-11), Bellevue, WA, USA, 2011, pp. 1017-1024.
[20]
Graves A., Generating sequences with recurrent neural networks, arXiv preprint arXiv:1308.0850.
[21]
Kim Y., Jernite Y., Sontag D., and Rush A. M., Character aware neural language models, in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI Press, Phoenix, AZ, USA, 2016, pp. 2741-2749.
[22]
Adel H. and Schtze H., Using mined coreference chains as a resource for a semantic task, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, Doha, Qatar, 2014, pp. 1447-1452.
DOI
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 31 December 2016
Revised: 29 March 2017
Accepted: 25 May 2017
Published: 14 December 2017
Issue date: December 2017

Copyright

© The author(s) 2017

Acknowledgements

This work was supported by the National High-Tech Research and Development (863) Program (No. 2015AA015401), the National Natural Science Foundation of China (Nos. 61533018 and 61402220), the State Scholarship Fund of CSC (No. 201608430240), the Philosophy and Social Science Foundation of Hunan Province (No. 16YBA323), and the Scientific Research Fund of Hunan Provincial Education Department (Nos. 16C1378 and 14B153).

Rights and permissions

Return