How Do Pronouns Affect Word Embedding

Tonglee Chung; Bin Xu; Yongbin Liu; Juanzi Li; Chunping Ouyang

doi:10.23919/TST.2017.8195342

Tsinghua Science and Technology 2017, 22(6): 586-594 https://doi.org/10.23919/TST.2017.8195342

Open Access | Issue | Published: 14 December 2017

How Do Pronouns Affect Word Embedding

Show Author's Information Hide Author's Information Tonglee Chung, Bin Xu, Yongbin Liu(

), Juanzi Li, Chunping Ouyang

Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China.

School of Computer Science and Technology, University of South China, Hengyang 421001, China.

Keywords:

word embedding, co-reference resolution, representation learning

Cite this article:

Chung T, Xu B, Liu Y, et al. How Do Pronouns Affect Word Embedding. Tsinghua Science and Technology, 2017, 22(6): 586-594. https://doi.org/10.23919/TST.2017.8195342

Download citation

EndNote(RIS)

BibTeX

393

Views

Downloads

Citations

Crossref

N/A

WoS

Scopus

CSCD

Abstract Full text About this article

Abstract

Word embedding has drawn a lot of attention due to its usefulness in many NLP tasks. So far a handful of neural-network based word embedding algorithms have been proposed without considering the effects of pronouns in the training corpus. In this paper, we propose using co-reference resolution to improve the word embedding by extracting better context. We evaluate four word embeddings with considerations of co-reference resolution and compare the quality of word embedding on the task of word analogy and word similarity on multiple data sets. Experiments show that by using co-reference resolution, the word embedding performance in the word analogy task can be improved by around 1.88 $%$ . We find that the words that are names of countries are affected the most, which is as expected.

Full text

Abstract

Full text

Outline

About this article

How Do Pronouns Affect Word Embedding

Show Author's information Hide Author's Information Tonglee Chung, Bin Xu, Yongbin Liu(

), Juanzi Li, Chunping Ouyang

Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China.

School of Computer Science and Technology, University of South China, Hengyang 421001, China.

Abstract

Keywords: word embedding, co-reference resolution, representation learning

References(22)

[1]

Harris Z. S., Distributional structure, Word, vol. 10, nos. 2&3, pp. 146-162, 1954.

DOI Google Scholar

[2]

Rubenstein H. and Goodenough J. B., Contextual correlates of synonymy, Communications of the ACM, vol. 8, no. 10, pp. 627-633, 1965.

DOI Google Scholar

[3]

Mikolov T., Chen K., Corrado G., and Dean J., Efficient estimation of word representations in vector space, arXiv preprint arXiv:1301.3781.

[4]

Le Q. V. and Mikolov T., Distributed representations of sentences and documents, arXiv preprint arXiv:1405.4053.

[5]

Mikolov T., Kombrink S., Burget L., and Cernocky J. H., Extensions of recurrent neural network language model, in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 2011, pp. 5528-5531.

DOI

[6]

Mikolov T. and Zweig G., Context dependent recurrent neural network language model, in IEEE Workshop on Spoken Language Technology (SLT), Miami, FL, USA, 2012, pp. 234-239.

DOI

[7]

Mikolov T., Deoras A., Povey D., and Burget L., Strategies for training large scale neural network language models, in 2011 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Waikoloa, HI, USA, 2011, pp. 196-201.

DOI

[8]

Levy O. and Goldberg Y., Dependency-based word embeddings, in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Baltimore, MD, USA, 2014, pp. 302-308.

DOI

[9]

Pennington J., Socher R., and Manning C., GloVe: Global vectors for word representation, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Doha, Qatar, 2014, pp. 1532-1543.

DOI

[10]

Baroni M., Dinu G., and Kruszewski G., Don’t count, predict! A systematic comparison of context-counting vs. contextpredicting semantic vectors, in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Baltimore, MD, USA, 2014, pp. 238-247.

DOI

[11]

Faruqui M. and Dyer C., Community evaluation and exchange of word vectors at wordvectors.org, in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Association for Computational Linguistics, Baltimore, MD, USA, 2014, pp. 19-24.

DOI

[12]

Lee H., Peirsman Y., Chang A., Chambers N., Surdeanu M., and Jurafsky D., Stanfords multi-pass sieve coreference resolution system at the conll-2011 shared task, in Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task, CONLL Shared Task 11, Association for Computational Linguistics, Stroudsburg, PA, USA, 2011, pp. 28-34.

[13]

Mikolov T., Sutskever I., Chen K., Corrado G., and Dean J., Distributed representations of words and phrases and their compositionality, in Advances in Neural Information Processing Systems, Lake Tahoe, CA, USA, 2013, pp. 3111-3119.

[14]

Bengio Y., Courville A. C., and Vincent P., Unsupervised feature learning and deep learning: A review and new perspectives, arXiv preprint arXiv:1206.5538v1.

[15]

Huang E. H., Socher R., Manning C. D., and Ng A. Y., Improving word representations via global context and multiple word prototypes, in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Stroudsburg, PA, USA, 2012, pp. 873-882.

[16]

Bengio Y., Ducharme R., Vincent P., and Janvin C., A neural probabilistic language model, Journal of Machine Learning Research, vol. 3, pp. 1137-1155, 2003.

Google Scholar

[17]

Mnih A. and Hinton G., Three new graphical models for statistical language modelling, in Proceedings of the 24th International Conference on Machine Learning, ICML 07, ACM, New York, NY, USA, 2007, pp. 641-648.

DOI

[18]

Collobert R. and Weston J., A unified architecture for natural language processing: Deep neural networks with multitask learning, in Proceedings of the 25th International Conference on Machine Learning, ICML 08, ACM, New York, NY, USA, 2008, pp. 160-167.

DOI

[19]

Sutskever I., Martens J., and Hinton G. E., Generating text with recurrent neural networks, in Proceedings of the 28th International Conference on Machine Learning (ICML-11), Bellevue, WA, USA, 2011, pp. 1017-1024.

[20]

Graves A., Generating sequences with recurrent neural networks, arXiv preprint arXiv:1308.0850.

[21]

Kim Y., Jernite Y., Sontag D., and Rush A. M., Character aware neural language models, in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI Press, Phoenix, AZ, USA, 2016, pp. 2741-2749.

[22]

Adel H. and Schtze H., Using mined coreference chains as a resource for a semantic task, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, Doha, Qatar, 2014, pp. 1447-1452.

DOI

About this article

Publication history

Acknowledgements

Rights and permissions

Publication history

Received: 31 December 2016

Revised: 29 March 2017

Accepted: 25 May 2017

Published: 14 December 2017

Issue date: December 2017

Copyright

Acknowledgements

This work was supported by the National High-Tech Research and Development (863) Program (No. 2015AA015401), the National Natural Science Foundation of China (Nos. 61533018 and 61402220), the State Scholarship Fund of CSC (No. 201608430240), the Philosophy and Social Science Foundation of Hunan Province (No. 16YBA323), and the Scientific Research Fund of Hunan Provincial Education Department (Nos. 16C1378 and 14B153).