Document-Level Neural Machine Translation with Hierarchical Modeling of Global Context

Xin Tan; Long-Yin Zhang; Guo-Dong Zhou

doi:10.1007/s11390-021-0286-3

Journal of Computer Science and Technology 2022, 37(2): 295-308 https://doi.org/10.1007/s11390-021-0286-3

Regular Paper | Issue | Published: 31 March 2022

Document-Level Neural Machine Translation with Hierarchical Modeling of Global Context

Show Author's Information Hide Author's Information Xin Tan, Long-Yin Zhang, Guo-Dong Zhou(

)

School of Computer Science and Technology, Soochow University, Suzhou 215006, China

A preliminary version of the paper was published in the Proceedings of EMNLP-IJCNLP 2019.

Keywords:

neural machine translation, document-level translation, global context, hierarchical model

Cite this article:

Tan X, Zhang L-Y, Zhou G-D. Document-Level Neural Machine Translation with Hierarchical Modeling of Global Context. Journal of Computer Science and Technology, 2022, 37(2): 295-308. https://doi.org/10.1007/s11390-021-0286-3

Download citation

EndNote(RIS)

BibTeX

236

Views

Citations

Crossref

WoS

Scopus

CSCD

Abstract Electronic supplementary material About this article

Abstract

Document-level machine translation (MT) remains challenging due to its difficulty in efficiently using document-level global context for translation. In this paper, we propose a hierarchical model to learn the global context for document-level neural machine translation (NMT). This is done through a sentence encoder to capture intra-sentence dependencies and a document encoder to model document-level inter-sentence consistency and coherence. With this hierarchical architecture, we feedback the extracted document-level global context to each word in a top-down fashion to distinguish different translations of a word according to its specific surrounding context. Notably, we explore the effect of three popular attention functions during the information backward-distribution phase to take a deep look into the global context information distribution of our model. In addition, since large-scale in-domain document-level parallel corpora are usually unavailable, we use a two-step training strategy to take advantage of a large-scale corpus with out-of-domain parallel sentence pairs and a small-scale corpus with in-domain parallel document pairs to achieve the domain adaptability. Experimental results of our model on Chinese-English and English-German corpora significantly improve the Transformer baseline by 4.5 BLEU points on average which demonstrates the effectiveness of our proposed hierarchical model in document-level NMT.

Electronic supplementary material

File

jcst-37-2-295-Highlights.pdf (129.4 KB)

About this article

Publication history

Acknowledgements

Publication history

Received: 09 March 2020

Accepted: 11 January 2021

Published: 31 March 2022

Issue date: March 2022

Copyright

Acknowledgements

Acknowledgement(s)

The authors are grateful to the anonymous reviewers for their helpful comments and corrections. We also would like to thank Professor De-Yi Xiong (Tianjin University) for the discussion on this research.