Journal Home > Volume 37 , Issue 2

Document-level machine translation (MT) remains challenging due to its difficulty in efficiently using document-level global context for translation. In this paper, we propose a hierarchical model to learn the global context for document-level neural machine translation (NMT). This is done through a sentence encoder to capture intra-sentence dependencies and a document encoder to model document-level inter-sentence consistency and coherence. With this hierarchical architecture, we feedback the extracted document-level global context to each word in a top-down fashion to distinguish different translations of a word according to its specific surrounding context. Notably, we explore the effect of three popular attention functions during the information backward-distribution phase to take a deep look into the global context information distribution of our model. In addition, since large-scale in-domain document-level parallel corpora are usually unavailable, we use a two-step training strategy to take advantage of a large-scale corpus with out-of-domain parallel sentence pairs and a small-scale corpus with in-domain parallel document pairs to achieve the domain adaptability. Experimental results of our model on Chinese-English and English-German corpora significantly improve the Transformer baseline by 4.5 BLEU points on average which demonstrates the effectiveness of our proposed hierarchical model in document-level NMT.

File
jcst-37-2-295-Highlights.pdf (129.4 KB)
Publication history
Copyright
Acknowledgements

Publication history

Received: 09 March 2020
Accepted: 11 January 2021
Published: 31 March 2022
Issue date: March 2022

Copyright

©Institute of Computing Technology, Chinese Academy of Sciences 2022

Acknowledgements

Acknowledgement(s)

The authors are grateful to the anonymous reviewers for their helpful comments and corrections. We also would like to thank Professor De-Yi Xiong (Tianjin University) for the discussion on this research.

Return