Enriching the Transfer Learning with Pre-Trained Lexicon Embedding for Low-Resource Neural Machine Translation

Mieradilijiang Maimaiti; Yang Liu; Huanbo Luan; Maosong Sun

doi:10.26599/TST.2020.9010029

Tsinghua Science and Technology 2022, 27(1): 150-163 https://doi.org/10.26599/TST.2020.9010029

Open Access | Issue | Published: 17 August 2021

Enriching the Transfer Learning with Pre-Trained Lexicon Embedding for Low-Resource Neural Machine Translation

Show Author's Information Hide Author's Information Mieradilijiang Maimaiti, Yang Liu(

), Huanbo Luan, Maosong Sun

Institute for Artificial Intelligence, Beijing National Research Center for Information Science and Technology, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

Beijing Academy of Artificial Intelligence, Beijing Advanced Innovation Center for Language Resources, Beijing 100084, China

Keywords:

artificial intelligence, natural language processing, transfer learning, neural network, low-resource languages, machine translation

Cite this article:

Maimaiti M, Liu Y, Luan H, et al. Enriching the Transfer Learning with Pre-Trained Lexicon Embedding for Low-Resource Neural Machine Translation. Tsinghua Science and Technology, 2022, 27(1): 150-163. https://doi.org/10.26599/TST.2020.9010029

Download citation

EndNote(RIS)

BibTeX

634

Views

Downloads

Citations

Crossref

WoS

Scopus

CSCD

Abstract Full text About this article

Abstract

Most State-Of-The-Art (SOTA) Neural Machine Translation (NMT) systems today achieve outstanding results based only on large parallel corpora. The large-scale parallel corpora for high-resource languages is easily obtainable. However, the translation quality of NMT for morphologically rich languages is still unsatisfactory, mainly because of the data sparsity problem encountered in Low-Resource Languages (LRLs). In the low-resource NMT paradigm, Transfer Learning (TL) has been developed into one of the most efficient methods. It is difficult to train the model on high-resource languages to include the information in both parent and child models, as well as the initially trained model that only contains the lexicon features and word embeddings of the parent model instead of the child languages feature. In this work, we aim to address this issue by proposing the language-independent Hybrid Transfer Learning (HTL) method for LRLs by sharing lexicon embedding between parent and child languages without leveraging back translation or manually injecting noises. First, we train the High-Resource Languages (HRLs) as the parent model with its vocabularies. Then, we combine the parent and child language pairs using the oversampling method to train the hybrid model initialized by the previously parent model. Finally, we fine-tune the morphologically rich child model using a hybrid model. Besides, we explore some exciting discoveries on the original TL approach. Experimental results show that our model consistently outperforms five SOTA methods in two languages Azerbaijani (Az) and Uzbek (Uz). Meanwhile, our approach is practical and significantly better, achieving improvements of up to $4.94$ and $4.84$ BLEU points for low-resource child languages Az $\to$ Zh and Uz $\to$ Zh, respectively.

Full text

Abstract

Full text

Outline

About this article

Enriching the Transfer Learning with Pre-Trained Lexicon Embedding for Low-Resource Neural Machine Translation

Show Author's information Hide Author's Information Mieradilijiang Maimaiti, Yang Liu(

), Huanbo Luan, Maosong Sun

Beijing Academy of Artificial Intelligence, Beijing Advanced Innovation Center for Language Resources, Beijing 100084, China

Abstract

Keywords: artificial intelligence, natural language processing, transfer learning, neural network, low-resource languages, machine translation

References(50)

[1]

I. Sutskever, O. Vinyals, and V. Le Quoc, Sequence to sequence learning with neural networks, in Proc. 27th Int. Conf. Neural Information Processing Systems, Cambridge, MA, USA, 2014, pp. 3104-3112.

[2]

D. Bahdanau, K. Cho, and Y. Bengio, Neural machine translation by jointly learning to align and translate, in Proc. 3rd Int. Conf. Learning Representations, San Diego, CA, USA, 2015.

[3]

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł Kaiser, and I. Polosukhin, Attention is all you need, in Proc. 31st Int. Conf. Neural Information Processing Systems, Red Hook, NY, USA, 2017, pp. 6000-6010.

[4]

Y. H. Wu, M. Schuster, Z. F. Chen, V. Le Quoc, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, et al., Google’s neural machine translation system: Bridging the gap between human and machine translation, arXiv preprint arXiv: 1609.08144, 2016.

Google Scholar

[5]

S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997.

DOI Google Scholar

[6]

K. Cho, B. van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, Learning phrase representations using RNN encoder-decoder for statistical machine translation, in Proc. 2014 Conf. Empirical Methods in Natural Language Processing, Doha, Qatar, 2014, pp. 1724-1734.

DOI

[7]

J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv: 1810.04805, 2018.

Google Scholar

[8]

P. F. Brown, V. J. Della Pietra, S. A. Della Pietra, and R. L. Mercer, The mathematics of statistical machine translation: Parameter estimation, Computational Linguistics, vol. 19, no. 2, pp. 263-311, 1993.

Google Scholar

[9]

P. Koehn, F. J. Och, and D. Marcu, Statistical phrase-based translation, in Proc. 2003 Conf. North American Chapter of the Association for Computational Linguistics on Human Language Technology, Stroudsburg, PA, USA, 2003, pp. 48-54.

DOI

[10]

D. Chiang, A hierarchical phrase-based model for statistical machine translation, in Proc. 43rd Annu. Meeting on Association for Computational Linguistics, Stroudsburg, PA, USA, 2005, pp. 263-270.

DOI

[11]

M. Junczys-Dowmunt, T. Dwojak, and H. Hoang, Is neural machine translation ready for deployment? A case study on 30 translation directions, arXiv preprint arXiv: 1610.01108, 2016.

Google Scholar

[12]

B. Zoph, D. Yuret, J. May, and K. Knight, Transfer learning for low-resource neural machine translation, in Proc. 2016 Conf. Empirical Methods in Natural Language Processing, Austin, Texas, 2016, pp. 1568-1575.

DOI

[13]

C. Vania and A. Lopez, From characters to words to in between: Do we capture morphology? in Proc. 55th Annu. Meeting of the Association for Computational Linguistics, Vancouver, Canada, 2017, pp. 2016-2027.

DOI

[14]

Y. Chen, Y. Liu, and V. O. K. Li, Zero-resource neural machine translation with multi-agent communication game, arXiv preprint arXiv: 1802.03116, 2018.

Google Scholar

[15]

A. Karakanta, J. Dehdari, and J. van Genabith, Neural machine translation for low-resource languages without parallel corpora, Machine Translation, vol. 32, nos. 1&2, pp. 167-189, 2018.

DOI Google Scholar

[16]

P. Koehn, EuroParl: A parallel corpus for statistical machine translation, http://homepages.inf.ed.ac.uk/pkoehn/publications/europarl-mtsummit05.pdf, 2005.

[17]

Y. Chen, Y. Liu, Y. Cheng, and V. O. L. Li, A teacher-student framework for zero-resource neural machine translation, in Proc. 55th Annu. Meeting of the Association for Computational Linguistics, Vancouver, Canada, 2017, pp. 1925-1935.

DOI

[18]

H. Zheng, Y. Cheng, and Y. Liu, Maximum expected likelihood estimation for zero-resource neural machine translation, in Proc. 26th Int. Joint Conf. Artificial Intelligence, Melbourne, Australia, 2017, pp. 4251-4257.

DOI

[19]

Y. Cheng, W. Xu, Z. J. He, W. He, H. Wu, M. S. Sun, and Y. Liu, Semi-supervised learning for neural machine translation, in Proc. 54th Annu. Meeting of the Association for Computational Linguistics, Berlin, Germany, 2016, pp. 1965-1974.

DOI

[20]

Y. Cheng, Q. Yang, Y. Liu, M. S. Sun, and W. Xu, Joint training for pivot-based neural machine translation, in Proc. 26th Int. Joint Conf. Artificial Intelligence, Melbourne, Australia, 2017, pp. 3974-3980.

DOI

[21]

P. Passban, Q. Liu, and A. Way, Translating low-resource languages by vocabulary adaptation from close counterparts, ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 16, no. 4, p. 29, 2017.

DOI Google Scholar

[22]

M. Maimaiti and X. H. Zou, Discussion on bilingual cognition in international exchange activities, presented at International Conference on Intelligence Science (ICIS2018), Beijing, China, 2018, pp. 167-177.

DOI

[23]

T. Kocmi and O. Bojar, Trivial transfer learning for low-resource neural machine translation, in Proc. 3rd Conf. Machine Translation: Research Papers, Brussels, Belgium, 2018, pp. 244-252.

DOI

[24]

R. Dabre, T. Nakagawa, and H. Kazawa, An empirical study of language relatedness for transfer learning in neural machine translation, https://www.aclweb.org/anthology/Y17-1038.pdf, 2017.

[25]

Y. Kim, Y. B. Gao, and H. Ney, Effective cross-lingual transfer of neural machine translation models without shared vocabularies, in Proc. 57th Annu. Meeting of the Association for Computational Linguistics, Florence, Italy, 2019, pp. 1246-1257.

DOI

[26]

M. Johnson, M. Schuster, V. Le Quoc, M. Krikun, Y. H. Wu, Z. F. Chen, N. Thorat, F. Viégas, M. Wattenberg, G. Corrado, et al., Google’s multilingual neural machine translation system: Enabling zero-shot translation, Transactions of the Association of Computational Linguistics, vol. 5, pp. 339-351, 2017.

DOI Google Scholar

[27]

S. J. Pan and Q. Yang, A survey on transfer learning, IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345-1359, 2010.

DOI Google Scholar

[28]

S. Hong, J. Oh, H. Lee, and B. Han, Learning transferrable knowledge for semantic segmentation with deep convolutional neural network, presented 2016 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 3204-3212.

DOI

[29]

M. Oquab, L. Bottou, I. Laptev, and J. Sivic, Learning and transferring mid-level image representations using convolutional neural networks, presented at 2014 IEEE Conf. Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014, pp. 1717-1724.

DOI

[30]

B. Tan, Y. Zhang, S. J. Pan, and Q. Yang, Distant domain transfer learning, in Proc. 31st AAAI Conf. Artificial Intelligence, San Francisco, CA, USA, 2017, pp. 2604-2610.

[31]

K. M. He, X. Y. Zhang, S. Q. Ren, and J. Sun, Deep residual learning for image recognition, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 770-778.

DOI

[32]

J. L. Ba, J. R. Kiros, and G. E. Hinton, Layer normalization, arXiv preprint arXiv: 1607.06450, 2016.

Google Scholar

[33]

Z. G. Li and M. S. Sun, Punctuation as implicit annotations for Chinese word segmentation, Computational Linguistics, vol. 35, no. 4, pp. 505-512, 2009.

DOI Google Scholar

[34]

T. Luong, I. Sutskever, V. Le Quoc, O. Vinyals, and W. Zaremba, Addressing the rare word problem in neural machine translation, in Proc. 53rd Annu. Meeting of the Association for Computational Linguistics and the 7th Int. Joint Conf. Natural Language Processing, Beijing, China, 2015, pp. 11-19.

DOI

[35]

R. Sennrich, B. Haddow, and A. Birch, Neural machine translation of rare words with subword units, arXiv preprint arXiv: 1508.07909, 2016.

Google Scholar

[36]

J. C. Zhang, Y. Z. Ding, S. Q. Shen, Y. Cheng, M. S. Sun, H. B. Luan, and Y. Liu, THUMT: An open source toolkit for neural machine translation, arXiv preprint arXiv: 1706.06415, 2017.

Google Scholar

[37]

K. Papineni, S. Roukos, T. Ward, and W. J. Zhu, Bleu: A method for automatic evaluation of machine translation, in Proc. 40th Annu. Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, 2002, pp. 311-318.

DOI

[38]

D. X. Dong, H. Wu, W. He, D. H. Yu, and H. F. Wang, Multi-task learning for multiple language translation, in Proc. 53rd Annu. Meeting of the Association for Computational Linguistics and the 7th Int. Joint Conf. Natural Language Processing, Beijing, China, 2015, pp. 1723-1732.

[39]

O. Firat, B. Sankaran, Y. Al-Onaizan, F. T. Y. Vural, and K. Cho, Zero-resource translation with multi-lingual neural machine translation, in Proc. 2016 Conf. Empirical Methods in Natural Language Processing, Austin, TX, USA, 2016, pp. 268-277.

DOI

[40]

O. Firat, K. Cho, and Y. Bengio, Multi-way, multilingual neural machine translation with a shared attention mechanism, in Proc. 2016 Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 2016, pp. 866-875.

DOI

[41]

B. Zoph, V. Vasudevan, J. Shlens, and V. Le Quoc, Learning transferable architectures for scalable image recognition, presented at 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 8697-8710.

DOI

[42]

C. H. Chu, R. Dabre, and S. Kurohashi, An empirical comparison of simple domain adaptation methods for neural machine translation, arXiv preprint arXiv: 1701.03214, 2017.

Google Scholar

[43]

J. L. Zeng, J. S. Su, H. T. Wen, Y. Liu, J. Xie, Y. J. Yin, and J. Q. Zhao, Multi-domain neural machine translation with word-level domain context discrimination, in Proc. 2018 Conf. Empirical Methods in Natural Language Processing, Brussels, Belgium, 2018, pp. 447-457.

DOI

[44]

J. L. Zeng, Y. Liu, J. S. Su, Y. B. Ge, Y. J. Lu, Y. J. Yin, and J. B. Luo, Iterative dual domain adaptation for neural machine translation, in Proc. 2019 Conf. Empirical Methods in Natural Language Processing and the 9th Int. Joint Conf. Natural Language Processing, Hong Kong, China, 2019, pp. 845-855.

DOI

[45]

J. S. Su, J. L. Zeng, J. Xie, H. T. Wen, Y. J. Yin, and Y. Liu, Exploring discriminative word-level domain contexts for multi-domain neural machine translation, IEEE Transactions on Pattern Analysis and Machine Intelligence, .

DOI Google Scholar

[46]

F. Marzieh, A. Bisazza, and C. Monz, Data augmentation for low-resource neural machine translation, in Proc. 55th Annu. Meeting of the Association for Computational Linguistics, Vancouver, Canada, 2017, pp. 567-573.

[47]

J. T. Gu, Y. Wang, Y. Chen, V. O. K. Li, and K. Cho, Meta-learning for low-resource neural machine translation, in Proc. 2018 Conf. Empirical Methods in Natural Language Processing, Brussels, Belgium, 2018, pp. 3622-3631.

DOI

[48]

H. Setiawan, Z. Q. Huang, and R. Zbib, BBN’s low-resource machine translation for the LoReHLT 2016 evaluation, Machine Translation, vol. 32, no. 1, pp. 45-57, 2018.

DOI Google Scholar

[49]

M. Maimaiti, Y. Liu, H. B. Luan, and M. S. Sun, Multi-round transfer learning for low-resource NMT using multiple high-resource languages, ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 18, no. 4, p. 38, 2019.

DOI Google Scholar

[50]

T. Q. Nguyen and D. Chiang, Transfer learning across low-resource, related languages for neural machine translation, in Proc. 8th Int. Joint Conf. Natural Language Processing, Taipei, China, 2017. pp. 296-301.

About this article

Publication history

Acknowledgements

Rights and permissions

Publication history

Received: 08 August 2020

Revised: 23 August 2020

Accepted: 31 August 2020

Published: 17 August 2021

Issue date: February 2022

Copyright

Acknowledgements

We would like to thank all anonymous reviewers for their valuable comments and suggestions during both the major revision and minor revision for this work. Besides, we also would like to thank Dr. Ivan Hajnal and Leah who is the assistant researcher at multi-lingual team of DAMO academy in Alibaba Group, for supporting the proofreading of our revised paper after major and minor revision patiently. This work was supported by the National Key R $&$ D Program of China (No. 2017YFB0202204), the National Natural Science Foundation of China (Nos. 61925601, 61761166008, and 61772302), Beijing Advanced Innovation Center for Language Resources (No. TYR17002), and the NExT++ project which supported by the National Research Foundation, Prime Ministers Office, Singapore under its IRC@Singapore Funding Initiative.

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).