Journal Home > Volume 23 , Issue 6

Statistical machine translation for low-resource language suffers from the lack of abundant training corpora. Several methods, such as the use of a pivot language, have been proposed as a bridge to translate from one language to another. However, errors will accumulate during the extensive translation pipelines. In this paper, we propose an approach to low-resource language translation by exploiting the pronunciation correlations between languages. We find that the pronunciation features can improve both Chinese-Vietnamese and Vietnamese-Chinese translation qualities. Experimental results show that our proposed model yields effective improvements, and the translation performance (bilingual evaluation understudy score) is improved by a maximum value of 1.03.


menu
Abstract
Full text
Outline
About this article

Integrating Pronunciation into Chinese-Vietnamese Statistical Machine Translation

Show Author's information Anh Tran HuuHeyan HuangYuhang Guo( )Shumin Shiand Ping Jian
Department of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.

Abstract

Statistical machine translation for low-resource language suffers from the lack of abundant training corpora. Several methods, such as the use of a pivot language, have been proposed as a bridge to translate from one language to another. However, errors will accumulate during the extensive translation pipelines. In this paper, we propose an approach to low-resource language translation by exploiting the pronunciation correlations between languages. We find that the pronunciation features can improve both Chinese-Vietnamese and Vietnamese-Chinese translation qualities. Experimental results show that our proposed model yields effective improvements, and the translation performance (bilingual evaluation understudy score) is improved by a maximum value of 1.03.

Keywords: pronunciation integration, low-resource languages, Chinese-Vietnamese machine translation, Sino-Vietnamese words

References(14)

[1]
Koehn P., Statistical Machine Translation. Cambridge, UK: Cambridge University Press, 2010.
DOI
[2]
Zhu X. N.,He Z. J., Wu H., Wang H. F., Zhu C. H., and Zhao T. J., Improving pivot-based statistical machine translation using random walk, in Proc. 2013 Conf. on Empirical Methods in Natural Language Processing, Seattle, DC, USA, 2013, pp. 524–534.
[3]
Zhao H., Yin T. J., and Zhang J. Y., Vietnamese to Chinese machine translation via Chinese character as pivot, in Proc. 27th Pacific Asia Conf. on Language, Information, and Computation (PACLIC 27), Taipei, China, 2013, pp. 250–259.
[4]
Oh Y. M., Pellegrino F., Marsico E., and Coupé C., A quantitative and typological approach to correlating linguistic complexity, in Proc. 5th Conf. on Quantitative Investigations in Theoretical Linguistics, Leuven, Belgium, 2013.
[5]
Vinh N. D., Research on the effects of Sino-Vietnamese pronunciation in helping Vietnamese students study mandarin Chinese, (in Chinese), master dissertation, Northwestern University, Xi’an, China, 2015.
[6]
Koehn P.,Hoang H.,Birch A.,Burch C. C.,Federico M.,Bertoldi N., Cowan B., Shen W., Moran C., Zens R., et al., Moses: Open source toolkit for statistical machine translation, in Proc. 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, Prague, Czech, 2007, pp. 177–180.
[7]
Och F. J. and Ney H., A systematic comparison of various statistical alignment models, Computational Linguistics, vol. 29, no. 1, pp. 19–51, 2003.
[8]
Stolcke A,, Srilm–An extensible language modeling toolkit, in Proc. Int. Conf. on Spoken Language Processing, Denver, CO, USA, 2002.
[9]
Papineni K., Roukos S., Ward T., and Zhu W. J., Bleu: A method for automatic evaluation of machine translation, in Proc. 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, PA, USA, 2002, pp. 311–318.
DOI
[10]
Koehn P. and Hoang H., Factored translation models, in Proc. 2007 Joint Conf. on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech, 2007.
[11]
Birch A., Osborne M., and Koehn P., CCG supertags in factored statistical machine translation, in Proc. Second Workshop on Statistical Machine Translation (ACL), Prague, Czech, 2007, pp. 9–16.
DOI
[12]
Wang R., Osenova P., and Simov K., Linguistically-augmented Bulgarian-to-English statistical machine translation model, in Proc. Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra), Avignon, France, 2012, pp. 119–128.
[13]
Simov K., Simova I., Todorova V., and Osenova P., Factored models for deep machine translation, in Proc. 1st Deep Machine Translation Workshop (DMTW 2015), Prague, Czech, 2015, pp. 97–105.
[14]
Charniak E., Knight K., and Yamada K., Syntax-based language models for statistical machine translation, in Proc. Ninth Machine Translation Summit of the International Association for Machine Translation, New Orleans, LA, USA, 2003.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 15 July 2017
Accepted: 07 August 2017
Published: 17 September 2018
Issue date: December 2018

Copyright

© The authors 2018

Acknowledgements

This work was supported by the National key Basic Research and Development (973) Program of China (No. 2013CB329303), the National Natural Science Foundation of China (Nos. 61502035, 61132009, and 61671064), and Beijing Advanced Innovation Center for Imaging Technology (No. BAICIT-2016007).

Rights and permissions

Return