Sort:
Open Access Issue
Enriching the Transfer Learning with Pre-Trained Lexicon Embedding for Low-Resource Neural Machine Translation
Tsinghua Science and Technology 2022, 27 (1): 150-163
Published: 17 August 2021
Downloads:44

Most State-Of-The-Art (SOTA) Neural Machine Translation (NMT) systems today achieve outstanding results based only on large parallel corpora. The large-scale parallel corpora for high-resource languages is easily obtainable. However, the translation quality of NMT for morphologically rich languages is still unsatisfactory, mainly because of the data sparsity problem encountered in Low-Resource Languages (LRLs). In the low-resource NMT paradigm, Transfer Learning (TL) has been developed into one of the most efficient methods. It is difficult to train the model on high-resource languages to include the information in both parent and child models, as well as the initially trained model that only contains the lexicon features and word embeddings of the parent model instead of the child languages feature. In this work, we aim to address this issue by proposing the language-independent Hybrid Transfer Learning (HTL) method for LRLs by sharing lexicon embedding between parent and child languages without leveraging back translation or manually injecting noises. First, we train the High-Resource Languages (HRLs) as the parent model with its vocabularies. Then, we combine the parent and child language pairs using the oversampling method to train the hybrid model initialized by the previously parent model. Finally, we fine-tune the morphologically rich child model using a hybrid model. Besides, we explore some exciting discoveries on the original TL approach. Experimental results show that our model consistently outperforms five SOTA methods in two languages Azerbaijani (Az) and Uzbek (Uz). Meanwhile, our approach is practical and significantly better, achieving improvements of up to 4.94 and 4.84 BLEU points for low-resource child languages Az Zh and Uz Zh, respectively.

Open Access Issue
Portraying User Life Status from Microblogging Posts
Tsinghua Science and Technology 2013, 18 (2): 182-195
Published: 30 April 2013
Downloads:19

Microblogging services provide a novel and popular communication scheme for Web users to share information and express opinions by publishing short posts, which usually reflect the users’ daily life. We can thus model the users’ daily status and interests according to their posts. Because of the high complexity and the large amount of the content of the microblog users’ posts, it is necessary to provide a quick summary of the users’ life status, both for personal users and commercial services. It is non-trivial to summarize the life status of microblog users, particularly when the summary is conducted over a long period. In this paper, we present a compact interactive visualization prototype, LifeCircle, as an efficient summary for exploring the long-term life status of microblog users. The radial visualization provides multiple views for a given microblog user, including annual topics, monthly keywords, monthly sentiments, and temporal trends of posts. We tightly integrate interactive visualization with novel and state-of-the-art microblogging analytics to maximize their advantages. We implement LifeCircle on Sina Weibo, the most popular microblogging service in China, and illustrate the effectiveness of our prototype with various case studies. Results show that our prototype makes users nostalgic and makes them reminiscent about past events, which helps them to better understand themselves and others.

total 2