AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (1.2 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Article | Open Access

Pretraining Enhanced RNN Transducer

Junyu Lu1Rongzhong Lian1Di Jiang1( )Yuanfeng Song1Zhiyang Su2Victor Junqiu Wei2Lin Yang2
WeBank Co., Ltd., Shenzhen 518000, China
Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong 999077, China
Show Author Information

Abstract

Recurrent neural network transducer (RNN-T) is an important branch of current end-to-end automatic speech recognition (ASR). Various promising approaches have been designed for boosting RNN-T architecture; however, few studies exploit the effectiveness of pretrained methods in this framework. In this paper, we introduce the pretrained acoustic extractor (PAE) and the pretrained linguistic network (PLN) to enhance the Conformer long short-term memory (Conformer-LSTM) transducer. First, we construct the input of the acoustic encoder with two different latent representations: one extracted by PAE from the raw waveform, and the other obtained from filter-bank transformation. Second, we fuse an extra semantic feature from the PLN into the joint network to reduce illogical and homophonic errors. Compared with previous works, our approaches are able to obtain pretrained representations for better model generalization. Evaluation on two large-scale datasets has demonstrated that our proposed approaches yield better performance than existing approaches.

References

【1】
【1】
 
 
CAAI Artificial Intelligence Research
Article number: 9150039

{{item.num}}

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Close
Close
Cite this article:
Lu J, Lian R, Jiang D, et al. Pretraining Enhanced RNN Transducer. CAAI Artificial Intelligence Research, 2024, 3: 9150039. https://doi.org/10.26599/AIR.2024.9150039

3052

Views

263

Downloads

1

Crossref

Received: 22 February 2024
Revised: 28 June 2024
Accepted: 23 July 2024
Published: 11 September 2024
© The author(s) 2024.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).