AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

PDF (1.2 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Article | Open Access

Pretraining Enhanced RNN Transducer

Junyu Lu^¹, Rongzhong Lian^¹, Di Jiang^¹(

), Yuanfeng Song^¹, Zhiyang Su^², Victor Junqiu Wei^², Lin Yang^²

1WeBank Co., Ltd., Shenzhen 518000, China

2Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong 999077, China

Show Author Information

Abstract

Recurrent neural network transducer (RNN-T) is an important branch of current end-to-end automatic speech recognition (ASR). Various promising approaches have been designed for boosting RNN-T architecture; however, few studies exploit the effectiveness of pretrained methods in this framework. In this paper, we introduce the pretrained acoustic extractor (PAE) and the pretrained linguistic network (PLN) to enhance the Conformer long short-term memory (Conformer-LSTM) transducer. First, we construct the input of the acoustic encoder with two different latent representations: one extracted by PAE from the raw waveform, and the other obtained from filter-bank transformation. Second, we fuse an extra semantic feature from the PLN into the joint network to reduce illogical and homophonic errors. Compared with previous works, our approaches are able to obtain pretrained representations for better model generalization. Evaluation on two large-scale datasets has demonstrated that our proposed approaches yield better performance than existing approaches.

Keywords

pretraining automatic speech recognition self-supervised learning

References

【1】

Crossref Google Scholar

CAAI Artificial Intelligence Research

Volume 3,
2024

Article number: 9150039

DOI: 10.26599/AIR.2024.9150039

	{{item.num}}
{{version.versionName}} Author Response
{{version.versionName}} Review comment

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Cite this Report

. . , , {{reviewData.reportCite.doi}}

Cite this article:

Lu J, Lian R, Jiang D, et al. Pretraining Enhanced RNN Transducer. CAAI Artificial Intelligence Research, 2024, 3: 9150039. https://doi.org/10.26599/AIR.2024.9150039

3374

Views

267

Downloads

Crossref

Google Scholar
Citation

Received: 22 February 2024

Revised: 28 June 2024

Accepted: 23 July 2024

Published: 11 September 2024

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).