Publications
Sort:
Open Access Issue
Multi-Task ConvMixer Networks with Triplet Attention for Low-Resource Keyword Spotting
Tsinghua Science and Technology 2025, 30(2): 875-893
Published: 24 September 2024
Abstract PDF (20.3 MB) Collect
Downloads:5

Customized keyword spotting needs to adapt quickly to small user samples. Current methods primarily solve the problem under moderate noise conditions. Recent work increases the level of difficulty in detecting keywords by introducing keyword interference. However, the current solution has been explored on large models with many parameters, making it unsuitable for deployment on small devices. When applying the current solution to lightweight models with minimal training data, the performance degrades compared to the baseline model. Therefore, we propose a light-weight multi-task architecture (< 9.0×104 parameters) created from integrating the triplet attention module in the ConvMixer networks and a new auxiliary mixed labeling encoding to address the challenge. The results of our experiment show that the proposed model outperforms similar light-weight models for keyword spotting, with accuracy gains ranging from 0.73% to 2.95% for a clean set and from 2.01% to 3.37% for a mixed set under different scales of training set. Furthermore, our model shows its robustness in different low-resource language datasets while converging faster.

Total 1