Journal of Appliance Science & Technology ›› 2022, Vol. 0 ›› Issue (5): 22-25.doi: 10.19784/j.cnki.issn1672-0172.2022.05.002

• Articles • Previous Articles     Next Articles

Gated residual DFSMN acoustic models for large vocabulary speech recognition

HUO Weiming, XU Hao   

  1. GD Midea Air-Conditioning Equipment Co., Ltd. Foshan 528311
  • Online:2022-10-01 Published:2022-11-01

Abstract: Deep Feedforward Sequential Memory Network (DFSMN) is a powerful acoustic model in terms of recognition accuracy. It alleviates the gradient vanishing problem by introducing skip connections between memory blocks in adjacent layers. However, we find it is still a challenging task to optimize the neural networks when training very deep DFSMNs and simply stacking more layers can not lead to better neural networks. Residual learning is an efficient method to help neural networks converge easier and faster when building very deep structures. A novel network architecture named gated residual DFSMN (GR-DFSMN) is proposed. It introduces additional gate controlled shortcut paths from lower DFSMN blocks for efficient training of networks with very deep DFSMN structures. Experimental results have shown that GR-DFSMN can outperform the original DFSMN when training very deep models. In the 1000 hours English Librispeech task, GR-DFSMN Mono-Phone CTC model achieves a 0.7% absolute improvement compared to the original DFSMN Mono-Phone CTC model.

Key words: Speech recognition, DFSMN, Gated residual, CTC

CLC Number: