家电科技 ›› 2022, Vol. 0 ›› Issue (5): 22-25.doi: 10.19784/j.cnki.issn1672-0172.2022.05.002

• 论文 • 上一篇    下一篇

用于大词汇量语音识别的门控残差DFSMN声波模型

霍伟明, 徐浩   

  1. 广东美的制冷设备有限公司 广东佛山 528311
  • 出版日期:2022-10-01 发布日期:2022-11-01
  • 通讯作者: 徐浩,E-mail:xuhao25@midea.com。
  • 作者简介:霍伟明,学士。研究方向:人工智能。地址:佛山市顺德区北滘镇林港路22号。E-mail:weiming.huo@midea.com。

Gated residual DFSMN acoustic models for large vocabulary speech recognition

HUO Weiming, XU Hao   

  1. GD Midea Air-Conditioning Equipment Co., Ltd. Foshan 528311
  • Online:2022-10-01 Published:2022-11-01

摘要: 深度前馈序列记忆网络(DFSMN,Deep Feedforward Sequential Memory Network)是一种识别精度较高的声学模型,其在相邻的记忆块间引入跳跃链接来缓解梯度消失问题。而训练一个深层堆叠的DFSMN仍是十分具有挑战性的任务,且简单的网络层堆叠并不能使网络模型的性能得到提升。在构造非常深的神经网络结构时,残差学习是一种有效的方法,可以帮助神经网络更容易、更快地收敛。提出一种名为门控残差DFSMN(Gated Residual DFSMN,GR-DFSMN)的新型网络结构。该模型从低层DFSMN块引入了额外的门控捷径用于有效地训练深层DFSMN结构的网络。实验结果表明,当训练非常深的模型时,GR-DFSMN相比于普通的DFSMN具有较好的性能。在1000小时的大规模英语语料库任务中,当层数达到40时,与DFSMN相比,GR-DFSMN在四个测试集上评估所得的平均字错误率降低了0.7%。

关键词: 语音识别, DFSMN, 门控残差, CTC

Abstract: Deep Feedforward Sequential Memory Network (DFSMN) is a powerful acoustic model in terms of recognition accuracy. It alleviates the gradient vanishing problem by introducing skip connections between memory blocks in adjacent layers. However, we find it is still a challenging task to optimize the neural networks when training very deep DFSMNs and simply stacking more layers can not lead to better neural networks. Residual learning is an efficient method to help neural networks converge easier and faster when building very deep structures. A novel network architecture named gated residual DFSMN (GR-DFSMN) is proposed. It introduces additional gate controlled shortcut paths from lower DFSMN blocks for efficient training of networks with very deep DFSMN structures. Experimental results have shown that GR-DFSMN can outperform the original DFSMN when training very deep models. In the 1000 hours English Librispeech task, GR-DFSMN Mono-Phone CTC model achieves a 0.7% absolute improvement compared to the original DFSMN Mono-Phone CTC model.

Key words: Speech recognition, DFSMN, Gated residual, CTC

中图分类号: