Publications

Attention based on-device streaming speech recognition with large speech corpus

Published

IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)

Date

2019.12.14

Research Areas

Abstract

In this paper, we present a new on-device automatic speechrecognition  (ASR)  system  based  on  monotonic  chunk-wiseattention  (MoChA)  models.  MoChA  model  performancefinally surpassed that of the previous conventional ASR sys-tems  through  connectionist  temporal  classifier  (CTC)  andcross entropy (CE) joint training, layer-wise pretraining, andMinimum word error rate (MWER) training. In addition, wereduce  the  model  size  by  around  4.7  times  using  a  hyperlow-rank  assumption  (LRA)  method  with  minimal  sacri-ficing  in  recognition  accuracy.  The  memory  footprint  wasfurther  reduced  1/4  times  using  8-bit  quantization  to  bringdown  the  final  model  size  to  around  40  MB.  For  the  per-sonalized  on-device  ASR  system,  we  fused  n-gram  modelsand result of MoChA models also used weighted finite statetransducer (WFST) based method to improve the on-demanduser-context based speech recognition.