Multi-stage progressive compression of conformer models

Published

Annual Conference of the International Speech Communication Association(INTERSPEECH)

Date

2022.06.15

Research Areas

Abstract

The smaller memory bandwidth in smart devices prompts
development of smaller Automatic Speech Recognition (ASR)
models. To obtain a smaller model, one can employ the model
compression techniques. Knowledge distillation (KD) is a pop ular model compression approach that has shown to achieve
smaller model size with relatively lesser degradation in the
model performance. In this approach, knowledge is distilled
from a trained large size teacher model to a smaller size student
model. Also, the transducer based models have recently shown
to perform well for on-device streaming ASR task, while the
conformer models are efficient in handling long term depen dencies. Hence in this work we employ a streaming transducer
architecture with conformer as the encoder. We propose a multi stage progressive approach to compress the conformer trans ducer model using KD. We progressively update our teacher
model with the distilled student model in a multi-stage setup.
On standard LibriSpeech dataset, our experimental results have
successfully achieved compression rates greater than 60% with out significant degradation in the performance compared to the
larger teacher model.

View publication

https://www.isca-speech.org/archive/pdfs/interspeech_2022/rathod22_interspeech.pdf

LIST

Publications

Multi-stage progressive compression of conformer models

Published

Date

Research Areas

Abstract

View publication