Grouped Attention – A modified version of the attention mechanism that makes it agnostic to sequence-length.Progressive Downsampling – A progressive reduction scheme for the length of the encoded sequence, inspired by ContextNet.For example, our Conformer-1 is built on top of the Efficient Conformer, a modification of the original Conformer architecture that introduces the following technical modifications: To do this, we introduce a number of modifications to the original Conformer architecture. With Conformer-1, our goal was to train a production-ready speech recognition model that can be deployed at extremely large scale and that maximally leverages the original Conformer architecture’s outstanding modeling capabilities. This makes the original Conformer architecture slow to operate at both training and inference tasks compared to other existing architectures, and poses an engineering challenge for its deployment within large scale ASR systems. The core usage of the attention mechanism in Conformer, essential to capture and retain long-term information in an input sequence, is in fact well-known to be a computational bottleneck. While the Conformer architecture has shown state-of-the-art performance in speech recognition, its main downside lies in its computational and memory efficiency. By integrating convolutional layers into the Transformer architecture, the Conformer can capture both local and global dependencies while being a relatively size-efficient neural net architecture. The Conformer builds upon the now-ubiquitous Transformer architecture, which is famous for its parallelizability and heavy use of the attention mechanism. The Conformer is a neural net for speech recognition that was published by Google Brain in 2020. Conformer-1’s architecture A model that leverages Transformer and Convolutional layers for speech recognition.
0 Comments
Leave a Reply. |