Learning Efficient Representations for Keyword Spotting with Triplet Loss | ✓ Link | 98.56 | 98.37 | | | 97.0 | | | | | | | | | TripletLoss-res15 | 2021-01-12 |
Broadcasted Residual Learning for Efficient Keyword Spotting | ✓ Link | 98.0 | 98.7 | | | | | | | | | | | | BC-ResNet-8 | 2021-06-08 |
Wav2KWS: Transfer Learning from Speech Representations for Keyword Spotting | ✓ Link | 97.9 | 98.5 | | 97.8 | | | | | | | | | | Wav2KWS | 2021-05-10 |
Howl: A Deployed, Open-Source Wake Word Detection System | ✓ Link | 97.8 | | | | | | | | | | | | | res 8 | 2020-08-21 |
Keyword Transformer: A Self-Attention Model for Keyword Spotting | ✓ Link | 97.49 ±0.15 | 98.56 ±0.07 | | | 97.69 ±0.09 | | | | | | | | | KWT-3 | 2021-04-01 |
MatchboxNet: 1D Time-Channel Separable Convolutional Neural Network Architecture for Speech Commands Recognition | ✓ Link | 97.48 | 97.63 | | | | | | | | | | | | MatchboxNet-3x2x64 | 2020-04-21 |
ConvMixer: Feature Interactive Convolution with Curriculum Learning for Small Footprint and Noisy Far-field Keyword Spotting | ✓ Link | 97.3 | 98.2 | | | | | | | | | | | | ConvMixer | 2022-01-15 |
Keyword Transformer: A Self-Attention Model for Keyword Spotting | ✓ Link | 97.27 ±0.08 | 98.43±0.08 | | | 97.74 ±0.03 | | | | | | | | | KWT-2 | 2021-04-01 |
Keyword Transformer: A Self-Attention Model for Keyword Spotting | ✓ Link | 97.26±0.18 | 98.08±0.10 | | | 96.95±0.14 | | | | | | | | | KWT-1 | 2021-04-01 |
Streaming keyword spotting on mobile devices | ✓ Link | 97.2 | 98 | | | | | | | | | | | | MHAtt-RNN | 2020-05-14 |
Neural Architecture Search For Keyword Spotting | | 97.06 | | | | | | | | | | | | | NAS1 | 2020-09-01 |
SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model | ✓ Link | 96.9 | | | | 97.4 | | | | | | | | | SSAMBA | 2024-05-20 |
A neural attention model for speech command recognition | ✓ Link | 95.6 | 96.9 | 99.4 | 94.5 | 93.9 | 99.2 | 94.1 | 94.3 | | | | | | Attention RNN | 2018-08-27 |
Hello Edge: Keyword Spotting on Microcontrollers | ✓ Link | 94.4 | | | | | | | | | | | | | DS-CNN | 2017-11-20 |
Hello Edge: Keyword Spotting on Microcontrollers | ✓ Link | 93.5 | | | | | | | | | | | | | GRU | 2017-11-20 |
Hello Edge: Keyword Spotting on Microcontrollers | ✓ Link | 92.9 | | | | | | | | | | | | | LSTM | 2017-11-20 |
Hello Edge: Keyword Spotting on Microcontrollers | ✓ Link | 92.0 | | | | | | | | | | | | | Basic LSTM | 2017-11-20 |
Hello Edge: Keyword Spotting on Microcontrollers | ✓ Link | 91.6 | | | | | | | | | | | | | DNN | 2017-11-20 |
Hello Edge: Keyword Spotting on Microcontrollers | ✓ Link | 84.6 | | | | | | | | | | | | | CNN | 2017-11-20 |
Work in Progress: Linear Transformers for TinyML | | | 98.8 | | | 99.1 | | | | | | | | | WaveFormer | 2024-03-25 |
EdgeCRNN: an edgecomputing oriented model of acoustic feature enhancement for keyword spotting | | | 98.05 | | | | | | | | | | | | EdgeCRNN 2.0× | 2021-03-14 |
Training Keyword Spotters with Limited and Synthesized Speech Data | ✓ Link | | 97.7 | | | | | | | | | | | | Embedding + Head | 2020-01-31 |
Training Keyword Spotters with Limited and Synthesized Speech Data | ✓ Link | | 97.4 | | | | | | | | | | | | Head without Embedding | 2020-01-31 |
Temporal Convolution for Real-time Keyword Spotting on Mobile Devices | ✓ Link | | 96.6 | | | | | | | | | | | | TC-ResNet14-1.5 | 2019-04-08 |
End-to-end Keyword Spotting using Neural Architecture Search and Quantization | | | 95.55 | | | | | | | | | | | | End-to-end KWS model | 2021-04-14 |
MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers | ✓ Link | | 95.3 | | | | | | | | | | | | MicroNet-KWS-L | 2020-10-21 |
Effective Combination of DenseNet andBiLSTM for Keyword Spotting | | | | | 96.6 | | | | | | | | | | DenseNet-BiLTSM | 2019-01-19 |
Multi-layer Attention Mechanism for Speech Keyword Recognition | | | | | 93.72 | | | | | | | | | | LSTM | 2019-07-10 |
Towards on-Device Keyword Spotting using Low-Footprint Quaternion Neural Models | ✓ Link | | | | | 98.60 | | | | | | | | 98.53 | QNN | 2023-09-15 |
Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input | ✓ Link | | | | | 98.5 | | | | | | | | | M2D | 2022-10-26 |
End-to-End Audio Strikes Back: Boosting Augmentations Towards An Efficient Audio Classification Network | ✓ Link | | | | | 98.15 | | | | | | | | | EAT-S | 2022-04-25 |
AST: Audio Spectrogram Transformer | ✓ Link | | | | | 98.11 | | | | | | | | | Audio Spectrogram Transformer | 2021-04-05 |
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection | ✓ Link | | | | | 98.0 | | | | | | | | | HTS-AT | 2022-02-02 |
Attention-Free Keyword Spotting | ✓ Link | | | | | 97.56 | | | | | | | | | KW-MLP | 2021-10-14 |
ImportantAug: a data augmentation agent for speech | ✓ Link | | | | | 95 | | | | | | 86.7 | | | ImportantAug | 2021-12-14 |
Neural Architecture Search For Keyword Spotting | | | | | | | | | | 97.22 | | | | | NAS2 | 2020-09-01 |
Decentralizing Feature Extraction with Quantum Convolutional Neural Network for Automatic Speech Recognition | ✓ Link | | | | | | | | | | 95.12 | | | | Quantum CNN | 2020-10-26 |
Efficient keyword spotting using time delay neural networks | | | | | | | | | | | 94.3 | | | | TDNN | 2018-08-28 |
PATE-AAE: Incorporating Adversarial Autoencoder into Private Aggregation of Teacher Ensembles for Spoken Command Classification | | | | | | | | | | | 92.37 | | | | PATE-AAE (Differentially-Private) | 2021-04-02 |
SubSpectral Normalization for Neural Audio Data Processing | | | | | | | | | | | | | 95.4% ±0.22 | | res8 w/ SSN(S=4, A=Sub) | 2021-03-25 |
SubSpectral Normalization for Neural Audio Data Processing | | | | | | | | | | | | | 96.8% ±0.13 | | res15 w/ SSN(S=4, A=Sub) | 2021-03-25 |
SubSpectral Normalization for Neural Audio Data Processing | | | | | | | | | | | | | 97.5% ±0.15 | | res15 w/ SSN(S=4, A=Sub) (2019) | 2021-03-25 |