keyword-spotting-on-google-speech-commands

Keyword Spotting

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Google Speech Commands V1 12	Google Speech Commands V2 12	Google Speech Commands V2 2	Google Speech Commands V2 20	Google Speech Commands V2 35	Google Speech Commands V1 2	Google Speech Commands V1 20	Google Speech Commands V1 35	Google Speech Commands V1 6	10-keyword Speech Commands dataset	Google Speech Command-Musan	% Test Accuracy	Google Speech Commands	ModelName	ReleaseDate
Learning Efficient Representations for Keyword Spotting with Triplet Loss	✓ Link	98.56	98.37			97.0									TripletLoss-res15	2021-01-12
Broadcasted Residual Learning for Efficient Keyword Spotting	✓ Link	98.0	98.7												BC-ResNet-8	2021-06-08
Wav2KWS: Transfer Learning from Speech Representations for Keyword Spotting	✓ Link	97.9	98.5		97.8										Wav2KWS	2021-05-10
Howl: A Deployed, Open-Source Wake Word Detection System	✓ Link	97.8													res 8	2020-08-21
Keyword Transformer: A Self-Attention Model for Keyword Spotting	✓ Link	97.49 ±0.15	98.56 ±0.07			97.69 ±0.09									KWT-3	2021-04-01
MatchboxNet: 1D Time-Channel Separable Convolutional Neural Network Architecture for Speech Commands Recognition	✓ Link	97.48	97.63												MatchboxNet-3x2x64	2020-04-21
ConvMixer: Feature Interactive Convolution with Curriculum Learning for Small Footprint and Noisy Far-field Keyword Spotting	✓ Link	97.3	98.2												ConvMixer	2022-01-15
Keyword Transformer: A Self-Attention Model for Keyword Spotting	✓ Link	97.27 ±0.08	98.43±0.08			97.74 ±0.03									KWT-2	2021-04-01
Keyword Transformer: A Self-Attention Model for Keyword Spotting	✓ Link	97.26±0.18	98.08±0.10			96.95±0.14									KWT-1	2021-04-01
Streaming keyword spotting on mobile devices	✓ Link	97.2	98												MHAtt-RNN	2020-05-14
Neural Architecture Search For Keyword Spotting		97.06													NAS1	2020-09-01
SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model	✓ Link	96.9				97.4									SSAMBA	2024-05-20
A neural attention model for speech command recognition	✓ Link	95.6	96.9	99.4	94.5	93.9	99.2	94.1	94.3						Attention RNN	2018-08-27
Hello Edge: Keyword Spotting on Microcontrollers	✓ Link	94.4													DS-CNN	2017-11-20
Hello Edge: Keyword Spotting on Microcontrollers	✓ Link	93.5													GRU	2017-11-20
Hello Edge: Keyword Spotting on Microcontrollers	✓ Link	92.9													LSTM	2017-11-20
Hello Edge: Keyword Spotting on Microcontrollers	✓ Link	92.0													Basic LSTM	2017-11-20
Hello Edge: Keyword Spotting on Microcontrollers	✓ Link	91.6													DNN	2017-11-20
Hello Edge: Keyword Spotting on Microcontrollers	✓ Link	84.6													CNN	2017-11-20
Work in Progress: Linear Transformers for TinyML			98.8			99.1									WaveFormer	2024-03-25
EdgeCRNN: an edgecomputing oriented model of acoustic feature enhancement for keyword spotting			98.05												EdgeCRNN 2.0×	2021-03-14
Training Keyword Spotters with Limited and Synthesized Speech Data	✓ Link		97.7												Embedding + Head	2020-01-31
Training Keyword Spotters with Limited and Synthesized Speech Data	✓ Link		97.4												Head without Embedding	2020-01-31
Temporal Convolution for Real-time Keyword Spotting on Mobile Devices	✓ Link		96.6												TC-ResNet14-1.5	2019-04-08
End-to-end Keyword Spotting using Neural Architecture Search and Quantization			95.55												End-to-end KWS model	2021-04-14
MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers	✓ Link		95.3												MicroNet-KWS-L	2020-10-21
Effective Combination of DenseNet andBiLSTM for Keyword Spotting					96.6										DenseNet-BiLTSM	2019-01-19
Multi-layer Attention Mechanism for Speech Keyword Recognition					93.72										LSTM	2019-07-10
Towards on-Device Keyword Spotting using Low-Footprint Quaternion Neural Models	✓ Link					98.60								98.53	QNN	2023-09-15
Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input	✓ Link					98.5									M2D	2022-10-26
End-to-End Audio Strikes Back: Boosting Augmentations Towards An Efficient Audio Classification Network	✓ Link					98.15									EAT-S	2022-04-25
AST: Audio Spectrogram Transformer	✓ Link					98.11									Audio Spectrogram Transformer	2021-04-05
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection	✓ Link					98.0									HTS-AT	2022-02-02
Attention-Free Keyword Spotting	✓ Link					97.56									KW-MLP	2021-10-14
ImportantAug: a data augmentation agent for speech	✓ Link					95						86.7			ImportantAug	2021-12-14
Neural Architecture Search For Keyword Spotting										97.22					NAS2	2020-09-01
Decentralizing Feature Extraction with Quantum Convolutional Neural Network for Automatic Speech Recognition	✓ Link										95.12				Quantum CNN	2020-10-26
Efficient keyword spotting using time delay neural networks											94.3				TDNN	2018-08-28
PATE-AAE: Incorporating Adversarial Autoencoder into Private Aggregation of Teacher Ensembles for Spoken Command Classification											92.37				PATE-AAE (Differentially-Private)	2021-04-02
SubSpectral Normalization for Neural Audio Data Processing													95.4% ±0.22		res8 w/ SSN(S=4, A=Sub)	2021-03-25
SubSpectral Normalization for Neural Audio Data Processing													96.8% ±0.13		res15 w/ SSN(S=4, A=Sub)	2021-03-25
SubSpectral Normalization for Neural Audio Data Processing													97.5% ±0.15		res15 w/ SSN(S=4, A=Sub) (2019)	2021-03-25

OpenCodePapers

keyword-spotting-on-google-speech-commands