Paper | Code | Sentiment Consistency | Speaker Consistency | Gender Consistency | Background (Domain) Consistency | Background (Random) Consistency | Room Consistency | Sentiment Alignment | Background Alignment | ModelName | ReleaseDate |
---|---|---|---|---|---|---|---|---|---|---|---|
Spirit LM: Interleaved Spoken and Written Language Model | ✓ Link | 73.5 | 81.0 | 85.0 | 55.0 | 64.0 | 54.5 | 52.0 | 59.5 | Spirit-LM (Expr.) | 2024-02-08 |
LAST: Language Model Aware Speech Tokenization | 65.0 | 64.5 | 68.5 | 56.0 | 61.0 | 62.5 | 53.5 | 53.0 | LAST 1.3B | 2024-09-05 | |
LAST: Language Model Aware Speech Tokenization | 64.0 | 63.0 | 70.5 | 55.5 | 60.5 | 61.0 | 51.5 | 54.5 | LAST 350M | 2024-09-05 | |
Textually Pretrained Speech Language Models | ✓ Link | 61.5 | 71.0 | 70.0 | 55.0 | 60.5 | 62.0 | 51.5 | 54.5 | TWIST 7B | 2023-05-22 |
Textually Pretrained Speech Language Models | ✓ Link | 61.5 | 69.0 | 69.5 | 55.5 | 60.5 | 59.0 | 53.0 | 56.5 | TWIST 1.3B | 2023-05-22 |
Textually Pretrained Speech Language Models | ✓ Link | 59.0 | 69.5 | 68.0 | 54.0 | 61.5 | 59.0 | 51.5 | 56.5 | TWIST 350M | 2023-05-22 |
[]() | 59.0 | 68.0 | 70.5 | 61.0 | TASLM 1B (token) | ||||||
[]() | 57.5 | 67.0 | 75.5 | 50.0 | TASLM 1B (embedding) | ||||||
Spirit LM: Interleaved Spoken and Written Language Model | ✓ Link | 54.5 | 69.5 | 67.0 | 53.5 | 55.5 | 54.5 | 48.0 | 51.5 | Spirit-LM (base) | 2024-02-08 |
Text-Free Prosody-Aware Generative Spoken Language Modeling | ✓ Link | 40.5 | 83.0 | 88.5 | 57.0 | 66.0 | 53.5 | 55.5 | 53.5 | pGSLM | 2021-09-07 |