Paper | Code | Perplexity | ModelName | ReleaseDate |
---|---|---|---|---|
Transformer Quality in Linear Time | ✓ Link | 14.998 | FLASH-Quad-8k | 2022-02-21 |
Combiner: Full Attention Transformer with Sparse Computation Cost | ✓ Link | 16.49 | Combiner-Axial-8k | 2021-07-12 |
Combiner: Full Attention Transformer with Sparse Computation Cost | ✓ Link | 16.60 | Combiner-Fixed-8k | 2021-07-12 |