OpenCodePapers

language-modelling-on-wiki-40b

Language Modelling
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodePerplexityModelNameReleaseDate
Transformer Quality in Linear Time✓ Link14.998FLASH-Quad-8k2022-02-21
Combiner: Full Attention Transformer with Sparse Computation Cost✓ Link16.49Combiner-Axial-8k2021-07-12
Combiner: Full Attention Transformer with Sparse Computation Cost✓ Link16.60Combiner-Fixed-8k2021-07-12