Paper | Code | eval_loss | ModelName | ReleaseDate |
---|---|---|---|---|
Polynomial, trigonometric, and tropical activations | ✓ Link | 2.91 | GPT2-Hermite | 2025-02-03 |
Loop Neural Networks for Parameter Sharing | 3.11 | GPT2-81M-LOOP | 2024-09-21 | |
Language Models are Unsupervised Multitask Learners | ✓ Link | 3.12 | GPT2-124M | 2019-02-14 |