Paper | Code | 1k | 2k | 4k | 6k | 8k | 12k | 16k | 32k | 64k | 128k | ModelName | ReleaseDate |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GPT-4 Technical Report | ✓ Link | 74.0 | 73.5 | 67.5 | 59.5 | 53.5 | 49.5 | 44.0 | 16.0 | 0.0 | 0.0 | GPT-4-Turbo-1106 | 2023-03-15 |
GPT-4 Technical Report | ✓ Link | 73.5 | 73.5 | 65.5 | 63.0 | 56.5 | 52.0 | 44.5 | 30.0 | 0.0 | 0.0 | GPT-4-Turbo-0125 | 2023-03-15 |
[]() | 65.0 | 43.5 | 23.5 | 15.0 | 17.0 | 12.0 | 11.0 | 4.0 | 0.0 | Claude-2 | |||
[]() | 61.5 | 48.5 | 41.5 | 29.5 | 17.0 | 2.5 | 2.5 | GPT-3.5-Turbo-1106 | |||||
InternLM2 Technical Report | ✓ Link | 58.6 | 49.5 | 33.9 | 12.3 | 13.4 | 2.0 | 0.8 | 0.5 | 0.5 | 0.0 | InternLM2-7b | 2024-03-26 |
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena | ✓ Link | 53.4 | 29.2 | 13.1 | 4.3 | 2.2 | 1.4 | 0.9 | Vicuna-13b-v1.5-16k | 2023-06-09 | |||
GLM-130B: An Open Bilingual Pre-trained Model | ✓ Link | 39.8 | 18.8 | 9.0 | 5.0 | 3.4 | 0.9 | 0.5 | ChatGLM3-6b-32k | 2022-10-05 | |||
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena | ✓ Link | 37.0 | 11.1 | 5.8 | 3.2 | 1.8 | 1.9 | 1.0 | Vicuna-7b-v1.5-16k | 2023-06-09 | |||
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena | ✓ Link | 32.4 | 10.7 | 5.7 | 3.1 | 1.9 | 1.6 | 0.8 | LongChat-7b-v1.5-32k | 2023-06-09 | |||
GLM-130B: An Open Bilingual Pre-trained Model | ✓ Link | 31.2 | 10.9 | 4.5 | 1.6 | 1.6 | 0.0 | 0.3 | ChatGLM2-6b-32k | 2022-10-05 |