OpenCodePapers

long-context-understanding-on-ada-leval

Long-Context Understanding

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	1k	2k	4k	6k	8k	12k	16k	32k	64k	128k	ModelName	ReleaseDate
GPT-4 Technical Report	✓ Link	74.0	73.5	67.5	59.5	53.5	49.5	44.0	16.0	0.0	0.0	GPT-4-Turbo-1106	2023-03-15
GPT-4 Technical Report	✓ Link	73.5	73.5	65.5	63.0	56.5	52.0	44.5	30.0	0.0	0.0	GPT-4-Turbo-0125	2023-03-15
[]()		65.0	43.5	23.5	15.0	17.0	12.0	11.0	4.0	0.0		Claude-2
[]()		61.5	48.5	41.5	29.5	17.0	2.5	2.5				GPT-3.5-Turbo-1106
InternLM2 Technical Report	✓ Link	58.6	49.5	33.9	12.3	13.4	2.0	0.8	0.5	0.5	0.0	InternLM2-7b	2024-03-26
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena	✓ Link	53.4	29.2	13.1	4.3	2.2	1.4	0.9				Vicuna-13b-v1.5-16k	2023-06-09
GLM-130B: An Open Bilingual Pre-trained Model	✓ Link	39.8	18.8	9.0	5.0	3.4	0.9	0.5				ChatGLM3-6b-32k	2022-10-05
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena	✓ Link	37.0	11.1	5.8	3.2	1.8	1.9	1.0				Vicuna-7b-v1.5-16k	2023-06-09
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena	✓ Link	32.4	10.7	5.7	3.1	1.9	1.6	0.8				LongChat-7b-v1.5-32k	2023-06-09
GLM-130B: An Open Bilingual Pre-trained Model	✓ Link	31.2	10.9	4.5	1.6	1.6	0.0	0.3				ChatGLM2-6b-32k	2022-10-05