OpenCodePapers
sentence-ordering-on-econlogicqa
Sentence Ordering
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
Show papers without code
Paper
Code
Accuracy
↕
ModelName
ReleaseDate
↕
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
✓ Link
0.5692
GPT-4-Turbo
2024-05-13
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
✓ Link
0.5538
GPT-4
2024-05-13
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
✓ Link
0.3769
GPT-3.5-Turbo
2024-05-13
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
✓ Link
0.3462
Llama-3-8B-Instruct
2024-05-13
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
✓ Link
0.3154
Mistral-7B-Instruct-v0.2
2024-05-13
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
✓ Link
0.2615
Mistral-7B-v0.1
2024-05-13
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
✓ Link
0.2615
Mistral-7B-v0.2
2024-05-13
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
✓ Link
0.2385
Llama-3-8B
2024-05-13
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
✓ Link
0.2308
Zephyr-7B-Alpha
2024-05-13
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
✓ Link
0.2077
Yi-6B-Chat
2024-05-13
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
✓ Link
0.1769
Zephyr-7B-Beta
2024-05-13
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
✓ Link
0.1538
Mistral-7B-Instruct-v0.1
2024-05-13
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
✓ Link
0.1462
Llama-2-13B-Chat
2024-05-13
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
✓ Link
0.0923
Llama-2-7B-Chat
2024-05-13
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
✓ Link
0.0846
Gemma-2B-IT
2024-05-13
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
✓ Link
0.0385
Yi-6B
2024-05-13
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
✓ Link
0.0231
Gemma-7B-IT
2024-05-13
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning
✓ Link
0.0077
Llama-2-7B
2024-05-13