text-to-sql-on-bird-big-bench-for-large-scale

Text-To-SQL

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Execution Accuracy % (Test)	Execution Accuracy % (Dev)	Execution Accurarcy (Human)	ModelName	ReleaseDate
A Preview of XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL	✓ Link	75.63	73.34		XiYan-SQL	2024-11-13
[]()		74.12	74.32		DSAIR + GPT-4o
CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL		74.06	73.14		CHASE-SQL + Gemini	2024-10-02
[]()		73.17	72.43		ExSL + granite-34b-code
[]()		72.28	69.3		OpenSearch-SQL+ v2 + GPT-4o
The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models		71.83	67.21		Distillery + GPT-4o	2024-08-14
[]()		70.26	72.16		Insights AI
[]()		70.21	68.12		PURPLE + RED + GPT-4o
[]()		69.40	68.91		MCTS-SQL
[]()		69.03	66.95		RECAP + Gemini
[]()		68.87	65.45		ByteBrain
[]()		67.86	65.38		ExSL + granite-20b-code
CHESS: Contextual Harnessing for Efficient SQL Synthesis	✓ Link	66.69	65		CHESS	2024-05-27
[]()		66.21	67.99		Arcwise + GPT-4o
[]()		65.45	63.36		MCS-SQL + GPT-4
[]()		65.23	64.73		SCL-SQL
[]()		64.95	61.34		OpenSearch-SQL v1 + GPT-4
[]()		64.84	60.5		PB-SQL v1
[]()		64.51	62.97		PURPLE + GPT-4o
[]()		64.00	66.82		MSL-SQL + DeepSeek-V2.5
[]()		63.39	55.48		SENSE-13B
[]()		63.39	55.48		SENSE
[]()		63.22	62.58		GRA-SQL
[]()		62.66	58.5		SuperSQL
[]()		60.71	59.71		Dubo-SQL, v1
[]()		60.37	58.47		SFT CodeS-15B
MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL	✓ Link	59.59	57.56		MAC-SQL + GPT-4	2023-12-18
[]()		59.25	57.17		SFT CodeS-7B
Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation	✓ Link	57.41	54.76		DAIL-SQL + GPT-4	2023-08-29
DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction	✓ Link	55.90	50.72		DIN-SQL + GPT-4	2023-04-21
Can LLMs Effectively Leverage Graph Structural Information through Prompts, and Why?	✓ Link	54.89	46.35		GPT-4 (Baseline)	2023-09-28
Can LLMs Effectively Leverage Graph Structural Information through Prompts, and Why?	✓ Link	49.02	42.70		Claude-2 (Baseline)	2023-09-28
[]()		47.74	37.68		Open SQL-7B
Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs	✓ Link	40.08	36.64		CoT + ChatGPT	2023-05-04
Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs	✓ Link	39.30	37.22		ChatGPT (Baseline)	2023-05-04
Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs	✓ Link	36.47	34.35		Codex (Baseline)	2023-05-04
Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs	✓ Link	33.04	27.38		Palm-2 (Baseline)	2023-05-04
MSc-SQL: Multi-Sample Critiquing Small Language Models For Text-To-SQL Translation	✓ Link		65.6		MSc-SQL	2024-10-16
[]()			64.62		SFT CodeS-15B + SQLFixAgent
Knowledge-to-SQL: Enhancing SQL Generation with Data Expert LLM	✓ Link		48.92		DELLM + MAC-SQL	2024-02-18
Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs	✓ Link			92.96	Human Performance	2023-05-04

OpenCodePapers

text-to-sql-on-bird-big-bench-for-large-scale