question-answering-on-hotpotqa

Question Answering

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	JOINT-F1	ANS-EM	ANS-F1	SUP-EM	SUP-F1	JOINT-EM	ModelName	ReleaseDate
End-to-End Beam Retrieval for Multi-Hop Question Answering	✓ Link	0.775	0.727	0.850	0.663	0.901	0.505	Beam Retrieval	2023-08-17
Big Bird: Transformers for Longer Sequences	✓ Link	0.736		0.755		0.891		BigBird-etc	2020-07-28
Adaptive Information Seeking for Open-Domain Question Answering	✓ Link	0.720	0.675	0.805	0.612	0.860	0.449	AISO	2021-09-14
Chain-of-Skills: A Configurable Model for Open-domain Question Answering	✓ Link	0.717	0.674	0.801	0.613	0.853	0.457	Chain-of-Skills	2023-05-04
[]()		0.708	0.670	0.795	0.594	0.843	0.444	TPRR
HopRetriever: Retrieve Hops over Wikipedia to Answer Complex Questions		0.706	0.671	0.799	0.574	0.835	0.432	HopRetriever + Sp-search	2020-12-31
[]()		0.700	0.662	0.793	0.573	0.840	0.420	EBS-Large
[]()		0.698	0.671	0.799	0.572	0.826	0.431	HopRetriever
Answering Open-Domain Questions of Varying Reasoning Steps from Text	✓ Link	0.696	0.663	0.791	0.569	0.832	0.428	IRRR+	2020-10-23
[]()		0.689	0.655	0.786	0.559	0.831	0.409	EBS-SH
Answering Open-Domain Questions of Varying Reasoning Steps from Text	✓ Link	0.686	0.657	0.782	0.559	0.821	0.421	IRRR	2020-10-23
[]()		0.678	0.648	0.778	0.561	0.818	0.410	HopRetriever-V2
[]()		0.670	0.646	0.778	0.557	0.812	0.411	AFSGraph-retriever
Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval	✓ Link	0.666	0.623	0.753	0.575	0.809	0.418	Recursive Dense Retriever	2020-09-27
[]()		0.662	0.630	0.754	0.546	0.800	0.404	Step-by-Step Retriever
Answering Any-hop Open-domain Questions with Iterative Document Reranking		0.639	0.625	0.759	0.510	0.789	0.360	DDRQA	2020-09-16
[]()		0.639	0.608	0.739	0.531	0.793	0.380	HopRetriever-V1
[]()		0.630	0.620	0.753	0.499	0.778	0.354	DR model large
[]()		0.629	0.617	0.746	0.500	0.772	0.368	Model name
[]()		0.629	0.617	0.746	0.500	0.772	0.368	HopAns
[]()		0.629	0.604	0.732	0.520	0.771	0.380	Anonymous
[]()		0.624	0.615	0.746	0.503	0.772	0.362	Multi-dimensional-AFSGraph
[]()		0.623	0.597	0.714	0.510	0.774	0.379	HGN-albert + SemanticRetrievalMRS IR
[]()		0.617	0.603	0.731	0.499	0.768	0.359	Tree-shaped-cluster
[]()		0.617	0.601	0.730	0.500	0.769	0.359	AFSgraph
Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering	✓ Link	0.612	0.600	0.730	0.491	0.764	0.354	Robustly Fine-tuned Graph-based Recurrent Retriever	2019-11-24
[]()		0.609	0.601	0.730	0.485	0.759	0.350	AFSgraph model
[]()		0.607	0.579	0.699	0.510	0.768	0.372	HGN-large + SemanticRetrievalMRS IR
[]()		0.602	0.598	0.727	0.480	0.749	0.345	RoBERTa-DenseRetriever-Fast
[]()		0.602	0.598	0.727	0.480	0.749	0.345	DPR-recurrent
[]()		0.601	0.596	0.724	0.479	0.748	0.345	RoBERTa-DenseRetriever
Hierarchical Graph Network for Multi-hop Question Answering	✓ Link	0.599	0.567	0.692	0.500	0.764	0.356	HGN + SemanticRetrievalMRS IR	2019-11-09
Dynamically Fused Graph Network for Multi-hop Reasoning	✓ Link	0.5982						DFGN	2019-05-16
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering	✓ Link	0.598	0.589	0.716	0.480	0.757	0.345	SAFSR model	2018-09-25
[]()		0.569	0.582	0.709	0.429	0.713	0.310	GraphRR-Fast
[]()		0.568	0.588	0.717	0.416	0.725	0.293	DR model
A Simple Yet Strong Pipeline for HotpotQA		0.562	0.555	0.675	0.456	0.730	0.329	Quark + SemanticRetrievalMRS IR	2020-04-14
[]()		0.561	0.523	0.648	0.490	0.747	0.330	GAR-BERT
[]()		0.553	0.560	0.689	0.441	0.730	0.292	Graph-based Recurrent Retriever
[]()		0.548	0.529	0.648	0.428	0.720	0.312	MIR+EPS+BERT
[]()		0.530	0.482	0.613	0.483	0.739	0.306	GAR
Transformer-XH: Multi-Evidence Reasoning with eXtra Hop Attention	✓ Link	0.513	0.516	0.641	0.409	0.714	0.261	Transformer-XH-final	2020-05-01
[]()		0.496	0.490	0.608	0.417	0.700	0.271	Transformer-XH
Revealing the Importance of Semantic Retrieval for Machine Reading at Scale	✓ Link	0.476	0.453	0.573	0.387	0.708	0.251	SemanticRetrievalMRS	2019-09-17
[]()		0.429	0.421	0.517	0.371	0.598	0.247	DrKIT
[]()		0.392	0.418	0.531	0.263	0.573	0.170	Entity-centric BERT Pipeline
[]()		0.391	0.433	0.538	0.219	0.596	0.145	PR-Bert
Answering Complex Open-domain Questions Through Iterative Query Generation	✓ Link	0.391	0.379	0.486	0.307	0.642	0.180	GoldEn Retriever	2019-10-15
[]()		0.370	0.394	0.514	0.242	0.585	0.133	SAFSr-Bert
Cognitive Graph for Multi-Hop Reading Comprehension at Scale	✓ Link	0.349	0.371	0.489	0.228	0.577	0.124	Cognitive Graph QA	2019-05-14
[]()		0.334	0.475	0.606	0.076	0.448	0.049	GAR-NOSF
[]()		0.304	0.358	0.453	0.160	0.512	0.115	IKFGraph
[]()		0.291	0.369	0.460	0.153	0.468	0.115	AnonymousQ
[]()		0.284	0.335	0.427	0.156	0.493	0.110	HGN Model-reproduce
Multi-Hop Paragraph Retrieval for Open-Domain Question Answering	✓ Link	0.270	0.306	0.403	0.167	0.473	0.109	MUPPET	2019-06-15
[]()		0.258	0.299	0.391	0.132	0.497	0.083	GRN + BERT
[]()		0.255	0.354	0.463	0.001	0.432	0.000	Entity-centric IR
Multi-Paragraph Reasoning with Knowledge-enhanced Graph Neural Network		0.247	0.277	0.372	0.127	0.472	0.070	KGNN	2019-11-06
[]()		0.245	0.284	0.386	0.147	0.472	0.086	SAQA
[]()		0.236	0.273	0.365	0.122	0.488	0.074	GRN
Answering while Summarizing: Multi-task Learning for Multi-hop QA with Evidence Extraction		0.231	0.287	0.381	0.142	0.444	0.087	QFE	2019-05-21
[]()		0.209	0.289	0.391	0.080	0.406	0.041	SAFSr_model
[]()		0.175	0.236	0.320	0.056	0.400	0.033	SuppBERT
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering	✓ Link	0.162	0.240	0.329	0.039	0.377	0.019	Baseline Model	2018-09-25
[]()		0.011	0.074	0.121	0.000	0.078	0.000	tes
[]()		0.000	0.581	0.711	0.000	0.000	0.000	PromptRank-fewshot-2-demo
[]()		0.000	0.581	0.710	0.000	0.000	0.000	graph-recurrent-retriever+roberta-base w. S/R-pretraining
[]()		0.000	0.360	0.474	0.000	0.000	0.000	TPReasoner w/o BERT
[]()		0.000	0.307	0.402	0.000	0.000	0.000	MultiQA
Multi-hop Reading Comprehension through Question Decomposition and Rescoring	✓ Link	0.000	0.300	0.407	0.000	0.000	0.000	DecompRC	2019-06-07
[]()		0.000	0.300	0.407	0.000	0.000	0.000
[]()		0.000	0.080	0.221	0.000	0.000	0.000	Mistral multi hop with very large sources

OpenCodePapers

question-answering-on-hotpotqa