OpenCodePapers

dialogue-safety-prediction-on-rt-inod

Dialogue UnderstandingDialogue Safety Prediction
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeBest-ofModelNameReleaseDate
Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations✓ Link0.92Baseline2024-04-15
Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations✓ Link0.91GPT-42024-04-15
Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations✓ Link0.91Gemma2024-04-15
Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations✓ Link0.87Mistral2024-04-15
Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations✓ Link0.86Llama22024-04-15