OpenCodePapers
dialogue-safety-prediction-on-rt-inod
Dialogue Understanding
Dialogue Safety Prediction
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
Show papers without code
Paper
Code
Best-of
↕
ModelName
ReleaseDate
↕
Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations
✓ Link
0.92
Baseline
2024-04-15
Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations
✓ Link
0.91
GPT-4
2024-04-15
Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations
✓ Link
0.91
Gemma
2024-04-15
Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations
✓ Link
0.87
Mistral
2024-04-15
Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations
✓ Link
0.86
Llama2
2024-04-15