GPT-4 Technical Report | ✓ Link | 0.59 | | | | | | | | | | GPT-4 (RLHF) | 2023-03-15 |
TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space | ✓ Link | 0.56 | 0.75 | | | | | | | | | Mistral-7B-Instruct-v0.2 + TruthX | 2024-02-27 |
TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space | ✓ Link | 0.54 | 0.74 | | | | | | | | | LLaMa-2-7B-Chat + TruthX | 2024-02-27 |
Representation Engineering: A Top-Down Approach to AI Transparency | ✓ Link | 0.54 | | | | | | | | | | LLaMA-2-Chat-13B + Representation Control (Contrast Vector) | 2023-10-02 |
Representation Engineering: A Top-Down Approach to AI Transparency | ✓ Link | 0.48 | | | | | | | | | | LLaMA-2-Chat-7B + Representation Control (Contrast Vector) | 2023-10-02 |
[]() | | 0.389 | | 88.6 | 83.5 | | | | | | | Vicuna 7B + Inference Time Intervention (ITI) | |
[]() | | 0.319 | | 66.6 | 97.7 | | | | | | | Alpaca 7B + Inference Time Intervention (ITI) | |
Scaling Language Models: Methods, Analysis & Insights from Training Gopher | ✓ Link | 0.295 | | | | | | | | | | Gopher 280B (zero-shot, Our Prompt + Choices) | 2021-12-08 |
[]() | | 0.288 | | 45.1 | 93.8 | | | | | | | LLaMA 7B + Inference Time Intervention (ITI) | |
Galactica: A Large Language Model for Science | ✓ Link | 0.26 | | | | | | | | | | GAL 120B | 2022-11-16 |
Scaling Language Models: Methods, Analysis & Insights from Training Gopher | ✓ Link | 0.25 | | | | | | | | | | Gopher 7.1 (zero-shot, QA prompts) | 2021-12-08 |
Galactica: A Large Language Model for Science | ✓ Link | 0.24 | | | | | | | | | | GAL 30B | 2022-11-16 |
Scaling Language Models: Methods, Analysis & Insights from Training Gopher | ✓ Link | 0.23 | | | | | | | | | | Gopher 7.1B (zero-shot, Our Prompt + Choices) | 2021-12-08 |
Scaling Language Models: Methods, Analysis & Insights from Training Gopher | ✓ Link | 0.23 | | | | | | | | | | Gopher 1.4 (zero-shot, QA prompts) | 2021-12-08 |
TruthfulQA: Measuring How Models Mimic Human Falsehoods | ✓ Link | 0.22 | 0.39 | 29.50 | 89.84 | 29.87 | -0.25 | -9.41 | -4.91 | | | GPT-2 1.5B | 2021-09-08 |
Scaling Language Models: Methods, Analysis & Insights from Training Gopher | ✓ Link | 0.217 | | | | | | | | | | Gopher 1.4B (zero-shot, Our Prompt + Choices) | 2021-12-08 |
TruthfulQA: Measuring How Models Mimic Human Falsehoods | ✓ Link | 0.21 | 0.33 | 20.44 | 97.55 | 20.56 | -0.56 | -17.75 | -17.38 | | | GPT-3 175B | 2021-09-08 |
Galactica: A Large Language Model for Science | ✓ Link | 0.21 | | | | | | | | | | OPT 175B | 2022-11-16 |
TruthfulQA: Measuring How Models Mimic Human Falsehoods | ✓ Link | 0.20 | 0.36 | 26.68 | 89.96 | 27.17 | -0.31 | -11.35 | -7.58 | | | GPT-J 6B | 2021-09-08 |
TruthfulQA: Measuring How Models Mimic Human Falsehoods | ✓ Link | 0.19 | 0.35 | 53.86 | 64.50 | 53.24 | 0.08 | 1.76 | -0.16 | | | UnifiedQA 3B | 2021-09-08 |
Galactica: A Large Language Model for Science | ✓ Link | 0.19 | | | | | | | | | | GAL 125M | 2022-11-16 |
Galactica: A Large Language Model for Science | ✓ Link | 0.19 | | | | | | | | | | GAL 1.3B | 2022-11-16 |
Galactica: A Large Language Model for Science | ✓ Link | 0.19 | | | | | | | | | | GAL 6.7B | 2022-11-16 |
Scaling Language Models: Methods, Analysis & Insights from Training Gopher | ✓ Link | 0. 27 | | | | | | | | | | Gopher 280B (zero-shot, QA prompts) | 2021-12-08 |
LLaMA: Open and Efficient Foundation Language Models | ✓ Link | | | 57 | 53 | | | | | | | LLaMA 65B | 2023-02-27 |
LLaMA: Open and Efficient Foundation Language Models | ✓ Link | | | 52 | 48 | | | | | | | LLaMA 33B | 2023-02-27 |
LLaMA: Open and Efficient Foundation Language Models | ✓ Link | | | 47 | 41 | | | | | | | LLaMA 13B | 2023-02-27 |
LLaMA: Open and Efficient Foundation Language Models | ✓ Link | | | 33 | 29 | | | | | | | LLaMA 7B | 2023-02-27 |
Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models | ✓ Link | | | | | | | | | 67.3 | | CoA | 2024-03-26 |
Tree of Thoughts: Deliberate Problem Solving with Large Language Models | ✓ Link | | | | | | | | | 66.6 | | ToT | 2023-05-17 |
Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models | ✓ Link | | | | | | | | | 63.3 | | CoA w/o actions | 2024-03-26 |
Automatic Chain of Thought Prompting in Large Language Models | ✓ Link | | | | | | | | | 42.2 | | Auto-CoT | 2022-10-07 |
SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments | | | | | | | | | | | 68.4 | Shakti-LLM (2.5B) | 2024-10-15 |