Paper | Code | Inst-level loose-accuracy | Inst-level strict-accuracy | Prompt-level loose-accuracy | Prompt-level strict-accuracy | ModelName | ReleaseDate |
---|---|---|---|---|---|---|---|
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models | ✓ Link | 90.4 | 86.7 | 85.6 | 80.2 | AutoIF (Llama3 70B) | 2024-06-19 |
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models | ✓ Link | 88 | 86.1 | 82.3 | 80.2 | AutoIF (Qwen2 72B) | 2024-06-19 |
Instruction-Following Evaluation for Large Language Models | ✓ Link | 85.37 | 83.57 | 79.3 | 76.89 | GPT-4 | 2023-11-14 |
Instruction-Following Evaluation for Large Language Models | ✓ Link | 59.11 | 55.76 | 46.95 | 43.07 | PaLM 2 S | 2023-11-14 |