Paper | Code | Log odds-ratio (pythia-6.9b) | ModelName | ReleaseDate |
---|---|---|---|---|
CausalGym: Benchmarking causal interpretability methods on linguistic tasks | ✓ Link | 9.95 | DAS | 2024-02-19 |
CausalGym: Benchmarking causal interpretability methods on linguistic tasks | ✓ Link | 3.42 | Linear probe | 2024-02-19 |
CausalGym: Benchmarking causal interpretability methods on linguistic tasks | ✓ Link | 2.91 | Difference-in-means | 2024-02-19 |
CausalGym: Benchmarking causal interpretability methods on linguistic tasks | ✓ Link | 1.87 | k-means | 2024-02-19 |
CausalGym: Benchmarking causal interpretability methods on linguistic tasks | ✓ Link | 1.81 | PCA | 2024-02-19 |
CausalGym: Benchmarking causal interpretability methods on linguistic tasks | ✓ Link | 0.27 | LDA | 2024-02-19 |
CausalGym: Benchmarking causal interpretability methods on linguistic tasks | ✓ Link | 0.01 | Random | 2024-02-19 |