OpenCodePapers

interpretability-techniques-for-deep-learning

Interpretability Techniques for Deep Learning
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeLog odds-ratio (pythia-6.9b)ModelNameReleaseDate
CausalGym: Benchmarking causal interpretability methods on linguistic tasks✓ Link9.95DAS2024-02-19
CausalGym: Benchmarking causal interpretability methods on linguistic tasks✓ Link3.42Linear probe2024-02-19
CausalGym: Benchmarking causal interpretability methods on linguistic tasks✓ Link2.91Difference-in-means2024-02-19
CausalGym: Benchmarking causal interpretability methods on linguistic tasks✓ Link1.87k-means2024-02-19
CausalGym: Benchmarking causal interpretability methods on linguistic tasks✓ Link1.81PCA2024-02-19
CausalGym: Benchmarking causal interpretability methods on linguistic tasks✓ Link0.27LDA2024-02-19
CausalGym: Benchmarking causal interpretability methods on linguistic tasks✓ Link0.01Random2024-02-19