Paper | Code | Top 1 Accuracy | L1 | Top 5 Accuracy | ModelName | ReleaseDate |
---|---|---|---|---|---|---|
Learning What and Where: Disentangling Location and Identity Tracking Without Supervision | ✓ Link | 90.7 | 0.14 | 98.5 | Loci | 2022-05-26 |
TFCNet: Temporal Fully Connected Networks for Static Unbiased Temporal Reasoning | 79.7 | 0.47 | 95.5 | TFCNet | 2022-03-11 | |
Learning Object Permanence from Video | ✓ Link | 74.8 | 0.54 | OPNet | 2020-03-23 | |
Attention over learned object embeddings enables complex visual reasoning | ✓ Link | 74.0 | 0.44 | 94.0 | Aloe | 2020-12-15 |
Hopper: Multi-hop Transformer for Spatiotemporal Reasoning | ✓ Link | 73.2 | 0.85 | 93.8 | Hopper | 2021-03-19 |
INFERNO: Inferring Object-Centric 3D Scene Representations without Supervision | 71.7 | 88.9 | Inferno | 2021-09-29 | ||
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset | ✓ Link | 60.2 | 1.2 | 81.8 | I3D-50 + LSTM | 2017-05-22 |