OpenCodePapers

vision-and-language-navigation-on-vln

Vision and Language Navigation
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodesuccesslengtherrororacle successsplModelNameReleaseDate
[]()0.8611.851.610.90.76human
[]()0.79686.452.50.990.01Lily
Airbert: In-domain Pretraining for Vision-and-Language Navigation✓ Link0.78686.542.580.990.01Airbert2021-08-20
[]()0.74686.862.990.990.01Global Normalization
[]()0.74625.273.550.990.01explore@40 beam-search
Improving Vision-and-Language Navigation with Image-Text Pairs from the Web✓ Link0.73686.623.090.990.01VLN-Bert2020-04-30
[]()0.7315.873.130.810.62BEVBert
[]()0.7314.433.350.80.62GMap
[]()0.7310.23.00.80.69Gloabl Normalization pre-explore
[]()0.721250.893.051.00.01FOAM-Beam Search
[]()0.7216.143.440.790.6Lily
[]()0.7215.243.330.780.61ReadNet
[]()0.71176.223.070.940.05Active Exploration (Beam Search)
Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks0.7140.853.240.810.21Self-Supervised Auxiliary Reasoning Tasks (Beam Search)2019-11-18
[]()0.7115.473.380.790.59HOC
[]()0.7114.253.570.770.61metaexplore
[]()0.7110.213.260.770.67sponge
[]()0.7690.613.210.990.01SERL (Beam_Search)
[]()0.714.63.610.770.59lxyict
[]()0.714.393.550.760.6DUET+PASTS
[]()0.711.793.520.750.65Single-run
[]()0.79.853.30.770.68Active Exploration (Pre-explore)
[]()0.69786.353.310.990.01ADad
Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout✓ Link0.69686.823.260.990.01null2019-04-08
[]()0.6914.733.650.760.59CVPR22
[]()0.6911.863.240.760.62CMC-AAL2
[]()0.6811.93.590.730.64EnvEdit+PT
Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks0.6810.433.690.750.65Self-Supervised Auxiliary Reasoning Tasks (Pre-explore)2019-11-18
[]()0.6713.553.740.730.61DDL
[]()0.6712.073.410.760.6CMG-AAL
[]()0.6619.623.850.740.55VLN-TreeTrans
[]()0.6616.413.770.710.59sliu_team
[]()0.6615.893.730.730.6Single-Run, No Pre-Explore
[]()0.6614.783.680.720.6TD-STP
[]()0.6613.073.670.730.6VLN-BERT-Aug
[]()0.6612.753.860.720.6Fortest
[]()0.6611.893.770.720.63ESceme Single-run
[]()0.6515.93.780.710.59WIN
[]()0.6515.93.780.710.59WIN + RecVLN BERT
Vision-Language Navigation with Random Environmental Mixup✓ Link0.6513.113.870.720.59single-run2021-06-15
[]()0.6512.713.820.720.6Single-Run
[]()0.6512.273.930.720.6HAMT
[]()0.6512.223.860.710.61coefficient
[]()0.6511.914.00.70.6bin
[]()0.6510.243.760.710.62Greedy, No Pre-explore
[]()0.6413.753.970.710.58DDL
[]()0.6412.843.90.70.58clin
[]()0.6412.63.880.710.59single-run
[]()0.6412.313.860.710.59PANDA-TingLiu
[]()0.6412.33.940.710.59single-run
[]()0.649.793.970.70.61Back Translation with Environmental Dropout (exploring unseen environments before testing)
[]()0.63357.624.030.960.02Reinforced Cross-Modal Matching (optimized for SR; with beam search)
[]()0.6316.444.00.690.58Hikari
[]()0.6313.574.020.710.57SEvol_lzy
[]()0.6313.543.980.70.58Colab_buaa
[]()0.6313.024.040.70.58Geo
[]()0.6312.773.960.70.57YBYB
[]()0.6312.623.990.710.58hellohellohello
[]()0.6312.514.160.690.58ed
[]()0.6312.354.090.70.57Single-Run, No Pre-Explore
[]()0.6312.354.090.70.57reg
[]()0.6312.354.090.70.57ART
[]()0.6312.34.070.70.58binbin
[]()0.6216.944.270.720.49CCC(ssm)
[]()0.6210.224.180.670.58MARVAL
[]()0.61373.094.480.970.02Self-Aware Co-Grounded Model
Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation✓ Link0.61196.534.290.90.03Tactical Rewind - long2019-03-06
[]()0.6120.394.570.70.46SSM
[]()0.6115.834.30.680.55single-run
[]()0.6113.434.320.690.55zq
[]()0.6113.24.220.690.56GBSE
[]()0.6112.734.260.670.55hellohello
[]()0.6112.134.270.670.56homebody
[]()0.6110.664.020.720.55GraphBert
[]()0.621.034.340.710.43Single Run
[]()0.69.484.210.670.59SIL-R2
[]()0.5914.294.260.660.55Envdrop+SEVol+BT
[]()0.5913.074.530.660.53a new baseline
[]()0.5912.224.490.670.54trysth
[]()0.5910.314.710.640.55SEA features + AuxRN (single-run)
[]()0.5910.214.520.640.56PREVALENT
[]()0.5813.04.450.670.53SQANv1, No Pre-explore
Neighbor-view Enhanced Model for Vision and Language Navigation✓ Link0.5812.984.370.660.54MM20212021-07-15
[]()0.5810.714.950.650.55without pre-explore, beam-search
[]()0.5713.164.610.650.5CMG-AAL-TCSVT
[]()0.5712.344.590.650.53Single-Run, No Pre-Explore
[]()0.5710.994.570.650.5test-sf
[]()0.5710.524.530.630.53Greedy
[]()0.561214.944.570.960.01null
[]()0.5615.744.840.690.48liuer
[]()0.5612.194.650.620.52reward-vln
[]()0.5611.395.290.650.53OAG(without pre-explore, beam-search)
[]()0.5610.585.170.630.52jmebs
[]()0.5610.184.890.630.53SEA features + Env-Dropout (single-run)
[]()0.5512.964.90.620.5Single-Run
[]()0.5510.95.320.630.51WQ_Pretrain
[]()0.5510.294.750.610.52Lang-Vis-Entity VLN (Single-Run)
Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation✓ Link0.5422.085.140.640.41Tactical Rewind - short2019-03-06
[]()0.5414.315.240.640.46single run
[]()0.5412.514.960.610.48DCMT
[]()0.5411.745.350.640.5map1
[]()0.5410.515.30.610.51PREVALENT
[]()0.5410.065.110.620.5227k
[]()0.531257.384.870.960.01Speaker-Follower
[]()0.5315.025.340.610.42GVLN
[]()0.5312.135.630.610.49SERL
[]()0.5310.45.30.610.52m-path
[]()0.5310.05.370.590.5single-run
[]()0.5210.655.450.60.48SYSU-ISE
[]()0.5113.055.140.60.45licr19
Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout✓ Link0.5111.665.230.590.47Back Translation with Environmental Dropout (no beam search)2019-04-08
[]()0.5111.475.70.570.47SERL (no_augmented)
[]()0.5111.155.450.570.47Single-Run, No Pre-Explore
[]()0.4914.425.50.570.41testliu
Self-Monitoring Navigation Agent via Auxiliary Progress Estimation✓ Link0.4818.045.670.590.35Self-Monitoring Navigation Agent (no beam search; Progress Inference)2019-01-10
The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation✓ Link0.4813.695.690.560.4The Regretful Agent (no beam search; greedy action selection)2019-03-05
Transferable Representation Learning in Vision-and-Language Navigation0.4810.275.490.560.45ALTR2019-08-09
[]()0.4716.735.80.570.36selfmoni0
[]()0.4716.035.560.570.35DoubleAttn
[]()0.4714.075.420.550.4SEA features + Speaker-Follower (single-run)
[]()0.4710.425.640.530.43naive
Environment-agnostic Multitask Learning for Natural Language Grounded Navigation✓ Link0.4513.356.030.560.4Environment-Agnostic Multitask Learning2020-03-01
[]()0.4510.476.010.530.4speaker_follower_tesk2
[]()0.4311.976.120.50.38Reinforced Cross-Modal Matching (single trajectory; NO beam search)
[]()0.410.176.170.470.36PTA
[]()0.3713.086.460.450.3Khanh Nguyen
[]()0.3613.856.610.460.3HAIL
[]()0.3610.836.670.440.33rcm_test
[]()0.3610.446.950.430.31ai like samurai with PNasNet5Large
[]()0.359.816.550.450.31Dynamic Convolutional Filters
[]()0.348.326.890.410.32AnonymousTeam
[]()0.3315.756.680.430.25base_1
[]()0.3115.17.030.40.25fuse_1
[]()0.322.147.630.610.2no
[]()0.2915.97.20.40.23base_0
[]()0.2610.927.470.340.21test_an
[]()0.259.157.530.320.23Look Before You Leap
[]()0.249.488.560.320.22X-Modal
[]()0.28.267.990.260.18zhangyong
[]()0.28.137.850.270.18Seq2Seq Baseline
[]()0.189.568.820.240.16zy123
[]()0.149.919.80.190.12zzzzzzzz55768
[]()0.139.899.790.180.12Random Agent
[]()0.0745.1312.290.490.021111
[]()0.00.09.930.00.015458