LangForce: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries
IntermediateShijie Lian, Bin Yu et al.Jan 21arXiv
Robots often learn a bad habit called the vision shortcut: they guess the task just by looking, and ignore the words you tell them.
#Vision-Language-Action#Bayesian decomposition#Latent Action Queries