ACoT-VLA: Action Chain-of-Thought for Vision-Language-Action Models
IntermediateLinqing Zhong, Yi Liu et al.Jan 16arXiv
Robots usually think in words and pictures, but their hands need exact motions, so there is a gap between understanding and doing.
#Vision-Language-Action#Action Chain-of-Thought#Explicit Action Reasoner