An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges
IntermediateChao Xu, Suyu Zhang et al.Dec 12arXiv
Vision-Language-Action (VLA) models are robots’ “see–think–do” brains that connect cameras (vision), words (language), and motors (action).
#Vision-Language-Action#Embodied AI#Multimodal Alignment