PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence
IntermediateXiaopeng Lin, Shijie Lian et al.Dec 18arXiv
Robots learn best from what they would actually see, which is a first-person (egocentric) view, but most AI models are trained on third-person videos and get confused.
#egocentric vision#first-person video#vision-language model