Papers2

#first-person video

MIND: Benchmarking Memory Consistency and Action Control in World Models

MIND is a new benchmark that fairly tests two core skills of world models: remembering the world over time (memory consistency) and following controls exactly (action control).

#world models#memory consistency#action control

Not triaged yet

PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence

Intermediate

Xiaopeng Lin, Shijie Lian et al.Dec 18arXiv

Robots learn best from what they would actually see, which is a first-person (egocentric) view, but most AI models are trained on third-person videos and get confused.

#egocentric vision#first-person video#vision-language model

Not triaged yet