WoG (World Guidance) teaches a robot to imagine just the right bits of the near future and use those bits to pick better actions.
Hepato-LLaVA is a special AI that reads giant microscope pictures of the liver and answers medical questions about cancer.
Robots usually need very detailed, step-by-step directions, but real life often gives only short, simple goals like βfind the red bench.β
The paper tackles a big blind spot in vision-language models: understanding how objects move and relate in 3D over time (dynamic spatial reasoning, or DSR).