Robots often learn good hand motions during training but get confused when the scene or the instructions change at test time, even a little bit.
DynamicVLA is a small and fast robot brain that sees, reads, and acts while things are moving.
Being-H0.5 is a robot brain that learns from huge amounts of human videos and robot demos so it can work on many different robots, not just one.
This paper introduces Self-E, a text-to-image model that learns from scratch and can generate good pictures in any number of steps, from just a few to many.
SpotEdit is a training‑free way to edit only the parts of an image that actually change, instead of re-generating the whole picture.
Before this work, big vision-language models (VLMs) were great at understanding pictures and words together but not at making new pictures.