LLaDA-o is a new AI that understands pictures and text and can also make images, all in one model.
Short videos are easy for AI to make sharp and lively, but long videos need stories and memory, and there isn’t much training data for that.
WoG (World Guidance) teaches a robot to imagine just the right bits of the near future and use those bits to pick better actions.
Robots learn faster and more flexibly when they can use human touch data, but humans and robots feel touch with very different sensors.
UniReason is a single, unified model that plans with world knowledge before making an image and then edits its own result to fix mistakes, like a student drafting and revising an essay.
Diffusion models make pictures from noise but often miss what people actually want in the prompt or what looks good to humans.
ShapeR builds clean, correctly sized 3D objects from messy, casual phone or glasses videos by using images, camera poses, sparse SLAM points, and short text captions together.
This paper turns a video model into a step-by-step visual thinker that makes one final, high-quality picture from a text prompt.
MorphAny3D is a training-free way to smoothly change one 3D object into another, even if they are totally different (like a bee into a biplane).
ProEdit is a training-free, plug-and-play method that fixes a common problem in image and video editing: the model clings too hard to the original picture and refuses to change what you asked for.
Robots learn best from what they would actually see, which is a first-person (egocentric) view, but most AI models are trained on third-person videos and get confused.
This paper protects your photos from being misused by new AI image editors that can copy your face or style from just one picture.