Deepfakes are getting so good that simple yes/no detectors are failing, especially when attackers add tiny, invisible changes.
DEER is a new way to speed up big language models by letting a diffusion model draft many tokens at once and an autoregressive model double-check them.
This paper checks if a popular text-to-image model called Nano Banana Pro can fix messy photos without any extra training.
This paper teaches vision-language models to reason about pictures using puzzles instead of expensive human labels.
HERBench is a new test that checks if video AI models can combine several clues spread across time, not just guess from one frame or language priors.
MemFlow is a new way for AI to remember the right parts of a long video story while it keeps making new parts, so characters and scenes stay consistent.
TimeLens studies how to teach AI not just what happens in a video, but exactly when it happens, which is called video temporal grounding (VTG).
This paper shows a simple, math-guided way to turn image pieces into tidy symbols (tokens) using points spread evenly on a sphere.
CRISP turns a normal phone video of a person into a clean 3D world and a virtual human that can move in it without breaking physics.
MMGR is a new benchmark that checks whether AI image and video generators follow real-world rules, not just whether their outputs look pretty.
Autoregressive (AR) models normally write one token at a time, which is accurate but slow for long answers.
Robots usually learn by copying many demonstrations, which is expensive and makes them brittle when things change.