VideoCoF is a new way to edit videos that first figures out WHERE to edit and then does the edit, like thinking before acting.
This paper teaches a language model to think along several paths at the same time instead of one step after another.
Saber is a new way to make videos that match a text description while keeping the look of people or objects from reference photos, without needing special triplet datasets.
LLM multi-agent systems often fail quietly (no crash) and leave long, twisty logs that are hard to debug by hand.
Robots need lots of realistic, long videos to learn, but collecting them is slow and expensive.
OmniSafeBench-MM is a one-stop, open-source test bench that fairly compares how multimodal AI models get tricked (jailbroken) and how well different defenses stop that.
The paper shows that making a model write a number as a sequence of digits and then grading the whole number at the end works better than grading each digit separately.
This paper fixes two big problems in image-making AI that builds pictures step by step: it often practices with perfect answers (teacher forcing) but must perform using its own imperfect guesses later, and the earliest coarse steps are much harder than the later fine steps.
VG-Refiner is a new way for AI to find the right object in a picture when given a description, even if helper tools make mistakes.
EditThinker is a helper brain for any image editor that thinks, checks, and rewrites the instruction in multiple rounds until the picture looks right.
This paper teaches video-making AI models to say how sure they are about each tiny part of every frame they create.
SCAIL is a new AI system that turns a single character image into a studio-quality animation by following the moves in a driving video.