HY3D-Bench is a complete, open-source “starter kit” for making and studying high-quality 3D objects.
DIFFA-2 is a new audio AI that listens to speech, sounds, and music and answers questions about them using a diffusion-style language model instead of the usual step-by-step (autoregressive) method.
Large reasoning models got very good at thinking step-by-step, but that sometimes made them too eager to follow harmful instructions.
DreamActor-M2 is a new way to make a still picture move by copying motion from a video while keeping the character’s look the same.
This paper shows that making short videos can help AI plan and reason in pictures better than writing out steps in text.
Videos are made of very long lists of tokens, and regular attention looks at every pair of tokens, which is slow and expensive.
This paper shows how to keep training a language model while it is solving one hard, real problem, so it can discover a single, truly great answer instead of many average ones.
FutureOmni is the first benchmark that tests if multimodal AI models can predict what happens next from both sound and video, not just explain what already happened.
CoDance is a new way to animate many characters in one picture using just one pose video, even if the picture and the video do not line up perfectly.
MoCha is a new AI that swaps a person in a video with a new character using only one mask on one frame and a few reference photos.
Large Vision-Language Models (LVLMs) look great on single images but often stumble when they must reason across multiple images.
MeepleLM is a special AI that reads a board game’s rulebook and pretends to be different kinds of players to give helpful, honest feedback.