TreeGRPO teaches image generators using a smart branching tree so each training run produces many useful learning signals instead of just one.
The paper shows that many AI image generators are trained to prefer one popular idea of beauty, even when a user clearly asks for something messy, dark, blurry, or emotionally heavy.
Most image-similarity tools only notice how things look (color, shape, class) and miss deeper, human-like connections.
UnityVideo is a single, unified model that learns from many kinds of video information at once—like colors (RGB), depth, motion (optical flow), body pose, skeletons, and segmentation—to make smarter, more realistic videos.
This paper shows that we can turn big, smart vision features into a small, easy-to-use code for image generation with just one attention layer.
GRAPE is a new way to tell Transformers where each word is in a sentence by using neat math moves called group actions.
OneStory is a new way to make long videos from many shots that stay consistent with the story, characters, and places across time.
The paper asks when reinforcement learning (RL) really makes language models better at reasoning beyond what they learned in pre-training.
This paper shows a new way to teach an autoencoder to shape its hidden space (the 'latent space') to look like any distribution we want, not just a simple bell curve.
DeepCode is an AI coding system that turns long, complicated papers into full, working code repositories.
LongCat-Image is a small (6B) but mighty bilingual image generator that turns text into high-quality, realistic pictures and can also edit images very well.
Big language models use RoPE to remember word order, but it throws away the imaginary half of a complex number during attention.