Speculative decoding speeds up big language models by letting a small helper model guess several next words and having the big model check them all at once.
Decoding (how a language model picks the next word) isn’t a bag of tricks; it’s a clean optimisation problem over probabilities.
Unified Latents (UL) is a way to learn the hidden code (latents) for images and videos by training three parts together: an encoder, a diffusion prior, and a diffusion decoder.
VESPO is a new, stable way to train language models with reinforcement learning even when training data comes from older or mismatched policies.
The paper shows a three-way no-win situation: an AI society cannot be closed off, keep learning forever, and stay perfectly safe for humans all at the same time.
The paper studies how to teach a smaller language model using a bigger one by only focusing on the most useful bits instead of everything.
The paper teaches vision-language models (AIs that look and read) to pay attention to the right picture parts without needing extra tools during answering time.
The paper teaches a video generator to move things realistically by borrowing motion knowledge from a strong video tracker.