MemEvolve teaches AI agents not only to remember past experiences but also to improve the way they remember, like a student who upgrades their study habits over time.
This paper shows that great image understanding features alone are not enough for making great images; you also need strong pixel-level detail.
Robust-R1 teaches vision-language models to notice how a picture is damaged, think through what that damage hides, and then answer as if the picture were clear.
This paper introduces NEPA, a very simple way to teach vision models by having them predict the next patch’s embedding in an image sequence, just like language models predict the next word.
This paper introduces Log-linear Sparse Attention (LLSA), a new way for Diffusion Transformers to focus only on the most useful information using a smart, layered search.
This paper builds a big, fair test called Hearing to Translate to check how well different speech translation systems work in the real world.
This paper speeds up diffusion language models (dLLMs) by changing the order in which they fill in missing words.
SCOPE lets AI agents rewrite their own instructions while they are working, so they can fix mistakes and get smarter on the next step, not just the next task.
HERBench is a new test that checks if video AI models can combine several clues spread across time, not just guess from one frame or language priors.
Sparse-LaViDa makes diffusion-style AI models much faster by skipping unhelpful masked tokens during generation while keeping quality the same.
Olmo 3 is a family of fully-open AI language models (7B and 32B) where every step—from raw data to training code and checkpoints—is released.
Diffusion Preview is a two-step “preview-then-refine” workflow that shows you a fast draft image first and only spends full compute after you like the draft.