Long tasks trip up most AIs because they lose track of goals and make small mistakes that snowball over many steps.
Metric Anything is a new way to teach AI real, ruler-like distances (metric depth) from very mixed and noisy 3D data.
The paper shows a simple way to teach AI models what not to learn by removing only the exact words (tokens) related to unwanted topics during pretraining.
SERA is a new, low-cost way to train coding helpers (agents) that learn the style and secrets of your own codebase.
This paper builds a fair, big playground (a benchmark) to test many EEG foundation models side-by-side on the same rules.
The paper shows that big language models often get stuck with weight sizes set by training hyperparameters instead of by the data, which quietly hurts performance.
The paper teaches a game-playing AI to copy good human players (behavior cloning) and shows that simply scaling up the model and the data makes the AI reason more causally (it pays attention to what truly causes outcomes on screen).
Language is lumpy: easy stretches and tricky jumps are mixed together, but old models spend the same effort on every word.
This paper asks if large language models (LLMs) can act like "world models" that predict what happens next in text-based environments, not just the next word in a sentence.
The paper tackles a paradox: visual tokenizers that get great pixel reconstructions often make worse images when used for generation.
Different programming languages scale differently when training code AI models, so treating them all the same wastes compute and lowers performance.
This paper studies how a newer kind of language model, called a discrete diffusion language model (DLM), gets better as we give it more data, bigger models, and more compute.