The paper argues that the fairest way to check how generally smart an AI is, is to see how quickly and well it learns lots of different human-made games, just like a person with the same time and practice.
The paper builds a Computer-Using World Model (CUWM) that lets an AI “imagine” what a desktop app (like Word/Excel/PowerPoint) will look like after a click or keystroke—before doing it for real.
The paper studies Mamba-2 (a fast, linear-time attention method) and pares it down to the pieces that truly boost accuracy.
This paper shows, step by step, how to train a 1.36-billion-parameter science-focused language model directly from raw arXiv LaTeX files using only 2 A100 GPUs.
Unified Latents (UL) is a way to learn the hidden code (latents) for images and videos by training three parts together: an encoder, a diffusion prior, and a diffusion decoder.
Robots learn better when they predict short, meaningful summaries of future images instead of drawing every pixel of the future scene.
Trinity is a family of open language models that are huge on the inside but only wake up a few 'experts' for each word, so they are fast and affordable to run.
This paper builds Conv-FinRe, a new test that checks if AI financial advisors give advice that fits a person’s true goals, not just what they clicked before.
This paper speeds up image and video generators called diffusion transformers by changing how big their puzzle pieces (patches) are at each step.
The paper shows how a code-writing AI (a large language model) can invent brand‑new multi‑agent learning algorithms instead of humans having to hand‑design them.
SimToolReal teaches a robot hand to use many different tools by practicing in simulation and then working in the real world without extra training.
This paper explains, in detail, how a simple two-layer neural network learns to add numbers on a clock (modular addition) by building and combining wave-like patterns called Fourier features.