Big models are often used to grade AI answers, but they are expensive, slow, and depend too much on tricky prompts.
RAPTOR is a simple, fast way to find a direction (a concept vector) inside a frozen language model that points toward a concept like 'sarcasm' or 'positivity.'
Millions of public AI models exist, but downloads are concentrated on a tiny set of “official” checkpoints, which are not always the best performers.
This paper shows how to turn a big Transformer model into a faster hybrid model that mixes attention and RNN layers using far less training data (about 2.3B tokens).
The paper teaches AI agents better by grading not just their final answers, but also how they think and use tools along the way.
DynamicVLA is a small and fast robot brain that sees, reads, and acts while things are moving.
Large language models usually learn by guessing the next word, then get a tiny bit of instruction-following practice; this paper flips that by turning massive web documents into instruction-and-answer pairs at huge scale.
This paper shows a simple, one-model way to dub videos that makes the new voice and the lips move together naturally.
Training big AI models uses lots of memory because most methods still keep a secret full-precision copy of the weights called master weights.
This paper introduces GANPO, a new way to train language models from human preferences by guiding the model using its hidden thoughts (latent space) instead of just its visible words (token space).
This paper shows a new way to help AI think through long problems faster by turning earlier text steps into small pictures the AI can reread.
The paper tackles a real problem: one-shot image or text searches often miss the right evidence (low hit-rate), especially in noisy, cluttered pictures.