DIFFA-2 is a new audio AI that listens to speech, sounds, and music and answers questions about them using a diffusion-style language model instead of the usual step-by-step (autoregressive) method.
Large reasoning models got very good at thinking step-by-step, but that sometimes made them too eager to follow harmful instructions.
Golden Goose turns messy internet text into clean multiple-choice puzzles that computers can learn from and get automatic rewards for.
Diffusion language models (dLLMs) generate several tokens at once but usually throw away lots of helpful clues each step—RCD keeps and reuses those clues.
DINO-SAE is a new autoencoder that keeps both the meaning of an image (semantics) and tiny textures (fine details) at the same time.
This paper builds a smart team of AI helpers, called MEnvAgent, that automatically sets up the right computer environments for code projects in many languages.
BatCoder teaches a code model to write both code and its documentation by doing a round trip: from code to docs and back to code.
This paper fixes a hidden mismatch in image generation: tokenizers make tokens without order, but generators need an order to predict the next token well.
VisionTrim makes picture-and-text AI models run much faster by keeping only the most useful visual pieces (tokens) and smartly merging the rest.
Large language models sometimes reach the right answer for the wrong reasons, which is risky and confusing.
Real attackers can try many prompts in parallel until a model slips, so testing safety with only one try badly underestimates risk.
TTCS is a way for a model to teach itself during the test by first making easier practice questions that are similar to the real hard question and then learning from them.