DINO-SAE is a new autoencoder that keeps both the meaning of an image (semantics) and tiny textures (fine details) at the same time.
This paper builds a smart team of AI helpers, called MEnvAgent, that automatically sets up the right computer environments for code projects in many languages.
BatCoder teaches a code model to write both code and its documentation by doing a round trip: from code to docs and back to code.
This paper fixes a hidden mismatch in image generation: tokenizers make tokens without order, but generators need an order to predict the next token well.
This paper shows how to train big language models faster and cheaper by using 4-bit numbers (NVFP4) without losing much accuracy.
VisionTrim makes picture-and-text AI models run much faster by keeping only the most useful visual pieces (tokens) and smartly merging the rest.
Large language models sometimes reach the right answer for the wrong reasons, which is risky and confusing.
Real attackers can try many prompts in parallel until a model slips, so testing safety with only one try badly underestimates risk.
TTCS is a way for a model to teach itself during the test by first making easier practice questions that are similar to the real hard question and then learning from them.
Big models are often used to grade AI answers, but they are expensive, slow, and depend too much on tricky prompts.
Most reinforcement learning agents only get a simple pass/fail reward, which hides how good or bad their attempts really were.
RAPTOR is a simple, fast way to find a direction (a concept vector) inside a frozen language model that points toward a concept like 'sarcasm' or 'positivity.'