Youtu-VL is a new kind of vision-language model that learns to predict both words and tiny image pieces, not just words.
AACR-Bench is a new test set that checks how well AI can do code reviews using the whole project, not just one file.
Selective Steering is a new way to gently nudge a language model’s inner thoughts without breaking its flow or skills.
Large language model (LLM) post-training has uneven work per GPU because some text sequences are much longer than others.
Innovator-VL is a new multimodal AI model that understands both pictures and text to help solve science problems without needing mountains of special data.
LLMs are usually trained by treating every question the same and giving each one the same number of tries, which wastes compute on easy problems and neglects hard ones.
SimpleSeg teaches a multimodal language model to outline objects by writing down a list of points, like connecting the dots, instead of using a special segmentation decoder.
This paper teaches a model to be its own teacher so it can climb out of a learning plateau on very hard math problems.
TSRBench is a giant test that checks if AI models can understand and reason about data that changes over time, like heartbeats, stock prices, and weather.
Large language models often learn one-size-fits-all preferences, but people are different, so we need personalization.
The paper finds almost 300 accepted NLP papers (mostly in 2025) that include at least one fake or non-existent reference, which the authors call a HalluCitation.
LingBot-VLA is a robot brain that listens to language, looks at the world, and decides smooth actions to get tasks done.