Papers924

Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision

Youtu-VL is a new kind of vision-language model that learns to predict both words and tiny image pieces, not just words.

#Vision-Language Models#Unified Autoregressive Supervision#Visual Tokenization

AACR-Bench: Evaluating Automatic Code Review with Holistic Repository-Level Context

Intermediate

Lei Zhang, Yongda Yu et al.Jan 27arXiv

AACR-Bench is a new test set that checks how well AI can do code reviews using the whole project, not just one file.

#Automated Code Review#Benchmark#Repository-level Context

Selective Steering: Norm-Preserving Control Through Discriminative Layer Selection

Intermediate

Quy-Anh Dang, Chris NgoJan 27arXiv

Selective Steering is a new way to gently nudge a language model’s inner thoughts without breaking its flow or skills.

#Selective Steering#Activation Steering#Angular Steering

Revisiting Parameter Server in LLM Post-Training

Intermediate

Xinyi Wan, Penghui Qi et al.Jan 27arXiv

Large language model (LLM) post-training has uneven work per GPU because some text sequences are much longer than others.

#On-Demand Communication#Fully Sharded Data Parallel#Parameter Server

Innovator-VL: A Multimodal Large Language Model for Scientific Discovery

Intermediate

Zichen Wen, Boxue Yang et al.Jan 27arXiv

Innovator-VL is a new multimodal AI model that understands both pictures and text to help solve science problems without needing mountains of special data.

#Innovator-VL#multimodal large language model#scientific reasoning

Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning

Intermediate

Kishan Panaganti, Zhenwen Liang et al.Jan 27arXiv

LLMs are usually trained by treating every question the same and giving each one the same number of tries, which wastes compute on easy problems and neglects hard ones.

#LLM reasoning#Reinforcement Learning (RL)#GRPO

Towards Pixel-Level VLM Perception via Simple Points Prediction

Intermediate

Tianhui Song, Haoyu Lu et al.Jan 27arXiv

SimpleSeg teaches a multimodal language model to outline objects by writing down a list of points, like connecting the dots, instead of using a special segmentation decoder.

#SimpleSeg#multimodal large language model#decoder-free segmentation

Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability

Intermediate

Shobhita Sundaram, John Quan et al.Jan 26arXiv

This paper teaches a model to be its own teacher so it can climb out of a learning plateau on very hard math problems.

#meta-reinforcement learning#teacher-student self-play#grounded rewards

TSRBench: A Comprehensive Multi-task Multi-modal Time Series Reasoning Benchmark for Generalist Models

Beginner

Fangxu Yu, Xingang Guo et al.Jan 26arXiv

TSRBench is a giant test that checks if AI models can understand and reason about data that changes over time, like heartbeats, stock prices, and weather.

#time series reasoning#multimodal benchmark#perception

One Adapts to Any: Meta Reward Modeling for Personalized LLM Alignment

Intermediate

Hongru Cai, Yongqi Li et al.Jan 26arXiv

Large language models often learn one-size-fits-all preferences, but people are different, so we need personalization.

#personalized alignment#reward modeling#meta-learning

HalluCitation Matters: Revealing the Impact of Hallucinated References with 300 Hallucinated Papers in ACL Conferences

Beginner

Yusuke Sakai, Hidetaka Kamigaito et al.Jan 26arXiv

The paper finds almost 300 accepted NLP papers (mostly in 2025) that include at least one fake or non-existent reference, which the authors call a HalluCitation.

#HalluCitation#hallucinated citations#citation verification

A Pragmatic VLA Foundation Model

Intermediate

Wei Wu, Fan Lu et al.Jan 26arXiv

LingBot-VLA is a robot brain that listens to language, looks at the world, and decides smooth actions to get tasks done.

#Vision‑Language‑Action#foundation model#Flow Matching

18 19 20 21 22