Papers784

Towards Pixel-Level VLM Perception via Simple Points Prediction

Tianhui Song, Haoyu Lu et al.Jan 27arXiv

SimpleSeg teaches a multimodal language model to outline objects by writing down a list of points, like connecting the dots, instead of using a special segmentation decoder.

#SimpleSeg#multimodal large language model#decoder-free segmentation

Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability

Intermediate

Shobhita Sundaram, John Quan et al.Jan 26arXiv

This paper teaches a model to be its own teacher so it can climb out of a learning plateau on very hard math problems.

#meta-reinforcement learning#teacher-student self-play#grounded rewards

One Adapts to Any: Meta Reward Modeling for Personalized LLM Alignment

Intermediate

Hongru Cai, Yongqi Li et al.Jan 26arXiv

Large language models often learn one-size-fits-all preferences, but people are different, so we need personalization.

#personalized alignment#reward modeling#meta-learning

A Pragmatic VLA Foundation Model

Intermediate

Wei Wu, Fan Lu et al.Jan 26arXiv

LingBot-VLA is a robot brain that listens to language, looks at the world, and decides smooth actions to get tasks done.

#Vision‑Language‑Action#foundation model#Flow Matching

AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning

Intermediate

Mingyang Song, Haoyu Sun et al.Jan 26arXiv

AdaReasoner teaches AI to pick the right visual tools, use them in the right order, and stop using them when they aren’t helping.

#AdaReasoner#dynamic tool orchestration#multimodal large language models

Self-Refining Video Sampling

Intermediate

Sangwon Jang, Taekyung Ki et al.Jan 26arXiv

This paper shows how a video generator can improve its own videos during sampling, without extra training or outside checkers.

#video generation#flow matching#denoising autoencoder

AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security

Intermediate

Dongrui Liu, Qihan Ren et al.Jan 26arXiv

AgentDoG is a new ‘diagnostic guardrail’ that watches AI agents step-by-step and explains exactly why a risky action happened.

#AgentDoG#AI agent safety#diagnostic guardrail

daVinci-Dev: Agent-native Mid-training for Software Engineering

Intermediate

Ji Zeng, Dayuan Fu et al.Jan 26arXiv

This paper teaches code AIs to work more like real software engineers by training them in the middle of their learning using real development workflows.

#agentic mid-training#agent-native data#contextually-native trajectories

TriPlay-RL: Tri-Role Self-Play Reinforcement Learning for LLM Safety Alignment

Intermediate

Zhewen Tan, Wenhan Yu et al.Jan 26arXiv

TriPlay-RL is a three-role self-play training loop (attacker, defender, evaluator) that teaches AI models to be safer with almost no manual labels.

#LLM safety alignment#self-play reinforcement learning#adversarial prompt generation

TAM-Eval: Evaluating LLMs for Automated Unit Test Maintenance

Intermediate

Elena Bruches, Vadim Alperovich et al.Jan 26arXiv

This paper introduces TAM-Eval, a new way to test how well AI models can create, fix, and update unit tests for real software projects.

#unit test maintenance#LLM for software engineering#reference-free evaluation

Yunjue Agent Tech Report: A Fully Reproducible, Zero-Start In-Situ Self-Evolving Agent System for Open-Ended Tasks

Intermediate

Haotian Li, Shijun Yang et al.Jan 26arXiv

This paper builds an AI agent that learns new skills while working, like a kid who learns new tricks during recess without a teacher telling them what to do.

#in-situ self-evolution#tool evolution#parallel batch evolution

PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR

Intermediate

James Burgess, Jan N. Hansen et al.Jan 26arXiv

This paper teaches a language-model agent to look up facts in millions of scientific paper summaries and answer clear, single-answer questions.

#RLVR#search agents#PaperSearchQA

16 17 18 19 20