Papers915

All Beginner Intermediate Advanced

All Sources arXiv

Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability

Intermediate

Shobhita Sundaram, John Quan et al.Jan 26arXiv

This paper teaches a model to be its own teacher so it can climb out of a learning plateau on very hard math problems.

#meta-reinforcement learning#teacher-student self-play#grounded rewards

TSRBench: A Comprehensive Multi-task Multi-modal Time Series Reasoning Benchmark for Generalist Models

Beginner

Fangxu Yu, Xingang Guo et al.Jan 26arXiv

TSRBench is a giant test that checks if AI models can understand and reason about data that changes over time, like heartbeats, stock prices, and weather.

#time series reasoning#multimodal benchmark#perception

One Adapts to Any: Meta Reward Modeling for Personalized LLM Alignment

Intermediate

Hongru Cai, Yongqi Li et al.Jan 26arXiv

Large language models often learn one-size-fits-all preferences, but people are different, so we need personalization.

#personalized alignment#reward modeling#meta-learning

HalluCitation Matters: Revealing the Impact of Hallucinated References with 300 Hallucinated Papers in ACL Conferences

Beginner

Yusuke Sakai, Hidetaka Kamigaito et al.Jan 26arXiv

The paper finds almost 300 accepted NLP papers (mostly in 2025) that include at least one fake or non-existent reference, which the authors call a HalluCitation.

#HalluCitation#hallucinated citations#citation verification

A Pragmatic VLA Foundation Model

Intermediate

Wei Wu, Fan Lu et al.Jan 26arXiv

LingBot-VLA is a robot brain that listens to language, looks at the world, and decides smooth actions to get tasks done.

#Vision‑Language‑Action#foundation model#Flow Matching

AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning

Intermediate

Mingyang Song, Haoyu Sun et al.Jan 26arXiv

AdaReasoner teaches AI to pick the right visual tools, use them in the right order, and stop using them when they aren’t helping.

#AdaReasoner#dynamic tool orchestration#multimodal large language models

Self-Refining Video Sampling

Intermediate

Sangwon Jang, Taekyung Ki et al.Jan 26arXiv

This paper shows how a video generator can improve its own videos during sampling, without extra training or outside checkers.

#video generation#flow matching#denoising autoencoder

AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security

Intermediate

Dongrui Liu, Qihan Ren et al.Jan 26arXiv

AgentDoG is a new ‘diagnostic guardrail’ that watches AI agents step-by-step and explains exactly why a risky action happened.

#AgentDoG#AI agent safety#diagnostic guardrail

daVinci-Dev: Agent-native Mid-training for Software Engineering

Intermediate

Ji Zeng, Dayuan Fu et al.Jan 26arXiv

This paper teaches code AIs to work more like real software engineers by training them in the middle of their learning using real development workflows.

#agentic mid-training#agent-native data#contextually-native trajectories

TriPlay-RL: Tri-Role Self-Play Reinforcement Learning for LLM Safety Alignment

Intermediate

Zhewen Tan, Wenhan Yu et al.Jan 26arXiv

TriPlay-RL is a three-role self-play training loop (attacker, defender, evaluator) that teaches AI models to be safer with almost no manual labels.

#LLM safety alignment#self-play reinforcement learning#adversarial prompt generation

TAM-Eval: Evaluating LLMs for Automated Unit Test Maintenance

Intermediate

Elena Bruches, Vadim Alperovich et al.Jan 26arXiv

This paper introduces TAM-Eval, a new way to test how well AI models can create, fix, and update unit tests for real software projects.

#unit test maintenance#LLM for software engineering#reference-free evaluation

Yunjue Agent Tech Report: A Fully Reproducible, Zero-Start In-Situ Self-Evolving Agent System for Open-Ended Tasks

Intermediate

Haotian Li, Shijun Yang et al.Jan 26arXiv

This paper builds an AI agent that learns new skills while working, like a kid who learns new tricks during recess without a teacher telling them what to do.

#in-situ self-evolution#tool evolution#parallel batch evolution

18 19 20 21 22