Papers924

X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests

X-Coder shows that models can learn expert-level competitive programming using data that is 100% synthetic—no real contest problems needed.

#competitive programming#synthetic data generation#feature-based synthesis

Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning

Intermediate

Chengwen Liu, Xiaomin Yu et al.Jan 11arXiv

VideoDR is a new benchmark that tests if AI can watch a video, pull out key visual clues, search the open web, and chain the clues together to find one verifiable answer.

#video deep research#multimodal reasoning#open-domain question answering

ET-Agent: Incentivizing Effective Tool-Integrated Reasoning Agent via Behavior Calibration

Intermediate

Yifei Chen, Guanting Dong et al.Jan 11arXiv

ET-Agent is a training framework that teaches AI agents to use tools (like search and code) more wisely, not just to get the right answer.

#Tool-Integrated Reasoning#Behavior Calibration#Self-evolving Data Flywheel

Forest Before Trees: Latent Superposition for Efficient Visual Reasoning

Intermediate

Yubo Wang, Juntian Zhang et al.Jan 11arXiv

This paper introduces Laser, a new way for vision-language models to think in their hidden space before speaking, so they see the whole “forest” before picking out the “trees.”

#Latent reasoning#Dynamic Windowed Alignment Learning#Dynamic Semantic Windows

MemGovern: Enhancing Code Agents through Learning from Governed Human Experiences

Beginner

Qihao Wang, Ziming Cheng et al.Jan 11arXiv

MemGovern teaches code agents to learn from past human fixes on GitHub by turning messy discussions into clean, reusable 'experience cards.'

#MemGovern#experience governance#agentic search

EpiCaR: Knowing What You Don't Know Matters for Better Reasoning in LLMs

Intermediate

Jewon Yeom, Jaewon Sok et al.Jan 11arXiv

This paper teaches AI models not just how to solve problems but also how to tell when their own answers might be wrong.

#EPICAR#calibration#epistemic uncertainty

Are LLMs Vulnerable to Preference-Undermining Attacks (PUA)? A Factorial Analysis Methodology for Diagnosing the Trade-off between Preference Alignment and Real-World Validity

Intermediate

Hongjun An, Yiliang Song et al.Jan 10arXiv

The paper shows that friendly, people-pleasing language can trick even advanced language models into agreeing with wrong answers.

#Preference-Undermining Attacks#PUA#sycophancy

BabyVision: Visual Reasoning Beyond Language

Intermediate

Liang Chen, Weichu Xie et al.Jan 10arXiv

BabyVision is a new test that checks if AI can handle the same basic picture puzzles that young children can do, without leaning on language tricks.

#BabyVision#visual reasoning#multimodal large language models

ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking

Beginner

Qiang Zhang, Boli Chen et al.Jan 10arXiv

ArenaRL teaches AI agents by comparing their answers against each other, like a sports tournament, instead of giving each answer a single noisy score.

#ArenaRL#reinforcement learning#relative ranking

LSRIF: Logic-Structured Reinforcement Learning for Instruction Following

Intermediate

Qingyu Ren, Qianyu He et al.Jan 10arXiv

Real instructions often have logic like and first-then and if-else and this paper teaches models to notice and obey that logic.

#instruction following#logical structures#parallel constraints

BizFinBench.v2: A Unified Dual-Mode Bilingual Benchmark for Expert-Level Financial Capability Alignment

Intermediate

Xin Guo, Rongjunchen Zhang et al.Jan 10arXiv

This paper builds BizFinBench.v2, a big bilingual (Chinese–English) test that checks how well AI models really handle finance using real business data from China and the U.S.

#BizFinBench.v2#financial benchmark#bilingual evaluation

Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards

Intermediate

Jiajie Zhang, Xin Lv et al.Jan 9arXiv

The paper fixes a big problem in training web-searching AI: rewarding only the final answer makes agents cut corners and sometimes hallucinate.

#deep search agents#reinforcement learning#rubric rewards

37 38 39 40 41