Papers26

#test-time scaling

CoDiQ: Test-Time Scaling for Controllable Difficult Question Generation

Zhongyuan Peng, Caijun Xu et al.Feb 2arXiv

CoDiQ is a recipe for making hard-but-solvable math and coding questions on purpose, and it controls how hard they get while you generate them.

#controllable difficulty#test-time scaling#question generation

FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents

Intermediate

Chiwei Zhu, Benfeng Xu et al.Feb 2arXiv

FS-Researcher is a two-agent system that lets AI do very long research by saving everything in a computer folder so it never runs out of memory.

#FS-Researcher#file-system agents#external memory

Thinking in Frames: How Visual Context and Test-Time Scaling Empower Video Reasoning

Intermediate

Chengzu Li, Zanyi Wang et al.Jan 28arXiv

This paper shows that making short videos can help AI plan and reason in pictures better than writing out steps in text.

#video reasoning#visual planning#test-time scaling

Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification

Intermediate

Yuxuan Wan, Tianqing Fang et al.Jan 22arXiv

DeepVerifier is a plug-in checker that helps Deep Research Agents catch and fix their own mistakes while they are working, without retraining.

#Deep Research Agents#verification asymmetry#rubrics-based feedback

Inference-time Physics Alignment of Video Generative Models with Latent World Models

Intermediate

Jianhao Yuan, Xiaofeng Zhang et al.Jan 15arXiv

This paper teaches video-making AIs to follow real-world physics better without retraining them.

#video generation#physics plausibility#latent world model

Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards

Intermediate

Jiajie Zhang, Xin Lv et al.Jan 9arXiv

The paper fixes a big problem in training web-searching AI: rewarding only the final answer makes agents cut corners and sometimes hallucinate.

#deep search agents#reinforcement learning#rubric rewards

DiffProxy: Multi-View Human Mesh Recovery via Diffusion-Generated Dense Proxies

Intermediate

Renke Wang, Zhenyu Zhang et al.Jan 5arXiv

DiffProxy turns tricky multi-camera photos of a person into a clean 3D body and hands by first painting a precise 'map' on each pixel and then fitting a standard body model to that map.

#human mesh recovery#SMPL-X#dense correspondence

SWE-RM: Execution-free Feedback For Software Engineering Agents

Intermediate

KaShun Shum, Binyuan Hui et al.Dec 26arXiv

Coding agents used to fix software rely on feedback; unit tests give only pass/fail signals that are often noisy or missing.

#execution-free feedback#reward model#software engineering agents

UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric Refinement

Intermediate

Tanghui Jia, Dongyu Yan et al.Dec 24arXiv

UltraShape 1.0 is a two-step 3D generator that first makes a simple overall shape and then zooms in to add tiny details.

#3D diffusion#coarse-to-fine generation#voxel-based refinement

GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation

Intermediate

Rang Li, Lei Li et al.Dec 19arXiv

Visual grounding is when an AI finds the exact thing in a picture that a sentence is talking about, and this paper shows today’s big vision-language AIs are not as good at it as we thought.

#visual grounding#multimodal large language models#benchmark

State over Tokens: Characterizing the Role of Reasoning Tokens

Intermediate

Mosh Levy, Zohar Elyoseph et al.Dec 14arXiv

Reasoning tokens (the words a model writes before its final answer) help the model think better, but they are not a trustworthy diary of how it really thought.

#State over Tokens#reasoning tokens#chain-of-thought

DuetSVG: Unified Multimodal SVG Generation with Internal Visual Guidance

Intermediate

Peiying Zhang, Nanxuan Zhao et al.Dec 11arXiv

DuetSVG is a new AI that learns to make SVG graphics by generating an image and the matching SVG code together, like sketching first and then tracing neatly.

#DuetSVG#multimodal generation#SVG generation

1 2 3