Papers1262

Learning to Repair Lean Proofs from Compiler Feedback

This paper teaches AI how to fix broken Lean math proofs by learning from the compiler’s feedback, not just from finished, perfect proofs.

#Lean proof repair#compiler feedback#APRIL dataset

Not triaged yet

Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization

Intermediate

Haocheng Xi, Shuo Yang et al.Feb 3arXiv

Auto-regressive video models make videos one chunk at a time but run out of GPU memory because the KV-cache grows with history.

#Quant VideoGen (QVG)#KV-cache quantization#2-bit quantization

Not triaged yet

FIRE-Bench: Evaluating Agents on the Rediscovery of Scientific Insights

Intermediate

Zhen Wang, Fan Bai et al.Feb 2arXiv

FIRE-Bench is a new test that checks whether AI agents can fully redo real scientific discoveries, step by step, not just guess answers.

#FIRE-Bench#scientific agents#rediscovery benchmark

Not triaged yet

AdaptMMBench: Benchmarking Adaptive Multimodal Reasoning for Mode Selection and Reasoning Process

Intermediate

Xintong Zhang, Xiaowen Zhang et al.Feb 2arXiv

AdaptMMBench is a new test that checks if AI models know when to just look and think, and when to use extra visual tools like zooming or brightening an image.

#Adaptive Multimodal Reasoning#Vision-Language Models#Tool Invocation

Not triaged yet

MARS: Modular Agent with Reflective Search for Automated AI Research

Intermediate

Jiefeng Chen, Bhavana Dalvi Mishra et al.Feb 2arXiv

MARS is an AI agent that runs AI research like a careful scientist and thrifty engineer at the same time.

#MARS#budget-aware MCTS#reflective memory

Not triaged yet

PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss

Intermediate

Zehong Ma, Ruihan Xu et al.Feb 2arXiv

PixelGen is a new image generator that works directly with pixels and uses what-looks-good-to-people guidance (perceptual loss) to improve quality.

#pixel diffusion#perceptual loss#LPIPS

Not triaged yet

RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System

Beginner

Yinjie Wang, Tianbao Xie et al.Feb 2arXiv

RLAnything is a new reinforcement learning (RL) framework that trains three things together at once: the policy (the agent), the reward model (the judge), and the environment (the tasks).

#reinforcement learning#closed-loop optimization#reward modeling

Not triaged yet

RE-TRAC: REcursive TRAjectory Compression for Deep Search Agents

Intermediate

Jialiang Zhu, Gongrui Zhang et al.Feb 2arXiv

Re-TRAC is a new way for AI search agents to learn from each try, write a clean summary of what happened, and then use that summary to do better on the next try.

#Re-TRAC#trajectory compression#deep research agents

Not triaged yet

Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability

Intermediate

Xiao Liang, Zhong-Zhi Li et al.Feb 2arXiv

The paper trains language models to solve hard problems by first breaking them into smaller parts and then solving those parts, instead of only thinking in one long chain.

#divide-and-conquer reasoning#chain-of-thought#reinforcement learning

Not triaged yet

MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents

Intermediate

Haozhen Zhang, Quanyu Long et al.Feb 2arXiv

MemSkill turns memory operations for AI agents into learnable skills instead of fixed, hand-made rules.

#memory skills#LLM agents#skill bank

Not triaged yet

SPARKLING: Balancing Signal Preservation and Symmetry Breaking for Width-Progressive Learning

Intermediate

Qifan Yu, Xinyu Ma et al.Feb 2arXiv

This paper shows how to safely make a neural network wider in the middle of training without it freaking out.

#Progressive Learning#Width Expansion#RMS scale

Not triaged yet

Thinking with Comics: Enhancing Multimodal Reasoning through Structured Visual Storytelling

Intermediate

Andong Chen, Wenxin Zhu et al.Feb 2arXiv

This paper shows that comics (multi-panel pictures with words) can help AI think through problems step by step, just like a student explains their work.

#multimodal reasoning#visual storytelling#comics

Not triaged yet

33 34 35 36 37